mistercrunch commented on PR #36368:
URL: https://github.com/apache/superset/pull/36368#issuecomment-3821434604

   Cornered an agent, asked a bunch of questions, reviewed some code and we put 
this together:
   
   ---
   
   Amazing work here, the architecture is well thought out and the PR 
description is A+. Excited about where this is heading. A few thoughts after 
digging through the code:
   
   ## Framework without consumers
   
   Just want to make sure this is intentional and clearly communicated: this PR 
ships the full GTF infrastructure (DB tables, API endpoints, UI page, Celery 
bridge) but **zero existing workloads are routed through it**. The `@task` 
decorator is exported via `core_api_injection.py` for extensions, but no core 
Superset code uses it **yet**.
   
   That means after merge, users get:
   - Two new tables (`tasks`, `task_subscribers`) — empty
   - A "Tasks" menu item under Manage — showing nothing
   - REST API at `/api/v1/task/` — returning empty results
   - No feature flag to gate any of this
   
   Is the intent to land the framework first and migrate workloads in follow-up 
PRs? If so, might be worth calling that out explicitly in the PR description so 
reviewers know what to expect. I understand it **could** be in-scope to enable 
some workloads by default OR through config (more on that below)
   
   ## Rollout strategy idea
   
   One thing I've been noodling on: what if instead of swapping decorators 
one-by-one in future PRs, we pre-decorated existing Celery tasks with the GTF 
wrapper now, but made it a **no-op by default**? Something like:
   
   ```python
   # Config
   GTF_ENABLED_TASKS: list[str] = []  # empty = all tasks use legacy Celery path
   
   # Usage
   @task(name="cache_chart_thumbnail", scope=TaskScope.PRIVATE)
   def cache_chart_thumbnail(chart_id: int) -> None:
       ...
   ```
   
   Where the `@task` decorator checks if `task_name in GTF_ENABLED_TASKS` — if 
not, it falls through to the existing Celery behavior. This way:
   - 100% no-op by default (code/logic would have to guarantee that)
   - The code is ready to go, no future code changes needed per task
   - Rollout is a config change, not a deploy
   - Operators/testers can opt in task-by-task at their own pace
   - Easy to roll back individual tasks if issues arise
   
   This might be premature for this PR, but wanted to float the idea. It would 
make testing and progressive rollout much more flexible. Happy to help spec 
this out if you think it's worth exploring.
   
   ## Rollout plan / timeline
   
   The PR body lists migration targets (thumbnails, alerts/reports, SQL Lab, 
GAQ) but there's no sequencing or timeline. A few questions that would help the 
team plan:
   
   - What's the first workload you'd want to migrate? Extensions-only, or a 
core task?
   - Would it help if the preset team picked up some of the migration work? 
Happy to coordinate on that
   - For alerts/reports specifically — it has its own state machine 
(`ReportScheduleStateMachine`), execution log table, and pruning job. How do 
you see that coexisting with GTF during migration? Two parallel tracking 
systems?
   
   ## UPDATING.md
   
   The `Next` section doesn't mention GTF yet. Given this introduces new DB 
tables, config keys, API endpoints, and a UI page, it would be good to add an 
entry covering:
   - New config keys: `PUBSUB_BACKEND`, `TASK_ABORT_POLLING_DEFAULT_INTERVAL`, 
`TASKS_ABORT_CHANNEL_PREFIX`
   - Here's a draft for you:
   
   ---
   
   Hey @villebro, really impressive work here — the architecture is well 
thought out and the PR description is one of the most thorough I've seen. 
Excited about where this is heading! A few thoughts after digging through the 
code:
   
   ## Framework without consumers
   
   Just want to make sure this is intentional and clearly communicated: this PR 
ships the full GTF infrastructure (DB tables, API endpoints, UI page, Celery 
bridge) but **zero existing workloads are routed through it**. The `@task` 
decorator is exported via `core_api_injection.py` for extensions, but no core 
Superset code uses it yet.
   
   That means after merge, users get:
   - Two new tables (`tasks`, `task_subscribers`) — empty
   - A "Tasks" menu item under Manage — showing nothing
   - REST API at `/api/v1/task/` — returning empty results
   - No feature flag to gate any of this
   
   Is the intent to land the framework first and migrate workloads in follow-up 
PRs? If so, might be worth calling that out explicitly in the PR description so 
reviewers know what to expect.
   
   ## Rollout strategy idea
   
   One thing I've been noodling on: what if instead of swapping decorators 
one-by-one in future PRs, we pre-decorated existing Celery tasks with the GTF 
wrapper now, but made it a **no-op by default**? Something like:
   
   ```python
   # Config
   GTF_ENABLED_TASKS: list[str] = []  # empty = all tasks use legacy Celery path
   
   # Usage
   @task(name="cache_chart_thumbnail", scope=TaskScope.PRIVATE)
   def cache_chart_thumbnail(chart_id: int) -> None:
       ...
   ```
   
   Where the `@task` decorator checks if `task_name in GTF_ENABLED_TASKS` — if 
not, it falls through to the existing Celery behavior. This way:
   - The code is ready to go, no future code changes needed per task
   - Rollout is a config change, not a deploy
   - Operators can opt in task-by-task at their own pace
   - Easy to roll back individual tasks if issues arise
   
   This might be premature for this PR, but wanted to float the idea. It would 
make testing and progressive rollout much more flexible. Happy to help spec 
this out if you think it's worth exploring.
   
   ## Rollout plan / timeline
   
   The PR body lists migration targets (thumbnails, alerts/reports, SQL Lab, 
GAQ) but there's no sequencing or timeline. A few questions that would help the 
team plan:
   
   - What's the first workload you'd want to migrate? Extensions-only, or a 
core task?
   - Would it help if the preset team picked up some of the migration work? 
Happy to coordinate on that
   - For alerts/reports specifically — it has its own state machine 
(`ReportScheduleStateMachine`), execution log table, and pruning job. How do 
you see that coexisting with GTF during migration? Two parallel tracking 
systems?
   
   ## UPDATING.md
   
   The `Next` section doesn't mention GTF yet. Given this introduces new DB 
tables, config keys, API endpoints, and a UI page, it would be good to add an 
entry covering:
   - informing of the existence of GTF (though it's a no-op as is...)
   - New config keys: `PUBSUB_BACKEND`, `TASK_ABORT_POLLING_DEFAULT_INTERVAL`, 
`TASKS_ABORT_CHANNEL_PREFIX` --- note: could consider hierarchy/object here 
`GTF_CONFIG = {...}`
   - Consider using pruning beat job
   
   Or, just a link to a docs page. The docs page you have is great but geared 
towards using the framework (for devs) as opposed to what devops/admins need to 
know. Can probably write a page for this as things settle. 
   
   ## Other things
   
   - **`TIMED_OUT` not pruned** and buildup: `TaskPruneCommand` only deletes 
`SUCCESS`, `FAILURE`, `ABORTED` — timed-out tasks would accumulate forever. Bug 
or intentional?
   - **Scalability for high-frequency tasks**: Each task lifecycle involves 
5-7+ DB operations plus lock acquisitions. For something like thumbnails + 
eventually explore/data queries, that table will run pretty hot 🔥 . Probably 
ok, but will have an impact on db CPU.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to