danielhumanmod commented on PR #1309:
URL:
https://github.com/apache/datafusion-ballista/pull/1309#issuecomment-3409336815
> Sorry, for late reply @danielhumanmod I'm not quite sure, i guess all the
metrics are collected at the scheduler side, so scheduler should have it all
once job finishes.
>
> It's a question how can we get it out of scheduler? Once the job is
scheduled its question where/how do we wait for job competition. Do we lunch
another job, which would poll rest api or we change scheduler grpc to wait
there, i'm not sure.
>
> #1292 may be related if we get it from rest api
Hey @milenkovicm, thanks for the patience, I did some investigation on the
code and get some idea on the solution:
Firstly, yes, we will introduce a new exec so that the logic can live
entirely outside the scheduler.
The overall execution flow will be:
1. Stage A – Actual query execution:
The target query runs as usual. During this stage, per-task metrics are
collected and reported back to the scheduler, which aggregates them into the
ExecutionGraph.
2. Stage B – New exec:
A new exec node, wrapped in a ShuffleWriterExec (single partition), is added
by the planner when EXPLAIN ANALYZE is triggered.
It uses an UnresolvedShuffleExec pointing to the final stage of Stage A to
express the dependency.
Once Stage A completes, the scheduler activates Stage B, and the executor
running AnalyzerExec simply:
- Builds a request and calls the new API (yes we need a new one here) to
retrieve the finalized metrics.
- Converts the structured response into one or more RecordBatches, then
writes them out through ShuffleWriterExec.
Would love to hear your thought here :)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]