mattcuento commented on issue #1359:
URL: 
https://github.com/apache/datafusion-ballista/issues/1359#issuecomment-3729122521

   Hey @milenkovic, also happy to help here!
   
   A lot of the proposal makes sense to me, though I should say my experience 
here is limited to Flink's adaptive scheduling which is more strictly focused 
on parallelism.
   
   +1 for the second approach, testability, rollout, and falling back to a 
static planner sound more simple in this way. Agreed that physical optimizer 
passes between stages with stage-based execution.
   
   Two questions:
   
   - Im curious about optimizations like two-phased aggregations, which would 
span stages (local agg in one, global in the next). I’ve seen limited examples 
of adaptive two-phase aggregations where the local agg can be disabled in the 
presence of a large keyspace. Have you given any thought to how the design 
might allow/disallow future extensions for rules that span > 1 stage and aim to 
adapt a running stage? Sounds like such rules wouldn’t be possible with a pass 
after each stage, which is fine, but wanted to make that thought explicit.
   
   - For observability sake, do we plan to log out or keep in memory each round 
of optimizations and the original execution graph?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to