paul-rogers commented on issue #2817:
URL: https://github.com/apache/drill/issues/2817#issuecomment-1667274517

   What is your use case?
   
   Drill differs from Spark. Spark allocates a set of workers per job. Spark 
starts workers for each job, based on the estimated complexity of the job. By 
contrast, Drill uses a shared cluster: queries run using the workers available 
at the moment the query runs. Spark is designed for large, complex, 
long-running jobs. Drill is designed for many concurrent short-running queries. 
In Drill, the query would normally be done long before new nodes get organized 
and join the cluster.
   
   Back in the day, Drill provided Drill-on-YARN to manage a Drill cluster in 
Hadoop. Scaling was manual, though an API encouraged someone to design a 
controller that would observe load and scale the cluster up or down to track 
average load. In modern times, K8s is the preferred alternative. The Drill 
operator handles the mechanics of scale-up or -down. Again, a separate 
controller would be needed to adjust cluster size based on load trends and/or 
local policies. [Feedback Control for Computer 
Systems](https://learning.oreilly.com/library/view/feedback-control-for/9781449362638/)
 explains the the kind of PID controller that could do the job.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@drill.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to