SaurabhChawla100 edited a comment on pull request #29413: URL: https://github.com/apache/spark/pull/29413#issuecomment-673398423
> sorry I'm still not seeing any difference here then increasing the size of the current queue? If both are not really allocating memory for the entire amount until runtime then either way you have to set memory of driver to be the maximum amount used. why not set the queue size to size + size * spark.set.optmized.event.queue.threshold? > If you look at the driver memory used, I don't think that is very reliable. They could change very quickly. **If you look at the driver memory used, I don't think that is very reliable. They could change very quickly.** - Yes agree used driver memory can change quickly, That why added the validation of used driver memory threshold as 90 percent. We can make it 95 percent or configure through Conf instead of directly using in the code. Here if we consider the VariableLinkedBlockingQueue then we can change the size of the queue at run time. Ok If we take out consideration of VariableLinkedBlockingQueue and Driver Memory validation to handle this issue, than I can think of below 2 approach 1) Either to Set Queue size as size + size * spark.set.optmized.event.queue.threshold -> The problem with this approach is that event drop can still happen after increasing the size if the flow of incoming events is at higher rate compare to consumer rate . In this case also critical events can be dropped from the queue (executor Mangament Queue,Appstatus Queue etc). But this will help in the scenario where there is high rate of event at one point of time and on increasing with spark.set.optmized.event.queue.threshold can help in preventing the overflow the queue here. 2) Make the size of the queue as Unbounded -> The only problem with this approach is Driver getting OOM. But If the event are really important and we cannot avoid dropping the event while running Spark Job. For example - a) For some user while running the Spark Job they cannot avoid dropping of events from Appstatus Queue like in case of notebooks(jupyter/Zeepline) b) For other user they want to have the dynamic allocation of resources (upscaling/ down scaling) needs to be consistent. c) For other user they want to have all the events in the eventlog File, which they used later for ML models to analyse the Spark Jobs. There is need to provide higher driver memory in this case but some spark jobs does not require this much driver memory. We cannot make default high driver memory since some application can run fine with default driver memory 1 gb. We can think of making it per queue basis any such approach based on the user requirement like I have already done in this PR using the conf(spark.set.optmized.event.queue) for VariableLinkedBlockingQueue at the start of application. Happy to discuss any other idea for tuning these Queues @Ngone51 @tgravescs ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org