Re: Spark structured streaming - performance tuning

2020-04-17 Thread Srinivas V
Thank you Alex. I will check it out and let you know if I have any questions On Fri, Apr 17, 2020 at 11:36 PM Alex Ott wrote: > http://shop.oreilly.com/product/0636920047568.do has quite good > information > on it. For Kafka, you need to start with approximation that processing of > each

Re: Memory allocation

2020-04-17 Thread Muhib Khan
spark.executor.memory and spark.driver.memory specifies the size of the JVM heap for the executor and the driver respectively. You can understand a bit more about memory usage from here . On Fri, Apr 17, 2020 at 4:07 PM

Memory allocation

2020-04-17 Thread Pat Ferrel
I have used Spark for several years and realize from recent chatter on this list that I don’t really understand how it uses memory. Specifically is spark.executor.memory and spark.driver.memory taken from the JVM heap when does Spark take memory from JVM heap and when it is from off JVM heap.

Re: Spark structured streaming - performance tuning

2020-04-17 Thread Alex Ott
http://shop.oreilly.com/product/0636920047568.do has quite good information on it. For Kafka, you need to start with approximation that processing of each partition is a separate task that need to be executed, so you need to plan number of cores correspondingly. Srinivas V at "Thu, 16 Apr 2020

Re: How does spark sql evaluate case statements?

2020-04-17 Thread kant kodali
Thanks! On Thu, Apr 16, 2020 at 9:57 PM ZHANG Wei wrote: > Are you looking for this: > https://spark.apache.org/docs/2.4.0/api/sql/#when ? > > The code generated will look like this in a `do { ... } while (false)` > loop: > > do { > ${cond.code} > if (!${cond.isNull} && ${cond.value})

Re: Spark-3.0.0 GA

2020-04-17 Thread Sean Owen
The second release candidate will come soon. I would guess it all completes by the end of May, myself, but no guarantees. On Fri, Apr 17, 2020 at 6:30 AM Marshall Markham wrote: > > Hi, > > > > I realize this was probably not responded to because either the date is > unclear or explicitly

Re: Spark-3.0.0 GA

2020-04-17 Thread Marshall Markham
Hi, I realize this was probably not responded to because either the date is unclear or explicitly confidential. However, since the information is pretty important to me and there is some chance it just got lost in the mailing list, I’ll bump it up. Does anyone know an approximate release date