Re: Best way to read batch from Kafka and Offsets

2020-02-15 Thread Ruijing Li
Thought to update this thread. Figured out my issue with forEachBatch and structured streaming, I had an issue where I did a count() before write() so my streaming query branched into 2. I am now using Trigger and structured streaming to handle checkpointing instead of doing it myself. Thanks all

Spark 2.4.4 has bigger memory impact than 2.3?

2020-02-15 Thread Ruijing Li
Hi all, We recently upgraded to our jobs to spark 2.4.4 from 2.3 and noticed that some jobs are failing due to lack of resources - particularly lack of executor memory causing some executors to fail. However, no code change was made other than the upgrade. Does spark 2.4.4 require more executor