I ran a spark streaming job. 100 executors 30G heap per executor 4 cores per executor
The version I used is 1.3.0-cdh5.1.0. The job is reading from a directory on HDFS (with files incoming continuously) and does some join on the data. I set batch interval to be 15 minutes and the job worked fine in the first few batches. However, it just stalled after 7-8 batches. Below are some symptoms. * In Spark UI, every tab worked fine except "Streaming" tab. When I clicked on it, it just hang forever. * I did not see any GC activity on driver. * Nothing was printed out from driver log. Anyone has seen this before? -- Chen Song