Here are some examples and details of the scenarios. The KafkaRDD is the most
error prone to polling
timeouts and concurrentm modification errors.
*Using KafkaRDD* - This takes a list of channels and processes them in
parallel using the KafkaRDD directly. they all use the same consumer group
I am getting the issues using Spark 2.0.1 and Kafka 0.10. I have two jobs,
one that uses a Kafka stream and one that uses just the KafkaRDD.
With the KafkaRDD, I continually get the "Failed to get records .. after
polling". I have adjusted the polling with
We were using 1.6, but now we are on 2.0.1. Both versions show the same
issue.
I dove deep into the Spark code and have identified that the extra java
options are /not/ added to the process on the executors. At this point, I
believe you have to use spark-defaults.conf to set any values that will
I am getting excessive memory leak warnings when running multiple mapping and
aggregations and using DataSets. Is there anything I should be looking for
to resolve this or is this a known issue?
WARN [Executor task launch worker-0]
org.apache.spark.memory.TaskMemoryManager - leak 16.3 MB memory