Re: Instability issues with Spark 2.0.1 and Kafka 0.10

2016-11-04 Thread vonnagy
Here are some examples and details of the scenarios. The KafkaRDD is the most error prone to polling timeouts and concurrentm modification errors. *Using KafkaRDD* - This takes a list of channels and processes them in parallel using the KafkaRDD directly. they all use the same consumer group

Instability issues with Spark 2.0.1 and Kafka 0.10

2016-11-04 Thread vonnagy
I am getting the issues using Spark 2.0.1 and Kafka 0.10. I have two jobs, one that uses a Kafka stream and one that uses just the KafkaRDD. With the KafkaRDD, I continually get the "Failed to get records .. after polling". I have adjusted the polling with

Re: Submit job with driver options in Mesos Cluster mode

2016-10-27 Thread vonnagy
We were using 1.6, but now we are on 2.0.1. Both versions show the same issue. I dove deep into the Spark code and have identified that the extra java options are /not/ added to the process on the executors. At this point, I believe you have to use spark-defaults.conf to set any values that will

Memory leak warnings in Spark 2.0.1

2016-10-12 Thread vonnagy
I am getting excessive memory leak warnings when running multiple mapping and aggregations and using DataSets. Is there anything I should be looking for to resolve this or is this a known issue? WARN [Executor task launch worker-0] org.apache.spark.memory.TaskMemoryManager - leak 16.3 MB memory