RE: StackOverflow in Spark

2016-06-13 Thread Michel Hubert
gt;; Matthew Young <taige...@gmail.com>; Michel Hubert <mich...@phact.nl>; user@spark.apache.org Onderwerp: Re: StackOverflow in Spark Stackoverflow is generated when DAG is too log as there are many transformations in lot of iterations. Please use checkpointing to store the DAG and

submitMissingTasks - serialize throws StackOverflow exception

2016-05-27 Thread Michel Hubert
Hi, My Spark application throws stackoverflow exceptions after a while. The DAGScheduler function submitMissingTasks tries to serialize a Tuple (MapPartitionsRDD, EsSpark..saveToEs) which is handled with a recursive algorithm. The recursive algorithm is too deep and results in a stackoverflow

StackOverflow in Spark

2016-05-25 Thread Michel Hubert
Hi, I have an Spark application which generates StackOverflowError exceptions after 30+ min. Anyone any ideas? Seems like problems with deserialization of checkpoint data? 16/05/25 10:48:51 WARN scheduler.TaskSetManager: Lost task 0.0 in stage 55449.0 (TID 5584,

Kafka 0.9 and spark-streaming-kafka_2.10

2016-05-09 Thread Michel Hubert
Hi, I'm thinking of upgdrading our kafka cluster to 0.9. Will this be a problem for the Spark Streaming + Kafka Direct Approach Integration using artifact spark-streaming-kafka_2.10 (1.6.1)? groupId = org.apache.spark artifactId = spark-streaming-kafka_2.10 version = 1.6.1 Because the

RE: run-example streaming.KafkaWordCount fails on CDH 5.7.0

2016-05-04 Thread Michel Hubert
We're running Kafka 0.8.2.2 Is that the problem, why? -Oorspronkelijk bericht- Van: Sean Owen [mailto:so...@cloudera.com] Verzonden: woensdag 4 mei 2016 10:41 Aan: Michel Hubert <mich...@phact.nl> CC: user@spark.apache.org Onderwerp: Re: run-example streaming.KafkaWordCount fails

RE: Kafka exception in Apache Spark

2016-04-26 Thread Michel Hubert
This is production. Van: Mich Talebzadeh [mailto:mich.talebza...@gmail.com] Verzonden: dinsdag 26 april 2016 12:01 Aan: Michel Hubert <mich...@phact.nl> CC: user@spark.apache.org Onderwerp: Re: Kafka exception in Apache Spark Hi Michael, Is this production or test? Dr Mich Tale

Kafka exception in Apache Spark

2016-04-26 Thread Michel Hubert
Hi, I use a Kafka direct stream approach. My Spark application was running ok. This morning we upgraded to CDH 5.7.0 And when I re-started my Spark application I get exceptions. It seems a problem with the direct stream approach. Any ideas how to fix this? User class threw exception:

RE: apache spark errors

2016-03-24 Thread Michel Hubert
foreachRDD(new VoidFunction<JavaRDD>() { public void call(JavaRDD rdd) throws Exception { for (TopData t: rdd.take(top)) { jedis … } May this resulted in a memory leak? Van: Ted Yu [mailto:yuzhih...@gmail.com] Verzonden: donderdag 24 maart 2016 15:15 Aan: Mic

RE: apache spark errors

2016-03-24 Thread Michel Hubert
Yu [mailto:yuzhih...@gmail.com] Verzonden: donderdag 24 maart 2016 14:33 Aan: Michel Hubert <mich...@phact.nl> CC: user@spark.apache.org Onderwerp: Re: apache spark errors Which release of Spark are you using ? Have you looked the tasks whose Ids were printed to see if there was more clue ?

apache spark errors

2016-03-24 Thread Michel Hubert
HI, I constantly get these errors: 0[Executor task launch worker-15] ERROR org.apache.spark.executor.Executor - Managed memory leak detected; size = 6564500 bytes, TID = 38969 310002 [Executor task launch worker-12] ERROR org.apache.spark.executor.Executor - Managed memory leak detected;

Spark 1.6.0 op CDH 5.6.0

2016-03-22 Thread Michel Hubert
Hi, I'm trying to run a Spark 1.6.0 application on a CDH 5.6.0 cluster. How do I submit the uber-jar so it's totally self-reliant? With kind regards, Mitchel spark-submit --class TEST --master yarn-cluster ./uber-TEST-1.0-SNAPSHOT.jar Spark 1.6.1 Version: Cloudera Express 5.6.0 16/03/22

updateStateByKey schedule time

2015-07-15 Thread Michel Hubert
() will not be the time of the time the batch was scheduled. I want to retrieve the job/task schedule time of the batch for which my updateStateByKey(..) routine is called. Is this possible? With kind regards, Michel Hubert

spark streaming performance

2015-07-09 Thread Michel Hubert
Hi, I've developed a POC Spark Streaming application. But it seems to perform better on my development machine than on our cluster. I submit it to yarn on our cloudera cluster. But my first question is more detailed: In de application UI (:4040) I see in the streaming section that the batch

RE: Breaking lineage and reducing stages in Spark Streaming

2015-07-09 Thread Michel Hubert
Hi, I was just wondering how you generated to second image with the charts. What product? From: Anand Nalya [mailto:anand.na...@gmail.com] Sent: donderdag 9 juli 2015 11:48 To: spark users Subject: Breaking lineage and reducing stages in Spark Streaming Hi, I've an application in which an rdd