Good to see I am not the only one who cannot get incoming Dstreams to repartition. I tried repartition(512) but still no luck - the app stubbornly runs only on two nodes. Now this is 1.0.0 but looking at release notes for 1.0.1 and 1.0.2, I don't see anything that says this was an issue and has been fixed.
How do I debug the repartition() statement to see what's the flow after the job hits that statement? On Fri, Aug 29, 2014 at 8:31 AM, bharatvenkat <bvenkat.sp...@gmail.com> wrote: > Chris, > > I did the Dstream.repartition mentioned in the document on parallelism in > receiving, as well as set "spark.default.parallelism" and it still uses only > 2 nodes in my cluster. I notice there is another email thread on the same > topic: > > http://apache-spark-user-list.1001560.n3.nabble.com/DStream-repartitioning-performance-tuning-processing-td13069.html > > My code is in Java and here is what I have: > > JavaPairReceiverInputDStream<String, String> messages = > > KafkaUtils.createStream(ssc, zkQuorum, > "cse-job-play-consumer", kafkaTopicMap); > > JavaPairDStream<String, String> newMessages = > messages.repartition(partitionSize);// partitionSize=30 > > JavaDStream<String> lines = newMessages.map(new > Function<Tuple2<String, String>, String>() { > ... > > public String call(Tuple2<String, String> tuple2) { > return tuple2._2(); > } > }); > > JavaDStream<String> words = lines.flatMap(new > MetricsComputeFunction() > ); > > JavaPairDStream<String, Integer> wordCounts = words.mapToPair( > new PairFunction<String, String, Integer>() { > ... > } > ); > > wordCounts.foreachRDD(new Function<JavaPairRDD<String, Integer>, > Void>() {...}); > > Thanks, > Bharat > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/Low-Level-Kafka-Consumer-for-Spark-tp11258p13131.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org