Re: Low Level Kafka Consumer for Spark

Tim Smith Fri, 29 Aug 2014 15:09:07 -0700

Good to see I am not the only one who cannot get incoming Dstreams to
repartition. I tried repartition(512) but still no luck - the app
stubbornly runs only on two nodes. Now this is 1.0.0 but looking at
release notes for 1.0.1 and 1.0.2, I don't see anything that says this
was an issue and has been fixed.


How do I debug the repartition() statement to see what's the flow
after the job hits that statement?


On Fri, Aug 29, 2014 at 8:31 AM, bharatvenkat <bvenkat.sp...@gmail.com> wrote:
> Chris,
>
> I did the Dstream.repartition mentioned in the document on parallelism in
> receiving, as well as set "spark.default.parallelism" and it still uses only
> 2 nodes in my cluster.  I notice there is another email thread on the same
> topic:
>
> http://apache-spark-user-list.1001560.n3.nabble.com/DStream-repartitioning-performance-tuning-processing-td13069.html
>
> My code is in Java and here is what I have:
>
>        JavaPairReceiverInputDStream<String, String> messages =
>
>                 KafkaUtils.createStream(ssc, zkQuorum,
> "cse-job-play-consumer", kafkaTopicMap);
>
>         JavaPairDStream<String, String> newMessages =
> messages.repartition(partitionSize);// partitionSize=30
>
>         JavaDStream<String> lines = newMessages.map(new
> Function<Tuple2&lt;String, String>, String>() {
>             ...
>
>             public String call(Tuple2<String, String> tuple2) {
>               return tuple2._2();
>             }
>           });
>
>         JavaDStream<String> words = lines.flatMap(new
> MetricsComputeFunction()
>                         );
>
>         JavaPairDStream<String, Integer> wordCounts = words.mapToPair(
>             new PairFunction<String, String, Integer>() {
>                ...
>             }
>         );
>
>          wordCounts.foreachRDD(new Function<JavaPairRDD&lt;String, Integer>,
> Void>() {...});
>
> Thanks,
> Bharat
>
>
>
> --
> View this message in context: 
> http://apache-spark-user-list.1001560.n3.nabble.com/Low-Level-Kafka-Consumer-for-Spark-tp11258p13131.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Re: Low Level Kafka Consumer for Spark

Reply via email to