Chris,

I did the Dstream.repartition mentioned in the document on parallelism in
receiving, as well as set "spark.default.parallelism" and it still uses only
2 nodes in my cluster.  I notice there is another email thread on the same
topic:

http://apache-spark-user-list.1001560.n3.nabble.com/DStream-repartitioning-performance-tuning-processing-td13069.html

My code is in Java and here is what I have:

       JavaPairReceiverInputDStream<String, String> messages =

                KafkaUtils.createStream(ssc, zkQuorum,
"cse-job-play-consumer", kafkaTopicMap);

        JavaPairDStream<String, String> newMessages =
messages.repartition(partitionSize);// partitionSize=30

        JavaDStream<String> lines = newMessages.map(new
Function<Tuple2&lt;String, String>, String>() {
            ...

            public String call(Tuple2<String, String> tuple2) {
              return tuple2._2();
            }
          });

        JavaDStream<String> words = lines.flatMap(new
MetricsComputeFunction()
                        );
        
        JavaPairDStream<String, Integer> wordCounts = words.mapToPair(
            new PairFunction<String, String, Integer>() {
               ...
            }
        );

         wordCounts.foreachRDD(new Function<JavaPairRDD&lt;String, Integer>,
Void>() {...});

Thanks,
Bharat



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Low-Level-Kafka-Consumer-for-Spark-tp11258p13131.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to