Thanks for the quick response. I have tried the direct word count python example and it also seems to be slow. Lot of times it is not fetching the words that are sent by the producer. I am using SPARK version 1.4.1 and KAFKA 2.10-0.8.2.0.
On Tue, Aug 25, 2015 at 2:05 AM, Tathagata Das <t...@databricks.com> wrote: > The scala version of the Kafka is something that we have been working on > for a while, and is likely to be more optimized than the python one. The > python one definitely requires pass the data back and forth between JVM and > Python VM and decoding the raw bytes to the Python strings (probably less > efficient that Java's Byte to UTF8 decoder), so that may cause some extra > overheads compared to scala. > > Also consider trying the direct API. Read more in the Kafka integration > guide - > http://spark.apache.org/docs/latest/streaming-kafka-integration.html > That overall has a much higher throughput that the earlier receiver based > approach. > > BTW, disclaimer. Do not consider this difference as generalization of the > performance difference between Scala and Python for all of Spark, For > example, DataFrames provide performance parity between Scala and Python > APIs. > > > On Mon, Aug 24, 2015 at 5:22 AM, utk.pat <utkarsh.pat...@gmail.com> wrote: > >> I am new to SPARK streaming. I was running the "kafka_wordcount" example >> with a local KAFKA and SPARK instance. It was very easy to set this up and >> get going :) I tried running both SCALA and Python versions of the word >> count example. Python versions seems to be extremely slow. Sometimes it has >> delays of more than couple of minutes. On the other hand SCALA versions >> seems to be way better. I am running on a windows machine. I am trying to >> understand what is the cause slowness in python streaming? Is there >> anything that I am missing? For real time streaming analysis should I >> prefer SCALA? >> ------------------------------ >> View this message in context: Performance - Python streaming v/s Scala >> streaming >> <http://apache-spark-user-list.1001560.n3.nabble.com/Performance-Python-streaming-v-s-Scala-streaming-tp24415.html> >> Sent from the Apache Spark User List mailing list archive >> <http://apache-spark-user-list.1001560.n3.nabble.com/> at Nabble.com. >> > >