Thanks for the quick response.
I have tried the direct word count python example and it also seems to be
slow. Lot of times it is not fetching the words that are sent by the
producer.
I am using SPARK version 1.4.1 and KAFKA 2.10-0.8.2.0.


On Tue, Aug 25, 2015 at 2:05 AM, Tathagata Das <t...@databricks.com> wrote:

> The scala version of the Kafka  is something that we have been working on
> for a while, and is likely to be more optimized than the python one. The
> python one definitely requires pass the data back and forth between JVM and
> Python VM and decoding the raw bytes to the Python strings (probably less
> efficient that Java's Byte to UTF8 decoder), so that may cause some extra
> overheads compared to scala.
>
> Also consider trying the direct API. Read more in the Kafka integration
> guide -
> http://spark.apache.org/docs/latest/streaming-kafka-integration.html
> That overall has a much higher throughput that the earlier receiver based
> approach.
>
> BTW, disclaimer. Do not consider this difference as generalization of the
> performance difference between Scala and Python for all of Spark, For
> example, DataFrames provide performance parity between Scala and Python
> APIs.
>
>
> On Mon, Aug 24, 2015 at 5:22 AM, utk.pat <utkarsh.pat...@gmail.com> wrote:
>
>> I am new to SPARK streaming. I was running the "kafka_wordcount" example
>> with a local KAFKA and SPARK instance. It was very easy to set this up and
>> get going :) I tried running both SCALA and Python versions of the word
>> count example. Python versions seems to be extremely slow. Sometimes it has
>> delays of more than couple of minutes. On the other hand SCALA versions
>> seems to be way better. I am running on a windows machine. I am trying to
>> understand what is the cause slowness in python streaming? Is there
>> anything that I am missing? For real time streaming analysis should I
>> prefer SCALA?
>> ------------------------------
>> View this message in context: Performance - Python streaming v/s Scala
>> streaming
>> <http://apache-spark-user-list.1001560.n3.nabble.com/Performance-Python-streaming-v-s-Scala-streaming-tp24415.html>
>> Sent from the Apache Spark User List mailing list archive
>> <http://apache-spark-user-list.1001560.n3.nabble.com/> at Nabble.com.
>>
>
>

Reply via email to