Re: spark streaming 1.3 kafka error

2015-08-22 Thread Shushant Arora
Exception comes when client has so many connections to some another external server also. So I think Exception is coming because of client side issue only- server side there is no issue. Want to understand is executor(simple consumer) not making new connection to kafka broker at start of each

Re: Transformation not happening for reduceByKey or GroupByKey

2015-08-22 Thread satish chandra j
HI All, Currently using DSE 4.7 and Spark 1.2.2 version Regards, Satish On Fri, Aug 21, 2015 at 7:30 PM, java8964 java8...@hotmail.com wrote: What version of Spark you are using, or comes with DSE 4.7? We just cannot reproduce it in Spark. yzhang@localhost$ more test.spark val pairs =

Re: Spark streaming multi-tasking during I/O

2015-08-22 Thread Akhil Das
Hmm for a singl core VM you will have to run it in local mode(specifying master= local[4]). The flag is available in all the versions of spark i guess. On Aug 22, 2015 5:04 AM, Sateesh Kavuri sateesh.kav...@gmail.com wrote: Thanks Akhil. Does this mean that the executor running in the VM can

sparkStreaming how to work with partitions,how tp create partition

2015-08-22 Thread Gaurav Agarwal
1. how to work with partition in spark streaming from kafka 2. how to create partition in spark streaming from kafka when i send the message from kafka topic having three partitions. Spark will listen the message when i say kafkautils.createStream or createDirectstSream have local[4] Now i

Re: Spark streaming multi-tasking during I/O

2015-08-22 Thread Sateesh Kavuri
Hi Rishitesh, We are not using any RDD's to parallelize the processing and all of the algorithm runs on a single core (and in a single thread). The parallelism is done at the user level The disk can be started in a separate IO, but then the executor will not be able to take up more jobs, since

Re: spark streaming 1.3 kafka error

2015-08-22 Thread Shushant Arora
On trying the consumer without external connections or with low number of external conections its working fine - so doubt is how socket got closed - java.io.EOFException: Received -1 when reading from channel, socket has likely been closed. On Sat, Aug 22, 2015 at 7:24 PM, Akhil Das

subscribe

2015-08-22 Thread Lars Hermes
subscribe - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org

pickling error with PySpark and Elasticsearch-py analyzer

2015-08-22 Thread pkphlam
Reposting my question from SO: http://stackoverflow.com/questions/32161865/elasticsearch-analyze-not-compatible-with-spark-in-python I'm using the elasticsearch-py client within PySpark using Python 3 and I'm running into a problem using the analyze() function with ES in conjunction with an RDD.

Re: Spark streaming multi-tasking during I/O

2015-08-22 Thread Sateesh Kavuri
Hi Akhil, Think of the scenario as running a piece of code in normal Java with multiple threads. Lets say there are 4 threads spawned by a Java process to handle reading from database, some processing and storing to database. In this process, while a thread is performing a database I/O, the CPU

Re: Using spark streaming to load data from Kafka to HDFS

2015-08-22 Thread Xu (Simon) Chen
Last time I checked, Camus doesn't support storing data as parquet, which is a deal breaker for me. Otherwise it works well for my Kafka topics with low data volume. I am currently using spark streaming to ingest data, generate semi-realtime stats and publish to a dashboard, and dump full dataset

Re: subscribe

2015-08-22 Thread Brandon White
https://www.youtube.com/watch?v=umDr0mPuyQc On Sat, Aug 22, 2015 at 8:01 AM, Ted Yu yuzhih...@gmail.com wrote: See http://spark.apache.org/community.html Cheers On Sat, Aug 22, 2015 at 2:51 AM, Lars Hermes li...@hermes-it-consulting.de wrote: subscribe

spark 1.4.1 - LZFException

2015-08-22 Thread Yadid Ayzenberg
Hi All, We have a spark standalone cluster running 1.4.1 and we are setting spark.io.compression.codec to lzf. I have a long running interactive application which behaves as normal, but after a few days I get the following exception in multiple jobs. Any ideas on what could be causing this

how to migrate from spark 0.9 to spark 1.4

2015-08-22 Thread sai rakesh
currently i am using spark 0.9 on my data i wrote code in java for sparksql.now i want to use spark 1.4 so how to do and what changes i have to do for tables.i ahve .sql file,pom file,.py file. iam using s3 for storage -- View this message in context:

Re: Worker Machine running out of disk for Long running Streaming process

2015-08-22 Thread Ashish Rangole
Interesting. TD, can you please throw some light on why this is and point to the relevant code in Spark repo. It will help in a better understanding of things that can affect a long running streaming job. On Aug 21, 2015 1:44 PM, Tathagata Das t...@databricks.com wrote: Could you periodically

Re: Spark streaming multi-tasking during I/O

2015-08-22 Thread Sateesh Kavuri
Thanks Akhil. Does this mean that the executor running in the VM can spawn two concurrent jobs on the same core? If this is the case, this is what we are looking for. Also, which version of Spark is this flag in? Thanks, Sateesh On Sat, Aug 22, 2015 at 1:44 AM, Akhil Das

Re: spark streaming 1.3 kafka error

2015-08-22 Thread Shushant Arora
On trying the consumer without external connections or with low number of external conections its working fine - so doubt is how socket got closed - 15/08/21 08:54:54 ERROR executor.Executor: Exception in task 262.0 in stage 130.0 (TID 16332) java.io.EOFException: Received -1 when reading

Re: subscribe

2015-08-22 Thread Ted Yu
See http://spark.apache.org/community.html Cheers On Sat, Aug 22, 2015 at 2:51 AM, Lars Hermes li...@hermes-it-consulting.de wrote: subscribe - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional

Re: How can I save the RDD result as Orcfile with spark1.3?

2015-08-22 Thread Ted Yu
In Spark 1.4, there was considerable refactoring around interaction with Hive, such as SPARK-7491. It would not be straight forward to port ORC support to 1.3 FYI On Fri, Aug 21, 2015 at 10:21 PM, dong.yajun dongt...@gmail.com wrote: hi Ted, thanks for your reply, are there any other way to

Re: spark streaming 1.3 kafka error

2015-08-22 Thread Akhil Das
Can you try some other consumer and see if the issue still exists? On Aug 22, 2015 12:47 AM, Shushant Arora shushantaror...@gmail.com wrote: Exception comes when client has so many connections to some another external server also. So I think Exception is coming because of client side issue

Re: spark streaming 1.3 kafka error

2015-08-22 Thread Dibyendu Bhattacharya
I think you also can give a try to this consumer : http://spark-packages.org/package/dibbhatt/kafka-spark-consumer in your environment. This has been running fine for topic with large number of Kafka partition ( 200 ) like yours without any issue.. no issue with connection as this consumer re-use

Re: spark streaming 1.3 kafka error

2015-08-22 Thread Cody Koeninger
To be perfectly clear, the direct kafka stream will also recover from any failures, because it does the simplest thing possible - fail the task and let spark retry it. If you're consistently having socket closed problems on one task after another, there's probably something else going on in your