Re: Spark Streaming Empty DStream / RDD and reduceByKey

2014-10-15 Thread Abraham Jacob
removing print? that would cause it to do nothing, which of course generates no error. On Wed, Oct 15, 2014 at 12:11 AM, Abraham Jacob abe.jac...@gmail.com wrote: Hi All, I am trying to understand what is going on in my simple WordCount Spark Streaming application. Here is the setup

Spark Streaming Empty DStream / RDD and reduceByKey

2014-10-14 Thread Abraham Jacob
Hi All, I am trying to understand what is going on in my simple WordCount Spark Streaming application. Here is the setup - I have a Kafka producer that is streaming words (lines of text). On the flip side, I have a spark streaming application that uses the high-level Kafka/Spark connector to

Re: Spark Streaming Empty DStream / RDD and reduceByKey

2014-10-14 Thread Abraham Jacob
, Integer second){ return first + second; } } On Tue, Oct 14, 2014 at 4:16 PM, Stephen Boesch java...@gmail.com wrote: Is ReduceWords serializable? 2014-10-14 16:11 GMT-07:00 Abraham Jacob abe.jac...@gmail.com: Hi All, I am trying to understand what is going on in my simple WordCount

Re: Spark Streaming Empty DStream / RDD and reduceByKey

2014-10-14 Thread Abraham Jacob
, Oct 14, 2014 at 4:56 PM, Michael Campbell michael.campb...@gmail.com wrote: Do you get any different results if you have ReduceWords actually implement java.io.Serializable? On Tue, Oct 14, 2014 at 7:35 PM, Abraham Jacob abe.jac...@gmail.com wrote: Yeah... it totally should

Spark Streaming KafkaUtils Issue

2014-10-10 Thread Abraham Jacob
Hi Folks, I am seeing some strange behavior when using the Spark Kafka connector in Spark streaming. I have a Kafka topic which has 8 partitions. I have a kafka producer that pumps some messages into this topic. On the consumer side I have a spark streaming application that that has 8 executors

Re: Spark Streaming KafkaUtils Issue

2014-10-10 Thread Abraham Jacob
? Are you also setting group.id? Thanks, Sean On Oct 10, 2014, at 4:31 PM, Abraham Jacob abe.jac...@gmail.com wrote: Hi Folks, I am seeing some strange behavior when using the Spark Kafka connector in Spark streaming. I have a Kafka topic which has 8 partitions. I have a kafka

Re: Spark Streaming KafkaUtils Issue

2014-10-10 Thread Abraham Jacob
Jerry *From:* Abraham Jacob [mailto:abe.jac...@gmail.com] *Sent:* Saturday, October 11, 2014 6:57 AM *To:* Sean McNamara *Cc:* user@spark.apache.org *Subject:* Re: Spark Streaming KafkaUtils Issue Probably this is the issue - http://www.michael-noll.com/blog/2014/10/01/kafka-spark

Re: Spark Streaming KafkaUtils Issue

2014-10-10 Thread Abraham Jacob
and recompile the Spark. Thanks Jerry *From:* Abraham Jacob [mailto:abe.jac...@gmail.com] *Sent:* Saturday, October 11, 2014 8:49 AM *To:* Shao, Saisai *Cc:* user@spark.apache.org; Sean McNamara *Subject:* Re: Spark Streaming KafkaUtils Issue Thanks Jerry, So, from what I can

Re: Spark Streaming saveAsNewAPIHadoopFiles

2014-10-07 Thread Abraham Jacob
) { dstream.saveAsNewAPIHadoopFiles(prefix, suffix, keyClass, valueClass, outputFormatClass, conf) } Less confusion, more readability and better consistency... -abe On Mon, Oct 6, 2014 at 1:51 PM, Abraham Jacob abe.jac...@gmail.com wrote: Sean, Thanks a ton Sean... This is exactly what I was looking

Spark / Kafka connector - CDH5 distribution

2014-10-07 Thread Abraham Jacob
Hi All, Does anyone know if CDH5.1.2 packages the spark streaming kafka connector under the spark externals project? -- ~

Re: Spark / Kafka connector - CDH5 distribution

2014-10-07 Thread Abraham Jacob
-streaming_2.10-1.0.0-cdh5.1.3.jar file in the project. Where can I find it in the CDH5.1.3 spark distribution? On Tue, Oct 7, 2014 at 3:40 PM, Sean Owen so...@cloudera.com wrote: Yes, it is the entire Spark distribution. On Oct 7, 2014 11:36 PM, Abraham Jacob abe.jac...@gmail.com wrote: Hi All

Re: SparkStreaming program does not start

2014-10-07 Thread Abraham Jacob
Try using spark-submit instead of spark-shell On Tue, Oct 7, 2014 at 3:47 PM, spr s...@yarcdata.com wrote: I'm probably doing something obviously wrong, but I'm not seeing it. I have the program below (in a file try1.scala), which is similar but not identical to the examples. import

Re: Spark / Kafka connector - CDH5 distribution

2014-10-07 Thread Abraham Jacob
Never mind... my bad... made a typo. looks good. Thanks, On Tue, Oct 7, 2014 at 3:57 PM, Abraham Jacob abe.jac...@gmail.com wrote: Thanks Sean, Sorry in my earlier question I meant to type CDH5.1.3 not CDH5.1.2 I presume it's included in spark-streaming_2.10-1.0.0-cdh5.1.3 But for some

Spark Streaming saveAsNewAPIHadoopFiles

2014-10-06 Thread Abraham Jacob
Hi All, Would really appreciate from the community if anyone has implemented the saveAsNewAPIHadoopFiles method in Java found in the org.apache.spark.streaming.api.java.JavaPairDStreamK,V Any code snippet or online link would be greatly appreciated. Regards, Jacob

Re: Spark Streaming saveAsNewAPIHadoopFiles

2014-10-06 Thread Abraham Jacob
; Thanks again Sean... On Mon, Oct 6, 2014 at 12:23 PM, Sean Owen so...@cloudera.com wrote: Here's an example: https://github.com/OryxProject/oryx/blob/master/oryx-lambda/src/main/java/com/cloudera/oryx/lambda/BatchLayer.java#L131 On Mon, Oct 6, 2014 at 7:39 PM, Abraham Jacob abe.jac

Re: Spark Streaming writing to HDFS

2014-10-04 Thread Abraham Jacob
...@cloudera.com wrote: Are you importing the '.mapred.' version of TextOutputFormat instead of the new API '.mapreduce.' version? On Sat, Oct 4, 2014 at 1:08 AM, Abraham Jacob abe.jac...@gmail.com wrote: Hi All, Would really appreciate if someone in the community can help me with this. I

Spark Streaming writing to HDFS

2014-10-03 Thread Abraham Jacob
Hi All, Would really appreciate if someone in the community can help me with this. I have a simple Java spark streaming application - NetworkWordCount SparkConf sparkConf = new SparkConf().setMaster(yarn-cluster).setAppName(Streaming WordCount); JavaStreamingContext jssc = new