Re: Spark kafka integration issues

2016-09-13 Thread Cody Koeninger
1. see http://spark.apache.org/docs/latest/streaming-kafka-integration.html#approach-2-direct-approach-no-receivers look for HasOffsetRange. If you really want the info per-message rather than per-partition, createRDD has an overload that takes a messageHandler from MessageAndMetadata to

Spark kafka integration issues

2016-09-13 Thread Mukesh Jha
Hello fellow sparkers, I'm using spark to consume messages from kafka in a non streaming fashion. I'm suing the using spark-streaming-kafka-0-8_2.10 & sparkv2.0to do the same. I have a few queries for the same, please get back if you guys have clues on the same. 1) Is there anyway to get the

Re: Pausing spark kafka streaming (direct) or exclude/include some partitions on the fly per batch

2016-09-03 Thread Cody Koeninger
g kafka consumer pluggable, so you may be able to wrap a >> consumer to do what you need. >> >> On Fri, Sep 2, 2016 at 12:28 PM, sagarcasual . <sagarcas...@gmail.com> >> wrote: >> > Hello, this is for >> > Pausing spark kafka streaming (direct) or exclude/includ

Re: Pausing spark kafka streaming (direct) or exclude/include some partitions on the fly per batch

2016-09-02 Thread sagarcasual .
oing to need to write some code. The 0.10 integration makes the > underlying kafka consumer pluggable, so you may be able to wrap a > consumer to do what you need. > > On Fri, Sep 2, 2016 at 12:28 PM, sagarcasual . <sagarcas...@gmail.com> > wrote: > > Hello, this

Re: Pausing spark kafka streaming (direct) or exclude/include some partitions on the fly per batch

2016-09-02 Thread Cody Koeninger
a consumer to do what you need. On Fri, Sep 2, 2016 at 12:28 PM, sagarcasual . <sagarcas...@gmail.com> wrote: > Hello, this is for > Pausing spark kafka streaming (direct) or exclude/include some partitions on > the fly per batch >

Pausing spark kafka streaming (direct) or exclude/include some partitions on the fly per batch

2016-09-02 Thread sagarcasual .
Hello, this is for Pausing spark kafka streaming (direct) or exclude/include some partitions on the fly per batch = I have following code that creates a direct stream using Kafka connector for Spark. final JavaInputDStream msgRecords

Re: Spark Kafka stream processing time increasing gradually

2016-06-22 Thread Roshan Singh
Thanks for the detailed explanation. Just tested it, worked like a charm. On Mon, Jun 20, 2016 at 1:02 PM, N B wrote: > Its actually necessary to retire keys that become "Zero" or "Empty" so to > speak. In your case, the key is "imageURL" and values are a dictionary, one >

Re: Spark Kafka stream processing time increasing gradually

2016-06-20 Thread N B
Its actually necessary to retire keys that become "Zero" or "Empty" so to speak. In your case, the key is "imageURL" and values are a dictionary, one of whose fields is "count" that you are maintaining. For simplicity and illustration's sake I will assume imageURL to be a strings like "abc". Your

Re: Spark Kafka stream processing time increasing gradually

2016-06-16 Thread Roshan Singh
Hi, According to the docs ( https://spark.apache.org/docs/latest/api/python/pyspark.streaming.html#pyspark.streaming.DStream.reduceByKeyAndWindow), filerFunc can be used to retain expiring keys. I do not want to retain any expiring key, so I do not understand how can this help me stabilize it.

Re: Spark Kafka stream processing time increasing gradually

2016-06-16 Thread N B
We had this same issue with the reduceByKeyAndWindow API that you are using. For fixing this issue, you have to use different flavor of that API, specifically the 2 versions that allow you to give a 'Filter function' to them. Putting in the filter functions helped stabilize our application too.

Re: RESOLVED - Re: Apache Spark Kafka Integration - org.apache.spark.SparkException: Couldn't find leader offsets for Set()

2016-06-07 Thread Mich Talebzadeh
anonfun$process$1.apply(ILoop.scala:837) >>>> at >>>> >>>> scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135) >>>> at scala.tools.nsc.interpreter.ILoop.process(ILoop.scala:837) >>>> at s

RESOLVED - Re: Apache Spark Kafka Integration - org.apache.spark.SparkException: Couldn't find leader offsets for Set()

2016-06-07 Thread Dominik Safaric
7) >>>> at >>>> scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135) >>>> at scala.tools.nsc.interpreter.ILoop.process(ILoop.scala:837) >>>> at scala.tools.nsc.interpreter.ILoop.main(ILoop.s

Re: Apache Spark Kafka Integration - org.apache.spark.SparkException: Couldn't find leader offsets for Set()

2016-06-07 Thread Dominik Safaric
soleRunner.java:64) >>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >>> at >>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) >>> at >>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMeth

Re: Apache Spark Kafka Integration - org.apache.spark.SparkException: Couldn't find leader offsets for Set()

2016-06-07 Thread Todd Nist
> >>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) >>> at java.lang.reflect.Method.invoke(Method.java:483) >>> at >>> com.intellij.rt.execution

Re: Apache Spark Kafka Integration - org.apache.spark.SparkException: Couldn't find leader offsets for Set()

2016-06-07 Thread Dominik Safaric
>> val confParams: Map[String, String] = Map( >> "metadata.broker.list" -> ":9092", >> "auto.offset.reset" -> "largest" >> ) >> >> val topics: Set[String] = Set("") >> >>

Re: Apache Spark Kafka Integration - org.apache.spark.SparkException: Couldn't find leader offsets for Set()

2016-06-07 Thread Todd Nist
>> As for the Spark configuration: >> >>val conf: SparkConf = new >> SparkConf().setAppName("AppName").setMaster("local[2]") >> >> val confParams: Map[String, String] = Map( >> "metadata.broker.list" -> ":

Re: Apache Spark Kafka Integration - org.apache.spark.SparkException: Couldn't find leader offsets for Set()

2016-06-07 Thread Dominik Safaric
(conf, Seconds(1)) > val kafkaStream = KafkaUtils.createDirectStream(context,confParams, > topics) > > kafkaStream.foreachRDD(rdd => { > rdd.collect().foreach(println) > }) > > context.awaitTermination() > context.start() > > The

Re: Apache Spark Kafka Integration - org.apache.spark.SparkException: Couldn't find leader offsets for Set()

2016-06-07 Thread Jacek Laskowski
KafkaUtils.createDirectStream(context,confParams, > topics) > > kafkaStream.foreachRDD(rdd => { > rdd.collect().foreach(println) > }) > > context.awaitTermination() > context.start() > > The Kafka topic does exist, Kafka server is up and running a

Re: Apache Spark Kafka Integration - org.apache.spark.SparkException: Couldn't find leader offsets for Set()

2016-06-07 Thread Mich Talebzadeh
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >>>> at >>>> >>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) >>>> at >>>> >>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(Delegati

Re: Apache Spark Kafka Integration - org.apache.spark.SparkException: Couldn't find leader offsets for Set()

2016-06-07 Thread Dominik Safaric
17) >>> at scala.tools.nsc.interpreter.ILoop.processLine$1(ILoop.scala:581) >>> at scala.tools.nsc.interpreter.ILoop.innerLoop$1(ILoop.scala:588) >>> at scala.tools.nsc.interpreter.ILoop.loop(ILoop.scala:591) >>> at >>> scala.tools.nsc.inte

Re: Apache Spark Kafka Integration - org.apache.spark.SparkException: Couldn't find leader offsets for Set()

2016-06-07 Thread Mich Talebzadeh
1.apply(ILoop.scala:837) >>> at >>> >>> scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135) >>> at scala.tools.nsc.interpreter.ILoop.pro

Re: Apache Spark Kafka Integration - org.apache.spark.SparkException: Couldn't find leader offsets for Set()

2016-06-07 Thread Dominik Safaric
ang.reflect.Method.invoke(Method.java:483) >> at >> com.intellij.rt.execution.application.AppMain.main(AppMain.java:144) >> >> As for the Spark configuration: >> >>val conf: SparkConf = new >> SparkConf().setAppName("AppName").se

Re: Apache Spark Kafka Integration - org.apache.spark.SparkException: Couldn't find leader offsets for Set()

2016-06-07 Thread Mich Talebzadeh
ke(DelegatingMethodAccessorImpl.java:43) >> at java.lang.reflect.Method.invoke(Method.java:483) >> at >> com.intellij.rt.execution.application.AppMain.main(AppMain.java:144) >> >> As for the Spark configuration: >> >>val conf: Spark

Re: Apache Spark Kafka Integration - org.apache.spark.SparkException: Couldn't find leader offsets for Set()

2016-06-07 Thread Dominik Safaric
ontext: StreamingContext = new StreamingContext(conf, Seconds(1)) > val kafkaStream = KafkaUtils.createDirectStream(context,confParams, > topics) > > kafkaStream.foreachRDD(rdd => { > rdd.collect().foreach(println) > }) > > context.awaitTermin

Re: Apache Spark Kafka Integration - org.apache.spark.SparkException: Couldn't find leader offsets for Set()

2016-06-07 Thread Mich Talebzadeh
reamingContext = new StreamingContext(conf, Seconds(1)) > val kafkaStream = KafkaUtils.createDirectStream(context,confParams, > topics) > > kafkaStream.foreachRDD(rdd => { > rdd.collect().foreach(println) > }) > > context.awaitTermination() > c

Apache Spark Kafka Integration - org.apache.spark.SparkException: Couldn't find leader offsets for Set()

2016-06-07 Thread Dominik Safaric
t the problem actually be? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Apache-Spark-Kafka-Integration-org-apache-spark-SparkException-Couldn-t-find-leader-offsets-for-Set-tp271

Re: Spark + Kafka processing trouble

2016-05-31 Thread Malcolm Lockyer
Thanks for the suggestions. I agree that there isn't some magic configuration setting, or that the sql options have some flaw - I just intended to explain the frustration of having a non-trivial (but still simple) Spark streaming job running on tiny amounts of data performing absolutely horribly.

Re: Spark + Kafka processing trouble

2016-05-31 Thread Cody Koeninger
> 500ms is I believe the minimum batch interval for Spark micro batching. It's better to test than to believe, I've run 250ms jobs. Same applies to the comments around JDBC, why assume when you could (dis)prove? It's not like it's a lot of effort to set up a minimal job that does

Re: Spark + Kafka processing trouble

2016-05-31 Thread Mich Talebzadeh
500ms is I believe the minimum batch interval for Spark micro batching. However, a JDBC call is a use of Unix file descriptor and context switch and it does have performance implication. That is irrespective of Kafka as it is happening one is actually going through Hive JDBC. It is a classic

Re: Spark + Kafka processing trouble

2016-05-31 Thread Cody Koeninger
There isn't a magic spark configuration setting that would account for multiple-second-long fixed overheads, you should be looking at maybe 200ms minimum for a streaming batch. 1024 kafka topicpartitions is not reasonable for the volume you're talking about. Unless you have really extreme

Re: Spark + Kafka processing trouble

2016-05-31 Thread Alonso Isidoro Roman
Mich`s idea is quite fine, if i was you, i will follow his idea... Alonso Isidoro Roman [image: https://]about.me/alonso.isidoro.roman 2016-05-31 6:37 GMT+02:00 Mich Talebzadeh

Re: Spark + Kafka processing trouble

2016-05-30 Thread Mich Talebzadeh
how are you getting your data from the database. Are you using JDBC. Can you actually call the database first (assuming the same data, put it in temp table in Spark and cache it for the duration of windows length and use the data from the cached table? Dr Mich Talebzadeh LinkedIn *

Re: Spark + Kafka processing trouble

2016-05-30 Thread Malcolm Lockyer
On Tue, May 31, 2016 at 3:14 PM, Darren Govoni wrote: > Well that could be the problem. A SQL database is essential a big > synchronizer. If you have a lot of spark tasks all bottlenecking on a single > database socket (is the database clustered or colocated with spark

Re: Spark + Kafka processing trouble

2016-05-30 Thread Darren Govoni
from my Verizon Wireless 4G LTE smartphone Original message From: Malcolm Lockyer <malcolm.lock...@hapara.com> Date: 05/30/2016 10:40 PM (GMT-05:00) To: user@spark.apache.org Subject: Re: Spark + Kafka processing trouble On Tue, May 31, 2016 at 1:56 PM, Darren Govon

Re: Spark + Kafka processing trouble

2016-05-30 Thread Malcolm Lockyer
On Tue, May 31, 2016 at 1:56 PM, Darren Govoni wrote: > So you are calling a SQL query (to a single database) within a spark > operation distributed across your workers? Yes, but currently with very small sets of data (1-10,000) and on a single (dev) machine right now.

RE: Spark + Kafka processing trouble

2016-05-30 Thread Darren Govoni
-05:00) To: user@spark.apache.org Subject: Spark + Kafka processing trouble Hopefully this is not off topic for this list, but I am hoping to reach some people who have used Kafka + Spark before. We are new to Spark and are setting up our first production environment and hitting a speed issue that

Spark + Kafka processing trouble

2016-05-30 Thread Malcolm Lockyer
Hopefully this is not off topic for this list, but I am hoping to reach some people who have used Kafka + Spark before. We are new to Spark and are setting up our first production environment and hitting a speed issue that maybe configuration related - and we have little experience in configuring

Re: What is the default value of rebalance.backoff.ms in Spark Kafka Direct?

2016-04-29 Thread swetha kasireddy
etting in the app for it to > wait > > till the leader election and rebalance is done from the Kafka side > assuming > > that Kafka has rebalance.backoff.ms of 2000 ? > > > > Also, does Spark Kafka Direct try to restart the app when the leader is > lost > > or

Re: What is the default value of rebalance.backoff.ms in Spark Kafka Direct?

2016-04-29 Thread Cody Koeninger
creasing to 2500 I still get Leader Lost Errors. > > Is refresh.leader.backoff.ms the right setting in the app for it to wait > till the leader election and rebalance is done from the Kafka side assuming > that Kafka has rebalance.backoff.ms of 2000 ? > > Also, does Spark Kafk

Re: What is the default value of rebalance.backoff.ms in Spark Kafka Direct?

2016-04-29 Thread swetha kasireddy
. Is refresh.leader.backoff.ms the right setting in the app for it to wait till the leader election and rebalance is done from the Kafka side assuming that Kafka has rebalance.backoff.ms of 2000 ? Also, does Spark Kafka Direct try to restart the app when the leader is lost or it will just wait till

Re: What is the default value of rebalance.backoff.ms in Spark Kafka Direct?

2016-04-29 Thread swetha kasireddy
; View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/What-is-the-default-value-of-rebalance-backoff-ms-in-Spark-Kafka-Direct-tp26840.html > > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > > >

Re: What is the default value of rebalance.backoff.ms in Spark Kafka Direct?

2016-04-27 Thread Cody Koeninger
ist.1001560.n3.nabble.com/What-is-the-default-value-of-rebalance-backoff-ms-in-Spark-Kafka-Direct-tp26840.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > - > To unsubscribe, e-mail: user-

What is the default value of rebalance.backoff.ms in Spark Kafka Direct?

2016-04-27 Thread SRK
.1001560.n3.nabble.com/What-is-the-default-value-of-rebalance-backoff-ms-in-Spark-Kafka-Direct-tp26840.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr

Re: Spark + Kafka all messages being used in 1 batch

2016-03-06 Thread Shahbaz
- Do you happen to see how busy are the nodes in terms of CPU and how much heap each executor is allocated with. - If there is enough capacity ,you may want to increase number of cores per executor to 2 and do the needed heap tweaking. - How much time did it take to process 4M+

Re: Spark + Kafka all messages being used in 1 batch

2016-03-06 Thread Vinti Maheshwari
I have 2 machines in my cluster with the below specifications: 128 GB RAM and 8 cores machine Regards, ~Vinti On Sun, Mar 6, 2016 at 7:54 AM, Vinti Maheshwari wrote: > Thanks Supreeth and Shahbaz. I will try adding > spark.streaming.kafka.maxRatePerPartition. > > Hi

Re: Spark + Kafka all messages being used in 1 batch

2016-03-06 Thread Vinti Maheshwari
Thanks Supreeth and Shahbaz. I will try adding spark.streaming.kafka.maxRatePerPartition. Hi Shahbaz, Please see comments, inline: - Which version of Spark you are using. ==> *1.5.2* - How big is the Kafka Cluster ==> *2 brokers* - What is the Message Size and type.==> *String, 9,550

Re: Spark + Kafka all messages being used in 1 batch

2016-03-05 Thread Supreeth
Try setting spark.streaming.kafka.maxRatePerPartition, this can help control the number of messages read from Kafka per partition on the spark streaming consumer. -S > On Mar 5, 2016, at 10:02 PM, Vinti Maheshwari wrote: > > Hello, > > I am trying to figure out why my

Spark + Kafka all messages being used in 1 batch

2016-03-05 Thread Vinti Maheshwari
Hello, I am trying to figure out why my kafka+spark job is running slow. I found that spark is consuming all the messages out of kafka into a single batch itself and not sending any messages to the other batches. 2016/03/05 21:57:05

Number of executors in Spark - Kafka

2016-01-21 Thread Guillermo Ortiz
I'm using Spark Streaming and Kafka with Direct Approach. I have created a topic with 6 partitions so when I execute Spark there are six RDD. I understand than ideally it should have six executors to process each one one RDD. To do it, when I execute spark-submit (I use YARN) I specific the

Re: Number of executors in Spark - Kafka

2016-01-21 Thread Cody Koeninger
6 kafka partitions will result in 6 spark partitions, not 6 spark rdds. The question of whether you will have a backlog isn't just a matter of having 1 executor per partition. If a single executor can process all of the partitions fast enough to complete a batch in under the required time, you

Re: Spark Kafka Direct Error

2015-11-24 Thread swetha kasireddy
d returning messages before reaching that ending offset. >>> >>> That probably means something got screwed up with Kafka - e.g. you lost >>> a leader and lost messages in the process. >>> >>> On Mon, Nov 23, 2015 at 12:57 PM, swetha <swethakasir

Re: Spark Kafka Direct Error

2015-11-24 Thread Cody Koeninger
. >>>> >>>> That probably means something got screwed up with Kafka - e.g. you lost >>>> a leader and lost messages in the process. >>>> >>>> On Mon, Nov 23, 2015 at 12:57 PM, swetha <swethakasire...@gmail.com> >>>

Re: Spark Kafka Direct Error

2015-11-24 Thread swetha kasireddy
;> >>>> On Mon, Nov 23, 2015 at 11:01 AM, Cody Koeninger <c...@koeninger.org> >>>> wrote: >>>> >>>>> No, that means that at the time the batch was scheduled, the kafka >>>>> leader reported the ending offset w

Re: Spark Kafka Direct Error

2015-11-23 Thread swetha kasireddy
s. > > On Mon, Nov 23, 2015 at 12:57 PM, swetha <swethakasire...@gmail.com> > wrote: > >> Hi, >> >> I see the following error in my Spark Kafka Direct. Would this mean that >> Kafka Direct is not able to catch up with the messages and is failing? >> >> Job a

Spark Kafka Direct Error

2015-11-23 Thread swetha
Hi, I see the following error in my Spark Kafka Direct. Would this mean that Kafka Direct is not able to catch up with the messages and is failing? Job aborted due to stage failure: Task 20 in stage 117.0 failed 4 times, most recent failure: Lost task 20.3 in stage 117.0 (TID 2114, 10.227.64.52

Re: Spark Kafka Direct Error

2015-11-23 Thread Cody Koeninger
messages in the process. On Mon, Nov 23, 2015 at 12:57 PM, swetha <swethakasire...@gmail.com> wrote: > Hi, > > I see the following error in my Spark Kafka Direct. Would this mean that > Kafka Direct is not able to catch up with the messages and is failing? > > Job abor

RE: Spark/Kafka Streaming Job Gets Stuck

2015-10-29 Thread Sabarish Sasidharan
m). We're in the process of re-evaluating. > -- > Nick > > -Original Message- > From: Adrian Tanase [mailto:atan...@adobe.com] > Sent: Wednesday, October 28, 2015 4:53 PM > To: Afshartous, Nick <nafshart...@turbine.com> > Cc: user@spark.apache.org > Subject: Re: Sp

Re: Spark/Kafka Streaming Job Gets Stuck

2015-10-29 Thread Adrian Tanase
Original Message- >From: Adrian Tanase [mailto:atan...@adobe.com] >Sent: Wednesday, October 28, 2015 4:53 PM >To: Afshartous, Nick <nafshart...@turbine.com> >Cc: user@spark.apache.org >Subject: Re: Spark/Kafka Streaming Job Gets Stuck > >Does it work as expected with smalle

Re: Spark/Kafka Streaming Job Gets Stuck

2015-10-29 Thread Cody Koeninger
void. That >> was the explanation of the developer who made this decision (who's no >> longer on the team). We're in the process of re-evaluating. >> -- >> Nick >> >> -Original Message- >> From: Adrian Tanase [mailto:atan...@adobe.com] >>

Re: Spark/Kafka Streaming Job Gets Stuck

2015-10-29 Thread srungarapu vamsi
Other than @Adrian suggestions, check if the processing delay is more than the batch processing time. On Thu, Oct 29, 2015 at 2:23 AM, Adrian Tanase wrote: > Does it work as expected with smaller batch or smaller load? Could it be > that it's accumulating too many events over

Spark/Kafka Streaming Job Gets Stuck

2015-10-28 Thread Afshartous, Nick
Hi, we are load testing our Spark 1.3 streaming (reading from Kafka) job and seeing a problem. This is running in AWS/Yarn and the streaming batch interval is set to 3 minutes and this is a ten node cluster. Testing at 30,000 events per second we are seeing the streaming job get stuck

Re: Spark/Kafka Streaming Job Gets Stuck

2015-10-28 Thread Adrian Tanase
Does it work as expected with smaller batch or smaller load? Could it be that it's accumulating too many events over 3 minutes? You could also try increasing the parallelism via repartition to ensure smaller tasks that can safely fit in working memory. Sent from my iPhone > On 28 Oct 2015, at

Where are logs for Spark Kafka Yarn on Cloudera

2015-09-29 Thread Rachana Srivastava
Hello all, I am trying to test JavaKafkaWordCount on Yarn, to make sure Yarn is working fine I am saving the output to hdfs. The example works fine in local mode but not on yarn mode. I cannot see any output logged when I changed the mode to yarn-client or yarn-cluster or cannot find any

Re: Where are logs for Spark Kafka Yarn on Cloudera

2015-09-29 Thread Marcelo Vanzin
(-dev@) Try using the "yarn logs" command to read logs for finished applications. You can also browse the RM UI to find more information about the applications you ran. On Mon, Sep 28, 2015 at 11:37 PM, Rachana Srivastava wrote: > Hello all, > > > > I am

Re: Spark-Kafka Connector issue

2015-09-29 Thread Cody Koeninger
> > Sent from Outlook <http://taps.io/outlookmobile> > > _ > From: Cody Koeninger <c...@koeninger.org> > Sent: Tuesday, September 29, 2015 12:33 am > Subject: Re: Spark-Kafka Connector issue > To: Ratika Prasad <rpra...@couponsinc.

Re: spark kafka partitioning

2015-08-21 Thread Gaurav Agarwal
when i send the message from kafka topic having three partitions. Spark will listen the message when i say kafkautils.createStream or createDirectstSream have local[4] Now i want to see if spark will create partitions when it receive message from kafka using dstream, how and where ,prwhich method

Re: spark kafka partitioning

2015-08-20 Thread ayan guha
If you have 1 topic, that means you have 1 DStream, which will have a series of RDDs for each batch interval. In receiver-based integration, there is no direct relationship b/w Kafka paritions with spark partitions. in Direct approach, 1 partition will be created for each kafka partition. On Fri,

spark kafka partitioning

2015-08-20 Thread Gaurav Agarwal
Hello Regarding Spark Streaming and Kafka Partitioning When i send message on kafka topic with 3 partitions and listens on kafkareceiver with local value[4] . how will i come to know in Spark Streaming that different Dstreams are created according to partitions of kafka messages . Thanks

Re: spark kafka partitioning

2015-08-20 Thread Cody Koeninger
I'm not clear on your question, can you rephrase it? Also, are you talking about createStream or createDirectStream? On Thu, Aug 20, 2015 at 9:48 PM, Gaurav Agarwal gaurav130...@gmail.com wrote: Hello Regarding Spark Streaming and Kafka Partitioning When i send message on kafka topic with

spark-kafka directAPI vs receivers based API

2015-08-10 Thread Mohit Durgapal
Hi All, I just wanted to know how does directAPI for spark streaming compare with earlier receivers based API. Has anyone used directAPI based approach on production or is it still being used for pocs? Also, since I'm new to spark, could anyone share a starting point from where I could find a

Re: spark-kafka directAPI vs receivers based API

2015-08-10 Thread Cody Koeninger
For direct stream questions: https://github.com/koeninger/kafka-exactly-once Yes, it is used in production. For general spark streaming question: http://spark.apache.org/docs/latest/streaming-programming-guide.html On Mon, Aug 10, 2015 at 7:51 AM, Mohit Durgapal durgapalmo...@gmail.com

Re: Spark Kafka Direct Streaming

2015-07-07 Thread Tathagata Das
checkpointing is required. Can I achieve this, i.e. enabling meta checkpointing and not the data checkpointing? Thanks Abhishek Patel -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Kafka-Direct-Streaming-tp23685.html Sent from the Apache

Spark Kafka Direct Streaming

2015-07-07 Thread abi_pat
: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Kafka-Direct-Streaming-tp23685.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org

Spark + Kafka

2015-04-01 Thread James King
I have a simple setup/runtime of Kafka and Sprak. I have a command line consumer displaying arrivals to Kafka topic. So i know messages are being received. But when I try to read from Kafka topic I get no messages, here are some logs below. I'm thinking there aren't enough threads. How do i

Re: Spark + Kafka

2015-04-01 Thread bit1...@163.com
Please make sure that you have given more cores than Receiver numbers. From: James King Date: 2015-04-01 15:21 To: user Subject: Spark + Kafka I have a simple setup/runtime of Kafka and Sprak. I have a command line consumer displaying arrivals to Kafka topic. So i know messages are being

Re: Spark + Kafka

2015-04-01 Thread James King
can you help please. On Wed, Apr 1, 2015 at 9:39 AM, bit1...@163.com bit1...@163.com wrote: Please make sure that you have given more cores than Receiver numbers. *From:* James King jakwebin...@gmail.com *Date:* 2015-04-01 15:21 *To:* user user@spark.apache.org *Subject:* Spark + Kafka

Re: Spark + Kafka

2015-04-01 Thread Saisai Shao
. On Wed, Apr 1, 2015 at 9:39 AM, bit1...@163.com bit1...@163.com wrote: Please make sure that you have given more cores than Receiver numbers. *From:* James King jakwebin...@gmail.com *Date:* 2015-04-01 15:21 *To:* user user@spark.apache.org *Subject:* Spark + Kafka I have a simple

RE: Spark + Kafka

2015-04-01 Thread Shao, Saisai
: James King [mailto:jakwebin...@gmail.com] Sent: Wednesday, April 1, 2015 6:59 PM To: Saisai Shao Cc: bit1...@163.com; user Subject: Re: Spark + Kafka This is the code. And I couldn't find anything like the log you suggested. public KafkaLogConsumer(int duration, String master

Re: Spark + Kafka

2015-04-01 Thread James King
: Please make sure that you have given more cores than Receiver numbers. *From:* James King jakwebin...@gmail.com *Date:* 2015-04-01 15:21 *To:* user user@spark.apache.org *Subject:* Spark + Kafka I have a simple setup/runtime of Kafka and Sprak. I have a command line consumer displaying

Re: Spark + Kafka

2015-04-01 Thread James King
...@163.com wrote: Please make sure that you have given more cores than Receiver numbers. *From:* James King jakwebin...@gmail.com *Date:* 2015-04-01 15:21 *To:* user user@spark.apache.org *Subject:* Spark + Kafka I have a simple setup/runtime of Kafka and Sprak. I have a command line

Re: Spark + Kafka

2015-03-19 Thread James King
Thanks Khanderao. On Wed, Mar 18, 2015 at 7:18 PM, Khanderao Kand Gmail khanderao.k...@gmail.com wrote: I have used various version of spark (1.0, 1.2.1) without any issues . Though I have not significantly used kafka with 1.3.0 , a preliminary testing revealed no issues . - khanderao

Re: Spark + Kafka

2015-03-19 Thread James King
Many thanks all for the good responses, appreciated. On Thu, Mar 19, 2015 at 8:36 AM, James King jakwebin...@gmail.com wrote: Thanks Khanderao. On Wed, Mar 18, 2015 at 7:18 PM, Khanderao Kand Gmail khanderao.k...@gmail.com wrote: I have used various version of spark (1.0, 1.2.1) without

Re: Spark + Kafka

2015-03-18 Thread Jeffrey Jedele
Probably 1.3.0 - it has some improvements in the included Kafka receiver for streaming. https://spark.apache.org/releases/spark-release-1-3-0.html Regards, Jeff 2015-03-18 10:38 GMT+01:00 James King jakwebin...@gmail.com: Hi All, Which build of Spark is best when using Kafka? Regards jk

Re: Spark + Kafka

2015-03-18 Thread James King
Thanks Jeff, I'm planning to use it in standalone mode, OK will use hadoop 2.4 package. Chao! On Wed, Mar 18, 2015 at 10:56 AM, Jeffrey Jedele jeffrey.jed...@gmail.com wrote: What you call sub-category are packages pre-built to run on certain Hadoop environments. It really depends on where

Re: Spark + Kafka

2015-03-18 Thread Jeffrey Jedele
What you call sub-category are packages pre-built to run on certain Hadoop environments. It really depends on where you want to run Spark. As far as I know, this is mainly about the included HDFS binding - so if you just want to play around with Spark, any of the packages should be fine. I

Re: Spark + Kafka

2015-03-18 Thread Khanderao Kand Gmail
I have used various version of spark (1.0, 1.2.1) without any issues . Though I have not significantly used kafka with 1.3.0 , a preliminary testing revealed no issues . - khanderao On Mar 18, 2015, at 2:38 AM, James King jakwebin...@gmail.com wrote: Hi All, Which build of Spark is

Re: spark kafka batch integration

2014-12-15 Thread Cody Koeninger
://github.com/tresata/spark-kafka best, koert

Re: spark kafka batch integration

2014-12-15 Thread Koert Kuipers
://github.com/tresata/spark-kafka best, koert

spark kafka batch integration

2014-12-14 Thread Koert Kuipers
using spark * make non-streaming data available in kafka with periodic data drops from hdfs using spark. this is to facilitate merging the speed and batch layer in spark-streaming * distributed writes from spark-streaming see here: https://github.com/tresata/spark-kafka best, koert

R: Spark Kafka Performance

2014-11-04 Thread Eduardo Alfaia
...@spark.incubator.apache.org Oggetto: Re: Spark Kafka Performance Not sure about the throughput, but: I mean that the words counted in spark should grow up - The spark word-count example doesn't accumulate. It gets an RDD every n seconds and counts the words in that RDD. So we don't expect the count to go up

Spark Kafka Performance

2014-11-03 Thread Eduardo Costa Alfaia
Hi Guys, Anyone could explain me how to work Kafka with Spark, I am using the JavaKafkaWordCount.java like a test and the line command is: ./run-example org.apache.spark.streaming.examples.JavaKafkaWordCount spark://192.168.0.13:7077 computer49:2181 test-consumer-group unibs.it 3 and like a

Spark / Kafka connector - CDH5 distribution

2014-10-07 Thread Abraham Jacob
Hi All, Does anyone know if CDH5.1.2 packages the spark streaming kafka connector under the spark externals project? -- ~

Re: Spark / Kafka connector - CDH5 distribution

2014-10-07 Thread Abraham Jacob
Thanks Sean, Sorry in my earlier question I meant to type CDH5.1.3 not CDH5.1.2 I presume it's included in spark-streaming_2.10-1.0.0-cdh5.1.3 But for some reason eclipse complains that import org.apache.spark.streaming.kafka cannon be resolved, even though I have included the

Re: Spark / Kafka connector - CDH5 distribution

2014-10-07 Thread Abraham Jacob
Never mind... my bad... made a typo. looks good. Thanks, On Tue, Oct 7, 2014 at 3:57 PM, Abraham Jacob abe.jac...@gmail.com wrote: Thanks Sean, Sorry in my earlier question I meant to type CDH5.1.3 not CDH5.1.2 I presume it's included in spark-streaming_2.10-1.0.0-cdh5.1.3 But for some

Re: Spark Kafka streaming - ClassNotFoundException: org.apache.spark.streaming.kafka.KafkaReceiver

2014-06-11 Thread gaurav.dasgupta
and fetch data. I have tested the code on local mode and it is working fine. But when I am executing the same code on YARN mode, I am getting KafkaReceiver class not found exception. I am providing the Spark Kafka jar in the classpath and ensured that the path is correct for all the nodes in my

Re: Spark Kafka streaming - ClassNotFoundException: org.apache.spark.streaming.kafka.KafkaReceiver

2014-06-08 Thread Tobias Pfeiffer
written my own custom Spark streaming code which connects to Kafka server and fetch data. I have tested the code on local mode and it is working fine. But when I am executing the same code on YARN mode, I am getting KafkaReceiver class not found exception. I am providing the Spark Kafka jar

Spark Kafka streaming - ClassNotFoundException: org.apache.spark.streaming.kafka.KafkaReceiver

2014-06-05 Thread Gaurav Dasgupta
Hi, I have written my own custom Spark streaming code which connects to Kafka server and fetch data. I have tested the code on local mode and it is working fine. But when I am executing the same code on YARN mode, I am getting KafkaReceiver class not found exception. I am providing the Spark

<    1   2