Re: Spark Streaming from Kafka, deal with initial heavy load.

2017-03-20 Thread Cody Koeninger
You want spark.streaming.kafka.maxRatePerPartition for the direct stream. On Sat, Mar 18, 2017 at 3:37 PM, Mal Edwin wrote: > > Hi, > You can enable backpressure to handle this. > > spark.streaming.backpressure.enabled > spark.streaming.receiver.maxRate > > Thanks, > Edwin > > On Mar 18, 2017, 12

Re: Spark Streaming from Kafka, deal with initial heavy load.

2017-03-18 Thread Mal Edwin
Hi, You can enable backpressure to handle this. spark.streaming.backpressure.enabled spark.streaming.receiver.maxRate Thanks, Edwin On Mar 18, 2017, 12:53 AM -0400, sagarcasual . , wrote: > Hi, we have spark 1.6.1 streaming from Kafka (0.10.1) topic using direct > approach. The streaming part

Spark Streaming from Kafka, deal with initial heavy load.

2017-03-17 Thread sagarcasual .
Hi, we have spark 1.6.1 streaming from Kafka (0.10.1) topic using direct approach. The streaming part works fine but when we initially start the job, we have to deal with really huge Kafka message backlog, millions of messages, and that first batch runs for over 40 hours, and after 12 hours or so

Re: Spark streaming from Kafka best fit

2016-03-07 Thread pratik khadloya
Would using mapPartitions instead of map help here? ~Pratik On Tue, Mar 1, 2016 at 10:07 AM Cody Koeninger wrote: > You don't need an equal number of executor cores to partitions. An > executor can and will work on multiple partitions within a batch, one after > the other. The real issue is w

Re: Spark streaming from Kafka best fit

2016-03-01 Thread Cody Koeninger
You don't need an equal number of executor cores to partitions. An executor can and will work on multiple partitions within a batch, one after the other. The real issue is whether you are able to keep your processing time under your batch time, so that delay doesn't increase. On Tue, Mar 1, 2016

Re: Spark streaming from Kafka best fit

2016-03-01 Thread Jatin Kumar
Thanks Cody! I understand what you said and if I am correct it will be using 224 executor cores just for fetching + stage-1 processing of 224 partitions. I will obviously need more cores for processing further stages and fetching next batch. I will start with higher number of executor cores and s

Re: Spark streaming from Kafka best fit

2016-03-01 Thread Cody Koeninger
> "How do I keep a balance of executors which receive data from Kafka and which process data" I think you're misunderstanding how the direct stream works. The executor which receives data is also the executor which processes data, there aren't separate receivers. If it's a single stage worth of

Re: How to monitor Spark Streaming from Kafka?

2015-06-02 Thread Ruslan Dautkhanov
2015 at 5:23 PM, dgoldenberg >> wrote: >> >>> Hi, >>> >>> What are some of the good/adopted approached to monitoring Spark >>> Streaming >>> from Kafka? I see that there are things like >>> http://quantifind.github.io/KafkaOffsetMonitor, f

Re: How to monitor Spark Streaming from Kafka?

2015-06-01 Thread Dmitry Goldenberg
>> >> What are some of the good/adopted approached to monitoring Spark Streaming >> from Kafka? I see that there are things like >> http://quantifind.github.io/KafkaOffsetMonitor, for example. Do they all >> assume that Receiver-based streaming is used? >> >> Th

Re: How to monitor Spark Streaming from Kafka?

2015-06-01 Thread Otis Gospodnetic
/22/monitoring-stream-processing-tools-cassandra-kafka-and-spark/ Otis On Mon, Jun 1, 2015 at 5:23 PM, dgoldenberg wrote: > Hi, > > What are some of the good/adopted approached to monitoring Spark Streaming > from Kafka? I see that there are things like > http://quant

Re: How to monitor Spark Streaming from Kafka?

2015-06-01 Thread Cody Koeninger
oup name of your choice. > > TD > > On Mon, Jun 1, 2015 at 2:23 PM, dgoldenberg > wrote: > >> Hi, >> >> What are some of the good/adopted approached to monitoring Spark Streaming >> from Kafka? I see that there are things like >> http://quantifind.github

Re: How to monitor Spark Streaming from Kafka?

2015-06-01 Thread Tathagata Das
re some of the good/adopted approached to monitoring Spark Streaming > from Kafka? I see that there are things like > http://quantifind.github.io/KafkaOffsetMonitor, for example. Do they all > assume that Receiver-based streaming is used? > > Then "Note that one disadvantage

How to monitor Spark Streaming from Kafka?

2015-06-01 Thread dgoldenberg
Hi, What are some of the good/adopted approached to monitoring Spark Streaming from Kafka? I see that there are things like http://quantifind.github.io/KafkaOffsetMonitor, for example. Do they all assume that Receiver-based streaming is used? Then "Note that one disadvantage of this app

Re: Spark Streaming from Kafka - no receivers and spark.streaming.receiver.maxRate?

2015-05-27 Thread Dmitry Goldenberg
fka, is there a way to >>> set something like spark.streaming.receiver.maxRate so as not to >>> overwhelm >>> the Spark consumers? >>> >>> What would be some of the ways to throttle the streamed messages so that >>> the >>> consumers don'

Re: Spark Streaming from Kafka - no receivers and spark.streaming.receiver.maxRate?

2015-05-27 Thread Tathagata Das
ming.receiver.maxRate so as not to overwhelm >> the Spark consumers? >> >> What would be some of the ways to throttle the streamed messages so that >> the >> consumers don't run out of memory? >> >> >> >> >> >> -- >> View this

Re: Spark Streaming from Kafka - no receivers and spark.streaming.receiver.maxRate?

2015-05-27 Thread Ted Yu
mething like spark.streaming.receiver.maxRate so as not to overwhelm > the Spark consumers? > > What would be some of the ways to throttle the streamed messages so that > the > consumers don't run out of memory? > > > > > > -- > View this message in context: > http://apac

Spark Streaming from Kafka - no receivers and spark.streaming.receiver.maxRate?

2015-05-27 Thread dgoldenberg
emory? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Streaming-from-Kafka-no-receivers-and-spark-streaming-receiver-maxRate-tp23061.html Sent from the Apache Spark User List mailing list archive at Nabbl

Re: spark streaming from kafka real time + batch processing in java

2015-02-06 Thread Andrew Psaltis
Mohit, >I want to process the data in real-time as well as store the data in hdfs in year/month/day/hour/ format. Are you wanting to process it and then put it into HDFS or just put the raw data into HDFS? If the later then why not just use Camus ( https://github.com/linkedin/camus), it will easil

Re: spark streaming from kafka real time + batch processing in java

2015-02-06 Thread Charles Feduke
Good questions, some of which I'd like to know the answer to. >> Is it okay to update a NoSQL DB with aggregated counts per batch interval or is it generally stored in hdfs? This depends on how you are going to use the aggregate data. 1. Is there a lot of data? If so, and you are going to use t

spark streaming from kafka real time + batch processing in java

2015-02-06 Thread Mohit Durgapal
I want to write a spark streaming consumer for kafka in java. I want to process the data in real-time as well as store the data in hdfs in year/month/day/hour/ format. I am not sure how to achieve this. Should I write separate kafka consumers, one for writing data to HDFS and one for spark streamin

spark streaming from kafka real time + batch processing in java

2015-02-05 Thread Mohit Durgapal
I want to write a spark streaming consumer for kafka in java. I want to process the data in real-time as well as store the data in hdfs in year/month/day/hour/ format. I am not sure how to achieve this. Should I write separate kafka consumers, one for writing data to HDFS and one for spark streamin

Re: Spark Streaming from Kafka

2014-10-29 Thread harold
I using kafka_2.10-1.1.0.jar on spark 1.1.0 — Sent from Mailbox On Wed, Oct 29, 2014 at 12:31 AM, null wrote: > Thanks! How do I find out which Kafka jar to use for scala 2.10.4? > — > Sent from Mailbox > On Wed, Oct 29, 2014 at 12:26 AM, Akhil Das > wrote: >> Looks like the kafka jar that you

Re: Spark Streaming from Kafka

2014-10-29 Thread harold
Thanks! How do I find out which Kafka jar to use for scala 2.10.4? — Sent from Mailbox On Wed, Oct 29, 2014 at 12:26 AM, Akhil Das wrote: > Looks like the kafka jar that you are using isn't compatible with your > scala version. > Thanks > Best Regards > On Wed, Oct 29, 2014 at 11:50 AM, Harold

Re: Spark Streaming from Kafka

2014-10-29 Thread Akhil Das
Looks like the kafka jar that you are using isn't compatible with your scala version. Thanks Best Regards On Wed, Oct 29, 2014 at 11:50 AM, Harold Nguyen wrote: > Hi, > > Just wondering if you've seen the following error when reading from Kafka: > > ERROR ReceiverTracker: Deregistered receiver

Spark Streaming from Kafka

2014-10-28 Thread Harold Nguyen
Hi, Just wondering if you've seen the following error when reading from Kafka: ERROR ReceiverTracker: Deregistered receiver for stream 0: Error starting receiver 0 - java.lang.NoClassDefFoundError: scala/reflect/ClassManifest at kafka.utils.Log4jController$.(Log4jController.scala:29) at kafka.uti