You want spark.streaming.kafka.maxRatePerPartition for the direct stream.
On Sat, Mar 18, 2017 at 3:37 PM, Mal Edwin wrote:
>
> Hi,
> You can enable backpressure to handle this.
>
> spark.streaming.backpressure.enabled
> spark.streaming.receiver.maxRate
>
> Thanks,
> Edwin
>
> On Mar 18, 2017, 12
Hi,
You can enable backpressure to handle this.
spark.streaming.backpressure.enabled
spark.streaming.receiver.maxRate
Thanks,
Edwin
On Mar 18, 2017, 12:53 AM -0400, sagarcasual . , wrote:
> Hi, we have spark 1.6.1 streaming from Kafka (0.10.1) topic using direct
> approach. The streaming part
Hi, we have spark 1.6.1 streaming from Kafka (0.10.1) topic using direct
approach. The streaming part works fine but when we initially start the
job, we have to deal with really huge Kafka message backlog, millions of
messages, and that first batch runs for over 40 hours, and after 12 hours
or so
Would using mapPartitions instead of map help here?
~Pratik
On Tue, Mar 1, 2016 at 10:07 AM Cody Koeninger wrote:
> You don't need an equal number of executor cores to partitions. An
> executor can and will work on multiple partitions within a batch, one after
> the other. The real issue is w
You don't need an equal number of executor cores to partitions. An
executor can and will work on multiple partitions within a batch, one after
the other. The real issue is whether you are able to keep your processing
time under your batch time, so that delay doesn't increase.
On Tue, Mar 1, 2016
Thanks Cody!
I understand what you said and if I am correct it will be using 224
executor cores just for fetching + stage-1 processing of 224 partitions. I
will obviously need more cores for processing further stages and fetching
next batch.
I will start with higher number of executor cores and s
> "How do I keep a balance of executors which receive data from Kafka and
which process data"
I think you're misunderstanding how the direct stream works. The executor
which receives data is also the executor which processes data, there aren't
separate receivers. If it's a single stage worth of
2015 at 5:23 PM, dgoldenberg
>> wrote:
>>
>>> Hi,
>>>
>>> What are some of the good/adopted approached to monitoring Spark
>>> Streaming
>>> from Kafka? I see that there are things like
>>> http://quantifind.github.io/KafkaOffsetMonitor, f
>>
>> What are some of the good/adopted approached to monitoring Spark Streaming
>> from Kafka? I see that there are things like
>> http://quantifind.github.io/KafkaOffsetMonitor, for example. Do they all
>> assume that Receiver-based streaming is used?
>>
>> Th
/22/monitoring-stream-processing-tools-cassandra-kafka-and-spark/
Otis
On Mon, Jun 1, 2015 at 5:23 PM, dgoldenberg
wrote:
> Hi,
>
> What are some of the good/adopted approached to monitoring Spark Streaming
> from Kafka? I see that there are things like
> http://quant
oup name of your choice.
>
> TD
>
> On Mon, Jun 1, 2015 at 2:23 PM, dgoldenberg
> wrote:
>
>> Hi,
>>
>> What are some of the good/adopted approached to monitoring Spark Streaming
>> from Kafka? I see that there are things like
>> http://quantifind.github
re some of the good/adopted approached to monitoring Spark Streaming
> from Kafka? I see that there are things like
> http://quantifind.github.io/KafkaOffsetMonitor, for example. Do they all
> assume that Receiver-based streaming is used?
>
> Then "Note that one disadvantage
Hi,
What are some of the good/adopted approached to monitoring Spark Streaming
from Kafka? I see that there are things like
http://quantifind.github.io/KafkaOffsetMonitor, for example. Do they all
assume that Receiver-based streaming is used?
Then "Note that one disadvantage of this app
fka, is there a way to
>>> set something like spark.streaming.receiver.maxRate so as not to
>>> overwhelm
>>> the Spark consumers?
>>>
>>> What would be some of the ways to throttle the streamed messages so that
>>> the
>>> consumers don'
ming.receiver.maxRate so as not to overwhelm
>> the Spark consumers?
>>
>> What would be some of the ways to throttle the streamed messages so that
>> the
>> consumers don't run out of memory?
>>
>>
>>
>>
>>
>> --
>> View this
mething like spark.streaming.receiver.maxRate so as not to overwhelm
> the Spark consumers?
>
> What would be some of the ways to throttle the streamed messages so that
> the
> consumers don't run out of memory?
>
>
>
>
>
> --
> View this message in context:
> http://apac
emory?
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Streaming-from-Kafka-no-receivers-and-spark-streaming-receiver-maxRate-tp23061.html
Sent from the Apache Spark User List mailing list archive at Nabbl
Mohit,
>I want to process the data in real-time as well as store the data in hdfs
in year/month/day/hour/ format.
Are you wanting to process it and then put it into HDFS or just put the raw
data into HDFS? If the later then why not just use Camus (
https://github.com/linkedin/camus), it will easil
Good questions, some of which I'd like to know the answer to.
>> Is it okay to update a NoSQL DB with aggregated counts per batch
interval or is it generally stored in hdfs?
This depends on how you are going to use the aggregate data.
1. Is there a lot of data? If so, and you are going to use t
I want to write a spark streaming consumer for kafka in java. I want to
process the data in real-time as well as store the data in hdfs in
year/month/day/hour/ format. I am not sure how to achieve this. Should I
write separate kafka consumers, one for writing data to HDFS and one for
spark streamin
I want to write a spark streaming consumer for kafka in java. I want to
process the data in real-time as well as store the data in hdfs in
year/month/day/hour/ format. I am not sure how to achieve this. Should I
write separate kafka consumers, one for writing data to HDFS and one for
spark streamin
I using kafka_2.10-1.1.0.jar on spark 1.1.0
—
Sent from Mailbox
On Wed, Oct 29, 2014 at 12:31 AM, null wrote:
> Thanks! How do I find out which Kafka jar to use for scala 2.10.4?
> —
> Sent from Mailbox
> On Wed, Oct 29, 2014 at 12:26 AM, Akhil Das
> wrote:
>> Looks like the kafka jar that you
Thanks! How do I find out which Kafka jar to use for scala 2.10.4?
—
Sent from Mailbox
On Wed, Oct 29, 2014 at 12:26 AM, Akhil Das
wrote:
> Looks like the kafka jar that you are using isn't compatible with your
> scala version.
> Thanks
> Best Regards
> On Wed, Oct 29, 2014 at 11:50 AM, Harold
Looks like the kafka jar that you are using isn't compatible with your
scala version.
Thanks
Best Regards
On Wed, Oct 29, 2014 at 11:50 AM, Harold Nguyen wrote:
> Hi,
>
> Just wondering if you've seen the following error when reading from Kafka:
>
> ERROR ReceiverTracker: Deregistered receiver
Hi,
Just wondering if you've seen the following error when reading from Kafka:
ERROR ReceiverTracker: Deregistered receiver for stream 0: Error starting
receiver 0 - java.lang.NoClassDefFoundError: scala/reflect/ClassManifest
at kafka.utils.Log4jController$.(Log4jController.scala:29)
at kafka.uti
25 matches
Mail list logo