Scaling Kafka Streaming to Thousands of Partitions

2019-05-25 Thread Charles Chao
Hi, We have been using Spark Kafka streaming for real time processing with success. The scale of this stream has been increasing with data growth, and we have been able to scale up by adding more brokers to the Kafka cluster, adding more partitions to the topic, and adding more executors to the

Re: Shuffle memory woes

2016-02-07 Thread Charles Chao
"The dataset is 100gb at most, the spills can up to 10T-100T" -- I have had the same experiences, although not to this extreme (the spills were < 10T while the input was ~ 100s gb) and haven't found any solution yet. I don't believe this is related to input data format. in my case, I got my

Re: Use KafkaRDD to Batch Process Messages from Kafka

2016-01-22 Thread Charles Chao
ve to redeploy spark) > - write / find equivalent code yourself > > If you want to build a patched version of the subproject and need a hand, > just ask on the list. > > > On Fri, Jan 22, 2016 at 1:30 PM, Charles Chao <charles.c...@bluecava.com> > wrote: > >>

Use KafkaRDD to Batch Process Messages from Kafka

2016-01-22 Thread Charles Chao
Hi, I have been using DirectKafkaInputDStream in Spark Streaming to consumer kafka messages and it's been working very well. Now I have the need to batch process messages from Kafka, for example, retrieve all messages every hour and process them, output to destinations like Hive or HDFS. I

Re: Event logging not working when worker machine terminated

2015-09-09 Thread Charles Chao
I have encountered the same problem after migrating from 1.2.2 to 1.3.0. After some searching this appears to be a bug introduced in 1.3. Hopefully it¹s fixed in 1.4. Thanks, Charles On 9/9/15, 7:30 AM, "David Rosenstrauch" wrote: >Standalone. > >On 09/08/2015 11:18

Re: Event logging not working when worker machine terminated

2015-09-09 Thread Charles Chao
r it. > >Thanks, > >DR > >On 09/09/2015 11:50 AM, Charles Chao wrote: >> I have encountered the same problem after migrating from 1.2.2 to 1.3.0. >> After some searching this appears to be a bug introduced in 1.3. >>Hopefully >> it¹s fixed in 1.4. >>