subject:"spark streaming doubt"

Re: spark streaming doubt

2015-07-13 Thread Shushant Arora

For second question I am comparing 2 situtations of processing kafkaRDD. case I - When I used foreachPartition to process kafka stream I am not able to see any stream job timing interval like Time: 142905487 ms . displayed on driver console at start of each stream batch. But it processed

Re: spark streaming doubt

2015-07-13 Thread Cody Koeninger

Regarding your first question, having more partitions than you do executors usually means you'll have better utilization, because the workload will be distributed more evenly. There's some degree of per-task overhead, but as long as you don't have a huge imbalance between number of tasks and

Re: spark streaming doubt

2015-07-13 Thread Aniruddh Sharma

Hi Sushant/Cody, For question 1 , following is my understanding ( I am not 100% sure and this is only my understanding, I have asked this question in another words to TD for confirmation which is not confirmed as of now). Following is my understanding. In accordance with tasks created in

spark streaming doubt

2015-07-11 Thread Shushant Arora

1.spark streaming 1.3 creates as many RDD Partitions as there are kafka partitions in topic. Say I have 300 partitions in topic and 10 executors and each with 3 cores so , is it means at a time only 10*3=30 partitions are processed and then 30 like that since executors launch tasks per RDD

Re: spark streaming doubt

2015-05-20 Thread Akhil Das

One receiver basically runs on 1 core, so if your single node is having 4 cores, there are still 3 cores left for the processing (for executors). And yes receiver remains on the same machine unless some failure happens. Thanks Best Regards On Tue, May 19, 2015 at 10:57 PM, Shushant Arora

Re: spark streaming doubt

2015-05-20 Thread Shushant Arora

So I can explicitly specify no of receivers and executors in receiver based streaming? Can you share a sample program if any? Also in Low level non receiver based , will data be fetched by same worker executor node and processed ? Also if I have concurrent jobs set to 1- so in low level fetching

Re: spark streaming doubt

2015-05-20 Thread Akhil Das

On Wed, May 20, 2015 at 1:12 PM, Shushant Arora shushantaror...@gmail.com wrote: So I can explicitly specify no of receivers and executors in receiver based streaming? Can you share a sample program if any? -You can look at the lowlevel consumer repo

spark streaming doubt

2015-05-19 Thread Shushant Arora

What happnes if in a streaming application one job is not yet finished and stream interval reaches. Does it starts next job or wait for first to finish and rest jobs will keep on accumulating in queue. Say I have a streaming application with stream interval of 1 sec, but my job takes 2 min to

Re: spark streaming doubt

2015-05-19 Thread Akhil Das

It will be a single job running at a time by default (you can also configure the spark.streaming.concurrentJobs to run jobs parallel which is not recommended to put in production). Now, your batch duration being 1 sec and processing time being 2 minutes, if you are using a receiver based

Re: spark streaming doubt

2015-05-19 Thread Akhil Das

spark.streaming.concurrentJobs takes an integer value, not boolean. If you set it as 2 then 2 jobs will run parallel. Default value is 1 and the next job will start once it completes the current one. Actually, in the current implementation of Spark Streaming and under default configuration,

Re: spark streaming doubt

2015-05-19 Thread Shushant Arora

So for Kafka+spark streaming, Receiver based streaming used highlevel api and non receiver based streaming used low level api. 1.In high level receiver based streaming does it registers consumers at each job start(whenever a new job is launched by streaming application say at each second)? 2.No

Re: spark streaming doubt

2015-05-19 Thread Akhil Das

On Tue, May 19, 2015 at 8:10 PM, Shushant Arora shushantaror...@gmail.com wrote: So for Kafka+spark streaming, Receiver based streaming used highlevel api and non receiver based streaming used low level api. 1.In high level receiver based streaming does it registers consumers at each job

Re: spark streaming doubt

2015-05-19 Thread Dibyendu Bhattacharya

Just to add, there is a Receiver based Kafka consumer which uses Kafka Low Level Consumer API. http://spark-packages.org/package/dibbhatt/kafka-spark-consumer Regards, Dibyendu On Tue, May 19, 2015 at 9:00 PM, Akhil Das ak...@sigmoidanalytics.com wrote: On Tue, May 19, 2015 at 8:10 PM,

Re: spark streaming doubt

2015-05-19 Thread Shushant Arora

Thanks Akhil andDibyendu. Does in high level receiver based streaming executors run on receivers itself to have data localisation ? Or its always data is transferred to executor nodes and executor nodes differ in each run of job but receiver node remains same(same machines) throughout life of

Re: spark streaming doubt

Re: spark streaming doubt

Re: spark streaming doubt

spark streaming doubt

Re: spark streaming doubt

Re: spark streaming doubt

Re: spark streaming doubt

spark streaming doubt

Re: spark streaming doubt

Re: spark streaming doubt

Re: spark streaming doubt

Re: spark streaming doubt

Re: spark streaming doubt

Re: spark streaming doubt

14 matches

Site Navigation

Mail list logo

Footer information