Re: How to address seemingly low core utilization on a spark workload?

2018-11-15 Thread Shahbaz
30k Sql shuffle partitions is extremely high.Core to Partition is 1 to 1 ,default value of Sql shuffle partitions is 200 ,set it to 300 or leave it to default ,see which one gives best performance,after you do that ,see how cores are being used? Regards, Shahbaz On Thu, Nov 15, 2018 at 10:58

Re: How to increase the parallelism of Spark Streaming application?

2018-11-07 Thread Shahbaz
ext level of parallelism ,if you are not having any data skew,then you should get good performance. Regards, Shahbaz On Wed, Nov 7, 2018 at 12:58 PM JF Chen wrote: > I have a Spark Streaming application which reads data from kafka and save > the the transformation result to hdfs. >

Re: Spark Streaming heap space out of memory

2016-05-30 Thread Shahbaz
Hi Christian can you please try if 30seconds works for your case .I think your batches are getting queued up .Regards Shahbaz On Tuesday 31 May 2016, Dancuart, Christian <christian.dancu...@rbc.com> wrote: > While it has heap space, batches run well below 15 seconds. > > >

Re: Spark Streaming heap space out of memory

2016-05-30 Thread Shahbaz
Hi Christian, - What is the processing time of each of your Batch,is it exceeding 15 seconds. - How many jobs are queued. - Can you take a heap dump and see which objects are occupying the heap. Regards, Shahbaz On Tue, May 31, 2016 at 12:21 AM, christian.dancu...@rbc.com

Re: Spark + Kafka all messages being used in 1 batch

2016-03-06 Thread Shahbaz
+ events (In Spark UI,you can look at Duration) column. - I believe Reader is Quite fast ,however Processing could be slower ,if you click on the Job,it gives you break down of execution,Result Serialization etc ,you may want to look at that and drive from there. Regards, Shahbaz On Sun