30k Sql shuffle partitions is extremely high.Core to Partition is 1 to 1
,default value of Sql shuffle partitions is 200 ,set it to 300 or leave it
to default ,see which one gives best performance,after you do that ,see how
cores are being used?
Regards,
Shahbaz
On Thu, Nov 15, 2018 at 10:58
ext level of parallelism ,if you are not having any data skew,then you
should get good performance.
Regards,
Shahbaz
On Wed, Nov 7, 2018 at 12:58 PM JF Chen wrote:
> I have a Spark Streaming application which reads data from kafka and save
> the the transformation result to hdfs.
>
Hi Christian can you please try if 30seconds works for your case .I think
your batches are getting queued up .Regards Shahbaz
On Tuesday 31 May 2016, Dancuart, Christian <christian.dancu...@rbc.com>
wrote:
> While it has heap space, batches run well below 15 seconds.
>
>
>
Hi Christian,
- What is the processing time of each of your Batch,is it exceeding 15
seconds.
- How many jobs are queued.
- Can you take a heap dump and see which objects are occupying the heap.
Regards,
Shahbaz
On Tue, May 31, 2016 at 12:21 AM, christian.dancu...@rbc.com
+ events (In Spark UI,you can
look at Duration) column.
- I believe Reader is Quite fast ,however Processing could be slower ,if
you click on the Job,it gives you break down of execution,Result
Serialization etc ,you may want to look at that and drive from there.
Regards,
Shahbaz
On Sun