unsubscribe

2024-05-01 Thread Nebi Aydin
unsubscribe

About shuffle partition size

2023-12-20 Thread Nebi Aydin
Hi all, What happens when # of unique join keys less than shuffle partitions? Are we going to end up with lots of empty partitions? If yes,is there any point to have shuffle partitions bigger than # of unique join keys?

Thread dump only shows 10 shuffle clients

2023-09-28 Thread Nebi Aydin
Hi all, I set the spark.shuffle.io.serverThreads and spark.shuffle.io.clientThreads to *800* But when I click Thread dump from the Spark UI for the executor: I only see 10 shuffle client threads for the executor. Is that normal, am I missing something?

Files io threads vs shuffle io threads

2023-09-27 Thread Nebi Aydin
Hi all, Can someone explain the difference between Files io threads and shuffle io threads, as I couldn't find any explanation. I'm specifically asking about these: spark.rpc.io.serverThreads spark.rpc.io.clientThreads spark.rpc.io.threads spark.files.io.serverThreads spark.files.io.clientThreads

About Peak Jvm Memory Onheap

2023-09-17 Thread Nebi Aydin
Hi all, I couldn't find any useful doc that explains `*Peak JVM Memory Onheap`* field on Spark UI. Most of the time my applications have very low *On heap storage memory *and *Peak execution memory on heap* But have very big `*Peak JVM Memory Onheap`.* on Spark UI Can someone please explain the

[Spark Core]: How does rpc threads influence shuffle?

2023-09-15 Thread Nebi Aydin
Hello all, I know that these parameters exist for shuffle tuning: *spark.shuffle.io.serverThreadsspark.shuffle.io.clientThreadsspark.shuffle.io.threads* But we also have *spark.rpc.io.serverThreadsspark.rpc.io.clientThreadsspark.rpc.io.threads* So specifically talking about *Shuffling,

Re: [External Email] Re: About /mnt/hdfs/current/BP directories

2023-09-08 Thread Nebi Aydin
>> ) >> >> On Fri, Sep 8, 2023 at 14:56 Jack Wells wrote: >> >>> Hi Nebi, can you share the code you’re using to read and write from S3? >>> >>> On Sep 8, 2023 at 10:59:59, Nebi Aydin >>> wrote: >>> >>>> H

Re: [External Email] Re: About /mnt/hdfs/current/BP directories

2023-09-08 Thread Nebi Aydin
> > On Sep 8, 2023 at 10:59:59, Nebi Aydin > wrote: > >> Hi all, >> I am using spark on EMR to process data. Basically i read data from AWS >> S3 and do the transformation and post transformation i am loading/writing >> data to s3. >> >> Recently we

About /mnt/hdfs/current/BP directories

2023-09-08 Thread Nebi Aydin
Hi all, I am using spark on EMR to process data. Basically i read data from AWS S3 and do the transformation and post transformation i am loading/writing data to s3. Recently we have found that hdfs(/mnt/hdfs) utilization is going too high. I disabled `yarn.log-aggregation-enable` by setting it

Re: [External Email] Re: [Spark Core]: What's difference among spark.shuffle.io.threads

2023-08-19 Thread Nebi Aydin
r will in no case be liable for any monetary damages arising from > such loss, damage or destruction. > > > > > On Fri, 18 Aug 2023 at 23:30, Nebi Aydin wrote: > >> >> Hi, sorry for duplicates. First time user :) >> I keep getting fetchfailedexception 733

Re: [External Email] Re: [Spark Core]: What's difference among spark.shuffle.io.threads

2023-08-18 Thread Nebi Aydin
or any other property which may arise > from relying on this email's technical content is explicitly disclaimed. > The author will in no case be liable for any monetary damages arising from > such loss, damage or destruction. > > > > > On Fri, 18 Aug 2023 at 20:39, Nebi Aydin

[Spark Core]: What's difference among spark.shuffle.io.threads

2023-08-18 Thread Nebi Aydin
I want to learn differences among below thread configurations. spark.shuffle.io.serverThreads spark.shuffle.io.clientThreads spark.shuffle.io.threads spark.rpc.io.serverThreads spark.rpc.io.clientThreads spark.rpc.io.threads Thanks.

[Spark Core]: What's difference among spark.shuffle.io.threads

2023-08-18 Thread Nebi Aydin
I want to learn differences among below thread configurations. spark.shuffle.io.serverThreads spark.shuffle.io.clientThreads spark.shuffle.io.threads spark.rpc.io.serverThreads spark.rpc.io.clientThreads spark.rpc.io.threads Thanks.