Re: [External Email] Re: [Spark Core]: What's difference among spark.shuffle.io.threads

2023-08-18 Thread Nebi Aydin
Hi, sorry for duplicates. First time user :) I keep getting fetchfailedexception 7337 port closed. Which is external shuffle service port. I was trying to tune these parameters. I have around 1000 executors and 5000 cores. I tried to set spark.shuffle.io.serverThreads to 2k. Should I also set

Re: [Spark Core]: What's difference among spark.shuffle.io.threads

2023-08-18 Thread Mich Talebzadeh
Hi, These two threads that you sent seem to be duplicates of each other? Anyhow I trust that you are familiar with the concept of shuffle in Spark. Spark Shuffle is an expensive operation since it involves the following - Disk I/O - Involves data serialization and deserialization

[Spark Core]: What's difference among spark.shuffle.io.threads

2023-08-18 Thread Nebi Aydin
I want to learn differences among below thread configurations. spark.shuffle.io.serverThreads spark.shuffle.io.clientThreads spark.shuffle.io.threads spark.rpc.io.serverThreads spark.rpc.io.clientThreads spark.rpc.io.threads Thanks.

[Spark Core]: What's difference among spark.shuffle.io.threads

2023-08-18 Thread Nebi Aydin
I want to learn differences among below thread configurations. spark.shuffle.io.serverThreads spark.shuffle.io.clientThreads spark.shuffle.io.threads spark.rpc.io.serverThreads spark.rpc.io.clientThreads spark.rpc.io.threads Thanks.

[no subject]

2023-08-18 Thread Dipayan Dev
Unsubscribe -- With Best Regards, Dipayan Dev Author of *Deep Learning with Hadoop * M.Tech (AI), IISc, Bangalore

Re: read dataset from only one node in YARN cluster

2023-08-18 Thread Mich Talebzadeh
Hi, Where do you see this? In spark UI. So data is skewed most probably as one node gets all the data and others nothing as I understand? HTH Mich Talebzadeh, Solutions Architect/Engineering Lead London United Kingdom view my Linkedin profile

read dataset from only one node in YARN cluster

2023-08-18 Thread marc nicole
Hi, Spark 3.2, Hadoop 3.2, using YARN cluster mode, if one wants to read a dataset that is found in one node of the cluster and not in the others, how to tell Spark that? I expect through DataframeReader and using path like *IP:port/pathOnLocalNode* PS: loading the dataset in HDFS is not an

RE: Re: Spark Vulnerabilities

2023-08-18 Thread Sankavi Nagalingam
Hi @Bjørn Jørgensen, Thank you for your quick response. Based on the PR shared , we are doing analysis from our side. For few jars you have requested for the CVE id, I have updated it in the attached document. Kindly verify it from your side and revert us back.

Re: Spark-SQL - Query Hanging, How To Troubleshoot

2023-08-18 Thread Mich Talebzadeh
Yes, it sounds like it. So the broadcast DF size seems to be between 1 and 4GB. So I suggest that you leave it as it is. I have not used the standalone mode since spark-2.4.3 so I may be missing a fair bit of context here. I am sure there are others like you that are still using it! HTH Mich