Re: org.apache.spark.shuffle.FetchFailedException in dataproc

2023-03-14 Thread Gary Liu
s? repartition the input dataset to have less >>>> partitions? I used df.rdd.getNumPartitions() to check the input data >>>> partitions, they have 9 and 17 partitions respectively, should I decrease >>>> them further? I also read a pos

Re: org.apache.spark.shuffle.FetchFailedException in dataproc

2023-03-13 Thread Mich Talebzadeh
xception-when-processing-big-data-se), >>> saying increasing partitions may help.Which one makes more sense? I >>> repartitioned the input data to 20 and 30 partitions, but still no luck. >>> >>> Any suggestions? >>> >>> 23/03/10 14

Re: org.apache.spark.shuffle.FetchFailedException in dataproc

2023-03-13 Thread Gary Liu
> >> Any suggestions? >> >> 23/03/10 14:32:19 WARN TaskSetManager: Lost task 58.1 in stage 27.0 (TID >> 3783) (10.1.0.116 executor 33): FetchFailed(BlockManagerId(72, 10.1.15.199, >> 36791, None), shuffleId=24, mapIndex=77, mapId=3457, reduceId=58,

Re: org.apache.spark.shuffle.FetchFailedException in dataproc

2023-03-10 Thread Mich Talebzadeh
(TID > 3783) (10.1.0.116 executor 33): FetchFailed(BlockManagerId(72, 10.1.15.199, > 36791, None), shuffleId=24, mapIndex=77, mapId=3457, reduceId=58, message= > org.apache.spark.shuffle.FetchFailedException > at > org.apache.spark.errors.SparkCoreErrors$.fetchFailedError(Spark

org.apache.spark.shuffle.FetchFailedException in dataproc

2023-03-10 Thread Gary Liu
in stage 27.0 (TID 3783) (10.1.0.116 executor 33): FetchFailed(BlockManagerId(72, 10.1.15.199, 36791, None), shuffleId=24, mapIndex=77, mapId=3457, reduceId=58, message= org.apache.spark.shuffle.FetchFailedException at org.apache.spark.errors.SparkCoreErrors$.fetchFailedError

Re: org.apache.spark.shuffle.FetchFailedException: Too large frame:

2018-05-03 Thread Ryan Blue
bove error in Spark SQL . I have increase (using 5000 >>> ) number of partitions but still getting the same error . >>> >>> My data most probably is skew. >>> >>> >>> >>> org.apache.spark.shuffle.FetchFailedException: Too large frame: 42

Re: org.apache.spark.shuffle.FetchFailedException: Too large frame:

2018-05-02 Thread Pralabh Kumar
>> I am getting the above error in Spark SQL . I have increase (using 5000 ) >> number of partitions but still getting the same error . >> >> My data most probably is skew. >> >> >> >> org.apache.spark.shuffle.FetchFailedException: Too

Re: org.apache.spark.shuffle.FetchFailedException: Too large frame:

2018-05-01 Thread Ryan Blue
increase (using 5000 ) > number of partitions but still getting the same error . > > My data most probably is skew. > > > > org.apache.spark.shuffle.FetchFailedException: Too large frame: 4247124829 > at > org.apache.spark.storage.ShuffleBloc

org.apache.spark.shuffle.FetchFailedException: Too large frame:

2018-05-01 Thread Pralabh Kumar
Hi I am getting the above error in Spark SQL . I have increase (using 5000 ) number of partitions but still getting the same error . My data most probably is skew. org.apache.spark.shuffle.FetchFailedException: Too large frame: 4247124829

Re: Job keeps aborting because of org.apache.spark.shuffle.FetchFailedException: Failed to connect to server/ip:39232

2017-07-29 Thread 周康
che.org > *Subject:* Re: Job keeps aborting because of > org.apache.spark.shuffle.FetchFailedException: > Failed to connect to server/ip:39232 > > > Thanks Juan for taking the time > > Here's more info: > - This is running on Yarn in Master mode > > - See config para

Re: Job keeps aborting because of org.apache.spark.shuffle.FetchFailedException: Failed to connect to server/ip:39232

2017-07-28 Thread jeff saremi
ct: Re: Job keeps aborting because of org.apache.spark.shuffle.FetchFailedException: Failed to connect to server/ip:39232 Thanks Juan for taking the time Here's more info: - This is running on Yarn in Master mode - See config params below - This is a corporate environment. In general nodes

Re: Job keeps aborting because of org.apache.spark.shuffle.FetchFailedException: Failed to connect to server/ip:39232

2017-07-28 Thread jeff saremi
etwork.timeout=1000s ^ From: Juan Rodríguez Hortalá <juan.rodriguez.hort...@gmail.com> Sent: Friday, July 28, 2017 4:20:40 PM To: jeff saremi Cc: user@spark.apache.org Subject: Re: Job keeps aborting because of org.apache.spark.shuffle.FetchFailedException: Faile

Re: Job keeps aborting because of org.apache.spark.shuffle.FetchFailedException: Failed to connect to server/ip:39232

2017-07-28 Thread Juan Rodríguez Hortalá
Hi Jeff, Can you provide more information about how are you running your job? In particular: - which cluster manager are you using? It is YARN, Mesos, Spark Standalone? - with configuration options are you using to submit the job? In particular are you using dynamic allocation or external

Job keeps aborting because of org.apache.spark.shuffle.FetchFailedException: Failed to connect to server/ip:39232

2017-07-28 Thread jeff saremi
We have a not too complex and not too large spark job that keeps dying with this error I have researched it and I have not seen any convincing explanation on why I am not using a shuffle service. Which server is the one that is refusing the connection? If I go to the server that is being

Re: PySpark Issue: "org.apache.spark.shuffle.FetchFailedException: Failed to connect to..."

2016-05-04 Thread HLee
I had the same problem. One forum post elsewhere suggested that too much network communication might be using up available ports. I reduced the partition size via repartition(int) and it solved the problem. -- View this message in context:

Re: PySpark Issue: "org.apache.spark.shuffle.FetchFailedException: Failed to connect to..."

2016-03-20 Thread craigiggy
Also, this is the command I use to submit the Spark application: ** where *recommendation_engine-0.1-py2.7.egg* is a Python egg of my own library I've written for this application, and *'file'* and *'/home/spark/enigma_analytics/tests/msg-epims0730_small.json'* are input arguments for the

Re: PySpark Issue: "org.apache.spark.shuffle.FetchFailedException: Failed to connect to..."

2016-03-19 Thread craigiggy
Slight update I suppose? For some reason, sometimes it will connect and continue and the job will be completed. But most of the time I still run into this error and the job is killed and the application doesn't finish. Still have no idea why this is happening. I could really use some help here.

PySpark Issue: "org.apache.spark.shuffle.FetchFailedException: Failed to connect to..."

2016-03-15 Thread craigiggy
attempts, but I still get the same error. The error code that stands out to me is: *org.apache.spark.shuffle.FetchFailedException: Failed to connect to spark-mastr-1:xx* The following is the error that I receive on my most recent attempted run of the application: Traceback (most recent call last

org.apache.spark.shuffle.FetchFailedException: Failed to connect to ..... on worker failure

2015-10-28 Thread kundan kumar
. org.apache.spark.shuffle.FetchFailedException: Failed to connect to . Now, when I restart the same worker or (2 workers were running on the machine and I killed just one of them) then the execution resumes and the process is completed. Please help me in understanding why on a worker failure my

Re: org.apache.spark.shuffle.FetchFailedException

2015-08-25 Thread kundan kumar
this query on a data size of 4 billion rows and getting org.apache.spark.shuffle.FetchFailedException error. select adid,position,userid,price from ( select adid,position,userid,price, dense_rank() OVER (PARTITION BY adlocationid ORDER BY price DESC) as rank FROM trainInfo) as tmp WHERE rank

Re: org.apache.spark.shuffle.FetchFailedException

2015-08-25 Thread Raghavendra Pandey
Did you try increasing sql partitions? On Tue, Aug 25, 2015 at 11:06 AM, kundan kumar iitr.kun...@gmail.com wrote: I am running this query on a data size of 4 billion rows and getting org.apache.spark.shuffle.FetchFailedException error. select adid,position,userid,price from ( select adid

Re: org.apache.spark.shuffle.FetchFailedException :: Migration from Spark 1.2 to 1.3

2015-05-19 Thread Akhil Das
from Spark 1.2 to Spark 1.3 15/05/18 18:22:39 WARN TaskSetManager: Lost task 0.0 in stage 1.6 (TID 84, cloud8-server): FetchFailed(BlockManagerId(1, cloud4-server, 7337), shuffleId=0, mapId=9, reduceId=1, message= org.apache.spark.shuffle.FetchFailedException: java.lang.RuntimeException

org.apache.spark.shuffle.FetchFailedException :: Migration from Spark 1.2 to 1.3

2015-05-18 Thread zia_kayani
= org.apache.spark.shuffle.FetchFailedException: java.lang.RuntimeException: Failed to open file: /tmp/spark-fff63849-a318-4e48-bdea-2f563076ad5d/spark-40ba3a41-0f4d-446e-b806-e788e210d394/spark-a3d61f7a-22e9-4b3b-9346-ff3b70d0e43d/blockmgr-0e3b2b5d-f677-4e91-b98b-ed913adbd15f/39/shuffle_0_9_0.index