Re: Thread spilling sort issue with single task
Well if your data is skewed I don't think it can be avoided but mitigated using skew techniques. I'd recommend you to take a look at "salted join" maybe. On Tue, 26 Jan 2021 at 11:29, rajat kumar wrote: > Hi , > > Yes I understand its skew based problem but how can it be avoided . Could > you please suggest? > > I am in Spark2.4 > > Thanks > Rajat > > On Tue, Jan 26, 2021 at 3:58 PM German Schiavon > wrote: > >> Hi, >> >> One word : SKEW >> >> It seems the classic skew problem, you would have to apply skew >> techniques to repartition your data properly or if you are in spark 3.0+ >> try the skewJoin optimization. >> >> On Tue, 26 Jan 2021 at 11:20, rajat kumar >> wrote: >> >>> Hi Everyone, >>> >>> I am running a spark application where I have applied 2 left joins. 1st >>> join in Broadcast and another one is normal. >>> Out of 200 tasks , last 1 task is stuck . It is running at "ANY" >>> Locality level. It seems data skewness issue. >>> It is doing too much spill and shuffle write is too much. Following >>> error is coming in executor logs: >>> >>> INFO UnsafeExternalSorter: Thread spilling sort data of 10.4 GB to disk >>> (10 times so far) >>> >>> >>> Can anyone please suggest what can be wrong? >>> >>> Thanks >>> Rajat >>> >>
Re: Thread spilling sort issue with single task
Hi , Yes I understand its skew based problem but how can it be avoided . Could you please suggest? I am in Spark2.4 Thanks Rajat On Tue, Jan 26, 2021 at 3:58 PM German Schiavon wrote: > Hi, > > One word : SKEW > > It seems the classic skew problem, you would have to apply skew techniques > to repartition your data properly or if you are in spark 3.0+ try the > skewJoin optimization. > > On Tue, 26 Jan 2021 at 11:20, rajat kumar > wrote: > >> Hi Everyone, >> >> I am running a spark application where I have applied 2 left joins. 1st >> join in Broadcast and another one is normal. >> Out of 200 tasks , last 1 task is stuck . It is running at "ANY" Locality >> level. It seems data skewness issue. >> It is doing too much spill and shuffle write is too much. Following error >> is coming in executor logs: >> >> INFO UnsafeExternalSorter: Thread spilling sort data of 10.4 GB to disk >> (10 times so far) >> >> >> Can anyone please suggest what can be wrong? >> >> Thanks >> Rajat >> >
Re: Thread spilling sort issue with single task
Hi, One word : SKEW It seems the classic skew problem, you would have to apply skew techniques to repartition your data properly or if you are in spark 3.0+ try the skewJoin optimization. On Tue, 26 Jan 2021 at 11:20, rajat kumar wrote: > Hi Everyone, > > I am running a spark application where I have applied 2 left joins. 1st > join in Broadcast and another one is normal. > Out of 200 tasks , last 1 task is stuck . It is running at "ANY" Locality > level. It seems data skewness issue. > It is doing too much spill and shuffle write is too much. Following error > is coming in executor logs: > > INFO UnsafeExternalSorter: Thread spilling sort data of 10.4 GB to disk > (10 times so far) > > > Can anyone please suggest what can be wrong? > > Thanks > Rajat >