Re: org.apache.spark.shuffle.FetchFailedException in dataproc

2023-03-14 Thread Gary Liu
damage or destruction of data or any other property which may arise > from relying on this email's technical content is explicitly disclaimed. > The author will in no case be liable for any monetary damages arising from > such loss, damage or destruction. > > > > >

Re: org.apache.spark.shuffle.FetchFailedException in dataproc

2023-03-13 Thread Gary Liu
t; such loss, damage or destruction. > > > > > On Fri, 10 Mar 2023 at 15:35, Gary Liu wrote: > >> Hi , >> >> I have a job in GCP dataproc server spark session (spark 3.3.2), it is a >> job involving multiple joinings, as well as a complex UDF. I always

org.apache.spark.shuffle.FetchFailedException in dataproc

2023-03-10 Thread Gary Liu
) at org.apache.spark.sql.execution.ShuffledRowRDD.compute(ShuffledRowRDD.scala:240) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:365) at org.apache.spark.rdd.RDD.iterator(RDD.scala:329) ... 40 more ) -- Gary Liu

Re: may I need a join here?

2022-01-24 Thread Gary Liu
one > > pyspark.sql.utils.AnalysisException: Resolved attribute(s) stopword#4 > missing from word#0,count#1L in operator !Filter NOT word#0 IN > (stopword#4).; > > !Filter NOT word#0 IN (stopword#4) > > +- LogicalRDD [word#0, count#1L], false > > > > > > The filter method doesn't work here. > > Maybe I need a join for two DF? > > What's the syntax for this? > > > > Thank you and regards, > > Bitfox > -- Gary Liu