date:20240224

unsubscribe

2024-02-24 Thread Ameet Kini

Re: job uuid not unique

2024-02-24 Thread Xin Zhang

unsubscribe On Sat, Feb 17, 2024 at 3:04 AM Рамик И wrote: > > Hi > I'm using Spark Streaming to read from Kafka and write to S3. Sometimes I > get errors when writing org.apache.hadoop.fs.FileAlreadyExistsException. > > Spark version: 3.5.0 > scala version : 2.13.8 > Cluster: k8s > >

Re: AQE coalesce 60G shuffle data into a single partition

2024-02-24 Thread Enrico Minack

Hi Shay, maybe this is related to the small number of output rows (1,250) of the last exchange step that consume those 60GB shuffle data. Looks like your outer transformation is something like df.groupBy($"id").agg(collect_list($"prop_name")) Have you tried adding a repartition as an attempt