Re: Spark using iceberg

Gaurav Agarwal Thu, 15 Jun 2023 01:48:28 -0700

> HI
>
> I am using spark with iceberg, updating the table with 1700 columns ,
> We are loading 0.6 Million rows from parquet files ,in future it will be
> 16 Million rows and trying to update the data in the table which has 16
> buckets .
> Using the default partitioner of spark .Also we don't do any
> repartitioning of the dataset.on the bucketing column,
> One of the executor fails with OOME , and it recovers and again fails.when
> we are using Merge Into strategy of iceberg
> Merge into target( select * from source) on Target.id= source.id when
> matched then update set
> When not matched then insert
>
> But when we do append blind append . this works.
>
> Question :
>
> How to find what the issue is ? as we are running spark on EKS cluster
> .when executor gives OOME it dies logs also gone , unable to see the logs.
>
> DO we need to partition of the column in the dataset ? when at the time of
> loading or once the data is loaded .
>
> Need help to understand?
>
>

Re: Spark using iceberg

Reply via email to