Spark using iceberg

Gaurav Agarwal Thu, 15 Jun 2023 01:03:04 -0700

HI

I am using spark with iceberg, updating the table with 1700 columns ,
We are loading 0.6 Million rows from parquet files ,in future it will be 16
Million rows and trying to update the data in the table which has 16
buckets .
Using the default partitioner of spark .Also we don't do any repartitioning
of the dataset.on the bucketing column,
One of the executor fails with OOME , and it recovers and again fails.when
we are using Merge Into strategy of iceberg
Merge into target( select * from source) on Target.id= source.id when
matched then update set
When not matched then insert


But when we do append blind append . this works.

Question :

How to find what the issue is ? as we are running spark on EKS cluster
.when executor gives OOME it dies logs also gone , unable to see the logs.

DO we need to partition of the column in the dataset ? when at the time of
loading or once the data is loaded .

Need help to understand?

Spark using iceberg

Reply via email to