koderka2020 opened a new issue, #11426:
URL: https://github.com/apache/iceberg/issues/11426
Hi Iceberg team,
I've been searching for some time information on what is the max insert rate
per sec or per min on iceberg table. We've been ingesting some large amounts of
data (in tandem with trino and nessie) by concurrently running aws glue jobs.
These jobs are failing at pretty high rate ("SystemExit: ERROR: An error
occurred while calling o213.append.") even with the increased "retry" table
property settings (25 retry, min 1000 ms wait, max 1500ms wait).
If the parallelism is too high (1000-2500 concurrently running jobs trying
to write to iceberg total of about 100k rows /500MB within 30mins) would you
recommend some way around it? I was thinking creating staging table in
postgress or creating multiple staging tables in iceberg to distribute the load
and later after migrating the data to the main iceberg table at the end just
dropping the staging tables. What are your thoughts on that?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]