waterlx edited a comment on issue #732: [DISCUSS] Update metadata serially URL: https://github.com/apache/incubator-iceberg/issues/732#issuecomment-585637463 Thanks for sharing your insight! It is of great help for us. I made some tests regarding the "metadata retries" based on Hive Catalog. It seems that it is not "cheap" 8-) The report is as below: **1. Environment** (1) Linux, 8 core + 16G mem, *1 (2) Mac, 6 core + 16G mem, *1 **2. Setup** (1) Iceberg table: Partitioned by identity(field_a), 10,000 partitions. Each partition only contains 1 row. (2) Local spark, master as local[*] (3) Hive metastore, co-located on the same machine as Spark. **3. Workload** N delete concurrently. Each is to delete one row. So no conflict. Our delete is Copy-on-Write, so the delete is mainly a overwrite or a delete call against Iceberg **4. Result** (Linux and Mac show similar results) (1) 5 concurrent deletes: done in 13s (2) 10 concurrent deletes: a. Need to set commit.retry.num-retries to 10, done in 1m12s. b. If commit.retry.num-retries takes the default (as 4), 5 deletes failed after 5 retries (3) 50 concurrent deletes: a. Set commit.retry.num-retries to 50 (A smaller value might be also ok), done in about 20-25 min. But it is achieved only when I have a fresh metastore (restart the process). b. When tested with a long run metastore, most of the delete (80%) stop retrying because the default timeout of commit.retry.total-timeout-ms (30min) is reached. The failed ones retried 7 times in average. I am not sure if some bad settings (in Iceberg or Hive metastore...) lead to the result above. Please kindly share your insights or your setup/settings if it is ok. It seems that here is no conflict but only competing for "metadata swap" and retries, because a lot of the following error message could be seen: ``` org.apache.iceberg.exceptions.CommitFailedException: Base metadata location 'aaa' is not same as the current table metadata location 'bbb' for table ``` So I am guess if the metadata swap is that "cheap"... ^_^ Correct me if I am wrong.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org