waterlx edited a comment on issue #732: [DISCUSS] Update metadata serially
URL: 
https://github.com/apache/incubator-iceberg/issues/732#issuecomment-585637463
 
 
   Thanks for sharing your insight! It is of great help for us.
   I made some tests regarding the "metadata retries" based on Hive Catalog. It 
seems that it is not "cheap" 8-)
   
   The report is as below:
   **1. Environment**
   (1) Linux, 8 core + 16G mem, *1
   (2) Mac,  6 core + 16G mem, *1
   
   **2. Setup**
   (1) Iceberg table: Partitioned by identity(field_a), 10,000 partitions. Each 
partition only contains 1 row. 
   (2) Local spark, master as local[*]
   (3) Hive metastore, co-located on the same machine as Spark.
   
   **3. Workload**
   N delete concurrently. Each is to delete one row. So no conflict.
   Our delete is Copy-on-Write, so the delete is mainly a overwrite or a delete 
call against Iceberg
   
   **4. Result**
   (Linux and Mac show similar results)
   (1) 5 concurrent deletes: done in 13s
   (2) 10 concurrent deletes:
   a. Need to set commit.retry.num-retries to 10, done in 1m12s.
   b. If commit.retry.num-retries takes the default (as 4), 5 deletes failed 
after 5 retries
   (3) 50 concurrent deletes: 
   a. Set commit.retry.num-retries to 50 (A smaller value might be also ok), 
done in about 20-25 min. But it is achieved only when I have a fresh metastore 
(restart the process).
   b. When tested with a long run metastore, most of the delete (80%) stop 
retrying because the default timeout of commit.retry.total-timeout-ms (30min) 
is reached. The failed ones retried 7 times in average.
   
   I am not sure if some bad settings (in Iceberg or Hive metastore...) lead to 
the result above. Please kindly share your insights or your setup/settings if 
it is ok.
   
   It seems that here is no conflict but only competing for "metadata swap" and 
retries, because a lot of the following error message could be seen:
   ```
   org.apache.iceberg.exceptions.CommitFailedException: Base metadata location 
'aaa' is not same as the current table metadata location 'bbb' for table
   ```
   So I am guess if the metadata swap is that "cheap"...  ^_^ Correct me if I 
am wrong.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to