Re: [PR] HIVE-28578: Concurrency issue in updateTableColumnStatistics [hive]

via GitHub Sun, 06 Jul 2025 21:49:56 -0700


dengzhhu653 commented on PR #5929:
URL: https://github.com/apache/hive/pull/5929#issuecomment-3043467323


   > There was another PR for the same: #5567 cc @InvisibleProgrammer I lean 
toward using retries rather than introducing locking. Non-blocking, better 
throughput under normal conditions.
   > 
   > ```
   > SqlRetryHandler sqlRetryHandler = new SqlRetryHandler(conf, dbType);
   > ...
   > return sqlRetryHandler.executeWithRetry(
   >     new SqlRetryCallProperties()
   >         .withCallerId("updateTableColumnStatistics")
   >         .withRetryOnDuplicateKey(true),
   >     () -> updateTableColumnStatisticsInternal(
   >             colStats, validWriteIds, writeId, catName, dbName, tableName));
   > ```
   > 
   > And since HMS provides `RetryingMetaStoreClient`, this might be redundant.
   > 
   > -1 on the locking approach
   
   Why locking approach isn't good?
   Retries might bring another problem: it might cause the param key 
`COLUMN_STATS_ACCURATE` inaccurate or missing the marker for some columns, for 
example:
     step 1: client1 has col1 updated, and set `COLUMN_STATS_ACCURATE` to 
`col1: true`
     step 2: client2 for col2 and client3 for col3 both throw the duplicate key 
exception, then retry...
     step 3: client2 and client 3 both read `COLUMN_STATS_ACCURATE` as `col1: 
true`, then update the param key separately.
    step 4:  After step 3 either `col2 : true` or `col3: true` is missing in 
`COLUMN_STATS_ACCURATE`.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org

Re: [PR] HIVE-28578: Concurrency issue in updateTableColumnStatistics [hive]

Reply via email to