dengzhhu653 commented on PR #5929: URL: https://github.com/apache/hive/pull/5929#issuecomment-3043467323
> There was another PR for the same: #5567 cc @InvisibleProgrammer I lean toward using retries rather than introducing locking. Non-blocking, better throughput under normal conditions. > > ``` > SqlRetryHandler sqlRetryHandler = new SqlRetryHandler(conf, dbType); > ... > return sqlRetryHandler.executeWithRetry( > new SqlRetryCallProperties() > .withCallerId("updateTableColumnStatistics") > .withRetryOnDuplicateKey(true), > () -> updateTableColumnStatisticsInternal( > colStats, validWriteIds, writeId, catName, dbName, tableName)); > ``` > > And since HMS provides `RetryingMetaStoreClient`, this might be redundant. > > -1 on the locking approach Why locking approach isn't good? Retries might bring another problem: it might cause the param key `COLUMN_STATS_ACCURATE` inaccurate or missing the marker for some columns, for example: step 1: client1 has col1 updated, and set `COLUMN_STATS_ACCURATE` to `col1: true` step 2: client2 for col2 and client3 for col3 both throw the duplicate key exception, then retry... step 3: client2 and client 3 both read `COLUMN_STATS_ACCURATE` as `col1: true`, then update the param key separately. step 4: After step 3 either `col2 : true` or `col3: true` is missing in `COLUMN_STATS_ACCURATE`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org