dengzhhu653 commented on PR #5929: URL: https://github.com/apache/hive/pull/5929#issuecomment-3451454074
> > I don't think retry is the good way, it introduces another issue, the `COLUMN_STATS` in `TABLE_PARAMS` would be got overwritten, and lead to some column markers missing, so it might make the stats import useless. For updating the stats, we have already had the in-process lock, it doesn't solve the parallel imports across the HMS instances, somehow we need the distributed lock to achieve it, the DB lock is one way. > > Locking TBLS and PARTITIONS isn’t necessarily a better solution. In large-scale, highly concurrent systems, locking often becomes a scalability bottleneck. Instead, most systems adopt optimistic locking strategies. Each operation assumes no conflict and only validates at commit, reducing contention and avoiding blocking. > > The idea is to balance consistency with throughput. Extending this principle to stats import (which btw is not a core functionality) we could consider a mechanism based on versioning rather than traditional locking. Such an approach generally scales far better in multi-instance HMS environments than depending solely on table or partition locks. Currently the HMS is lack of versioning, the RDMS nowadays provides MVCC for writes not block any reads, so the writes is the main concern here. The lock provides here is the row-level lock, it only blocks the write operation against the same table, such as the alter/drop the same table/partition, drop the db, or update the stats of this table/partition. Even without the explicit lock, we still have some exclusive row lock under the scenes, the RDMS in background ensures strong consistency and reliability the HMS benefits from and relies on, and provides a milliseconds to seconds query execution time, that can also have a satisfied throughput. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
