dengzhhu653 commented on PR #5929:
URL: https://github.com/apache/hive/pull/5929#issuecomment-3451454074

   > > I don't think retry is the good way, it introduces another issue, the 
`COLUMN_STATS` in `TABLE_PARAMS` would be got overwritten, and lead to some 
column markers missing, so it might make the stats import useless. For updating 
the stats, we have already had the in-process lock, it doesn't solve the 
parallel imports across the HMS instances, somehow we need the distributed lock 
to achieve it, the DB lock is one way.
   > 
   > Locking TBLS and PARTITIONS isn’t necessarily a better solution. In 
large-scale, highly concurrent systems, locking often becomes a scalability 
bottleneck. Instead, most systems adopt optimistic locking strategies. Each 
operation assumes no conflict and only validates at commit, reducing contention 
and avoiding blocking.
   > 
   > The idea is to balance consistency with throughput. Extending this 
principle to stats import (which btw is not a core functionality) we could 
consider a mechanism based on versioning rather than traditional locking. Such 
an approach generally scales far better in multi-instance HMS environments than 
depending solely on table or partition locks.
   
   Currently the HMS is lack of versioning, the RDMS nowadays provides MVCC for 
writes not block any reads, so the writes is the main concern here.
   The lock provides here is the row-level lock, it only blocks the write 
operation against the same table, such as the alter/drop the same 
table/partition, drop the db, or update the stats of this table/partition.
   
   Even without the explicit lock, we still have some exclusive row lock under 
the scenes, the RDMS in background ensures strong consistency and reliability 
the HMS benefits from and relies on, and provides a milliseconds to seconds 
query execution time, that can also have a satisfied throughput.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to