Hi, Reading the source code of Hive 3.x and I have a question regarding transaction IDs which form the span of a transaction: it's begin (TXN ID) and commit ID (NEXT_TXN_ID at time of commit).
Why is it that we have a global timeline for transactions rather than a timeline partitioned at the granularity of a database, kind of similar to how write IDs are partitioned per table but at the database scope? E.g., NEXT_TXN_ID +-------+-------------------+ | DB | NTXN_NEXT | +-------+-------------------+ | test1 | 23 | | test2 | 4 | +-------+-------------------+ Same question could also be applied to NEXT_LOCK_ID. I am just curious because it seems like partitioning the transaction (and lock IDs) would reduce the granularity of locking in the various transactional methods. For example, openTxn invocations are mutexed with all other openTxn invocations even if they are for transactions running in distinct database domains. Similarly for openTxn mutexing with respect to commitTxn if there is a write-write conflict, which I would have thought would only be the case if they are applicable to the same database. I'm sure that this would have the side effect of increasing the complexity of other subsystems but I had to ask what the rationale was behind this. (I'm new to Hive to please forgive me if the answer is obvious.) Regards, Granville
