Hi,

Reading the source code of Hive 3.x and I have a question regarding
transaction IDs which form the span of a transaction: it's begin (TXN ID)
and commit ID (NEXT_TXN_ID at time of commit).

Why is it that we have a global timeline for transactions rather than a
timeline partitioned at the granularity of a database, kind of similar to
how write IDs are partitioned per table but at the database scope?

E.g.,

NEXT_TXN_ID
+-------+-------------------+
| DB    | NTXN_NEXT  |
+-------+-------------------+
| test1 | 23                   |
| test2 | 4                     |
+-------+-------------------+

Same question could also be applied to NEXT_LOCK_ID.

I am just curious because it seems like partitioning the transaction (and
lock IDs) would reduce the granularity of locking in the various
transactional methods. For example, openTxn invocations are mutexed with
all other openTxn invocations even if they are for transactions running in
distinct database domains.  Similarly for openTxn mutexing with respect to
commitTxn if there is a write-write conflict, which I would have thought
would only be the case if they are applicable to the same database. I'm
sure that this would have the side effect of increasing the complexity of
other subsystems but I had to ask what the rationale was behind this.

(I'm new to Hive to please forgive me if the answer is obvious.)

Regards,

Granville

Reply via email to