Eugene Koifman created HIVE-16564:
-------------------------------------
Summary: StreamingAPI is locking too much?
Key: HIVE-16564
URL: https://issues.apache.org/jira/browse/HIVE-16564
Project: Hive
Issue Type: Bug
Components: HCatalog, Transactions
Affects Versions: 1.0.0
Reporter: Eugene Koifman
Assignee: Eugene Koifman
Currently _TransactionBatchImpl.beginNextTransactionImpl()_ acquires Shared
locks for each Transaction in the batch.
Especially under high load this creates pressure on the LockManager (i.e.
Metastore) and degrades performance of Ingest itself.
Because all transactions in a batch write to the same physical file and the
fact that for Acid tables (which are required for Streaming Ingest) shared
locks only protect against Exclusive locks (like drop table),
acquiring/releasing locks doesn't for each txn doesn't achieve much.
One possibility to acquire all locks (i.e. for all txns) at the time the batch
is created (same as is done for openTxn() for all txns in the batch). Locks
for each txn in the batch will be released automatically when commit is called
for the respective txn.
Alternatively, don't acquire any locks - this means someone may drop a table
while it's written to but using locks here doesn't buy much. Say a Drop
request is issued when a write is in progress. It will block until the write
releases it's lock and execute immediately after that. Thus none of the data
of that write is visible for any meaningful length of time anyway.
Allow a "meta lock" - a lock not associated with any specific txn, that is held
for the duration of the TransactionBatch. This sort of breaks the model
(especially since HIVE-12636). Perhaps each batch can open one "extra" txn for
internal purposes, just to acquire this "meta lock". No data will ever be
tagged with this "extra" txn.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)