[ https://issues.apache.org/jira/browse/HIVE-21052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16739823#comment-16739823 ]
Eugene Koifman commented on HIVE-21052: --------------------------------------- it would be useful to add some tests of partitioned tables, with > 1 partition column AcidUtils.list() - how does this work when there are very many files (which I think would be common here)? Should it use some form of RemoteFileIterator? e.g. FileUtils.RemoteIteratorWithFilter > Make sure transactions get cleaned if they are aborted before addPartitions > is called > ------------------------------------------------------------------------------------- > > Key: HIVE-21052 > URL: https://issues.apache.org/jira/browse/HIVE-21052 > Project: Hive > Issue Type: Bug > Components: Transactions > Affects Versions: 3.1.1 > Reporter: Jaume M > Assignee: Jaume M > Priority: Critical > Attachments: Aborted Txn w_Direct Write.pdf, HIVE-21052.1.patch, > HIVE-21052.2.patch > > > If the transaction is aborted between openTxn and addPartitions and data has > been written on the table the transaction manager will think it's an empty > transaction and no cleaning will be done. > This is currently an issue in the streaming API and in micromanaged tables. > As proposed by [~ekoifman] this can be solved by: > * Writing an entry with a special marker to TXN_COMPONENTS at openTxn and > when addPartitions is called remove this entry from TXN_COMPONENTS and add > the corresponding partition entry to TXN_COMPONENTS. > * If the cleaner finds and entry with a special marker in TXN_COMPONENTS that > specifies that a transaction was opened and it was aborted it must generate > jobs for the worker for every possible partition available. > cc [~ewohlstadter] -- This message was sent by Atlassian JIRA (v7.6.3#76005)