[ https://issues.apache.org/jira/browse/PHOENIX-4484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16458377#comment-16458377 ]
Ohad Shacham commented on PHOENIX-4484: --------------------------------------- [~jamestaylor], I think that I was wrong in this case and disabling the GC is not required. A general transaction might miss data if the low watermark exceeds the transaction timestamp during its run. This caused by the GC that removes all the versions of the key below the low watermark, except for the last one. During index population, the transaction has the fence id and it writes the data using auto commit (version and commit timestamp are the same) and does not need to commit. It is true that this transaction might miss data if the low watermark exceeds the fence id, however, if it misses data of a key K, it means that there exists another record of K with a version higher than the fence and lower than the low watermark. Because every entry written after the fence will be automatically added to the index (using the incremental mechanism) then the entry of K will be added to the index as well. It is true that we miss data, however, every transaction that might be interested in this data started below the low watermark and will be aborted on commit, so we don't really care. To sum up, the fact that at the fence, we enable the mechanism that updates the index with every mutation to the data table. Removes the need to disable the GC. > Write directly to HBase when creating an index for transactional table > ---------------------------------------------------------------------- > > Key: PHOENIX-4484 > URL: https://issues.apache.org/jira/browse/PHOENIX-4484 > Project: Phoenix > Issue Type: Sub-task > Reporter: Ohad Shacham > Assignee: Ohad Shacham > Priority: Major > > Today, when creating an index table for a non empty data table. The writes > are performed using the transaction api and both consumes client side memory, > for storing the writeset, and checks for conflict analysis upon commit. This > is redundant and can be replaced by direct write to HBase. For this reason, a > new function in the transaction abstraction layer should be added that writes > directly to HBase at the Tephra's case and adds shadow cells with the fence > id at the Omid case. -- This message was sent by Atlassian JIRA (v7.6.3#76005)