[ https://issues.apache.org/jira/browse/PHOENIX-4277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16208751#comment-16208751 ]
Thomas D'Silva commented on PHOENIX-4277: ----------------------------------------- +1 > Treat delete markers consistently with puts for point-in-time scans > ------------------------------------------------------------------- > > Key: PHOENIX-4277 > URL: https://issues.apache.org/jira/browse/PHOENIX-4277 > Project: Phoenix > Issue Type: Bug > Reporter: James Taylor > Assignee: Vincent Poon > Attachments: PHOENIX-4277_v2.patch, PHOENIX-4277_wip.patch > > > The IndexScrutinyTool relies on doing point-in-time scans to determine > consistency between the index and data tables. Unfortunately, deletes to the > tables cause a problem with this approach, since delete markers take effect > even if they're at a later time stamp than the point-in-time at which the > scan is being done (unless KEEP_DELETED_CELLS is true). The logic of this is > that scans should get the same results before and after a compaction take > place. > Taking snapshots does not help with this since they cannot be taken at a > point-in-time and the delete markers will act the same way - there's no way > to guarantee that the index and data table snapshots have the same "logical" > set of data. > Using raw scans would allow us to see the delete markers and do the correct > point-in-time filtering ourselves. We'd need to write the filters to do this > correctly (see the Tephra TransactionVisibilityFilter for an implementation > of this that could be adapted). We'd also need to hook this into Phoenix or > potentially dip down to the HBase level to do this. > Thanks for brainstorming on this with me, [~lhofhansl]. -- This message was sent by Atlassian JIRA (v6.4.14#64029)