[ https://issues.apache.org/jira/browse/HBASE-18471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16129211#comment-16129211 ]
ramkrishna.s.vasudevan commented on HBASE-18471: ------------------------------------------------ My doubt is correct. Say assume we have qual1 and qual0. We first do a put for qual1 and then we add a deleteFamily. Say in the same test case after DeleteFamily is added, if we do puts for qual0 (instead of empty qual as done now) every thing works fine. I think the simple reason is because just after adding Put (qual1), Delete family, put(qual0, val0), put(qual0, val1) - the Deletefamily always sorts out first becuase it knows that qual0 is lesser than qual1 and so while scanning DeleteFamily always peeks out as the first Cell when we do StoreScanner#next(). But when an empty qualifier is added the sorting takes a different pattern. Ideally a cell with qualifier qual0 and a cell without qualifier the cell without qualifier should sort first and then the cell with qualifier. I think in CellComparator#compareColums if we can handle this then we are able to solve this issue? {code} if(lclength != 0 && rclength == 0) { // means the right hand side should be sorted lower. return 1; } if(lclength == 0 && rclength != 0) { // means the right hand side should be sorted higher. return -1; } {code} What do you think [~chia7712]? > The DeleteFamily cell is skipped when StoreScanner seeks to next column > ----------------------------------------------------------------------- > > Key: HBASE-18471 > URL: https://issues.apache.org/jira/browse/HBASE-18471 > Project: HBase > Issue Type: Bug > Components: Deletes, hbase, scan > Affects Versions: 3.0.0, 1.3.0, 1.3.1, 2.0.0-alpha-1 > Reporter: Thomas Martens > Assignee: Chia-Ping Tsai > Priority: Critical > Fix For: 2.0.0, 1.4.0, 1.3.2, 1.5.0, 1.2.7 > > Attachments: HBASE-18471.branch-1.2.v0.patch, HBASE-18471.v0.patch, > HBASE-18471.v1.patch, HBaseDmlTest.java > > > The qualifier of a deleted row (with keep deleted cells true) re-appears > after re-inserting the same row multiple times (with different timestamp) > with an empty qualifier. > Scenario: > # Put row with family and qualifier (timestamp 1). > # Delete entire row (timestamp 2). > # Put same row again with family without qualifier (timestamp 3). > A scan (latest version) returns the row with family without qualifier, > version 3 (which is correct). > # Put the same row again with family without qualifier (timestamp 4). > A scan (latest version) returns multiple rows: > * the row with family without qualifier, version 4 (which is correct). > * the row with family with qualifier, version 1 (which is wrong). > There is a test scenario attached. > output: > <LOG> 13:42:53,952 [main] client.HBaseAdmin - Started disable of test_dml > <LOG> 13:42:55,801 [main] client.HBaseAdmin - Disabled test_dml > <LOG> 13:42:57,256 [main] client.HBaseAdmin - Deleted test_dml > <LOG> 13:42:58,592 [main] client.HBaseAdmin - Created test_dml > Put row: 'myRow' with family: 'myFamily' with qualifier: 'myQualifier' with > timestamp: '1' > Scan printout => > Row: 'myRow', Timestamp: '1', Family: 'myFamily', Qualifier: 'myQualifier', > Value: 'myValue' > Delete row: 'myRow' > Scan printout => > Put row: 'myRow' with family: 'myFamily' with qualifier: 'null' with > timestamp: '3' > Scan printout => > Row: 'myRow', Timestamp: '3', Family: 'myFamily', Qualifier: '', Value: > 'myValue' > Put row: 'myRow' with family: 'myFamily' with qualifier: 'null' with > timestamp: '4' > Scan printout => > Row: 'myRow', Timestamp: '4', Family: 'myFamily', Qualifier: '', Value: > 'myValue' > {color:red}Row: 'myRow', Timestamp: '1', Family: 'myFamily', Qualifier: > 'myQualifier', Value: 'myValue'{color} -- This message was sent by Atlassian JIRA (v6.4.14#64029)