Hi HBase Devs,

    Nicolae Popa found an inconsistent behavior when doing scan with filter, 
there is maxVersions configured for column family.
    Start with the example.

hbase(main):001:0> create 't1', {NAME => 'f1', VERSIONS => 1}
hbase(main):002:0> put 't1', 'r1', 'f1:q1', 'a'
hbase(main):003:0> put 't1', 'r1', 'f1:q1', ‘b'

// There are two versions for r1, f1:q1

hbase(main):004:0> scan 't1'
ROW                                                  COLUMN+CELL                
                                                                                
                                            
 r1                                                  column=f1:q1, 
timestamp=1488244089712, value=b                                                
                                                         
1 row(s)

// Scan with value filter ‘a’, returns the cell for ‘a’, even maxVersions is 
configured to be 1
hbase(main):006:0> scan 't1', {FILTER => "ValueFilter(=,'binary:a')"}
ROW                                                  COLUMN+CELL                
                                                                                
                                            
 r1                                                  column=f1:q1, 
timestamp=1488244087738, value=a                                                
                                                         
1 row(s)
hbase(main):007:0> scan 't1', {FILTER => "ValueFilter(=,'binary:b')"}
ROW                                                  COLUMN+CELL                
                                                                                
                                            
 r1                                                  column=f1:q1, 
timestamp=1488244089712, value=b                                                
                                                         
1 row(s)

// After flush and major compaction, the older version is deleted from hfile.
hbase(main):011:0> flush 't1'
hbase(main):012:0> major_compact 't1'
hbase(main):013:0> scan 't1', {FILTER => "ValueFilter(=,'binary:b')"}
ROW                                                  COLUMN+CELL                
                                                                                
                                            
 r1                                                  column=f1:q1, 
timestamp=1488244089712, value=b                                                
                                                         
1 row(s)

//Scan with value filter ‘a’, returns nothing now.
hbase(main):014:0> scan 't1', {FILTER => "ValueFilter(=,'binary:a')"}
ROW                                                  COLUMN+CELL                
                                                                                
                                            
0 row(s)
hbase(main):015:0> 

In the above example, the scan result for valueFilter ‘a” is inconsistent 
across flush and major compaction. The reason is that when filter returns SKIP, 
the version count is not increased. The older version is treated as
the latest version.

Is this the expected behavior? when maxVersions is specified in HCD, is user 
supposed to see the latest maxVersions or it could be affected by filters? It 
is not a raw scan in this example.

Thanks,
Huaxiang Sun

Reply via email to