Hi HBase Devs,
Nicolae Popa found an inconsistent behavior when doing scan with filter,
there is maxVersions configured for column family.
Start with the example.
hbase(main):001:0> create 't1', {NAME => 'f1', VERSIONS => 1}
hbase(main):002:0> put 't1', 'r1', 'f1:q1', 'a'
hbase(main):003:0> put 't1', 'r1', 'f1:q1', ‘b'
// There are two versions for r1, f1:q1
hbase(main):004:0> scan 't1'
ROW COLUMN+CELL
r1 column=f1:q1,
timestamp=1488244089712, value=b
1 row(s)
// Scan with value filter ‘a’, returns the cell for ‘a’, even maxVersions is
configured to be 1
hbase(main):006:0> scan 't1', {FILTER => "ValueFilter(=,'binary:a')"}
ROW COLUMN+CELL
r1 column=f1:q1,
timestamp=1488244087738, value=a
1 row(s)
hbase(main):007:0> scan 't1', {FILTER => "ValueFilter(=,'binary:b')"}
ROW COLUMN+CELL
r1 column=f1:q1,
timestamp=1488244089712, value=b
1 row(s)
// After flush and major compaction, the older version is deleted from hfile.
hbase(main):011:0> flush 't1'
hbase(main):012:0> major_compact 't1'
hbase(main):013:0> scan 't1', {FILTER => "ValueFilter(=,'binary:b')"}
ROW COLUMN+CELL
r1 column=f1:q1,
timestamp=1488244089712, value=b
1 row(s)
//Scan with value filter ‘a’, returns nothing now.
hbase(main):014:0> scan 't1', {FILTER => "ValueFilter(=,'binary:a')"}
ROW COLUMN+CELL
0 row(s)
hbase(main):015:0>
In the above example, the scan result for valueFilter ‘a” is inconsistent
across flush and major compaction. The reason is that when filter returns SKIP,
the version count is not increased. The older version is treated as
the latest version.
Is this the expected behavior? when maxVersions is specified in HCD, is user
supposed to see the latest maxVersions or it could be affected by filters? It
is not a raw scan in this example.
Thanks,
Huaxiang Sun