[jira] [Commented] (HBASE-3562) ValueFilter is being evaluated before performing the column match
[ https://issues.apache.org/jira/browse/HBASE-3562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15672429#comment-15672429 ] Duo Zhang commented on HBASE-3562: -- Do you mean we should commit the UTs in this patch? Now in master, we will call columns.checkColumn before evaluating filter so I think the problem described here is gone. But in general, I think we should also count versions before evaluating filters. The current implementation(filter then count versions) may returns different results on the same data set due to major compaction. Think of this. You set maxVersions to 3, and there are 4 versions. Your filter will filter out the 3 newer versions, so you will get the oldest version when doing a get or scan. And here comes a major compaction, the oldest version is reclaimed. At this time you will get nothing when doing the same get or scan. We need to fix this I think although this is an 'incompatible change'. Thanks. > ValueFilter is being evaluated before performing the column match > - > > Key: HBASE-3562 > URL: https://issues.apache.org/jira/browse/HBASE-3562 > Project: HBase > Issue Type: Bug > Components: Filters >Affects Versions: 0.90.0, 0.94.7 >Reporter: Evert Arckens > Attachments: HBASE-3562.patch > > > When performing a Get operation where a both a column is specified and a > ValueFilter, the ValueFilter is evaluated before making the column match as > is indicated in the javadoc of Get.setFilter() : " {@link > Filter#filterKeyValue(KeyValue)} is called AFTER all tests for ttl, column > match, deletes and max versions have been run. " > The is shown in the little test below, which uses a TestComparator extending > a WritableByteArrayComparable. > public void testFilter() throws Exception { > byte[] cf = Bytes.toBytes("cf"); > byte[] row = Bytes.toBytes("row"); > byte[] col1 = Bytes.toBytes("col1"); > byte[] col2 = Bytes.toBytes("col2"); > Put put = new Put(row); > put.add(cf, col1, new byte[]{(byte)1}); > put.add(cf, col2, new byte[]{(byte)2}); > table.put(put); > Get get = new Get(row); > get.addColumn(cf, col2); // We only want to retrieve col2 > TestComparator testComparator = new TestComparator(); > Filter filter = new ValueFilter(CompareOp.EQUAL, testComparator); > get.setFilter(filter); > Result result = table.get(get); > } > public class TestComparator extends WritableByteArrayComparable { > /** > * Nullary constructor, for Writable > */ > public TestComparator() { > super(); > } > > @Override > public int compareTo(byte[] theirValue) { > if (theirValue[0] == (byte)1) { > // If the column match was done before evaluating the filter, we > should never get here. > throw new RuntimeException("I only expect (byte)2 in col2, not > (byte)1 from col1"); > } > if (theirValue[0] == (byte)2) { > return 0; > } > else return 1; > } > } > When only one column should be retrieved, this can be worked around by using > a SingleColumnValueFilter instead of the ValueFilter. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-3562) ValueFilter is being evaluated before performing the column match
[ https://issues.apache.org/jira/browse/HBASE-3562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15671544#comment-15671544 ] Esteban Gutierrez commented on HBASE-3562: -- [~Apache9] any thoughts on this? There seems that the unit tests in the patch would be helpful and there is an addition to the ColumnTracker that might be helpful in other cases. > ValueFilter is being evaluated before performing the column match > - > > Key: HBASE-3562 > URL: https://issues.apache.org/jira/browse/HBASE-3562 > Project: HBase > Issue Type: Bug > Components: Filters >Affects Versions: 0.90.0, 0.94.7 >Reporter: Evert Arckens > Attachments: HBASE-3562.patch > > > When performing a Get operation where a both a column is specified and a > ValueFilter, the ValueFilter is evaluated before making the column match as > is indicated in the javadoc of Get.setFilter() : " {@link > Filter#filterKeyValue(KeyValue)} is called AFTER all tests for ttl, column > match, deletes and max versions have been run. " > The is shown in the little test below, which uses a TestComparator extending > a WritableByteArrayComparable. > public void testFilter() throws Exception { > byte[] cf = Bytes.toBytes("cf"); > byte[] row = Bytes.toBytes("row"); > byte[] col1 = Bytes.toBytes("col1"); > byte[] col2 = Bytes.toBytes("col2"); > Put put = new Put(row); > put.add(cf, col1, new byte[]{(byte)1}); > put.add(cf, col2, new byte[]{(byte)2}); > table.put(put); > Get get = new Get(row); > get.addColumn(cf, col2); // We only want to retrieve col2 > TestComparator testComparator = new TestComparator(); > Filter filter = new ValueFilter(CompareOp.EQUAL, testComparator); > get.setFilter(filter); > Result result = table.get(get); > } > public class TestComparator extends WritableByteArrayComparable { > /** > * Nullary constructor, for Writable > */ > public TestComparator() { > super(); > } > > @Override > public int compareTo(byte[] theirValue) { > if (theirValue[0] == (byte)1) { > // If the column match was done before evaluating the filter, we > should never get here. > throw new RuntimeException("I only expect (byte)2 in col2, not > (byte)1 from col1"); > } > if (theirValue[0] == (byte)2) { > return 0; > } > else return 1; > } > } > When only one column should be retrieved, this can be worked around by using > a SingleColumnValueFilter instead of the ValueFilter. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-3562) ValueFilter is being evaluated before performing the column match
[ https://issues.apache.org/jira/browse/HBASE-3562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13704813#comment-13704813 ] Federico Gaule commented on HBASE-3562: --- Hi everyone, I'm facing a similar issue. I'm requesting a number of contiguous columns using #addColumn and a FilterList expecting to be applied to the columns i request. But, the filters are being applied to all columns PLUS the next one to the last one i requested. As Workaround, setting a ColumnRangeFilter seems to solve the problem. ValueFilter is being evaluated before performing the column match - Key: HBASE-3562 URL: https://issues.apache.org/jira/browse/HBASE-3562 Project: HBase Issue Type: Bug Components: Filters Affects Versions: 0.90.0 Reporter: Evert Arckens Attachments: HBASE-3562.patch When performing a Get operation where a both a column is specified and a ValueFilter, the ValueFilter is evaluated before making the column match as is indicated in the javadoc of Get.setFilter() : {@link Filter#filterKeyValue(KeyValue)} is called AFTER all tests for ttl, column match, deletes and max versions have been run. The is shown in the little test below, which uses a TestComparator extending a WritableByteArrayComparable. public void testFilter() throws Exception { byte[] cf = Bytes.toBytes(cf); byte[] row = Bytes.toBytes(row); byte[] col1 = Bytes.toBytes(col1); byte[] col2 = Bytes.toBytes(col2); Put put = new Put(row); put.add(cf, col1, new byte[]{(byte)1}); put.add(cf, col2, new byte[]{(byte)2}); table.put(put); Get get = new Get(row); get.addColumn(cf, col2); // We only want to retrieve col2 TestComparator testComparator = new TestComparator(); Filter filter = new ValueFilter(CompareOp.EQUAL, testComparator); get.setFilter(filter); Result result = table.get(get); } public class TestComparator extends WritableByteArrayComparable { /** * Nullary constructor, for Writable */ public TestComparator() { super(); } @Override public int compareTo(byte[] theirValue) { if (theirValue[0] == (byte)1) { // If the column match was done before evaluating the filter, we should never get here. throw new RuntimeException(I only expect (byte)2 in col2, not (byte)1 from col1); } if (theirValue[0] == (byte)2) { return 0; } else return 1; } } When only one column should be retrieved, this can be worked around by using a SingleColumnValueFilter instead of the ValueFilter. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-3562) ValueFilter is being evaluated before performing the column match
[ https://issues.apache.org/jira/browse/HBASE-3562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13017371#comment-13017371 ] Evert Arckens commented on HBASE-3562: -- Jonathan, I'm making the changes you propose. Concerning the unit tests however, I'm not sure if an end-result test is usefull or what exactly to test in an end result test. After all, the result from a read query will be the same, no matter if the column is selected first or the filter is applied first. It is only that if the filter would be applied first it should be able to run against any value from any column (i.e. it should be robust). But that's something I guess is already covered by the test I included. Can you go into a bit more detail of what kind of tests you think would be usefull? ValueFilter is being evaluated before performing the column match - Key: HBASE-3562 URL: https://issues.apache.org/jira/browse/HBASE-3562 Project: HBase Issue Type: Bug Components: filters Affects Versions: 0.90.0 Reporter: Evert Arckens Attachments: HBASE-3562.patch When performing a Get operation where a both a column is specified and a ValueFilter, the ValueFilter is evaluated before making the column match as is indicated in the javadoc of Get.setFilter() : {@link Filter#filterKeyValue(KeyValue)} is called AFTER all tests for ttl, column match, deletes and max versions have been run. The is shown in the little test below, which uses a TestComparator extending a WritableByteArrayComparable. public void testFilter() throws Exception { byte[] cf = Bytes.toBytes(cf); byte[] row = Bytes.toBytes(row); byte[] col1 = Bytes.toBytes(col1); byte[] col2 = Bytes.toBytes(col2); Put put = new Put(row); put.add(cf, col1, new byte[]{(byte)1}); put.add(cf, col2, new byte[]{(byte)2}); table.put(put); Get get = new Get(row); get.addColumn(cf, col2); // We only want to retrieve col2 TestComparator testComparator = new TestComparator(); Filter filter = new ValueFilter(CompareOp.EQUAL, testComparator); get.setFilter(filter); Result result = table.get(get); } public class TestComparator extends WritableByteArrayComparable { /** * Nullary constructor, for Writable */ public TestComparator() { super(); } @Override public int compareTo(byte[] theirValue) { if (theirValue[0] == (byte)1) { // If the column match was done before evaluating the filter, we should never get here. throw new RuntimeException(I only expect (byte)2 in col2, not (byte)1 from col1); } if (theirValue[0] == (byte)2) { return 0; } else return 1; } } When only one column should be retrieved, this can be worked around by using a SingleColumnValueFilter instead of the ValueFilter. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-3562) ValueFilter is being evaluated before performing the column match
[ https://issues.apache.org/jira/browse/HBASE-3562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13014748#comment-13014748 ] Jonathan Gray commented on HBASE-3562: -- Thanks for looking into this Evert. This is definitely some tricky stuff. A few comments on your patch... - Our convention in conditionals is to put the variable first. I find it a little tricky to read the code when the constant is first. For example: {code} if (MatchCode.INCLUDE == mc) {code} should be {code} if (mc == MatchCode.INCLUDE) {code} (And all the other places where you have this type of logic) - The unit test {{TestColumnMatchAndFilterOrder}} is clever how you check correctness, but I think it would be good to actually do a read query and verify the results for a few different combinations of the query to prove correctness of the overall algorithm. Other changes to SQM down the road might change more behavior / order of operations, so this test may no longer apply or give full coverage for correctness. Having some tests which don't rely on the precise server-side interactions but rather confirm the end results will be more applicable as we move forward. - You have some lines that are 80 characters, especially in some of the javadoc. Just wrap that so all lines are = 80 chars. - There was a comment in SQM that described why the filter was checked first. Can you write some inline comments to describe how this works now? There are a couple lines at the end but it will be useful to have some explanation on why this has changed and what the behavior is now. - Is there any particular reason that you had includeLatestColumn take timestamp as a parameter? The timestamp is passed in the check call, and we could just hang on to that. It just feels a little strange to me since you should never pass a different timestamp, and the tracker can know which was the latest column. Overall this is really solid! Great work Evert! ValueFilter is being evaluated before performing the column match - Key: HBASE-3562 URL: https://issues.apache.org/jira/browse/HBASE-3562 Project: HBase Issue Type: Bug Components: filters Affects Versions: 0.90.0 Reporter: Evert Arckens Attachments: HBASE-3562.patch When performing a Get operation where a both a column is specified and a ValueFilter, the ValueFilter is evaluated before making the column match as is indicated in the javadoc of Get.setFilter() : {@link Filter#filterKeyValue(KeyValue)} is called AFTER all tests for ttl, column match, deletes and max versions have been run. The is shown in the little test below, which uses a TestComparator extending a WritableByteArrayComparable. public void testFilter() throws Exception { byte[] cf = Bytes.toBytes(cf); byte[] row = Bytes.toBytes(row); byte[] col1 = Bytes.toBytes(col1); byte[] col2 = Bytes.toBytes(col2); Put put = new Put(row); put.add(cf, col1, new byte[]{(byte)1}); put.add(cf, col2, new byte[]{(byte)2}); table.put(put); Get get = new Get(row); get.addColumn(cf, col2); // We only want to retrieve col2 TestComparator testComparator = new TestComparator(); Filter filter = new ValueFilter(CompareOp.EQUAL, testComparator); get.setFilter(filter); Result result = table.get(get); } public class TestComparator extends WritableByteArrayComparable { /** * Nullary constructor, for Writable */ public TestComparator() { super(); } @Override public int compareTo(byte[] theirValue) { if (theirValue[0] == (byte)1) { // If the column match was done before evaluating the filter, we should never get here. throw new RuntimeException(I only expect (byte)2 in col2, not (byte)1 from col1); } if (theirValue[0] == (byte)2) { return 0; } else return 1; } } When only one column should be retrieved, this can be worked around by using a SingleColumnValueFilter instead of the ValueFilter. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-3562) ValueFilter is being evaluated before performing the column match
[ https://issues.apache.org/jira/browse/HBASE-3562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13011175#comment-13011175 ] Evert Arckens commented on HBASE-3562: -- In ScanQueryMatcher.match I would do the columns.checkColumn call first and only if that returns MatchCode.INCLUDE execute the filters. I think this would be more efficient as well since calculating to skip a column or not will usually be faster than evaluating one ore more filters. However, in the code is mentioned explicitly : /** * Filters should be checked before checking column trackers. If we do * otherwise, as was previously being done, ColumnTracker may increment its * counter for even that KV which may be discarded later on by Filter. This * would lead to incorrect results in certain cases. */ It is not completely clear to me what the exact purpose of the counter on the ColumnTracker is or what the problem would be if it was incremented. Maybe calling ((ExplicitColumnTracker)columns).doneWithColumn (like is done in getNextRowOrNextColumn) explicitly when a filter skips a column can help here? ValueFilter is being evaluated before performing the column match - Key: HBASE-3562 URL: https://issues.apache.org/jira/browse/HBASE-3562 Project: HBase Issue Type: Bug Components: filters Affects Versions: 0.90.0 Reporter: Evert Arckens When performing a Get operation where a both a column is specified and a ValueFilter, the ValueFilter is evaluated before making the column match as is indicated in the javadoc of Get.setFilter() : {@link Filter#filterKeyValue(KeyValue)} is called AFTER all tests for ttl, column match, deletes and max versions have been run. The is shown in the little test below, which uses a TestComparator extending a WritableByteArrayComparable. public void testFilter() throws Exception { byte[] cf = Bytes.toBytes(cf); byte[] row = Bytes.toBytes(row); byte[] col1 = Bytes.toBytes(col1); byte[] col2 = Bytes.toBytes(col2); Put put = new Put(row); put.add(cf, col1, new byte[]{(byte)1}); put.add(cf, col2, new byte[]{(byte)2}); table.put(put); Get get = new Get(row); get.addColumn(cf, col2); // We only want to retrieve col2 TestComparator testComparator = new TestComparator(); Filter filter = new ValueFilter(CompareOp.EQUAL, testComparator); get.setFilter(filter); Result result = table.get(get); } public class TestComparator extends WritableByteArrayComparable { /** * Nullary constructor, for Writable */ public TestComparator() { super(); } @Override public int compareTo(byte[] theirValue) { if (theirValue[0] == (byte)1) { // If the column match was done before evaluating the filter, we should never get here. throw new RuntimeException(I only expect (byte)2 in col2, not (byte)1 from col1); } if (theirValue[0] == (byte)2) { return 0; } else return 1; } } When only one column should be retrieved, this can be worked around by using a SingleColumnValueFilter instead of the ValueFilter. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-3562) ValueFilter is being evaluated before performing the column match
[ https://issues.apache.org/jira/browse/HBASE-3562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13011248#comment-13011248 ] Jonathan Gray commented on HBASE-3562: -- The counter in ColumnTracker is responsible for tracking setMaxVersions. You may have queried for only the latest version, so once the ColumnTracker sees a given column, it will reject subsequent version of that columns. Currently there's no way for the CT to know that subsequent filters actually prevented it from being returned so it should not be included in the count of returned versions. We would need to introduce something like {{skippedPreviousKeyValue}} that could be sent back to the CT so it could undo the previous count. ValueFilter is being evaluated before performing the column match - Key: HBASE-3562 URL: https://issues.apache.org/jira/browse/HBASE-3562 Project: HBase Issue Type: Bug Components: filters Affects Versions: 0.90.0 Reporter: Evert Arckens When performing a Get operation where a both a column is specified and a ValueFilter, the ValueFilter is evaluated before making the column match as is indicated in the javadoc of Get.setFilter() : {@link Filter#filterKeyValue(KeyValue)} is called AFTER all tests for ttl, column match, deletes and max versions have been run. The is shown in the little test below, which uses a TestComparator extending a WritableByteArrayComparable. public void testFilter() throws Exception { byte[] cf = Bytes.toBytes(cf); byte[] row = Bytes.toBytes(row); byte[] col1 = Bytes.toBytes(col1); byte[] col2 = Bytes.toBytes(col2); Put put = new Put(row); put.add(cf, col1, new byte[]{(byte)1}); put.add(cf, col2, new byte[]{(byte)2}); table.put(put); Get get = new Get(row); get.addColumn(cf, col2); // We only want to retrieve col2 TestComparator testComparator = new TestComparator(); Filter filter = new ValueFilter(CompareOp.EQUAL, testComparator); get.setFilter(filter); Result result = table.get(get); } public class TestComparator extends WritableByteArrayComparable { /** * Nullary constructor, for Writable */ public TestComparator() { super(); } @Override public int compareTo(byte[] theirValue) { if (theirValue[0] == (byte)1) { // If the column match was done before evaluating the filter, we should never get here. throw new RuntimeException(I only expect (byte)2 in col2, not (byte)1 from col1); } if (theirValue[0] == (byte)2) { return 0; } else return 1; } } When only one column should be retrieved, this can be worked around by using a SingleColumnValueFilter instead of the ValueFilter. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-3562) ValueFilter is being evaluated before performing the column match
[ https://issues.apache.org/jira/browse/HBASE-3562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13011096#comment-13011096 ] stack commented on HBASE-3562: -- Do you have a fix Evert? (This might be a bit tricky to fix). ValueFilter is being evaluated before performing the column match - Key: HBASE-3562 URL: https://issues.apache.org/jira/browse/HBASE-3562 Project: HBase Issue Type: Bug Components: filters Affects Versions: 0.90.0 Reporter: Evert Arckens When performing a Get operation where a both a column is specified and a ValueFilter, the ValueFilter is evaluated before making the column match as is indicated in the javadoc of Get.setFilter() : {@link Filter#filterKeyValue(KeyValue)} is called AFTER all tests for ttl, column match, deletes and max versions have been run. The is shown in the little test below, which uses a TestComparator extending a WritableByteArrayComparable. public void testFilter() throws Exception { byte[] cf = Bytes.toBytes(cf); byte[] row = Bytes.toBytes(row); byte[] col1 = Bytes.toBytes(col1); byte[] col2 = Bytes.toBytes(col2); Put put = new Put(row); put.add(cf, col1, new byte[]{(byte)1}); put.add(cf, col2, new byte[]{(byte)2}); table.put(put); Get get = new Get(row); get.addColumn(cf, col2); // We only want to retrieve col2 TestComparator testComparator = new TestComparator(); Filter filter = new ValueFilter(CompareOp.EQUAL, testComparator); get.setFilter(filter); Result result = table.get(get); } public class TestComparator extends WritableByteArrayComparable { /** * Nullary constructor, for Writable */ public TestComparator() { super(); } @Override public int compareTo(byte[] theirValue) { if (theirValue[0] == (byte)1) { // If the column match was done before evaluating the filter, we should never get here. throw new RuntimeException(I only expect (byte)2 in col2, not (byte)1 from col1); } if (theirValue[0] == (byte)2) { return 0; } else return 1; } } When only one column should be retrieved, this can be worked around by using a SingleColumnValueFilter instead of the ValueFilter. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira