[jira] [Updated] (HBASE-9778) Avoid seeking to next column in ExplicitColumnTracker when possible
[ https://issues.apache.org/jira/browse/HBASE-9778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl updated HBASE-9778: - Attachment: 9778-trunk-v9.txt 9778-0.94-v9.txt What I am going to commit. (mostly spelling/naming fixes) Avoid seeking to next column in ExplicitColumnTracker when possible --- Key: HBASE-9778 URL: https://issues.apache.org/jira/browse/HBASE-9778 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Assignee: Lars Hofhansl Attachments: 9778-0.94-v2.txt, 9778-0.94-v3.txt, 9778-0.94-v4.txt, 9778-0.94-v5.txt, 9778-0.94-v6.txt, 9778-0.94-v7.txt, 9778-0.94-v8.txt, 9778-0.94-v9.txt, 9778-0.94.txt, 9778-trunk-v2.txt, 9778-trunk-v3.txt, 9778-trunk-v6.txt, 9778-trunk-v7.txt, 9778-trunk-v8.txt, 9778-trunk-v9.txt, 9778-trunk.txt The issue of slow seeking in ExplicitColumnTracker was brought up by [~vrodionov] on the dev list. My idea here is to avoid the seeking if we know that there aren't many versions to skip. How do we know? We'll use the column family's VERSIONS setting as a hint. If VERSIONS is set to 1 (or maybe some value 10) we'll avoid the seek and call SKIP repeatedly. HBASE-9769 has some initial number for this approach: Interestingly it depends on which column(s) is (are) selected. Some numbers: 4m rows, 5 cols each, 1 cf, 10 bytes values, VERSIONS=1, everything filtered at the server with a ValueFilter. Everything measured in seconds. Without patch: ||Wildcard||Col 1||Col 2||Col 4||Col 5||Col 2+4|| |6.4|8.5|14.3|14.6|11.1|20.3| With patch: ||Wildcard||Col 1||Col 2||Col 4||Col 5||Col 2+4|| |6.4|8.4|8.9|9.9|6.4|10.0| Variation here was +- 0.2s. So with this patch scanning is 2x faster than without in some cases, and never slower. No special hint needed, beyond declaring VERSIONS correctly. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HBASE-9778) Avoid seeking to next column in ExplicitColumnTracker when possible
[ https://issues.apache.org/jira/browse/HBASE-9778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl updated HBASE-9778: - Attachment: 9778-0.94-v7.txt Same for 0.94. Please have a look at the book changes. If no further comments I'll commit tomorrow. Avoid seeking to next column in ExplicitColumnTracker when possible --- Key: HBASE-9778 URL: https://issues.apache.org/jira/browse/HBASE-9778 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Assignee: Lars Hofhansl Attachments: 9778-0.94-v2.txt, 9778-0.94-v3.txt, 9778-0.94-v4.txt, 9778-0.94-v5.txt, 9778-0.94-v6.txt, 9778-0.94-v7.txt, 9778-0.94.txt, 9778-trunk-v2.txt, 9778-trunk-v3.txt, 9778-trunk-v6.txt, 9778-trunk-v7.txt, 9778-trunk.txt The issue of slow seeking in ExplicitColumnTracker was brought up by [~vrodionov] on the dev list. My idea here is to avoid the seeking if we know that there aren't many versions to skip. How do we know? We'll use the column family's VERSIONS setting as a hint. If VERSIONS is set to 1 (or maybe some value 10) we'll avoid the seek and call SKIP repeatedly. HBASE-9769 has some initial number for this approach: Interestingly it depends on which column(s) is (are) selected. Some numbers: 4m rows, 5 cols each, 1 cf, 10 bytes values, VERSIONS=1, everything filtered at the server with a ValueFilter. Everything measured in seconds. Without patch: ||Wildcard||Col 1||Col 2||Col 4||Col 5||Col 2+4|| |6.4|8.5|14.3|14.6|11.1|20.3| With patch: ||Wildcard||Col 1||Col 2||Col 4||Col 5||Col 2+4|| |6.4|8.4|8.9|9.9|6.4|10.0| Variation here was +- 0.2s. So with this patch scanning is 2x faster than without in some cases, and never slower. No special hint needed, beyond declaring VERSIONS correctly. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HBASE-9778) Avoid seeking to next column in ExplicitColumnTracker when possible
[ https://issues.apache.org/jira/browse/HBASE-9778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl updated HBASE-9778: - Attachment: 9778-trunk-v7.txt Trunk patch with doc changes. Avoid seeking to next column in ExplicitColumnTracker when possible --- Key: HBASE-9778 URL: https://issues.apache.org/jira/browse/HBASE-9778 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Assignee: Lars Hofhansl Attachments: 9778-0.94-v2.txt, 9778-0.94-v3.txt, 9778-0.94-v4.txt, 9778-0.94-v5.txt, 9778-0.94-v6.txt, 9778-0.94.txt, 9778-trunk-v2.txt, 9778-trunk-v3.txt, 9778-trunk-v6.txt, 9778-trunk-v7.txt, 9778-trunk.txt The issue of slow seeking in ExplicitColumnTracker was brought up by [~vrodionov] on the dev list. My idea here is to avoid the seeking if we know that there aren't many versions to skip. How do we know? We'll use the column family's VERSIONS setting as a hint. If VERSIONS is set to 1 (or maybe some value 10) we'll avoid the seek and call SKIP repeatedly. HBASE-9769 has some initial number for this approach: Interestingly it depends on which column(s) is (are) selected. Some numbers: 4m rows, 5 cols each, 1 cf, 10 bytes values, VERSIONS=1, everything filtered at the server with a ValueFilter. Everything measured in seconds. Without patch: ||Wildcard||Col 1||Col 2||Col 4||Col 5||Col 2+4|| |6.4|8.5|14.3|14.6|11.1|20.3| With patch: ||Wildcard||Col 1||Col 2||Col 4||Col 5||Col 2+4|| |6.4|8.4|8.9|9.9|6.4|10.0| Variation here was +- 0.2s. So with this patch scanning is 2x faster than without in some cases, and never slower. No special hint needed, beyond declaring VERSIONS correctly. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HBASE-9778) Avoid seeking to next column in ExplicitColumnTracker when possible
[ https://issues.apache.org/jira/browse/HBASE-9778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl updated HBASE-9778: - Release Note: Introduces a new scan attribute to allow a scan operation at a RegionServer to opportunistically look ahead a few KeyValues (columns) before scheduling a seek operation during a scan with explicit columns (Scan.addColumn). A seek is efficient when it can seek past 5-10 KeyValue (columns) or 512-1024 bytes. Api: {code} Scan s = new Scan(...); s.addColumn(...); s.setAttribute(Scan.HINT_EAGER_NEXT, Bytes.toBytes(2)); {code} Avoid seeking to next column in ExplicitColumnTracker when possible --- Key: HBASE-9778 URL: https://issues.apache.org/jira/browse/HBASE-9778 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Assignee: Lars Hofhansl Attachments: 9778-0.94-v2.txt, 9778-0.94-v3.txt, 9778-0.94-v4.txt, 9778-0.94-v5.txt, 9778-0.94-v6.txt, 9778-0.94-v7.txt, 9778-0.94.txt, 9778-trunk-v2.txt, 9778-trunk-v3.txt, 9778-trunk-v6.txt, 9778-trunk-v7.txt, 9778-trunk.txt The issue of slow seeking in ExplicitColumnTracker was brought up by [~vrodionov] on the dev list. My idea here is to avoid the seeking if we know that there aren't many versions to skip. How do we know? We'll use the column family's VERSIONS setting as a hint. If VERSIONS is set to 1 (or maybe some value 10) we'll avoid the seek and call SKIP repeatedly. HBASE-9769 has some initial number for this approach: Interestingly it depends on which column(s) is (are) selected. Some numbers: 4m rows, 5 cols each, 1 cf, 10 bytes values, VERSIONS=1, everything filtered at the server with a ValueFilter. Everything measured in seconds. Without patch: ||Wildcard||Col 1||Col 2||Col 4||Col 5||Col 2+4|| |6.4|8.5|14.3|14.6|11.1|20.3| With patch: ||Wildcard||Col 1||Col 2||Col 4||Col 5||Col 2+4|| |6.4|8.4|8.9|9.9|6.4|10.0| Variation here was +- 0.2s. So with this patch scanning is 2x faster than without in some cases, and never slower. No special hint needed, beyond declaring VERSIONS correctly. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HBASE-9778) Avoid seeking to next column in ExplicitColumnTracker when possible
[ https://issues.apache.org/jira/browse/HBASE-9778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl updated HBASE-9778: - Attachment: 9778-trunk-v8.txt 9778-0.94-v8.txt for 0.94 and trunk with LOOK_AHEAD attribute and fixed documentation. Avoid seeking to next column in ExplicitColumnTracker when possible --- Key: HBASE-9778 URL: https://issues.apache.org/jira/browse/HBASE-9778 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Assignee: Lars Hofhansl Attachments: 9778-0.94-v2.txt, 9778-0.94-v3.txt, 9778-0.94-v4.txt, 9778-0.94-v5.txt, 9778-0.94-v6.txt, 9778-0.94-v7.txt, 9778-0.94-v8.txt, 9778-0.94.txt, 9778-trunk-v2.txt, 9778-trunk-v3.txt, 9778-trunk-v6.txt, 9778-trunk-v7.txt, 9778-trunk-v8.txt, 9778-trunk.txt The issue of slow seeking in ExplicitColumnTracker was brought up by [~vrodionov] on the dev list. My idea here is to avoid the seeking if we know that there aren't many versions to skip. How do we know? We'll use the column family's VERSIONS setting as a hint. If VERSIONS is set to 1 (or maybe some value 10) we'll avoid the seek and call SKIP repeatedly. HBASE-9769 has some initial number for this approach: Interestingly it depends on which column(s) is (are) selected. Some numbers: 4m rows, 5 cols each, 1 cf, 10 bytes values, VERSIONS=1, everything filtered at the server with a ValueFilter. Everything measured in seconds. Without patch: ||Wildcard||Col 1||Col 2||Col 4||Col 5||Col 2+4|| |6.4|8.5|14.3|14.6|11.1|20.3| With patch: ||Wildcard||Col 1||Col 2||Col 4||Col 5||Col 2+4|| |6.4|8.4|8.9|9.9|6.4|10.0| Variation here was +- 0.2s. So with this patch scanning is 2x faster than without in some cases, and never slower. No special hint needed, beyond declaring VERSIONS correctly. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HBASE-9778) Avoid seeking to next column in ExplicitColumnTracker when possible
[ https://issues.apache.org/jira/browse/HBASE-9778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl updated HBASE-9778: - Release Note: Introduces a new scan attribute to allow a scan operation at a RegionServer to opportunistically look ahead a few KeyValues (columns) before scheduling a seek operation during a scan with explicit columns (Scan.addColumn). A seek is efficient when it can seek past 5-10 KeyValue (columns) or 512-1024 bytes. Api: {code} Scan s = new Scan(...); s.addColumn(...); s.setAttribute(Scan.HINT_LOOKAHEAD, Bytes.toBytes(2)); {code} was: Introduces a new scan attribute to allow a scan operation at a RegionServer to opportunistically look ahead a few KeyValues (columns) before scheduling a seek operation during a scan with explicit columns (Scan.addColumn). A seek is efficient when it can seek past 5-10 KeyValue (columns) or 512-1024 bytes. Api: {code} Scan s = new Scan(...); s.addColumn(...); s.setAttribute(Scan.HINT_EAGER_NEXT, Bytes.toBytes(2)); {code} Avoid seeking to next column in ExplicitColumnTracker when possible --- Key: HBASE-9778 URL: https://issues.apache.org/jira/browse/HBASE-9778 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Assignee: Lars Hofhansl Attachments: 9778-0.94-v2.txt, 9778-0.94-v3.txt, 9778-0.94-v4.txt, 9778-0.94-v5.txt, 9778-0.94-v6.txt, 9778-0.94-v7.txt, 9778-0.94-v8.txt, 9778-0.94.txt, 9778-trunk-v2.txt, 9778-trunk-v3.txt, 9778-trunk-v6.txt, 9778-trunk-v7.txt, 9778-trunk-v8.txt, 9778-trunk.txt The issue of slow seeking in ExplicitColumnTracker was brought up by [~vrodionov] on the dev list. My idea here is to avoid the seeking if we know that there aren't many versions to skip. How do we know? We'll use the column family's VERSIONS setting as a hint. If VERSIONS is set to 1 (or maybe some value 10) we'll avoid the seek and call SKIP repeatedly. HBASE-9769 has some initial number for this approach: Interestingly it depends on which column(s) is (are) selected. Some numbers: 4m rows, 5 cols each, 1 cf, 10 bytes values, VERSIONS=1, everything filtered at the server with a ValueFilter. Everything measured in seconds. Without patch: ||Wildcard||Col 1||Col 2||Col 4||Col 5||Col 2+4|| |6.4|8.5|14.3|14.6|11.1|20.3| With patch: ||Wildcard||Col 1||Col 2||Col 4||Col 5||Col 2+4|| |6.4|8.4|8.9|9.9|6.4|10.0| Variation here was +- 0.2s. So with this patch scanning is 2x faster than without in some cases, and never slower. No special hint needed, beyond declaring VERSIONS correctly. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HBASE-9778) Avoid seeking to next column in ExplicitColumnTracker when possible
[ https://issues.apache.org/jira/browse/HBASE-9778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl updated HBASE-9778: - Release Note: Introduces a new scan attribute to allow a scan operation with explicit columns (Scan.addColumn) to opportunistically look ahead a few KeyValues (columns/versions) before scheduling a seek operation. A seek is efficient when it can seek past 5-10 KeyValue (columns) or 512-1024 bytes. API: {code} Scan s = new Scan(...); s.addColumn(...); s.setAttribute(Scan.HINT_LOOKAHEAD, Bytes.toBytes(2)); {code} was: Introduces a new scan attribute to allow a scan operation at a RegionServer to opportunistically look ahead a few KeyValues (columns) before scheduling a seek operation during a scan with explicit columns (Scan.addColumn). A seek is efficient when it can seek past 5-10 KeyValue (columns) or 512-1024 bytes. Api: {code} Scan s = new Scan(...); s.addColumn(...); s.setAttribute(Scan.HINT_LOOKAHEAD, Bytes.toBytes(2)); {code} Avoid seeking to next column in ExplicitColumnTracker when possible --- Key: HBASE-9778 URL: https://issues.apache.org/jira/browse/HBASE-9778 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Assignee: Lars Hofhansl Attachments: 9778-0.94-v2.txt, 9778-0.94-v3.txt, 9778-0.94-v4.txt, 9778-0.94-v5.txt, 9778-0.94-v6.txt, 9778-0.94-v7.txt, 9778-0.94-v8.txt, 9778-0.94.txt, 9778-trunk-v2.txt, 9778-trunk-v3.txt, 9778-trunk-v6.txt, 9778-trunk-v7.txt, 9778-trunk-v8.txt, 9778-trunk.txt The issue of slow seeking in ExplicitColumnTracker was brought up by [~vrodionov] on the dev list. My idea here is to avoid the seeking if we know that there aren't many versions to skip. How do we know? We'll use the column family's VERSIONS setting as a hint. If VERSIONS is set to 1 (or maybe some value 10) we'll avoid the seek and call SKIP repeatedly. HBASE-9769 has some initial number for this approach: Interestingly it depends on which column(s) is (are) selected. Some numbers: 4m rows, 5 cols each, 1 cf, 10 bytes values, VERSIONS=1, everything filtered at the server with a ValueFilter. Everything measured in seconds. Without patch: ||Wildcard||Col 1||Col 2||Col 4||Col 5||Col 2+4|| |6.4|8.5|14.3|14.6|11.1|20.3| With patch: ||Wildcard||Col 1||Col 2||Col 4||Col 5||Col 2+4|| |6.4|8.4|8.9|9.9|6.4|10.0| Variation here was +- 0.2s. So with this patch scanning is 2x faster than without in some cases, and never slower. No special hint needed, beyond declaring VERSIONS correctly. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HBASE-9778) Avoid seeking to next column in ExplicitColumnTracker when possible
[ https://issues.apache.org/jira/browse/HBASE-9778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl updated HBASE-9778: - Release Note: Introduces a new scan attribute to allow a scan operation with explicit columns (Scan.addColumn) to opportunistically look ahead a few KeyValues (columns/versions) before scheduling a seek operation to seek between columns. A seek is efficient when it can seek past 5-10 KeyValue (columns) or 512-1024 bytes. With small rows and few versions look ahead is typically more efficient. API: {code} Scan s = new Scan(...); s.addColumn(...); // instructs the RegionServer to attempt two iterations of next before scheduling a seek s.setAttribute(Scan.HINT_LOOKAHEAD, Bytes.toBytes(2)); table.getScanner(s); {code} was: Introduces a new scan attribute to allow a scan operation with explicit columns (Scan.addColumn) to opportunistically look ahead a few KeyValues (columns/versions) before scheduling a seek operation. A seek is efficient when it can seek past 5-10 KeyValue (columns) or 512-1024 bytes. API: {code} Scan s = new Scan(...); s.addColumn(...); s.setAttribute(Scan.HINT_LOOKAHEAD, Bytes.toBytes(2)); {code} Avoid seeking to next column in ExplicitColumnTracker when possible --- Key: HBASE-9778 URL: https://issues.apache.org/jira/browse/HBASE-9778 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Assignee: Lars Hofhansl Attachments: 9778-0.94-v2.txt, 9778-0.94-v3.txt, 9778-0.94-v4.txt, 9778-0.94-v5.txt, 9778-0.94-v6.txt, 9778-0.94-v7.txt, 9778-0.94-v8.txt, 9778-0.94.txt, 9778-trunk-v2.txt, 9778-trunk-v3.txt, 9778-trunk-v6.txt, 9778-trunk-v7.txt, 9778-trunk-v8.txt, 9778-trunk.txt The issue of slow seeking in ExplicitColumnTracker was brought up by [~vrodionov] on the dev list. My idea here is to avoid the seeking if we know that there aren't many versions to skip. How do we know? We'll use the column family's VERSIONS setting as a hint. If VERSIONS is set to 1 (or maybe some value 10) we'll avoid the seek and call SKIP repeatedly. HBASE-9769 has some initial number for this approach: Interestingly it depends on which column(s) is (are) selected. Some numbers: 4m rows, 5 cols each, 1 cf, 10 bytes values, VERSIONS=1, everything filtered at the server with a ValueFilter. Everything measured in seconds. Without patch: ||Wildcard||Col 1||Col 2||Col 4||Col 5||Col 2+4|| |6.4|8.5|14.3|14.6|11.1|20.3| With patch: ||Wildcard||Col 1||Col 2||Col 4||Col 5||Col 2+4|| |6.4|8.4|8.9|9.9|6.4|10.0| Variation here was +- 0.2s. So with this patch scanning is 2x faster than without in some cases, and never slower. No special hint needed, beyond declaring VERSIONS correctly. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HBASE-9778) Avoid seeking to next column in ExplicitColumnTracker when possible
[ https://issues.apache.org/jira/browse/HBASE-9778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl updated HBASE-9778: - Attachment: 9778-0.94-v6.txt Added some tests to 0.94 patch. If approach is cool, I'll make a trunk patch. Avoid seeking to next column in ExplicitColumnTracker when possible --- Key: HBASE-9778 URL: https://issues.apache.org/jira/browse/HBASE-9778 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Assignee: Lars Hofhansl Attachments: 9778-0.94-v2.txt, 9778-0.94-v3.txt, 9778-0.94-v4.txt, 9778-0.94-v5.txt, 9778-0.94-v6.txt, 9778-0.94.txt, 9778-trunk-v2.txt, 9778-trunk-v3.txt, 9778-trunk.txt The issue of slow seeking in ExplicitColumnTracker was brought up by [~vrodionov] on the dev list. My idea here is to avoid the seeking if we know that there aren't many versions to skip. How do we know? We'll use the column family's VERSIONS setting as a hint. If VERSIONS is set to 1 (or maybe some value 10) we'll avoid the seek and call SKIP repeatedly. HBASE-9769 has some initial number for this approach: Interestingly it depends on which column(s) is (are) selected. Some numbers: 4m rows, 5 cols each, 1 cf, 10 bytes values, VERSIONS=1, everything filtered at the server with a ValueFilter. Everything measured in seconds. Without patch: ||Wildcard||Col 1||Col 2||Col 4||Col 5||Col 2+4|| |6.4|8.5|14.3|14.6|11.1|20.3| With patch: ||Wildcard||Col 1||Col 2||Col 4||Col 5||Col 2+4|| |6.4|8.4|8.9|9.9|6.4|10.0| Variation here was +- 0.2s. So with this patch scanning is 2x faster than without in some cases, and never slower. No special hint needed, beyond declaring VERSIONS correctly. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HBASE-9778) Avoid seeking to next column in ExplicitColumnTracker when possible
[ https://issues.apache.org/jira/browse/HBASE-9778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl updated HBASE-9778: - Attachment: 9778-trunk-v6.txt Same for trunk. Used the same attribute based approach (should probably protobug it, if we use this as official API). I can also see not changing the Scan API at all, and just allow setting that scan attribute - maybe that'd be best anyway. Avoid seeking to next column in ExplicitColumnTracker when possible --- Key: HBASE-9778 URL: https://issues.apache.org/jira/browse/HBASE-9778 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Assignee: Lars Hofhansl Attachments: 9778-0.94-v2.txt, 9778-0.94-v3.txt, 9778-0.94-v4.txt, 9778-0.94-v5.txt, 9778-0.94-v6.txt, 9778-0.94.txt, 9778-trunk-v2.txt, 9778-trunk-v3.txt, 9778-trunk-v6.txt, 9778-trunk.txt The issue of slow seeking in ExplicitColumnTracker was brought up by [~vrodionov] on the dev list. My idea here is to avoid the seeking if we know that there aren't many versions to skip. How do we know? We'll use the column family's VERSIONS setting as a hint. If VERSIONS is set to 1 (or maybe some value 10) we'll avoid the seek and call SKIP repeatedly. HBASE-9769 has some initial number for this approach: Interestingly it depends on which column(s) is (are) selected. Some numbers: 4m rows, 5 cols each, 1 cf, 10 bytes values, VERSIONS=1, everything filtered at the server with a ValueFilter. Everything measured in seconds. Without patch: ||Wildcard||Col 1||Col 2||Col 4||Col 5||Col 2+4|| |6.4|8.5|14.3|14.6|11.1|20.3| With patch: ||Wildcard||Col 1||Col 2||Col 4||Col 5||Col 2+4|| |6.4|8.4|8.9|9.9|6.4|10.0| Variation here was +- 0.2s. So with this patch scanning is 2x faster than without in some cases, and never slower. No special hint needed, beyond declaring VERSIONS correctly. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HBASE-9778) Avoid seeking to next column in ExplicitColumnTracker when possible
[ https://issues.apache.org/jira/browse/HBASE-9778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl updated HBASE-9778: - Attachment: 9778-0.94-v5.txt Please have a look. This is nice in that it let's a user tune risk of seeking vs. the risk of performing too many next() followed by a seek. Somebody *please* come up with a better name than eager next. Avoid seeking to next column in ExplicitColumnTracker when possible --- Key: HBASE-9778 URL: https://issues.apache.org/jira/browse/HBASE-9778 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Assignee: Lars Hofhansl Attachments: 9778-0.94-v2.txt, 9778-0.94-v3.txt, 9778-0.94-v4.txt, 9778-0.94-v5.txt, 9778-0.94.txt, 9778-trunk-v2.txt, 9778-trunk-v3.txt, 9778-trunk.txt The issue of slow seeking in ExplicitColumnTracker was brought up by [~vrodionov] on the dev list. My idea here is to avoid the seeking if we know that there aren't many versions to skip. How do we know? We'll use the column family's VERSIONS setting as a hint. If VERSIONS is set to 1 (or maybe some value 10) we'll avoid the seek and call SKIP repeatedly. HBASE-9769 has some initial number for this approach: Interestingly it depends on which column(s) is (are) selected. Some numbers: 4m rows, 5 cols each, 1 cf, 10 bytes values, VERSIONS=1, everything filtered at the server with a ValueFilter. Everything measured in seconds. Without patch: ||Wildcard||Col 1||Col 2||Col 4||Col 5||Col 2+4|| |6.4|8.5|14.3|14.6|11.1|20.3| With patch: ||Wildcard||Col 1||Col 2||Col 4||Col 5||Col 2+4|| |6.4|8.4|8.9|9.9|6.4|10.0| Variation here was +- 0.2s. So with this patch scanning is 2x faster than without in some cases, and never slower. No special hint needed, beyond declaring VERSIONS correctly. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HBASE-9778) Avoid seeking to next column in ExplicitColumnTracker when possible
[ https://issues.apache.org/jira/browse/HBASE-9778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl updated HBASE-9778: - Fix Version/s: (was: 0.94.15) (was: 0.96.1) (was: 0.98.0) Lemme just unschedule for now. Avoid seeking to next column in ExplicitColumnTracker when possible --- Key: HBASE-9778 URL: https://issues.apache.org/jira/browse/HBASE-9778 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Assignee: Lars Hofhansl Attachments: 9778-0.94-v2.txt, 9778-0.94-v3.txt, 9778-0.94-v4.txt, 9778-0.94.txt, 9778-trunk-v2.txt, 9778-trunk-v3.txt, 9778-trunk.txt The issue of slow seeking in ExplicitColumnTracker was brought up by [~vrodionov] on the dev list. My idea here is to avoid the seeking if we know that there aren't many versions to skip. How do we know? We'll use the column family's VERSIONS setting as a hint. If VERSIONS is set to 1 (or maybe some value 10) we'll avoid the seek and call SKIP repeatedly. HBASE-9769 has some initial number for this approach: Interestingly it depends on which column(s) is (are) selected. Some numbers: 4m rows, 5 cols each, 1 cf, 10 bytes values, VERSIONS=1, everything filtered at the server with a ValueFilter. Everything measured in seconds. Without patch: ||Wildcard||Col 1||Col 2||Col 4||Col 5||Col 2+4|| |6.4|8.5|14.3|14.6|11.1|20.3| With patch: ||Wildcard||Col 1||Col 2||Col 4||Col 5||Col 2+4|| |6.4|8.4|8.9|9.9|6.4|10.0| Variation here was +- 0.2s. So with this patch scanning is 2x faster than without in some cases, and never slower. No special hint needed, beyond declaring VERSIONS correctly. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HBASE-9778) Avoid seeking to next column in ExplicitColumnTracker when possible
[ https://issues.apache.org/jira/browse/HBASE-9778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl updated HBASE-9778: - Fix Version/s: (was: 0.94.14) 0.94.15 Avoid seeking to next column in ExplicitColumnTracker when possible --- Key: HBASE-9778 URL: https://issues.apache.org/jira/browse/HBASE-9778 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Assignee: Lars Hofhansl Fix For: 0.98.0, 0.96.1, 0.94.15 Attachments: 9778-0.94-v2.txt, 9778-0.94-v3.txt, 9778-0.94-v4.txt, 9778-0.94.txt, 9778-trunk-v2.txt, 9778-trunk-v3.txt, 9778-trunk.txt The issue of slow seeking in ExplicitColumnTracker was brought up by [~vrodionov] on the dev list. My idea here is to avoid the seeking if we know that there aren't many versions to skip. How do we know? We'll use the column family's VERSIONS setting as a hint. If VERSIONS is set to 1 (or maybe some value 10) we'll avoid the seek and call SKIP repeatedly. HBASE-9769 has some initial number for this approach: Interestingly it depends on which column(s) is (are) selected. Some numbers: 4m rows, 5 cols each, 1 cf, 10 bytes values, VERSIONS=1, everything filtered at the server with a ValueFilter. Everything measured in seconds. Without patch: ||Wildcard||Col 1||Col 2||Col 4||Col 5||Col 2+4|| |6.4|8.5|14.3|14.6|11.1|20.3| With patch: ||Wildcard||Col 1||Col 2||Col 4||Col 5||Col 2+4|| |6.4|8.4|8.9|9.9|6.4|10.0| Variation here was +- 0.2s. So with this patch scanning is 2x faster than without in some cases, and never slower. No special hint needed, beyond declaring VERSIONS correctly. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HBASE-9778) Avoid seeking to next column in ExplicitColumnTracker when possible
[ https://issues.apache.org/jira/browse/HBASE-9778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl updated HBASE-9778: - Fix Version/s: 0.94.14 0.96.1 0.98.0 Avoid seeking to next column in ExplicitColumnTracker when possible --- Key: HBASE-9778 URL: https://issues.apache.org/jira/browse/HBASE-9778 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Assignee: Lars Hofhansl Fix For: 0.98.0, 0.96.1, 0.94.14 Attachments: 9778-0.94.txt, 9778-0.94-v2.txt, 9778-0.94-v3.txt, 9778-0.94-v4.txt, 9778-trunk.txt, 9778-trunk-v2.txt, 9778-trunk-v3.txt The issue of slow seeking in ExplicitColumnTracker was brought up by [~vrodionov] on the dev list. My idea here is to avoid the seeking if we know that there aren't many versions to skip. How do we know? We'll use the column family's VERSIONS setting as a hint. If VERSIONS is set to 1 (or maybe some value 10) we'll avoid the seek and call SKIP repeatedly. HBASE-9769 has some initial number for this approach: Interestingly it depends on which column(s) is (are) selected. Some numbers: 4m rows, 5 cols each, 1 cf, 10 bytes values, VERSIONS=1, everything filtered at the server with a ValueFilter. Everything measured in seconds. Without patch: ||Wildcard||Col 1||Col 2||Col 4||Col 5||Col 2+4|| |6.4|8.5|14.3|14.6|11.1|20.3| With patch: ||Wildcard||Col 1||Col 2||Col 4||Col 5||Col 2+4|| |6.4|8.4|8.9|9.9|6.4|10.0| Variation here was +- 0.2s. So with this patch scanning is 2x faster than without in some cases, and never slower. No special hint needed, beyond declaring VERSIONS correctly. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HBASE-9778) Avoid seeking to next column in ExplicitColumnTracker when possible
[ https://issues.apache.org/jira/browse/HBASE-9778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl updated HBASE-9778: - Attachment: 9778-0.94-v4.txt New sample patch. Using NARROW_ROW_HINT to optimize seeking in *both* ExplicitColumnTracker and ScanWildcardColumnTracker. This makes ExplicitColumnTracker go around one for time for a version (like ScanWildcardColumnTracker, but then allows to SKIP following versions if any). Avoid seeking to next column in ExplicitColumnTracker when possible --- Key: HBASE-9778 URL: https://issues.apache.org/jira/browse/HBASE-9778 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Assignee: Lars Hofhansl Attachments: 9778-0.94.txt, 9778-0.94-v2.txt, 9778-0.94-v3.txt, 9778-0.94-v4.txt, 9778-trunk.txt, 9778-trunk-v2.txt, 9778-trunk-v3.txt The issue of slow seeking in ExplicitColumnTracker was brought up by [~vrodionov] on the dev list. My idea here is to avoid the seeking if we know that there aren't many versions to skip. How do we know? We'll use the column family's VERSIONS setting as a hint. If VERSIONS is set to 1 (or maybe some value 10) we'll avoid the seek and call SKIP repeatedly. HBASE-9769 has some initial number for this approach: Interestingly it depends on which column(s) is (are) selected. Some numbers: 4m rows, 5 cols each, 1 cf, 10 bytes values, VERSIONS=1, everything filtered at the server with a ValueFilter. Everything measured in seconds. Without patch: ||Wildcard||Col 1||Col 2||Col 4||Col 5||Col 2+4|| |6.4|8.5|14.3|14.6|11.1|20.3| With patch: ||Wildcard||Col 1||Col 2||Col 4||Col 5||Col 2+4|| |6.4|8.4|8.9|9.9|6.4|10.0| Variation here was +- 0.2s. So with this patch scanning is 2x faster than without in some cases, and never slower. No special hint needed, beyond declaring VERSIONS correctly. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HBASE-9778) Avoid seeking to next column in ExplicitColumnTracker when possible
[ https://issues.apache.org/jira/browse/HBASE-9778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl updated HBASE-9778: - Status: Open (was: Patch Available) Avoid seeking to next column in ExplicitColumnTracker when possible --- Key: HBASE-9778 URL: https://issues.apache.org/jira/browse/HBASE-9778 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Assignee: Lars Hofhansl Attachments: 9778-0.94.txt, 9778-0.94-v2.txt, 9778-0.94-v3.txt, 9778-trunk.txt, 9778-trunk-v2.txt, 9778-trunk-v3.txt The issue of slow seeking in ExplicitColumnTracker was brought up by [~vrodionov] on the dev list. My idea here is to avoid the seeking if we know that there aren't many versions to skip. How do we know? We'll use the column family's VERSIONS setting as a hint. If VERSIONS is set to 1 (or maybe some value 10) we'll avoid the seek and call SKIP repeatedly. HBASE-9769 has some initial number for this approach: Interestingly it depends on which column(s) is (are) selected. Some numbers: 4m rows, 5 cols each, 1 cf, 10 bytes values, VERSIONS=1, everything filtered at the server with a ValueFilter. Everything measured in seconds. Without patch: ||Wildcard||Col 1||Col 2||Col 4||Col 5||Col 2+4|| |6.4|8.5|14.3|14.6|11.1|20.3| With patch: ||Wildcard||Col 1||Col 2||Col 4||Col 5||Col 2+4|| |6.4|8.4|8.9|9.9|6.4|10.0| Variation here was +- 0.2s. So with this patch scanning is 2x faster than without in some cases, and never slower. No special hint needed, beyond declaring VERSIONS correctly. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HBASE-9778) Avoid seeking to next column in ExplicitColumnTracker when possible
[ https://issues.apache.org/jira/browse/HBASE-9778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl updated HBASE-9778: - Fix Version/s: (was: 0.96.1) (was: 0.94.13) (was: 0.98.0) Unscheduling for now. Avoid seeking to next column in ExplicitColumnTracker when possible --- Key: HBASE-9778 URL: https://issues.apache.org/jira/browse/HBASE-9778 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Assignee: Lars Hofhansl Attachments: 9778-0.94.txt, 9778-0.94-v2.txt, 9778-0.94-v3.txt, 9778-trunk.txt, 9778-trunk-v2.txt, 9778-trunk-v3.txt The issue of slow seeking in ExplicitColumnTracker was brought up by [~vrodionov] on the dev list. My idea here is to avoid the seeking if we know that there aren't many versions to skip. How do we know? We'll use the column family's VERSIONS setting as a hint. If VERSIONS is set to 1 (or maybe some value 10) we'll avoid the seek and call SKIP repeatedly. HBASE-9769 has some initial number for this approach: Interestingly it depends on which column(s) is (are) selected. Some numbers: 4m rows, 5 cols each, 1 cf, 10 bytes values, VERSIONS=1, everything filtered at the server with a ValueFilter. Everything measured in seconds. Without patch: ||Wildcard||Col 1||Col 2||Col 4||Col 5||Col 2+4|| |6.4|8.5|14.3|14.6|11.1|20.3| With patch: ||Wildcard||Col 1||Col 2||Col 4||Col 5||Col 2+4|| |6.4|8.4|8.9|9.9|6.4|10.0| Variation here was +- 0.2s. So with this patch scanning is 2x faster than without in some cases, and never slower. No special hint needed, beyond declaring VERSIONS correctly. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HBASE-9778) Avoid seeking to next column in ExplicitColumnTracker when possible
[ https://issues.apache.org/jira/browse/HBASE-9778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl updated HBASE-9778: - Status: Open (was: Patch Available) Avoid seeking to next column in ExplicitColumnTracker when possible --- Key: HBASE-9778 URL: https://issues.apache.org/jira/browse/HBASE-9778 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Assignee: Lars Hofhansl Fix For: 0.98.0, 0.94.13, 0.96.1 Attachments: 9778-0.94.txt, 9778-trunk.txt The issue of slow seeking in ExplicitColumnTracker was brought up by [~vrodionov] on the dev list. My idea here is to avoid the seeking if we know that there aren't many versions to skip. How do we know? We'll use the column family's VERSIONS setting as a hint. If VERSIONS is set to 1 (or maybe some value 10) we'll avoid the seek and call SKIP repeatedly. HBASE-9769 has some initial number for this approach: Interestingly it depends on which column(s) is (are) selected. Some numbers: 4m rows, 5 cols each, 1 cf, 10 bytes values, VERSIONS=1, everything filtered at the server with a ValueFilter. Everything measured in seconds. Without patch: ||Wildcard||Col 1||Col 2||Col 4||Col 5||Col 2+4|| |6.4|8.5|14.3|14.6|11.1|20.3| With patch: ||Wildcard||Col 1||Col 2||Col 4||Col 5||Col 2+4|| |6.4|8.4|8.9|9.9|6.4|10.0| Variation here was +- 0.2s. So with this patch scanning is 2x faster than without in some cases, and never slower. No special hint needed, beyond declaring VERSIONS correctly. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HBASE-9778) Avoid seeking to next column in ExplicitColumnTracker when possible
[ https://issues.apache.org/jira/browse/HBASE-9778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl updated HBASE-9778: - Status: Patch Available (was: Open) Avoid seeking to next column in ExplicitColumnTracker when possible --- Key: HBASE-9778 URL: https://issues.apache.org/jira/browse/HBASE-9778 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Assignee: Lars Hofhansl Fix For: 0.98.0, 0.94.13, 0.96.1 Attachments: 9778-0.94.txt, 9778-0.94-v2.txt, 9778-trunk.txt, 9778-trunk-v2.txt The issue of slow seeking in ExplicitColumnTracker was brought up by [~vrodionov] on the dev list. My idea here is to avoid the seeking if we know that there aren't many versions to skip. How do we know? We'll use the column family's VERSIONS setting as a hint. If VERSIONS is set to 1 (or maybe some value 10) we'll avoid the seek and call SKIP repeatedly. HBASE-9769 has some initial number for this approach: Interestingly it depends on which column(s) is (are) selected. Some numbers: 4m rows, 5 cols each, 1 cf, 10 bytes values, VERSIONS=1, everything filtered at the server with a ValueFilter. Everything measured in seconds. Without patch: ||Wildcard||Col 1||Col 2||Col 4||Col 5||Col 2+4|| |6.4|8.5|14.3|14.6|11.1|20.3| With patch: ||Wildcard||Col 1||Col 2||Col 4||Col 5||Col 2+4|| |6.4|8.4|8.9|9.9|6.4|10.0| Variation here was +- 0.2s. So with this patch scanning is 2x faster than without in some cases, and never slower. No special hint needed, beyond declaring VERSIONS correctly. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HBASE-9778) Avoid seeking to next column in ExplicitColumnTracker when possible
[ https://issues.apache.org/jira/browse/HBASE-9778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl updated HBASE-9778: - Attachment: 9778-trunk-v2.txt Same for trunk. Looking around more. Seems there are more seeks that can be avoided if we know there won't be many versions around. Avoid seeking to next column in ExplicitColumnTracker when possible --- Key: HBASE-9778 URL: https://issues.apache.org/jira/browse/HBASE-9778 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Assignee: Lars Hofhansl Fix For: 0.98.0, 0.94.13, 0.96.1 Attachments: 9778-0.94.txt, 9778-0.94-v2.txt, 9778-trunk.txt, 9778-trunk-v2.txt The issue of slow seeking in ExplicitColumnTracker was brought up by [~vrodionov] on the dev list. My idea here is to avoid the seeking if we know that there aren't many versions to skip. How do we know? We'll use the column family's VERSIONS setting as a hint. If VERSIONS is set to 1 (or maybe some value 10) we'll avoid the seek and call SKIP repeatedly. HBASE-9769 has some initial number for this approach: Interestingly it depends on which column(s) is (are) selected. Some numbers: 4m rows, 5 cols each, 1 cf, 10 bytes values, VERSIONS=1, everything filtered at the server with a ValueFilter. Everything measured in seconds. Without patch: ||Wildcard||Col 1||Col 2||Col 4||Col 5||Col 2+4|| |6.4|8.5|14.3|14.6|11.1|20.3| With patch: ||Wildcard||Col 1||Col 2||Col 4||Col 5||Col 2+4|| |6.4|8.4|8.9|9.9|6.4|10.0| Variation here was +- 0.2s. So with this patch scanning is 2x faster than without in some cases, and never slower. No special hint needed, beyond declaring VERSIONS correctly. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HBASE-9778) Avoid seeking to next column in ExplicitColumnTracker when possible
[ https://issues.apache.org/jira/browse/HBASE-9778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl updated HBASE-9778: - Attachment: 9778-0.94-v2.txt New 0.94 version. * fixes TestExplicitColumnTracker * adds a simple test case, to make sure it works correctly if there multiple versions in the store * Uses the store's (CF's) MAX_VERSIONS setting as the hint Avoid seeking to next column in ExplicitColumnTracker when possible --- Key: HBASE-9778 URL: https://issues.apache.org/jira/browse/HBASE-9778 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Assignee: Lars Hofhansl Fix For: 0.98.0, 0.94.13, 0.96.1 Attachments: 9778-0.94.txt, 9778-0.94-v2.txt, 9778-trunk.txt The issue of slow seeking in ExplicitColumnTracker was brought up by [~vrodionov] on the dev list. My idea here is to avoid the seeking if we know that there aren't many versions to skip. How do we know? We'll use the column family's VERSIONS setting as a hint. If VERSIONS is set to 1 (or maybe some value 10) we'll avoid the seek and call SKIP repeatedly. HBASE-9769 has some initial number for this approach: Interestingly it depends on which column(s) is (are) selected. Some numbers: 4m rows, 5 cols each, 1 cf, 10 bytes values, VERSIONS=1, everything filtered at the server with a ValueFilter. Everything measured in seconds. Without patch: ||Wildcard||Col 1||Col 2||Col 4||Col 5||Col 2+4|| |6.4|8.5|14.3|14.6|11.1|20.3| With patch: ||Wildcard||Col 1||Col 2||Col 4||Col 5||Col 2+4|| |6.4|8.4|8.9|9.9|6.4|10.0| Variation here was +- 0.2s. So with this patch scanning is 2x faster than without in some cases, and never slower. No special hint needed, beyond declaring VERSIONS correctly. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HBASE-9778) Avoid seeking to next column in ExplicitColumnTracker when possible
[ https://issues.apache.org/jira/browse/HBASE-9778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl updated HBASE-9778: - Status: Open (was: Patch Available) Avoid seeking to next column in ExplicitColumnTracker when possible --- Key: HBASE-9778 URL: https://issues.apache.org/jira/browse/HBASE-9778 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Assignee: Lars Hofhansl Fix For: 0.98.0, 0.94.13, 0.96.1 Attachments: 9778-0.94.txt, 9778-0.94-v2.txt, 9778-trunk.txt, 9778-trunk-v2.txt The issue of slow seeking in ExplicitColumnTracker was brought up by [~vrodionov] on the dev list. My idea here is to avoid the seeking if we know that there aren't many versions to skip. How do we know? We'll use the column family's VERSIONS setting as a hint. If VERSIONS is set to 1 (or maybe some value 10) we'll avoid the seek and call SKIP repeatedly. HBASE-9769 has some initial number for this approach: Interestingly it depends on which column(s) is (are) selected. Some numbers: 4m rows, 5 cols each, 1 cf, 10 bytes values, VERSIONS=1, everything filtered at the server with a ValueFilter. Everything measured in seconds. Without patch: ||Wildcard||Col 1||Col 2||Col 4||Col 5||Col 2+4|| |6.4|8.5|14.3|14.6|11.1|20.3| With patch: ||Wildcard||Col 1||Col 2||Col 4||Col 5||Col 2+4|| |6.4|8.4|8.9|9.9|6.4|10.0| Variation here was +- 0.2s. So with this patch scanning is 2x faster than without in some cases, and never slower. No special hint needed, beyond declaring VERSIONS correctly. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HBASE-9778) Avoid seeking to next column in ExplicitColumnTracker when possible
[ https://issues.apache.org/jira/browse/HBASE-9778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl updated HBASE-9778: - Attachment: 9778-0.94-v3.txt Arggh... Didn't see TestQueryMatcher before. Fixed that test as well, and added a simple that tests with 2 versions (to verify the existing behavior). Avoid seeking to next column in ExplicitColumnTracker when possible --- Key: HBASE-9778 URL: https://issues.apache.org/jira/browse/HBASE-9778 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Assignee: Lars Hofhansl Fix For: 0.98.0, 0.94.13, 0.96.1 Attachments: 9778-0.94.txt, 9778-0.94-v2.txt, 9778-0.94-v3.txt, 9778-trunk.txt, 9778-trunk-v2.txt The issue of slow seeking in ExplicitColumnTracker was brought up by [~vrodionov] on the dev list. My idea here is to avoid the seeking if we know that there aren't many versions to skip. How do we know? We'll use the column family's VERSIONS setting as a hint. If VERSIONS is set to 1 (or maybe some value 10) we'll avoid the seek and call SKIP repeatedly. HBASE-9769 has some initial number for this approach: Interestingly it depends on which column(s) is (are) selected. Some numbers: 4m rows, 5 cols each, 1 cf, 10 bytes values, VERSIONS=1, everything filtered at the server with a ValueFilter. Everything measured in seconds. Without patch: ||Wildcard||Col 1||Col 2||Col 4||Col 5||Col 2+4|| |6.4|8.5|14.3|14.6|11.1|20.3| With patch: ||Wildcard||Col 1||Col 2||Col 4||Col 5||Col 2+4|| |6.4|8.4|8.9|9.9|6.4|10.0| Variation here was +- 0.2s. So with this patch scanning is 2x faster than without in some cases, and never slower. No special hint needed, beyond declaring VERSIONS correctly. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HBASE-9778) Avoid seeking to next column in ExplicitColumnTracker when possible
[ https://issues.apache.org/jira/browse/HBASE-9778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl updated HBASE-9778: - Status: Patch Available (was: Open) Avoid seeking to next column in ExplicitColumnTracker when possible --- Key: HBASE-9778 URL: https://issues.apache.org/jira/browse/HBASE-9778 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Assignee: Lars Hofhansl Fix For: 0.98.0, 0.94.13, 0.96.1 Attachments: 9778-0.94.txt, 9778-0.94-v2.txt, 9778-0.94-v3.txt, 9778-trunk.txt, 9778-trunk-v2.txt, 9778-trunk-v3.txt The issue of slow seeking in ExplicitColumnTracker was brought up by [~vrodionov] on the dev list. My idea here is to avoid the seeking if we know that there aren't many versions to skip. How do we know? We'll use the column family's VERSIONS setting as a hint. If VERSIONS is set to 1 (or maybe some value 10) we'll avoid the seek and call SKIP repeatedly. HBASE-9769 has some initial number for this approach: Interestingly it depends on which column(s) is (are) selected. Some numbers: 4m rows, 5 cols each, 1 cf, 10 bytes values, VERSIONS=1, everything filtered at the server with a ValueFilter. Everything measured in seconds. Without patch: ||Wildcard||Col 1||Col 2||Col 4||Col 5||Col 2+4|| |6.4|8.5|14.3|14.6|11.1|20.3| With patch: ||Wildcard||Col 1||Col 2||Col 4||Col 5||Col 2+4|| |6.4|8.4|8.9|9.9|6.4|10.0| Variation here was +- 0.2s. So with this patch scanning is 2x faster than without in some cases, and never slower. No special hint needed, beyond declaring VERSIONS correctly. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HBASE-9778) Avoid seeking to next column in ExplicitColumnTracker when possible
[ https://issues.apache.org/jira/browse/HBASE-9778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl updated HBASE-9778: - Attachment: 9778-trunk-v3.txt And trunk Avoid seeking to next column in ExplicitColumnTracker when possible --- Key: HBASE-9778 URL: https://issues.apache.org/jira/browse/HBASE-9778 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Assignee: Lars Hofhansl Fix For: 0.98.0, 0.94.13, 0.96.1 Attachments: 9778-0.94.txt, 9778-0.94-v2.txt, 9778-0.94-v3.txt, 9778-trunk.txt, 9778-trunk-v2.txt, 9778-trunk-v3.txt The issue of slow seeking in ExplicitColumnTracker was brought up by [~vrodionov] on the dev list. My idea here is to avoid the seeking if we know that there aren't many versions to skip. How do we know? We'll use the column family's VERSIONS setting as a hint. If VERSIONS is set to 1 (or maybe some value 10) we'll avoid the seek and call SKIP repeatedly. HBASE-9769 has some initial number for this approach: Interestingly it depends on which column(s) is (are) selected. Some numbers: 4m rows, 5 cols each, 1 cf, 10 bytes values, VERSIONS=1, everything filtered at the server with a ValueFilter. Everything measured in seconds. Without patch: ||Wildcard||Col 1||Col 2||Col 4||Col 5||Col 2+4|| |6.4|8.5|14.3|14.6|11.1|20.3| With patch: ||Wildcard||Col 1||Col 2||Col 4||Col 5||Col 2+4|| |6.4|8.4|8.9|9.9|6.4|10.0| Variation here was +- 0.2s. So with this patch scanning is 2x faster than without in some cases, and never slower. No special hint needed, beyond declaring VERSIONS correctly. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HBASE-9778) Avoid seeking to next column in ExplicitColumnTracker when possible
[ https://issues.apache.org/jira/browse/HBASE-9778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl updated HBASE-9778: - Attachment: 9778-0.94.txt 0.94 patch. Avoid seeking to next column in ExplicitColumnTracker when possible --- Key: HBASE-9778 URL: https://issues.apache.org/jira/browse/HBASE-9778 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Assignee: Lars Hofhansl Fix For: 0.98.0, 0.94.13, 0.96.1 Attachments: 9778-0.94.txt, 9778-trunk.txt The issue of slow seeking in ExplicitColumnTracker was brought up by [~vrodionov] on the dev list. My idea here is to avoid the seeking if we know that there aren't many rows to skip. How do we know? We'll use the column family's VERSIONS setting as a hint. If VERSIONS is set to 1 (or maybe some value 10) we'll avoid the seek and call SKIP repeatedly. HBASE-9769 has some initial number for this approach: Interestingly it depends on which column(s) is (are) selected. Some numbers: 4m rows, 5 cols each, 1 cf, 10 bytes values, VERSIONS=1, everything filtered at the server with a ValueFilter. Everything measured in seconds. Without patch: ||Wildcard||Col 1||Col 2||Col 4||Col 5||Col 2+4|| |6.4|8.5|14.3|14.6|11.1|20.3| With patch: ||Wildcard||Col 1||Col 2||Col 4||Col 5||Col 2+4|| |6.4|8.4|8.9|9.9|6.4|10.0| Variation here was +- 0.2s. So with this patch scanning is 2x faster than without in some cases, and never slower. No special hint needed, beyond declaring VERSIONS correctly. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HBASE-9778) Avoid seeking to next column in ExplicitColumnTracker when possible
[ https://issues.apache.org/jira/browse/HBASE-9778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl updated HBASE-9778: - Status: Patch Available (was: Open) Avoid seeking to next column in ExplicitColumnTracker when possible --- Key: HBASE-9778 URL: https://issues.apache.org/jira/browse/HBASE-9778 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Assignee: Lars Hofhansl Fix For: 0.98.0, 0.94.13, 0.96.1 Attachments: 9778-0.94.txt, 9778-trunk.txt The issue of slow seeking in ExplicitColumnTracker was brought up by [~vrodionov] on the dev list. My idea here is to avoid the seeking if we know that there aren't many rows to skip. How do we know? We'll use the column family's VERSIONS setting as a hint. If VERSIONS is set to 1 (or maybe some value 10) we'll avoid the seek and call SKIP repeatedly. HBASE-9769 has some initial number for this approach: Interestingly it depends on which column(s) is (are) selected. Some numbers: 4m rows, 5 cols each, 1 cf, 10 bytes values, VERSIONS=1, everything filtered at the server with a ValueFilter. Everything measured in seconds. Without patch: ||Wildcard||Col 1||Col 2||Col 4||Col 5||Col 2+4|| |6.4|8.5|14.3|14.6|11.1|20.3| With patch: ||Wildcard||Col 1||Col 2||Col 4||Col 5||Col 2+4|| |6.4|8.4|8.9|9.9|6.4|10.0| Variation here was +- 0.2s. So with this patch scanning is 2x faster than without in some cases, and never slower. No special hint needed, beyond declaring VERSIONS correctly. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HBASE-9778) Avoid seeking to next column in ExplicitColumnTracker when possible
[ https://issues.apache.org/jira/browse/HBASE-9778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl updated HBASE-9778: - Attachment: 9778-trunk.txt And a trunk version for HadoopQA Avoid seeking to next column in ExplicitColumnTracker when possible --- Key: HBASE-9778 URL: https://issues.apache.org/jira/browse/HBASE-9778 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Assignee: Lars Hofhansl Fix For: 0.98.0, 0.94.13, 0.96.1 Attachments: 9778-0.94.txt, 9778-trunk.txt The issue of slow seeking in ExplicitColumnTracker was brought up by [~vrodionov] on the dev list. My idea here is to avoid the seeking if we know that there aren't many rows to skip. How do we know? We'll use the column family's VERSIONS setting as a hint. If VERSIONS is set to 1 (or maybe some value 10) we'll avoid the seek and call SKIP repeatedly. HBASE-9769 has some initial number for this approach: Interestingly it depends on which column(s) is (are) selected. Some numbers: 4m rows, 5 cols each, 1 cf, 10 bytes values, VERSIONS=1, everything filtered at the server with a ValueFilter. Everything measured in seconds. Without patch: ||Wildcard||Col 1||Col 2||Col 4||Col 5||Col 2+4|| |6.4|8.5|14.3|14.6|11.1|20.3| With patch: ||Wildcard||Col 1||Col 2||Col 4||Col 5||Col 2+4|| |6.4|8.4|8.9|9.9|6.4|10.0| Variation here was +- 0.2s. So with this patch scanning is 2x faster than without in some cases, and never slower. No special hint needed, beyond declaring VERSIONS correctly. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HBASE-9778) Avoid seeking to next column in ExplicitColumnTracker when possible
[ https://issues.apache.org/jira/browse/HBASE-9778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl updated HBASE-9778: - Description: The issue of slow seeking in ExplicitColumnTracker was brought up by [~vrodionov] on the dev list. My idea here is to avoid the seeking if we know that there aren't many versions to skip. How do we know? We'll use the column family's VERSIONS setting as a hint. If VERSIONS is set to 1 (or maybe some value 10) we'll avoid the seek and call SKIP repeatedly. HBASE-9769 has some initial number for this approach: Interestingly it depends on which column(s) is (are) selected. Some numbers: 4m rows, 5 cols each, 1 cf, 10 bytes values, VERSIONS=1, everything filtered at the server with a ValueFilter. Everything measured in seconds. Without patch: ||Wildcard||Col 1||Col 2||Col 4||Col 5||Col 2+4|| |6.4|8.5|14.3|14.6|11.1|20.3| With patch: ||Wildcard||Col 1||Col 2||Col 4||Col 5||Col 2+4|| |6.4|8.4|8.9|9.9|6.4|10.0| Variation here was +- 0.2s. So with this patch scanning is 2x faster than without in some cases, and never slower. No special hint needed, beyond declaring VERSIONS correctly. was: The issue of slow seeking in ExplicitColumnTracker was brought up by [~vrodionov] on the dev list. My idea here is to avoid the seeking if we know that there aren't many rows to skip. How do we know? We'll use the column family's VERSIONS setting as a hint. If VERSIONS is set to 1 (or maybe some value 10) we'll avoid the seek and call SKIP repeatedly. HBASE-9769 has some initial number for this approach: Interestingly it depends on which column(s) is (are) selected. Some numbers: 4m rows, 5 cols each, 1 cf, 10 bytes values, VERSIONS=1, everything filtered at the server with a ValueFilter. Everything measured in seconds. Without patch: ||Wildcard||Col 1||Col 2||Col 4||Col 5||Col 2+4|| |6.4|8.5|14.3|14.6|11.1|20.3| With patch: ||Wildcard||Col 1||Col 2||Col 4||Col 5||Col 2+4|| |6.4|8.4|8.9|9.9|6.4|10.0| Variation here was +- 0.2s. So with this patch scanning is 2x faster than without in some cases, and never slower. No special hint needed, beyond declaring VERSIONS correctly. Avoid seeking to next column in ExplicitColumnTracker when possible --- Key: HBASE-9778 URL: https://issues.apache.org/jira/browse/HBASE-9778 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Assignee: Lars Hofhansl Fix For: 0.98.0, 0.94.13, 0.96.1 Attachments: 9778-0.94.txt, 9778-trunk.txt The issue of slow seeking in ExplicitColumnTracker was brought up by [~vrodionov] on the dev list. My idea here is to avoid the seeking if we know that there aren't many versions to skip. How do we know? We'll use the column family's VERSIONS setting as a hint. If VERSIONS is set to 1 (or maybe some value 10) we'll avoid the seek and call SKIP repeatedly. HBASE-9769 has some initial number for this approach: Interestingly it depends on which column(s) is (are) selected. Some numbers: 4m rows, 5 cols each, 1 cf, 10 bytes values, VERSIONS=1, everything filtered at the server with a ValueFilter. Everything measured in seconds. Without patch: ||Wildcard||Col 1||Col 2||Col 4||Col 5||Col 2+4|| |6.4|8.5|14.3|14.6|11.1|20.3| With patch: ||Wildcard||Col 1||Col 2||Col 4||Col 5||Col 2+4|| |6.4|8.4|8.9|9.9|6.4|10.0| Variation here was +- 0.2s. So with this patch scanning is 2x faster than without in some cases, and never slower. No special hint needed, beyond declaring VERSIONS correctly. -- This message was sent by Atlassian JIRA (v6.1#6144)