[jira] [Updated] (HBASE-9778) Avoid seeking to next column in ExplicitColumnTracker when possible

2014-03-11 Thread Lars Hofhansl (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-9778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl updated HBASE-9778:
-

Attachment: 9778-trunk-v9.txt
9778-0.94-v9.txt

What I am going to commit. (mostly spelling/naming fixes)

 Avoid seeking to next column in ExplicitColumnTracker when possible
 ---

 Key: HBASE-9778
 URL: https://issues.apache.org/jira/browse/HBASE-9778
 Project: HBase
  Issue Type: Bug
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Attachments: 9778-0.94-v2.txt, 9778-0.94-v3.txt, 9778-0.94-v4.txt, 
 9778-0.94-v5.txt, 9778-0.94-v6.txt, 9778-0.94-v7.txt, 9778-0.94-v8.txt, 
 9778-0.94-v9.txt, 9778-0.94.txt, 9778-trunk-v2.txt, 9778-trunk-v3.txt, 
 9778-trunk-v6.txt, 9778-trunk-v7.txt, 9778-trunk-v8.txt, 9778-trunk-v9.txt, 
 9778-trunk.txt


 The issue of slow seeking in ExplicitColumnTracker was brought up by 
 [~vrodionov] on the dev list.
 My idea here is to avoid the seeking if we know that there aren't many 
 versions to skip.
 How do we know? We'll use the column family's VERSIONS setting as a hint. If 
 VERSIONS is set to 1 (or maybe some value  10) we'll avoid the seek and call 
 SKIP repeatedly.
 HBASE-9769 has some initial number for this approach:
 Interestingly it depends on which column(s) is (are) selected.
 Some numbers: 4m rows, 5 cols each, 1 cf, 10 bytes values, VERSIONS=1, 
 everything filtered at the server with a ValueFilter. Everything measured in 
 seconds.
 Without patch:
 ||Wildcard||Col 1||Col 2||Col 4||Col 5||Col 2+4||
 |6.4|8.5|14.3|14.6|11.1|20.3|
 With patch:
 ||Wildcard||Col 1||Col 2||Col 4||Col 5||Col 2+4||
 |6.4|8.4|8.9|9.9|6.4|10.0|
 Variation here was +- 0.2s.
 So with this patch scanning is 2x faster than without in some cases, and 
 never slower. No special hint needed, beyond declaring VERSIONS correctly.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HBASE-9778) Avoid seeking to next column in ExplicitColumnTracker when possible

2014-03-10 Thread Lars Hofhansl (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-9778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl updated HBASE-9778:
-

Attachment: 9778-0.94-v7.txt

Same for 0.94.
Please have a look at the book changes. If no further comments I'll commit 
tomorrow.

 Avoid seeking to next column in ExplicitColumnTracker when possible
 ---

 Key: HBASE-9778
 URL: https://issues.apache.org/jira/browse/HBASE-9778
 Project: HBase
  Issue Type: Bug
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Attachments: 9778-0.94-v2.txt, 9778-0.94-v3.txt, 9778-0.94-v4.txt, 
 9778-0.94-v5.txt, 9778-0.94-v6.txt, 9778-0.94-v7.txt, 9778-0.94.txt, 
 9778-trunk-v2.txt, 9778-trunk-v3.txt, 9778-trunk-v6.txt, 9778-trunk-v7.txt, 
 9778-trunk.txt


 The issue of slow seeking in ExplicitColumnTracker was brought up by 
 [~vrodionov] on the dev list.
 My idea here is to avoid the seeking if we know that there aren't many 
 versions to skip.
 How do we know? We'll use the column family's VERSIONS setting as a hint. If 
 VERSIONS is set to 1 (or maybe some value  10) we'll avoid the seek and call 
 SKIP repeatedly.
 HBASE-9769 has some initial number for this approach:
 Interestingly it depends on which column(s) is (are) selected.
 Some numbers: 4m rows, 5 cols each, 1 cf, 10 bytes values, VERSIONS=1, 
 everything filtered at the server with a ValueFilter. Everything measured in 
 seconds.
 Without patch:
 ||Wildcard||Col 1||Col 2||Col 4||Col 5||Col 2+4||
 |6.4|8.5|14.3|14.6|11.1|20.3|
 With patch:
 ||Wildcard||Col 1||Col 2||Col 4||Col 5||Col 2+4||
 |6.4|8.4|8.9|9.9|6.4|10.0|
 Variation here was +- 0.2s.
 So with this patch scanning is 2x faster than without in some cases, and 
 never slower. No special hint needed, beyond declaring VERSIONS correctly.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HBASE-9778) Avoid seeking to next column in ExplicitColumnTracker when possible

2014-03-10 Thread Lars Hofhansl (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-9778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl updated HBASE-9778:
-

Attachment: 9778-trunk-v7.txt

Trunk patch with doc changes.


 Avoid seeking to next column in ExplicitColumnTracker when possible
 ---

 Key: HBASE-9778
 URL: https://issues.apache.org/jira/browse/HBASE-9778
 Project: HBase
  Issue Type: Bug
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Attachments: 9778-0.94-v2.txt, 9778-0.94-v3.txt, 9778-0.94-v4.txt, 
 9778-0.94-v5.txt, 9778-0.94-v6.txt, 9778-0.94.txt, 9778-trunk-v2.txt, 
 9778-trunk-v3.txt, 9778-trunk-v6.txt, 9778-trunk-v7.txt, 9778-trunk.txt


 The issue of slow seeking in ExplicitColumnTracker was brought up by 
 [~vrodionov] on the dev list.
 My idea here is to avoid the seeking if we know that there aren't many 
 versions to skip.
 How do we know? We'll use the column family's VERSIONS setting as a hint. If 
 VERSIONS is set to 1 (or maybe some value  10) we'll avoid the seek and call 
 SKIP repeatedly.
 HBASE-9769 has some initial number for this approach:
 Interestingly it depends on which column(s) is (are) selected.
 Some numbers: 4m rows, 5 cols each, 1 cf, 10 bytes values, VERSIONS=1, 
 everything filtered at the server with a ValueFilter. Everything measured in 
 seconds.
 Without patch:
 ||Wildcard||Col 1||Col 2||Col 4||Col 5||Col 2+4||
 |6.4|8.5|14.3|14.6|11.1|20.3|
 With patch:
 ||Wildcard||Col 1||Col 2||Col 4||Col 5||Col 2+4||
 |6.4|8.4|8.9|9.9|6.4|10.0|
 Variation here was +- 0.2s.
 So with this patch scanning is 2x faster than without in some cases, and 
 never slower. No special hint needed, beyond declaring VERSIONS correctly.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HBASE-9778) Avoid seeking to next column in ExplicitColumnTracker when possible

2014-03-10 Thread Lars Hofhansl (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-9778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl updated HBASE-9778:
-

Release Note: 
Introduces a new scan attribute to allow a scan operation at a RegionServer to 
opportunistically look ahead a few KeyValues (columns) before scheduling a seek 
operation during a scan with explicit columns (Scan.addColumn).

A seek is efficient when it can seek past 5-10 KeyValue (columns) or 512-1024 
bytes.

Api:
{code}
Scan s = new Scan(...);
s.addColumn(...);
s.setAttribute(Scan.HINT_EAGER_NEXT, Bytes.toBytes(2));
{code}


 Avoid seeking to next column in ExplicitColumnTracker when possible
 ---

 Key: HBASE-9778
 URL: https://issues.apache.org/jira/browse/HBASE-9778
 Project: HBase
  Issue Type: Bug
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Attachments: 9778-0.94-v2.txt, 9778-0.94-v3.txt, 9778-0.94-v4.txt, 
 9778-0.94-v5.txt, 9778-0.94-v6.txt, 9778-0.94-v7.txt, 9778-0.94.txt, 
 9778-trunk-v2.txt, 9778-trunk-v3.txt, 9778-trunk-v6.txt, 9778-trunk-v7.txt, 
 9778-trunk.txt


 The issue of slow seeking in ExplicitColumnTracker was brought up by 
 [~vrodionov] on the dev list.
 My idea here is to avoid the seeking if we know that there aren't many 
 versions to skip.
 How do we know? We'll use the column family's VERSIONS setting as a hint. If 
 VERSIONS is set to 1 (or maybe some value  10) we'll avoid the seek and call 
 SKIP repeatedly.
 HBASE-9769 has some initial number for this approach:
 Interestingly it depends on which column(s) is (are) selected.
 Some numbers: 4m rows, 5 cols each, 1 cf, 10 bytes values, VERSIONS=1, 
 everything filtered at the server with a ValueFilter. Everything measured in 
 seconds.
 Without patch:
 ||Wildcard||Col 1||Col 2||Col 4||Col 5||Col 2+4||
 |6.4|8.5|14.3|14.6|11.1|20.3|
 With patch:
 ||Wildcard||Col 1||Col 2||Col 4||Col 5||Col 2+4||
 |6.4|8.4|8.9|9.9|6.4|10.0|
 Variation here was +- 0.2s.
 So with this patch scanning is 2x faster than without in some cases, and 
 never slower. No special hint needed, beyond declaring VERSIONS correctly.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HBASE-9778) Avoid seeking to next column in ExplicitColumnTracker when possible

2014-03-10 Thread Lars Hofhansl (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-9778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl updated HBASE-9778:
-

Attachment: 9778-trunk-v8.txt
9778-0.94-v8.txt

for 0.94 and trunk with LOOK_AHEAD attribute and fixed documentation.

 Avoid seeking to next column in ExplicitColumnTracker when possible
 ---

 Key: HBASE-9778
 URL: https://issues.apache.org/jira/browse/HBASE-9778
 Project: HBase
  Issue Type: Bug
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Attachments: 9778-0.94-v2.txt, 9778-0.94-v3.txt, 9778-0.94-v4.txt, 
 9778-0.94-v5.txt, 9778-0.94-v6.txt, 9778-0.94-v7.txt, 9778-0.94-v8.txt, 
 9778-0.94.txt, 9778-trunk-v2.txt, 9778-trunk-v3.txt, 9778-trunk-v6.txt, 
 9778-trunk-v7.txt, 9778-trunk-v8.txt, 9778-trunk.txt


 The issue of slow seeking in ExplicitColumnTracker was brought up by 
 [~vrodionov] on the dev list.
 My idea here is to avoid the seeking if we know that there aren't many 
 versions to skip.
 How do we know? We'll use the column family's VERSIONS setting as a hint. If 
 VERSIONS is set to 1 (or maybe some value  10) we'll avoid the seek and call 
 SKIP repeatedly.
 HBASE-9769 has some initial number for this approach:
 Interestingly it depends on which column(s) is (are) selected.
 Some numbers: 4m rows, 5 cols each, 1 cf, 10 bytes values, VERSIONS=1, 
 everything filtered at the server with a ValueFilter. Everything measured in 
 seconds.
 Without patch:
 ||Wildcard||Col 1||Col 2||Col 4||Col 5||Col 2+4||
 |6.4|8.5|14.3|14.6|11.1|20.3|
 With patch:
 ||Wildcard||Col 1||Col 2||Col 4||Col 5||Col 2+4||
 |6.4|8.4|8.9|9.9|6.4|10.0|
 Variation here was +- 0.2s.
 So with this patch scanning is 2x faster than without in some cases, and 
 never slower. No special hint needed, beyond declaring VERSIONS correctly.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HBASE-9778) Avoid seeking to next column in ExplicitColumnTracker when possible

2014-03-10 Thread Lars Hofhansl (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-9778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl updated HBASE-9778:
-

Release Note: 
Introduces a new scan attribute to allow a scan operation at a RegionServer to 
opportunistically look ahead a few KeyValues (columns) before scheduling a seek 
operation during a scan with explicit columns (Scan.addColumn).

A seek is efficient when it can seek past 5-10 KeyValue (columns) or 512-1024 
bytes.

Api:
{code}
Scan s = new Scan(...);
s.addColumn(...);
s.setAttribute(Scan.HINT_LOOKAHEAD, Bytes.toBytes(2));
{code}


  was:
Introduces a new scan attribute to allow a scan operation at a RegionServer to 
opportunistically look ahead a few KeyValues (columns) before scheduling a seek 
operation during a scan with explicit columns (Scan.addColumn).

A seek is efficient when it can seek past 5-10 KeyValue (columns) or 512-1024 
bytes.

Api:
{code}
Scan s = new Scan(...);
s.addColumn(...);
s.setAttribute(Scan.HINT_EAGER_NEXT, Bytes.toBytes(2));
{code}



 Avoid seeking to next column in ExplicitColumnTracker when possible
 ---

 Key: HBASE-9778
 URL: https://issues.apache.org/jira/browse/HBASE-9778
 Project: HBase
  Issue Type: Bug
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Attachments: 9778-0.94-v2.txt, 9778-0.94-v3.txt, 9778-0.94-v4.txt, 
 9778-0.94-v5.txt, 9778-0.94-v6.txt, 9778-0.94-v7.txt, 9778-0.94-v8.txt, 
 9778-0.94.txt, 9778-trunk-v2.txt, 9778-trunk-v3.txt, 9778-trunk-v6.txt, 
 9778-trunk-v7.txt, 9778-trunk-v8.txt, 9778-trunk.txt


 The issue of slow seeking in ExplicitColumnTracker was brought up by 
 [~vrodionov] on the dev list.
 My idea here is to avoid the seeking if we know that there aren't many 
 versions to skip.
 How do we know? We'll use the column family's VERSIONS setting as a hint. If 
 VERSIONS is set to 1 (or maybe some value  10) we'll avoid the seek and call 
 SKIP repeatedly.
 HBASE-9769 has some initial number for this approach:
 Interestingly it depends on which column(s) is (are) selected.
 Some numbers: 4m rows, 5 cols each, 1 cf, 10 bytes values, VERSIONS=1, 
 everything filtered at the server with a ValueFilter. Everything measured in 
 seconds.
 Without patch:
 ||Wildcard||Col 1||Col 2||Col 4||Col 5||Col 2+4||
 |6.4|8.5|14.3|14.6|11.1|20.3|
 With patch:
 ||Wildcard||Col 1||Col 2||Col 4||Col 5||Col 2+4||
 |6.4|8.4|8.9|9.9|6.4|10.0|
 Variation here was +- 0.2s.
 So with this patch scanning is 2x faster than without in some cases, and 
 never slower. No special hint needed, beyond declaring VERSIONS correctly.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HBASE-9778) Avoid seeking to next column in ExplicitColumnTracker when possible

2014-03-10 Thread Lars Hofhansl (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-9778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl updated HBASE-9778:
-

Release Note: 
Introduces a new scan attribute to allow a scan operation with explicit columns 
(Scan.addColumn) to opportunistically look ahead a few KeyValues 
(columns/versions) before scheduling a seek operation.

A seek is efficient when it can seek past 5-10 KeyValue (columns) or 512-1024 
bytes.

API:
{code}
Scan s = new Scan(...);
s.addColumn(...);
s.setAttribute(Scan.HINT_LOOKAHEAD, Bytes.toBytes(2));
{code}


  was:
Introduces a new scan attribute to allow a scan operation at a RegionServer to 
opportunistically look ahead a few KeyValues (columns) before scheduling a seek 
operation during a scan with explicit columns (Scan.addColumn).

A seek is efficient when it can seek past 5-10 KeyValue (columns) or 512-1024 
bytes.

Api:
{code}
Scan s = new Scan(...);
s.addColumn(...);
s.setAttribute(Scan.HINT_LOOKAHEAD, Bytes.toBytes(2));
{code}



 Avoid seeking to next column in ExplicitColumnTracker when possible
 ---

 Key: HBASE-9778
 URL: https://issues.apache.org/jira/browse/HBASE-9778
 Project: HBase
  Issue Type: Bug
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Attachments: 9778-0.94-v2.txt, 9778-0.94-v3.txt, 9778-0.94-v4.txt, 
 9778-0.94-v5.txt, 9778-0.94-v6.txt, 9778-0.94-v7.txt, 9778-0.94-v8.txt, 
 9778-0.94.txt, 9778-trunk-v2.txt, 9778-trunk-v3.txt, 9778-trunk-v6.txt, 
 9778-trunk-v7.txt, 9778-trunk-v8.txt, 9778-trunk.txt


 The issue of slow seeking in ExplicitColumnTracker was brought up by 
 [~vrodionov] on the dev list.
 My idea here is to avoid the seeking if we know that there aren't many 
 versions to skip.
 How do we know? We'll use the column family's VERSIONS setting as a hint. If 
 VERSIONS is set to 1 (or maybe some value  10) we'll avoid the seek and call 
 SKIP repeatedly.
 HBASE-9769 has some initial number for this approach:
 Interestingly it depends on which column(s) is (are) selected.
 Some numbers: 4m rows, 5 cols each, 1 cf, 10 bytes values, VERSIONS=1, 
 everything filtered at the server with a ValueFilter. Everything measured in 
 seconds.
 Without patch:
 ||Wildcard||Col 1||Col 2||Col 4||Col 5||Col 2+4||
 |6.4|8.5|14.3|14.6|11.1|20.3|
 With patch:
 ||Wildcard||Col 1||Col 2||Col 4||Col 5||Col 2+4||
 |6.4|8.4|8.9|9.9|6.4|10.0|
 Variation here was +- 0.2s.
 So with this patch scanning is 2x faster than without in some cases, and 
 never slower. No special hint needed, beyond declaring VERSIONS correctly.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HBASE-9778) Avoid seeking to next column in ExplicitColumnTracker when possible

2014-03-10 Thread Lars Hofhansl (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-9778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl updated HBASE-9778:
-

Release Note: 
Introduces a new scan attribute to allow a scan operation with explicit columns 
(Scan.addColumn) to opportunistically look ahead a few KeyValues 
(columns/versions) before scheduling a seek operation to seek between columns.

A seek is efficient when it can seek past 5-10 KeyValue (columns) or 512-1024 
bytes. With small rows and few versions look ahead is typically more efficient.

API:
{code}
Scan s = new Scan(...);
s.addColumn(...);
// instructs the RegionServer to attempt two iterations of next before 
scheduling a seek
s.setAttribute(Scan.HINT_LOOKAHEAD, Bytes.toBytes(2));
table.getScanner(s);
{code}


  was:
Introduces a new scan attribute to allow a scan operation with explicit columns 
(Scan.addColumn) to opportunistically look ahead a few KeyValues 
(columns/versions) before scheduling a seek operation.

A seek is efficient when it can seek past 5-10 KeyValue (columns) or 512-1024 
bytes.

API:
{code}
Scan s = new Scan(...);
s.addColumn(...);
s.setAttribute(Scan.HINT_LOOKAHEAD, Bytes.toBytes(2));
{code}



 Avoid seeking to next column in ExplicitColumnTracker when possible
 ---

 Key: HBASE-9778
 URL: https://issues.apache.org/jira/browse/HBASE-9778
 Project: HBase
  Issue Type: Bug
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Attachments: 9778-0.94-v2.txt, 9778-0.94-v3.txt, 9778-0.94-v4.txt, 
 9778-0.94-v5.txt, 9778-0.94-v6.txt, 9778-0.94-v7.txt, 9778-0.94-v8.txt, 
 9778-0.94.txt, 9778-trunk-v2.txt, 9778-trunk-v3.txt, 9778-trunk-v6.txt, 
 9778-trunk-v7.txt, 9778-trunk-v8.txt, 9778-trunk.txt


 The issue of slow seeking in ExplicitColumnTracker was brought up by 
 [~vrodionov] on the dev list.
 My idea here is to avoid the seeking if we know that there aren't many 
 versions to skip.
 How do we know? We'll use the column family's VERSIONS setting as a hint. If 
 VERSIONS is set to 1 (or maybe some value  10) we'll avoid the seek and call 
 SKIP repeatedly.
 HBASE-9769 has some initial number for this approach:
 Interestingly it depends on which column(s) is (are) selected.
 Some numbers: 4m rows, 5 cols each, 1 cf, 10 bytes values, VERSIONS=1, 
 everything filtered at the server with a ValueFilter. Everything measured in 
 seconds.
 Without patch:
 ||Wildcard||Col 1||Col 2||Col 4||Col 5||Col 2+4||
 |6.4|8.5|14.3|14.6|11.1|20.3|
 With patch:
 ||Wildcard||Col 1||Col 2||Col 4||Col 5||Col 2+4||
 |6.4|8.4|8.9|9.9|6.4|10.0|
 Variation here was +- 0.2s.
 So with this patch scanning is 2x faster than without in some cases, and 
 never slower. No special hint needed, beyond declaring VERSIONS correctly.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HBASE-9778) Avoid seeking to next column in ExplicitColumnTracker when possible

2014-03-07 Thread Lars Hofhansl (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-9778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl updated HBASE-9778:
-

Attachment: 9778-0.94-v6.txt

Added some tests to 0.94 patch. If approach is cool, I'll make a trunk patch.

 Avoid seeking to next column in ExplicitColumnTracker when possible
 ---

 Key: HBASE-9778
 URL: https://issues.apache.org/jira/browse/HBASE-9778
 Project: HBase
  Issue Type: Bug
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Attachments: 9778-0.94-v2.txt, 9778-0.94-v3.txt, 9778-0.94-v4.txt, 
 9778-0.94-v5.txt, 9778-0.94-v6.txt, 9778-0.94.txt, 9778-trunk-v2.txt, 
 9778-trunk-v3.txt, 9778-trunk.txt


 The issue of slow seeking in ExplicitColumnTracker was brought up by 
 [~vrodionov] on the dev list.
 My idea here is to avoid the seeking if we know that there aren't many 
 versions to skip.
 How do we know? We'll use the column family's VERSIONS setting as a hint. If 
 VERSIONS is set to 1 (or maybe some value  10) we'll avoid the seek and call 
 SKIP repeatedly.
 HBASE-9769 has some initial number for this approach:
 Interestingly it depends on which column(s) is (are) selected.
 Some numbers: 4m rows, 5 cols each, 1 cf, 10 bytes values, VERSIONS=1, 
 everything filtered at the server with a ValueFilter. Everything measured in 
 seconds.
 Without patch:
 ||Wildcard||Col 1||Col 2||Col 4||Col 5||Col 2+4||
 |6.4|8.5|14.3|14.6|11.1|20.3|
 With patch:
 ||Wildcard||Col 1||Col 2||Col 4||Col 5||Col 2+4||
 |6.4|8.4|8.9|9.9|6.4|10.0|
 Variation here was +- 0.2s.
 So with this patch scanning is 2x faster than without in some cases, and 
 never slower. No special hint needed, beyond declaring VERSIONS correctly.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HBASE-9778) Avoid seeking to next column in ExplicitColumnTracker when possible

2014-03-07 Thread Lars Hofhansl (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-9778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl updated HBASE-9778:
-

Attachment: 9778-trunk-v6.txt

Same for trunk.
Used the same attribute based approach (should probably protobug it, if we use 
this as official API).
I can also see not changing the Scan API at all, and just allow setting that 
scan attribute - maybe that'd be best anyway.

 Avoid seeking to next column in ExplicitColumnTracker when possible
 ---

 Key: HBASE-9778
 URL: https://issues.apache.org/jira/browse/HBASE-9778
 Project: HBase
  Issue Type: Bug
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Attachments: 9778-0.94-v2.txt, 9778-0.94-v3.txt, 9778-0.94-v4.txt, 
 9778-0.94-v5.txt, 9778-0.94-v6.txt, 9778-0.94.txt, 9778-trunk-v2.txt, 
 9778-trunk-v3.txt, 9778-trunk-v6.txt, 9778-trunk.txt


 The issue of slow seeking in ExplicitColumnTracker was brought up by 
 [~vrodionov] on the dev list.
 My idea here is to avoid the seeking if we know that there aren't many 
 versions to skip.
 How do we know? We'll use the column family's VERSIONS setting as a hint. If 
 VERSIONS is set to 1 (or maybe some value  10) we'll avoid the seek and call 
 SKIP repeatedly.
 HBASE-9769 has some initial number for this approach:
 Interestingly it depends on which column(s) is (are) selected.
 Some numbers: 4m rows, 5 cols each, 1 cf, 10 bytes values, VERSIONS=1, 
 everything filtered at the server with a ValueFilter. Everything measured in 
 seconds.
 Without patch:
 ||Wildcard||Col 1||Col 2||Col 4||Col 5||Col 2+4||
 |6.4|8.5|14.3|14.6|11.1|20.3|
 With patch:
 ||Wildcard||Col 1||Col 2||Col 4||Col 5||Col 2+4||
 |6.4|8.4|8.9|9.9|6.4|10.0|
 Variation here was +- 0.2s.
 So with this patch scanning is 2x faster than without in some cases, and 
 never slower. No special hint needed, beyond declaring VERSIONS correctly.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HBASE-9778) Avoid seeking to next column in ExplicitColumnTracker when possible

2014-03-06 Thread Lars Hofhansl (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-9778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl updated HBASE-9778:
-

Attachment: 9778-0.94-v5.txt

Please have a look.
This is nice in that it let's a user tune risk of seeking vs. the risk of 
performing too many next() followed by a seek.

Somebody *please* come up with a better name than eager next.


 Avoid seeking to next column in ExplicitColumnTracker when possible
 ---

 Key: HBASE-9778
 URL: https://issues.apache.org/jira/browse/HBASE-9778
 Project: HBase
  Issue Type: Bug
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Attachments: 9778-0.94-v2.txt, 9778-0.94-v3.txt, 9778-0.94-v4.txt, 
 9778-0.94-v5.txt, 9778-0.94.txt, 9778-trunk-v2.txt, 9778-trunk-v3.txt, 
 9778-trunk.txt


 The issue of slow seeking in ExplicitColumnTracker was brought up by 
 [~vrodionov] on the dev list.
 My idea here is to avoid the seeking if we know that there aren't many 
 versions to skip.
 How do we know? We'll use the column family's VERSIONS setting as a hint. If 
 VERSIONS is set to 1 (or maybe some value  10) we'll avoid the seek and call 
 SKIP repeatedly.
 HBASE-9769 has some initial number for this approach:
 Interestingly it depends on which column(s) is (are) selected.
 Some numbers: 4m rows, 5 cols each, 1 cf, 10 bytes values, VERSIONS=1, 
 everything filtered at the server with a ValueFilter. Everything measured in 
 seconds.
 Without patch:
 ||Wildcard||Col 1||Col 2||Col 4||Col 5||Col 2+4||
 |6.4|8.5|14.3|14.6|11.1|20.3|
 With patch:
 ||Wildcard||Col 1||Col 2||Col 4||Col 5||Col 2+4||
 |6.4|8.4|8.9|9.9|6.4|10.0|
 Variation here was +- 0.2s.
 So with this patch scanning is 2x faster than without in some cases, and 
 never slower. No special hint needed, beyond declaring VERSIONS correctly.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HBASE-9778) Avoid seeking to next column in ExplicitColumnTracker when possible

2013-11-27 Thread Lars Hofhansl (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-9778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl updated HBASE-9778:
-

Fix Version/s: (was: 0.94.15)
   (was: 0.96.1)
   (was: 0.98.0)

Lemme just unschedule for now.

 Avoid seeking to next column in ExplicitColumnTracker when possible
 ---

 Key: HBASE-9778
 URL: https://issues.apache.org/jira/browse/HBASE-9778
 Project: HBase
  Issue Type: Bug
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Attachments: 9778-0.94-v2.txt, 9778-0.94-v3.txt, 9778-0.94-v4.txt, 
 9778-0.94.txt, 9778-trunk-v2.txt, 9778-trunk-v3.txt, 9778-trunk.txt


 The issue of slow seeking in ExplicitColumnTracker was brought up by 
 [~vrodionov] on the dev list.
 My idea here is to avoid the seeking if we know that there aren't many 
 versions to skip.
 How do we know? We'll use the column family's VERSIONS setting as a hint. If 
 VERSIONS is set to 1 (or maybe some value  10) we'll avoid the seek and call 
 SKIP repeatedly.
 HBASE-9769 has some initial number for this approach:
 Interestingly it depends on which column(s) is (are) selected.
 Some numbers: 4m rows, 5 cols each, 1 cf, 10 bytes values, VERSIONS=1, 
 everything filtered at the server with a ValueFilter. Everything measured in 
 seconds.
 Without patch:
 ||Wildcard||Col 1||Col 2||Col 4||Col 5||Col 2+4||
 |6.4|8.5|14.3|14.6|11.1|20.3|
 With patch:
 ||Wildcard||Col 1||Col 2||Col 4||Col 5||Col 2+4||
 |6.4|8.4|8.9|9.9|6.4|10.0|
 Variation here was +- 0.2s.
 So with this patch scanning is 2x faster than without in some cases, and 
 never slower. No special hint needed, beyond declaring VERSIONS correctly.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HBASE-9778) Avoid seeking to next column in ExplicitColumnTracker when possible

2013-11-12 Thread Lars Hofhansl (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-9778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl updated HBASE-9778:
-

Fix Version/s: (was: 0.94.14)
   0.94.15

 Avoid seeking to next column in ExplicitColumnTracker when possible
 ---

 Key: HBASE-9778
 URL: https://issues.apache.org/jira/browse/HBASE-9778
 Project: HBase
  Issue Type: Bug
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Fix For: 0.98.0, 0.96.1, 0.94.15

 Attachments: 9778-0.94-v2.txt, 9778-0.94-v3.txt, 9778-0.94-v4.txt, 
 9778-0.94.txt, 9778-trunk-v2.txt, 9778-trunk-v3.txt, 9778-trunk.txt


 The issue of slow seeking in ExplicitColumnTracker was brought up by 
 [~vrodionov] on the dev list.
 My idea here is to avoid the seeking if we know that there aren't many 
 versions to skip.
 How do we know? We'll use the column family's VERSIONS setting as a hint. If 
 VERSIONS is set to 1 (or maybe some value  10) we'll avoid the seek and call 
 SKIP repeatedly.
 HBASE-9769 has some initial number for this approach:
 Interestingly it depends on which column(s) is (are) selected.
 Some numbers: 4m rows, 5 cols each, 1 cf, 10 bytes values, VERSIONS=1, 
 everything filtered at the server with a ValueFilter. Everything measured in 
 seconds.
 Without patch:
 ||Wildcard||Col 1||Col 2||Col 4||Col 5||Col 2+4||
 |6.4|8.5|14.3|14.6|11.1|20.3|
 With patch:
 ||Wildcard||Col 1||Col 2||Col 4||Col 5||Col 2+4||
 |6.4|8.4|8.9|9.9|6.4|10.0|
 Variation here was +- 0.2s.
 So with this patch scanning is 2x faster than without in some cases, and 
 never slower. No special hint needed, beyond declaring VERSIONS correctly.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HBASE-9778) Avoid seeking to next column in ExplicitColumnTracker when possible

2013-10-26 Thread Lars Hofhansl (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-9778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl updated HBASE-9778:
-

Fix Version/s: 0.94.14
   0.96.1
   0.98.0

 Avoid seeking to next column in ExplicitColumnTracker when possible
 ---

 Key: HBASE-9778
 URL: https://issues.apache.org/jira/browse/HBASE-9778
 Project: HBase
  Issue Type: Bug
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Fix For: 0.98.0, 0.96.1, 0.94.14

 Attachments: 9778-0.94.txt, 9778-0.94-v2.txt, 9778-0.94-v3.txt, 
 9778-0.94-v4.txt, 9778-trunk.txt, 9778-trunk-v2.txt, 9778-trunk-v3.txt


 The issue of slow seeking in ExplicitColumnTracker was brought up by 
 [~vrodionov] on the dev list.
 My idea here is to avoid the seeking if we know that there aren't many 
 versions to skip.
 How do we know? We'll use the column family's VERSIONS setting as a hint. If 
 VERSIONS is set to 1 (or maybe some value  10) we'll avoid the seek and call 
 SKIP repeatedly.
 HBASE-9769 has some initial number for this approach:
 Interestingly it depends on which column(s) is (are) selected.
 Some numbers: 4m rows, 5 cols each, 1 cf, 10 bytes values, VERSIONS=1, 
 everything filtered at the server with a ValueFilter. Everything measured in 
 seconds.
 Without patch:
 ||Wildcard||Col 1||Col 2||Col 4||Col 5||Col 2+4||
 |6.4|8.5|14.3|14.6|11.1|20.3|
 With patch:
 ||Wildcard||Col 1||Col 2||Col 4||Col 5||Col 2+4||
 |6.4|8.4|8.9|9.9|6.4|10.0|
 Variation here was +- 0.2s.
 So with this patch scanning is 2x faster than without in some cases, and 
 never slower. No special hint needed, beyond declaring VERSIONS correctly.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HBASE-9778) Avoid seeking to next column in ExplicitColumnTracker when possible

2013-10-22 Thread Lars Hofhansl (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-9778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl updated HBASE-9778:
-

Attachment: 9778-0.94-v4.txt

New sample patch. Using NARROW_ROW_HINT to optimize seeking in *both* 
ExplicitColumnTracker and ScanWildcardColumnTracker.

This makes ExplicitColumnTracker go around one for time for a version (like 
ScanWildcardColumnTracker, but then allows to SKIP following versions if any).


 Avoid seeking to next column in ExplicitColumnTracker when possible
 ---

 Key: HBASE-9778
 URL: https://issues.apache.org/jira/browse/HBASE-9778
 Project: HBase
  Issue Type: Bug
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Attachments: 9778-0.94.txt, 9778-0.94-v2.txt, 9778-0.94-v3.txt, 
 9778-0.94-v4.txt, 9778-trunk.txt, 9778-trunk-v2.txt, 9778-trunk-v3.txt


 The issue of slow seeking in ExplicitColumnTracker was brought up by 
 [~vrodionov] on the dev list.
 My idea here is to avoid the seeking if we know that there aren't many 
 versions to skip.
 How do we know? We'll use the column family's VERSIONS setting as a hint. If 
 VERSIONS is set to 1 (or maybe some value  10) we'll avoid the seek and call 
 SKIP repeatedly.
 HBASE-9769 has some initial number for this approach:
 Interestingly it depends on which column(s) is (are) selected.
 Some numbers: 4m rows, 5 cols each, 1 cf, 10 bytes values, VERSIONS=1, 
 everything filtered at the server with a ValueFilter. Everything measured in 
 seconds.
 Without patch:
 ||Wildcard||Col 1||Col 2||Col 4||Col 5||Col 2+4||
 |6.4|8.5|14.3|14.6|11.1|20.3|
 With patch:
 ||Wildcard||Col 1||Col 2||Col 4||Col 5||Col 2+4||
 |6.4|8.4|8.9|9.9|6.4|10.0|
 Variation here was +- 0.2s.
 So with this patch scanning is 2x faster than without in some cases, and 
 never slower. No special hint needed, beyond declaring VERSIONS correctly.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HBASE-9778) Avoid seeking to next column in ExplicitColumnTracker when possible

2013-10-21 Thread Lars Hofhansl (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-9778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl updated HBASE-9778:
-

Status: Open  (was: Patch Available)

 Avoid seeking to next column in ExplicitColumnTracker when possible
 ---

 Key: HBASE-9778
 URL: https://issues.apache.org/jira/browse/HBASE-9778
 Project: HBase
  Issue Type: Bug
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Attachments: 9778-0.94.txt, 9778-0.94-v2.txt, 9778-0.94-v3.txt, 
 9778-trunk.txt, 9778-trunk-v2.txt, 9778-trunk-v3.txt


 The issue of slow seeking in ExplicitColumnTracker was brought up by 
 [~vrodionov] on the dev list.
 My idea here is to avoid the seeking if we know that there aren't many 
 versions to skip.
 How do we know? We'll use the column family's VERSIONS setting as a hint. If 
 VERSIONS is set to 1 (or maybe some value  10) we'll avoid the seek and call 
 SKIP repeatedly.
 HBASE-9769 has some initial number for this approach:
 Interestingly it depends on which column(s) is (are) selected.
 Some numbers: 4m rows, 5 cols each, 1 cf, 10 bytes values, VERSIONS=1, 
 everything filtered at the server with a ValueFilter. Everything measured in 
 seconds.
 Without patch:
 ||Wildcard||Col 1||Col 2||Col 4||Col 5||Col 2+4||
 |6.4|8.5|14.3|14.6|11.1|20.3|
 With patch:
 ||Wildcard||Col 1||Col 2||Col 4||Col 5||Col 2+4||
 |6.4|8.4|8.9|9.9|6.4|10.0|
 Variation here was +- 0.2s.
 So with this patch scanning is 2x faster than without in some cases, and 
 never slower. No special hint needed, beyond declaring VERSIONS correctly.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HBASE-9778) Avoid seeking to next column in ExplicitColumnTracker when possible

2013-10-17 Thread Lars Hofhansl (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-9778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl updated HBASE-9778:
-

Fix Version/s: (was: 0.96.1)
   (was: 0.94.13)
   (was: 0.98.0)

Unscheduling for now.

 Avoid seeking to next column in ExplicitColumnTracker when possible
 ---

 Key: HBASE-9778
 URL: https://issues.apache.org/jira/browse/HBASE-9778
 Project: HBase
  Issue Type: Bug
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Attachments: 9778-0.94.txt, 9778-0.94-v2.txt, 9778-0.94-v3.txt, 
 9778-trunk.txt, 9778-trunk-v2.txt, 9778-trunk-v3.txt


 The issue of slow seeking in ExplicitColumnTracker was brought up by 
 [~vrodionov] on the dev list.
 My idea here is to avoid the seeking if we know that there aren't many 
 versions to skip.
 How do we know? We'll use the column family's VERSIONS setting as a hint. If 
 VERSIONS is set to 1 (or maybe some value  10) we'll avoid the seek and call 
 SKIP repeatedly.
 HBASE-9769 has some initial number for this approach:
 Interestingly it depends on which column(s) is (are) selected.
 Some numbers: 4m rows, 5 cols each, 1 cf, 10 bytes values, VERSIONS=1, 
 everything filtered at the server with a ValueFilter. Everything measured in 
 seconds.
 Without patch:
 ||Wildcard||Col 1||Col 2||Col 4||Col 5||Col 2+4||
 |6.4|8.5|14.3|14.6|11.1|20.3|
 With patch:
 ||Wildcard||Col 1||Col 2||Col 4||Col 5||Col 2+4||
 |6.4|8.4|8.9|9.9|6.4|10.0|
 Variation here was +- 0.2s.
 So with this patch scanning is 2x faster than without in some cases, and 
 never slower. No special hint needed, beyond declaring VERSIONS correctly.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HBASE-9778) Avoid seeking to next column in ExplicitColumnTracker when possible

2013-10-16 Thread Lars Hofhansl (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-9778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl updated HBASE-9778:
-

Status: Open  (was: Patch Available)

 Avoid seeking to next column in ExplicitColumnTracker when possible
 ---

 Key: HBASE-9778
 URL: https://issues.apache.org/jira/browse/HBASE-9778
 Project: HBase
  Issue Type: Bug
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Fix For: 0.98.0, 0.94.13, 0.96.1

 Attachments: 9778-0.94.txt, 9778-trunk.txt


 The issue of slow seeking in ExplicitColumnTracker was brought up by 
 [~vrodionov] on the dev list.
 My idea here is to avoid the seeking if we know that there aren't many 
 versions to skip.
 How do we know? We'll use the column family's VERSIONS setting as a hint. If 
 VERSIONS is set to 1 (or maybe some value  10) we'll avoid the seek and call 
 SKIP repeatedly.
 HBASE-9769 has some initial number for this approach:
 Interestingly it depends on which column(s) is (are) selected.
 Some numbers: 4m rows, 5 cols each, 1 cf, 10 bytes values, VERSIONS=1, 
 everything filtered at the server with a ValueFilter. Everything measured in 
 seconds.
 Without patch:
 ||Wildcard||Col 1||Col 2||Col 4||Col 5||Col 2+4||
 |6.4|8.5|14.3|14.6|11.1|20.3|
 With patch:
 ||Wildcard||Col 1||Col 2||Col 4||Col 5||Col 2+4||
 |6.4|8.4|8.9|9.9|6.4|10.0|
 Variation here was +- 0.2s.
 So with this patch scanning is 2x faster than without in some cases, and 
 never slower. No special hint needed, beyond declaring VERSIONS correctly.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HBASE-9778) Avoid seeking to next column in ExplicitColumnTracker when possible

2013-10-16 Thread Lars Hofhansl (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-9778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl updated HBASE-9778:
-

Status: Patch Available  (was: Open)

 Avoid seeking to next column in ExplicitColumnTracker when possible
 ---

 Key: HBASE-9778
 URL: https://issues.apache.org/jira/browse/HBASE-9778
 Project: HBase
  Issue Type: Bug
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Fix For: 0.98.0, 0.94.13, 0.96.1

 Attachments: 9778-0.94.txt, 9778-0.94-v2.txt, 9778-trunk.txt, 
 9778-trunk-v2.txt


 The issue of slow seeking in ExplicitColumnTracker was brought up by 
 [~vrodionov] on the dev list.
 My idea here is to avoid the seeking if we know that there aren't many 
 versions to skip.
 How do we know? We'll use the column family's VERSIONS setting as a hint. If 
 VERSIONS is set to 1 (or maybe some value  10) we'll avoid the seek and call 
 SKIP repeatedly.
 HBASE-9769 has some initial number for this approach:
 Interestingly it depends on which column(s) is (are) selected.
 Some numbers: 4m rows, 5 cols each, 1 cf, 10 bytes values, VERSIONS=1, 
 everything filtered at the server with a ValueFilter. Everything measured in 
 seconds.
 Without patch:
 ||Wildcard||Col 1||Col 2||Col 4||Col 5||Col 2+4||
 |6.4|8.5|14.3|14.6|11.1|20.3|
 With patch:
 ||Wildcard||Col 1||Col 2||Col 4||Col 5||Col 2+4||
 |6.4|8.4|8.9|9.9|6.4|10.0|
 Variation here was +- 0.2s.
 So with this patch scanning is 2x faster than without in some cases, and 
 never slower. No special hint needed, beyond declaring VERSIONS correctly.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HBASE-9778) Avoid seeking to next column in ExplicitColumnTracker when possible

2013-10-16 Thread Lars Hofhansl (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-9778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl updated HBASE-9778:
-

Attachment: 9778-trunk-v2.txt

Same for trunk.

Looking around more. Seems there are more seeks that can be avoided if we know 
there won't be many versions around.


 Avoid seeking to next column in ExplicitColumnTracker when possible
 ---

 Key: HBASE-9778
 URL: https://issues.apache.org/jira/browse/HBASE-9778
 Project: HBase
  Issue Type: Bug
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Fix For: 0.98.0, 0.94.13, 0.96.1

 Attachments: 9778-0.94.txt, 9778-0.94-v2.txt, 9778-trunk.txt, 
 9778-trunk-v2.txt


 The issue of slow seeking in ExplicitColumnTracker was brought up by 
 [~vrodionov] on the dev list.
 My idea here is to avoid the seeking if we know that there aren't many 
 versions to skip.
 How do we know? We'll use the column family's VERSIONS setting as a hint. If 
 VERSIONS is set to 1 (or maybe some value  10) we'll avoid the seek and call 
 SKIP repeatedly.
 HBASE-9769 has some initial number for this approach:
 Interestingly it depends on which column(s) is (are) selected.
 Some numbers: 4m rows, 5 cols each, 1 cf, 10 bytes values, VERSIONS=1, 
 everything filtered at the server with a ValueFilter. Everything measured in 
 seconds.
 Without patch:
 ||Wildcard||Col 1||Col 2||Col 4||Col 5||Col 2+4||
 |6.4|8.5|14.3|14.6|11.1|20.3|
 With patch:
 ||Wildcard||Col 1||Col 2||Col 4||Col 5||Col 2+4||
 |6.4|8.4|8.9|9.9|6.4|10.0|
 Variation here was +- 0.2s.
 So with this patch scanning is 2x faster than without in some cases, and 
 never slower. No special hint needed, beyond declaring VERSIONS correctly.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HBASE-9778) Avoid seeking to next column in ExplicitColumnTracker when possible

2013-10-16 Thread Lars Hofhansl (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-9778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl updated HBASE-9778:
-

Attachment: 9778-0.94-v2.txt

New 0.94 version.
* fixes TestExplicitColumnTracker
* adds a simple test case, to make sure it works correctly if there multiple 
versions in the store
* Uses the store's (CF's) MAX_VERSIONS setting as the hint


 Avoid seeking to next column in ExplicitColumnTracker when possible
 ---

 Key: HBASE-9778
 URL: https://issues.apache.org/jira/browse/HBASE-9778
 Project: HBase
  Issue Type: Bug
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Fix For: 0.98.0, 0.94.13, 0.96.1

 Attachments: 9778-0.94.txt, 9778-0.94-v2.txt, 9778-trunk.txt


 The issue of slow seeking in ExplicitColumnTracker was brought up by 
 [~vrodionov] on the dev list.
 My idea here is to avoid the seeking if we know that there aren't many 
 versions to skip.
 How do we know? We'll use the column family's VERSIONS setting as a hint. If 
 VERSIONS is set to 1 (or maybe some value  10) we'll avoid the seek and call 
 SKIP repeatedly.
 HBASE-9769 has some initial number for this approach:
 Interestingly it depends on which column(s) is (are) selected.
 Some numbers: 4m rows, 5 cols each, 1 cf, 10 bytes values, VERSIONS=1, 
 everything filtered at the server with a ValueFilter. Everything measured in 
 seconds.
 Without patch:
 ||Wildcard||Col 1||Col 2||Col 4||Col 5||Col 2+4||
 |6.4|8.5|14.3|14.6|11.1|20.3|
 With patch:
 ||Wildcard||Col 1||Col 2||Col 4||Col 5||Col 2+4||
 |6.4|8.4|8.9|9.9|6.4|10.0|
 Variation here was +- 0.2s.
 So with this patch scanning is 2x faster than without in some cases, and 
 never slower. No special hint needed, beyond declaring VERSIONS correctly.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HBASE-9778) Avoid seeking to next column in ExplicitColumnTracker when possible

2013-10-16 Thread Lars Hofhansl (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-9778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl updated HBASE-9778:
-

Status: Open  (was: Patch Available)

 Avoid seeking to next column in ExplicitColumnTracker when possible
 ---

 Key: HBASE-9778
 URL: https://issues.apache.org/jira/browse/HBASE-9778
 Project: HBase
  Issue Type: Bug
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Fix For: 0.98.0, 0.94.13, 0.96.1

 Attachments: 9778-0.94.txt, 9778-0.94-v2.txt, 9778-trunk.txt, 
 9778-trunk-v2.txt


 The issue of slow seeking in ExplicitColumnTracker was brought up by 
 [~vrodionov] on the dev list.
 My idea here is to avoid the seeking if we know that there aren't many 
 versions to skip.
 How do we know? We'll use the column family's VERSIONS setting as a hint. If 
 VERSIONS is set to 1 (or maybe some value  10) we'll avoid the seek and call 
 SKIP repeatedly.
 HBASE-9769 has some initial number for this approach:
 Interestingly it depends on which column(s) is (are) selected.
 Some numbers: 4m rows, 5 cols each, 1 cf, 10 bytes values, VERSIONS=1, 
 everything filtered at the server with a ValueFilter. Everything measured in 
 seconds.
 Without patch:
 ||Wildcard||Col 1||Col 2||Col 4||Col 5||Col 2+4||
 |6.4|8.5|14.3|14.6|11.1|20.3|
 With patch:
 ||Wildcard||Col 1||Col 2||Col 4||Col 5||Col 2+4||
 |6.4|8.4|8.9|9.9|6.4|10.0|
 Variation here was +- 0.2s.
 So with this patch scanning is 2x faster than without in some cases, and 
 never slower. No special hint needed, beyond declaring VERSIONS correctly.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HBASE-9778) Avoid seeking to next column in ExplicitColumnTracker when possible

2013-10-16 Thread Lars Hofhansl (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-9778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl updated HBASE-9778:
-

Attachment: 9778-0.94-v3.txt

Arggh... Didn't see TestQueryMatcher before. Fixed that test as well, and added 
a simple that tests with 2 versions (to verify the existing behavior).

 Avoid seeking to next column in ExplicitColumnTracker when possible
 ---

 Key: HBASE-9778
 URL: https://issues.apache.org/jira/browse/HBASE-9778
 Project: HBase
  Issue Type: Bug
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Fix For: 0.98.0, 0.94.13, 0.96.1

 Attachments: 9778-0.94.txt, 9778-0.94-v2.txt, 9778-0.94-v3.txt, 
 9778-trunk.txt, 9778-trunk-v2.txt


 The issue of slow seeking in ExplicitColumnTracker was brought up by 
 [~vrodionov] on the dev list.
 My idea here is to avoid the seeking if we know that there aren't many 
 versions to skip.
 How do we know? We'll use the column family's VERSIONS setting as a hint. If 
 VERSIONS is set to 1 (or maybe some value  10) we'll avoid the seek and call 
 SKIP repeatedly.
 HBASE-9769 has some initial number for this approach:
 Interestingly it depends on which column(s) is (are) selected.
 Some numbers: 4m rows, 5 cols each, 1 cf, 10 bytes values, VERSIONS=1, 
 everything filtered at the server with a ValueFilter. Everything measured in 
 seconds.
 Without patch:
 ||Wildcard||Col 1||Col 2||Col 4||Col 5||Col 2+4||
 |6.4|8.5|14.3|14.6|11.1|20.3|
 With patch:
 ||Wildcard||Col 1||Col 2||Col 4||Col 5||Col 2+4||
 |6.4|8.4|8.9|9.9|6.4|10.0|
 Variation here was +- 0.2s.
 So with this patch scanning is 2x faster than without in some cases, and 
 never slower. No special hint needed, beyond declaring VERSIONS correctly.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HBASE-9778) Avoid seeking to next column in ExplicitColumnTracker when possible

2013-10-16 Thread Lars Hofhansl (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-9778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl updated HBASE-9778:
-

Status: Patch Available  (was: Open)

 Avoid seeking to next column in ExplicitColumnTracker when possible
 ---

 Key: HBASE-9778
 URL: https://issues.apache.org/jira/browse/HBASE-9778
 Project: HBase
  Issue Type: Bug
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Fix For: 0.98.0, 0.94.13, 0.96.1

 Attachments: 9778-0.94.txt, 9778-0.94-v2.txt, 9778-0.94-v3.txt, 
 9778-trunk.txt, 9778-trunk-v2.txt, 9778-trunk-v3.txt


 The issue of slow seeking in ExplicitColumnTracker was brought up by 
 [~vrodionov] on the dev list.
 My idea here is to avoid the seeking if we know that there aren't many 
 versions to skip.
 How do we know? We'll use the column family's VERSIONS setting as a hint. If 
 VERSIONS is set to 1 (or maybe some value  10) we'll avoid the seek and call 
 SKIP repeatedly.
 HBASE-9769 has some initial number for this approach:
 Interestingly it depends on which column(s) is (are) selected.
 Some numbers: 4m rows, 5 cols each, 1 cf, 10 bytes values, VERSIONS=1, 
 everything filtered at the server with a ValueFilter. Everything measured in 
 seconds.
 Without patch:
 ||Wildcard||Col 1||Col 2||Col 4||Col 5||Col 2+4||
 |6.4|8.5|14.3|14.6|11.1|20.3|
 With patch:
 ||Wildcard||Col 1||Col 2||Col 4||Col 5||Col 2+4||
 |6.4|8.4|8.9|9.9|6.4|10.0|
 Variation here was +- 0.2s.
 So with this patch scanning is 2x faster than without in some cases, and 
 never slower. No special hint needed, beyond declaring VERSIONS correctly.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HBASE-9778) Avoid seeking to next column in ExplicitColumnTracker when possible

2013-10-16 Thread Lars Hofhansl (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-9778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl updated HBASE-9778:
-

Attachment: 9778-trunk-v3.txt

And trunk

 Avoid seeking to next column in ExplicitColumnTracker when possible
 ---

 Key: HBASE-9778
 URL: https://issues.apache.org/jira/browse/HBASE-9778
 Project: HBase
  Issue Type: Bug
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Fix For: 0.98.0, 0.94.13, 0.96.1

 Attachments: 9778-0.94.txt, 9778-0.94-v2.txt, 9778-0.94-v3.txt, 
 9778-trunk.txt, 9778-trunk-v2.txt, 9778-trunk-v3.txt


 The issue of slow seeking in ExplicitColumnTracker was brought up by 
 [~vrodionov] on the dev list.
 My idea here is to avoid the seeking if we know that there aren't many 
 versions to skip.
 How do we know? We'll use the column family's VERSIONS setting as a hint. If 
 VERSIONS is set to 1 (or maybe some value  10) we'll avoid the seek and call 
 SKIP repeatedly.
 HBASE-9769 has some initial number for this approach:
 Interestingly it depends on which column(s) is (are) selected.
 Some numbers: 4m rows, 5 cols each, 1 cf, 10 bytes values, VERSIONS=1, 
 everything filtered at the server with a ValueFilter. Everything measured in 
 seconds.
 Without patch:
 ||Wildcard||Col 1||Col 2||Col 4||Col 5||Col 2+4||
 |6.4|8.5|14.3|14.6|11.1|20.3|
 With patch:
 ||Wildcard||Col 1||Col 2||Col 4||Col 5||Col 2+4||
 |6.4|8.4|8.9|9.9|6.4|10.0|
 Variation here was +- 0.2s.
 So with this patch scanning is 2x faster than without in some cases, and 
 never slower. No special hint needed, beyond declaring VERSIONS correctly.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HBASE-9778) Avoid seeking to next column in ExplicitColumnTracker when possible

2013-10-15 Thread Lars Hofhansl (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-9778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl updated HBASE-9778:
-

Attachment: 9778-0.94.txt

0.94 patch.

 Avoid seeking to next column in ExplicitColumnTracker when possible
 ---

 Key: HBASE-9778
 URL: https://issues.apache.org/jira/browse/HBASE-9778
 Project: HBase
  Issue Type: Bug
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Fix For: 0.98.0, 0.94.13, 0.96.1

 Attachments: 9778-0.94.txt, 9778-trunk.txt


 The issue of slow seeking in ExplicitColumnTracker was brought up by 
 [~vrodionov] on the dev list.
 My idea here is to avoid the seeking if we know that there aren't many rows 
 to skip.
 How do we know? We'll use the column family's VERSIONS setting as a hint. If 
 VERSIONS is set to 1 (or maybe some value  10) we'll avoid the seek and call 
 SKIP repeatedly.
 HBASE-9769 has some initial number for this approach:
 Interestingly it depends on which column(s) is (are) selected.
 Some numbers: 4m rows, 5 cols each, 1 cf, 10 bytes values, VERSIONS=1, 
 everything filtered at the server with a ValueFilter. Everything measured in 
 seconds.
 Without patch:
 ||Wildcard||Col 1||Col 2||Col 4||Col 5||Col 2+4||
 |6.4|8.5|14.3|14.6|11.1|20.3|
 With patch:
 ||Wildcard||Col 1||Col 2||Col 4||Col 5||Col 2+4||
 |6.4|8.4|8.9|9.9|6.4|10.0|
 Variation here was +- 0.2s.
 So with this patch scanning is 2x faster than without in some cases, and 
 never slower. No special hint needed, beyond declaring VERSIONS correctly.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HBASE-9778) Avoid seeking to next column in ExplicitColumnTracker when possible

2013-10-15 Thread Lars Hofhansl (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-9778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl updated HBASE-9778:
-

Status: Patch Available  (was: Open)

 Avoid seeking to next column in ExplicitColumnTracker when possible
 ---

 Key: HBASE-9778
 URL: https://issues.apache.org/jira/browse/HBASE-9778
 Project: HBase
  Issue Type: Bug
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Fix For: 0.98.0, 0.94.13, 0.96.1

 Attachments: 9778-0.94.txt, 9778-trunk.txt


 The issue of slow seeking in ExplicitColumnTracker was brought up by 
 [~vrodionov] on the dev list.
 My idea here is to avoid the seeking if we know that there aren't many rows 
 to skip.
 How do we know? We'll use the column family's VERSIONS setting as a hint. If 
 VERSIONS is set to 1 (or maybe some value  10) we'll avoid the seek and call 
 SKIP repeatedly.
 HBASE-9769 has some initial number for this approach:
 Interestingly it depends on which column(s) is (are) selected.
 Some numbers: 4m rows, 5 cols each, 1 cf, 10 bytes values, VERSIONS=1, 
 everything filtered at the server with a ValueFilter. Everything measured in 
 seconds.
 Without patch:
 ||Wildcard||Col 1||Col 2||Col 4||Col 5||Col 2+4||
 |6.4|8.5|14.3|14.6|11.1|20.3|
 With patch:
 ||Wildcard||Col 1||Col 2||Col 4||Col 5||Col 2+4||
 |6.4|8.4|8.9|9.9|6.4|10.0|
 Variation here was +- 0.2s.
 So with this patch scanning is 2x faster than without in some cases, and 
 never slower. No special hint needed, beyond declaring VERSIONS correctly.



--
This message was sent by Atlassian JIRA
(v6.1#6144)



[jira] [Updated] (HBASE-9778) Avoid seeking to next column in ExplicitColumnTracker when possible

2013-10-15 Thread Lars Hofhansl (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-9778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl updated HBASE-9778:
-

Attachment: 9778-trunk.txt

And a trunk version for HadoopQA

 Avoid seeking to next column in ExplicitColumnTracker when possible
 ---

 Key: HBASE-9778
 URL: https://issues.apache.org/jira/browse/HBASE-9778
 Project: HBase
  Issue Type: Bug
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Fix For: 0.98.0, 0.94.13, 0.96.1

 Attachments: 9778-0.94.txt, 9778-trunk.txt


 The issue of slow seeking in ExplicitColumnTracker was brought up by 
 [~vrodionov] on the dev list.
 My idea here is to avoid the seeking if we know that there aren't many rows 
 to skip.
 How do we know? We'll use the column family's VERSIONS setting as a hint. If 
 VERSIONS is set to 1 (or maybe some value  10) we'll avoid the seek and call 
 SKIP repeatedly.
 HBASE-9769 has some initial number for this approach:
 Interestingly it depends on which column(s) is (are) selected.
 Some numbers: 4m rows, 5 cols each, 1 cf, 10 bytes values, VERSIONS=1, 
 everything filtered at the server with a ValueFilter. Everything measured in 
 seconds.
 Without patch:
 ||Wildcard||Col 1||Col 2||Col 4||Col 5||Col 2+4||
 |6.4|8.5|14.3|14.6|11.1|20.3|
 With patch:
 ||Wildcard||Col 1||Col 2||Col 4||Col 5||Col 2+4||
 |6.4|8.4|8.9|9.9|6.4|10.0|
 Variation here was +- 0.2s.
 So with this patch scanning is 2x faster than without in some cases, and 
 never slower. No special hint needed, beyond declaring VERSIONS correctly.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HBASE-9778) Avoid seeking to next column in ExplicitColumnTracker when possible

2013-10-15 Thread Lars Hofhansl (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-9778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl updated HBASE-9778:
-

Description: 
The issue of slow seeking in ExplicitColumnTracker was brought up by 
[~vrodionov] on the dev list.

My idea here is to avoid the seeking if we know that there aren't many versions 
to skip.
How do we know? We'll use the column family's VERSIONS setting as a hint. If 
VERSIONS is set to 1 (or maybe some value  10) we'll avoid the seek and call 
SKIP repeatedly.

HBASE-9769 has some initial number for this approach:
Interestingly it depends on which column(s) is (are) selected.

Some numbers: 4m rows, 5 cols each, 1 cf, 10 bytes values, VERSIONS=1, 
everything filtered at the server with a ValueFilter. Everything measured in 
seconds.

Without patch:
||Wildcard||Col 1||Col 2||Col 4||Col 5||Col 2+4||
|6.4|8.5|14.3|14.6|11.1|20.3|

With patch:
||Wildcard||Col 1||Col 2||Col 4||Col 5||Col 2+4||
|6.4|8.4|8.9|9.9|6.4|10.0|

Variation here was +- 0.2s.

So with this patch scanning is 2x faster than without in some cases, and never 
slower. No special hint needed, beyond declaring VERSIONS correctly.


  was:
The issue of slow seeking in ExplicitColumnTracker was brought up by 
[~vrodionov] on the dev list.
My idea here is to avoid the seeking if we know that there aren't many rows to 
skip.
How do we know? We'll use the column family's VERSIONS setting as a hint. If 
VERSIONS is set to 1 (or maybe some value  10) we'll avoid the seek and call 
SKIP repeatedly.

HBASE-9769 has some initial number for this approach:
Interestingly it depends on which column(s) is (are) selected.

Some numbers: 4m rows, 5 cols each, 1 cf, 10 bytes values, VERSIONS=1, 
everything filtered at the server with a ValueFilter. Everything measured in 
seconds.

Without patch:
||Wildcard||Col 1||Col 2||Col 4||Col 5||Col 2+4||
|6.4|8.5|14.3|14.6|11.1|20.3|

With patch:
||Wildcard||Col 1||Col 2||Col 4||Col 5||Col 2+4||
|6.4|8.4|8.9|9.9|6.4|10.0|

Variation here was +- 0.2s.

So with this patch scanning is 2x faster than without in some cases, and never 
slower. No special hint needed, beyond declaring VERSIONS correctly.



 Avoid seeking to next column in ExplicitColumnTracker when possible
 ---

 Key: HBASE-9778
 URL: https://issues.apache.org/jira/browse/HBASE-9778
 Project: HBase
  Issue Type: Bug
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Fix For: 0.98.0, 0.94.13, 0.96.1

 Attachments: 9778-0.94.txt, 9778-trunk.txt


 The issue of slow seeking in ExplicitColumnTracker was brought up by 
 [~vrodionov] on the dev list.
 My idea here is to avoid the seeking if we know that there aren't many 
 versions to skip.
 How do we know? We'll use the column family's VERSIONS setting as a hint. If 
 VERSIONS is set to 1 (or maybe some value  10) we'll avoid the seek and call 
 SKIP repeatedly.
 HBASE-9769 has some initial number for this approach:
 Interestingly it depends on which column(s) is (are) selected.
 Some numbers: 4m rows, 5 cols each, 1 cf, 10 bytes values, VERSIONS=1, 
 everything filtered at the server with a ValueFilter. Everything measured in 
 seconds.
 Without patch:
 ||Wildcard||Col 1||Col 2||Col 4||Col 5||Col 2+4||
 |6.4|8.5|14.3|14.6|11.1|20.3|
 With patch:
 ||Wildcard||Col 1||Col 2||Col 4||Col 5||Col 2+4||
 |6.4|8.4|8.9|9.9|6.4|10.0|
 Variation here was +- 0.2s.
 So with this patch scanning is 2x faster than without in some cases, and 
 never slower. No special hint needed, beyond declaring VERSIONS correctly.



--
This message was sent by Atlassian JIRA
(v6.1#6144)