[jira] [Commented] (HBASE-14822) Renewing leases of scanners doesn't work

2015-11-19 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15014102#comment-15014102
 ] 

Lars Hofhansl commented on HBASE-14822:
---

[~samarthjain] and took a look. It turns out that Phoenix does some funky 
requests where we have a scan with a filter that indicates "done" _and_ that 
has caching set to 0 - so this is essentially a useless request.
Be that as it may, though, for this use case this patch changes the behavior. 
As it so happens there is a good fix for this. Will upload a patch soon.

> Renewing leases of scanners doesn't work
> 
>
> Key: HBASE-14822
> URL: https://issues.apache.org/jira/browse/HBASE-14822
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.98.14
>Reporter: Samarth Jain
>Assignee: Lars Hofhansl
> Fix For: 2.0.0, 1.2.0, 1.3.0, 1.1.3, 0.98.17, 1.0.4
>
> Attachments: 14822-0.98-v2.txt, 14822-0.98.txt
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-14822) Renewing leases of scanners doesn't work

2015-11-19 Thread Lars Hofhansl (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl updated HBASE-14822:
--
Attachment: (was: 14822-0.98-v4.txt)

> Renewing leases of scanners doesn't work
> 
>
> Key: HBASE-14822
> URL: https://issues.apache.org/jira/browse/HBASE-14822
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.98.14
>Reporter: Samarth Jain
>Assignee: Lars Hofhansl
> Fix For: 2.0.0, 1.2.0, 1.3.0, 1.1.3, 0.98.17, 1.0.4
>
> Attachments: 14822-0.98-v2.txt, 14822-0.98.txt
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-14822) Renewing leases of scanners doesn't work

2015-11-19 Thread Lars Hofhansl (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl updated HBASE-14822:
--
Attachment: (was: 14822-0.98-v3.txt)

> Renewing leases of scanners doesn't work
> 
>
> Key: HBASE-14822
> URL: https://issues.apache.org/jira/browse/HBASE-14822
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.98.14
>Reporter: Samarth Jain
>Assignee: Lars Hofhansl
> Fix For: 2.0.0, 1.2.0, 1.3.0, 1.1.3, 0.98.17, 1.0.4
>
> Attachments: 14822-0.98-v2.txt, 14822-0.98.txt
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14822) Renewing leases of scanners doesn't work

2015-11-19 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15015331#comment-15015331
 ] 

Lars Hofhansl commented on HBASE-14822:
---

Waiting for a test run.

> Renewing leases of scanners doesn't work
> 
>
> Key: HBASE-14822
> URL: https://issues.apache.org/jira/browse/HBASE-14822
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.98.14
>Reporter: Samarth Jain
>Assignee: Lars Hofhansl
> Fix For: 2.0.0, 1.2.0, 1.3.0, 1.1.3, 0.98.17, 1.0.4
>
> Attachments: 14822-0.98-v2.txt, 14822-0.98-v3.txt, 14822-0.98.txt
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-14822) Renewing leases of scanners doesn't work

2015-11-19 Thread Lars Hofhansl (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl updated HBASE-14822:
--
Status: Open  (was: Patch Available)

> Renewing leases of scanners doesn't work
> 
>
> Key: HBASE-14822
> URL: https://issues.apache.org/jira/browse/HBASE-14822
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.98.14
>Reporter: Samarth Jain
>Assignee: Lars Hofhansl
> Fix For: 2.0.0, 1.2.0, 1.3.0, 1.1.3, 0.98.17, 1.0.4
>
> Attachments: 14822-0.98-v2.txt, 14822-0.98-v3.txt, 14822-0.98.txt, 
> 14822-v3-0.98.txt
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-14822) Renewing leases of scanners doesn't work

2015-11-19 Thread Lars Hofhansl (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl updated HBASE-14822:
--
Attachment: 14822-v3-0.98.txt

Naming patch so that HadoopQA recognizes it as 0.98 (I think it needs the -0.98 
last)

> Renewing leases of scanners doesn't work
> 
>
> Key: HBASE-14822
> URL: https://issues.apache.org/jira/browse/HBASE-14822
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.98.14
>Reporter: Samarth Jain
>Assignee: Lars Hofhansl
> Fix For: 2.0.0, 1.2.0, 1.3.0, 1.1.3, 0.98.17, 1.0.4
>
> Attachments: 14822-0.98-v2.txt, 14822-0.98-v3.txt, 14822-0.98.txt, 
> 14822-v3-0.98.txt
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-14822) Renewing leases of scanners doesn't work

2015-11-19 Thread Lars Hofhansl (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl updated HBASE-14822:
--
Status: Patch Available  (was: Open)

> Renewing leases of scanners doesn't work
> 
>
> Key: HBASE-14822
> URL: https://issues.apache.org/jira/browse/HBASE-14822
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.98.14
>Reporter: Samarth Jain
>Assignee: Lars Hofhansl
> Fix For: 2.0.0, 1.2.0, 1.3.0, 1.1.3, 0.98.17, 1.0.4
>
> Attachments: 14822-0.98-v2.txt, 14822-0.98-v3.txt, 14822-0.98.txt, 
> 14822-v3-0.98.txt
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14822) Renewing leases of scanners doesn't work

2015-11-18 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15011955#comment-15011955
 ] 

Lars Hofhansl commented on HBASE-14822:
---

Thanks [~samarthjain]. Good point. Although I think we only have avoid the {{if 
(!moreResults || closeScanner)}} part.
This is a bit fragile now. I'm tempted to remove that feature for now until 
this is working and not fragile.

> Renewing leases of scanners doesn't work
> 
>
> Key: HBASE-14822
> URL: https://issues.apache.org/jira/browse/HBASE-14822
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.98.14
>Reporter: Samarth Jain
>Assignee: Lars Hofhansl
> Fix For: 2.0.0, 1.2.0, 1.3.0, 1.1.3, 0.98.17, 1.0.4
>
> Attachments: 14822-0.98-v2.txt, 14822-0.98-v3.txt, 14822-0.98-v4.txt, 
> 14822-0.98.txt
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14822) Renewing leases of scanners doesn't work

2015-11-18 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15012069#comment-15012069
 ] 

Lars Hofhansl commented on HBASE-14822:
---

Looking again, the {{!moreResults}} is only set to false when we have a filter 
on the request and that filter indicates the scan is done.
So - as also indicated by the test I added - the code looks correct.

> Renewing leases of scanners doesn't work
> 
>
> Key: HBASE-14822
> URL: https://issues.apache.org/jira/browse/HBASE-14822
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.98.14
>Reporter: Samarth Jain
>Assignee: Lars Hofhansl
> Fix For: 2.0.0, 1.2.0, 1.3.0, 1.1.3, 0.98.17, 1.0.4
>
> Attachments: 14822-0.98-v2.txt, 14822-0.98-v3.txt, 14822-0.98-v4.txt, 
> 14822-0.98.txt
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14822) Renewing leases of scanners doesn't work

2015-11-18 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15012050#comment-15012050
 ] 

Lars Hofhansl commented on HBASE-14822:
---

So originally I put this in because I thought this can work without any server 
changes. Turns out I was wrong, my bad. Not sure I like this anymore.

> Renewing leases of scanners doesn't work
> 
>
> Key: HBASE-14822
> URL: https://issues.apache.org/jira/browse/HBASE-14822
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.98.14
>Reporter: Samarth Jain
>Assignee: Lars Hofhansl
> Fix For: 2.0.0, 1.2.0, 1.3.0, 1.1.3, 0.98.17, 1.0.4
>
> Attachments: 14822-0.98-v2.txt, 14822-0.98-v3.txt, 14822-0.98-v4.txt, 
> 14822-0.98.txt
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14791) Batch Deletes in MapReduce jobs (0.98)

2015-11-17 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15009826#comment-15009826
 ] 

Lars Hofhansl commented on HBASE-14791:
---

+1, thanks [~alexaraujo]

Going to commit now.

> Batch Deletes in MapReduce jobs (0.98)
> --
>
> Key: HBASE-14791
> URL: https://issues.apache.org/jira/browse/HBASE-14791
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.98.16
>Reporter: Lars Hofhansl
>Assignee: Alex Araujo
>  Labels: mapreduce
> Fix For: 0.98.17
>
> Attachments: HBASE-14791-0.98-v1.patch, HBASE-14791-0.98-v2.patch, 
> HBASE-14791-0.98.patch
>
>
> We found that some of our copy table job run for many hours, even when there 
> isn't that much data to copy.
> [~vik.karma] did his magic and found that the issue is with copying delete 
> markers (we use raw mode to also move deletes across).
> Looking at the code in 0.98 it's immediately obvious that deletes (unlike 
> puts) are not batched and hence sent to the other side one by one, causing a 
> network RTT for each delete marker.
> Looks like in trunk it's doing the right thing (using BufferedMutators for 
> all mutations in TableOutputFormat). So likely only a 0.98 (and 1.0, 1.1, 
> 1.2?) issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14822) Renewing leases of scanners doesn't work

2015-11-17 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15009839#comment-15009839
 ] 

Lars Hofhansl commented on HBASE-14822:
---

Shame on me for not adding a test to HBASE-1.
Looking.

> Renewing leases of scanners doesn't work
> 
>
> Key: HBASE-14822
> URL: https://issues.apache.org/jira/browse/HBASE-14822
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.98.14
>Reporter: Samarth Jain
>Assignee: Lars Hofhansl
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-14791) Batch Deletes in MapReduce jobs (0.98)

2015-11-17 Thread Lars Hofhansl (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl updated HBASE-14791:
--
  Resolution: Fixed
Hadoop Flags: Reviewed
  Status: Resolved  (was: Patch Available)

Committed to 0.98 only.
Thanks [~alexaraujo]

> Batch Deletes in MapReduce jobs (0.98)
> --
>
> Key: HBASE-14791
> URL: https://issues.apache.org/jira/browse/HBASE-14791
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.98.16
>Reporter: Lars Hofhansl
>Assignee: Alex Araujo
>  Labels: mapreduce
> Fix For: 0.98.17
>
> Attachments: HBASE-14791-0.98-v1.patch, HBASE-14791-0.98-v2.patch, 
> HBASE-14791-0.98.patch
>
>
> We found that some of our copy table job run for many hours, even when there 
> isn't that much data to copy.
> [~vik.karma] did his magic and found that the issue is with copying delete 
> markers (we use raw mode to also move deletes across).
> Looking at the code in 0.98 it's immediately obvious that deletes (unlike 
> puts) are not batched and hence sent to the other side one by one, causing a 
> network RTT for each delete marker.
> Looks like in trunk it's doing the right thing (using BufferedMutators for 
> all mutations in TableOutputFormat). So likely only a 0.98 (and 1.0, 1.1, 
> 1.2?) issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-14822) Renewing leases of scanners doesn't work

2015-11-17 Thread Lars Hofhansl (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl updated HBASE-14822:
--
Status: Patch Available  (was: Open)

> Renewing leases of scanners doesn't work
> 
>
> Key: HBASE-14822
> URL: https://issues.apache.org/jira/browse/HBASE-14822
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.98.14
>Reporter: Samarth Jain
>Assignee: Lars Hofhansl
> Fix For: 2.0.0, 1.2.0, 1.3.0, 1.1.3, 0.98.17, 1.0.4
>
> Attachments: 14822-0.98.txt
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14791) Batch Deletes in MapReduce jobs (0.98)

2015-11-17 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15009987#comment-15009987
 ] 

Lars Hofhansl commented on HBASE-14791:
---

I do not see any java doc warnings pertaining to the files changed in this 
patch. Is that a dud?

> Batch Deletes in MapReduce jobs (0.98)
> --
>
> Key: HBASE-14791
> URL: https://issues.apache.org/jira/browse/HBASE-14791
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.98.16
>Reporter: Lars Hofhansl
>Assignee: Alex Araujo
>  Labels: mapreduce
> Fix For: 0.98.17
>
> Attachments: HBASE-14791-0.98-v1.patch, HBASE-14791-0.98-v2.patch, 
> HBASE-14791-0.98.patch
>
>
> We found that some of our copy table job run for many hours, even when there 
> isn't that much data to copy.
> [~vik.karma] did his magic and found that the issue is with copying delete 
> markers (we use raw mode to also move deletes across).
> Looking at the code in 0.98 it's immediately obvious that deletes (unlike 
> puts) are not batched and hence sent to the other side one by one, causing a 
> network RTT for each delete marker.
> Looks like in trunk it's doing the right thing (using BufferedMutators for 
> all mutations in TableOutputFormat). So likely only a 0.98 (and 1.0, 1.1, 
> 1.2?) issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-14822) Renewing leases of scanners doesn't work

2015-11-17 Thread Lars Hofhansl (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl updated HBASE-14822:
--
Attachment: 14822-0.98.txt

Here's a patch for 0.98 with test that actually tests that the lease does not 
time out. Use -1 to indicate a close request, 0 can be used to simply renew the 
scanner.

The only concern I'd have with the test is that it uses wall time, so on 
*really* slow VMs it could fail because of this (the least could expire before 
we get to renew it.)


> Renewing leases of scanners doesn't work
> 
>
> Key: HBASE-14822
> URL: https://issues.apache.org/jira/browse/HBASE-14822
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.98.14
>Reporter: Samarth Jain
>Assignee: Lars Hofhansl
> Attachments: 14822-0.98.txt
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-14822) Renewing leases of scanners doesn't work

2015-11-17 Thread Lars Hofhansl (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl updated HBASE-14822:
--
Fix Version/s: 1.0.4
   1.1.3
   1.3.0
   1.2.0
   2.0.0

> Renewing leases of scanners doesn't work
> 
>
> Key: HBASE-14822
> URL: https://issues.apache.org/jira/browse/HBASE-14822
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.98.14
>Reporter: Samarth Jain
>Assignee: Lars Hofhansl
> Fix For: 2.0.0, 1.2.0, 1.3.0, 1.1.3, 0.98.17, 1.0.4
>
> Attachments: 14822-0.98.txt
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-14822) Renewing leases of scanners doesn't work

2015-11-17 Thread Lars Hofhansl (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl updated HBASE-14822:
--
Fix Version/s: 0.98.17

> Renewing leases of scanners doesn't work
> 
>
> Key: HBASE-14822
> URL: https://issues.apache.org/jira/browse/HBASE-14822
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.98.14
>Reporter: Samarth Jain
>Assignee: Lars Hofhansl
> Fix For: 2.0.0, 1.2.0, 1.3.0, 1.1.3, 0.98.17, 1.0.4
>
> Attachments: 14822-0.98.txt
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14822) Renewing leases of scanners doesn't work

2015-11-17 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15010320#comment-15010320
 ] 

Lars Hofhansl commented on HBASE-14822:
---

bq. Would it work to use EnvironmentEdgeManager#injectEdge to inject 
ManualEnvironmentEdge for explicitly fiddling with the (virtual) time? 

Only if all relevant parts actually use the env edge :)  Lemme check. Also 
forgot a license header on the new test class.


> Renewing leases of scanners doesn't work
> 
>
> Key: HBASE-14822
> URL: https://issues.apache.org/jira/browse/HBASE-14822
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.98.14
>Reporter: Samarth Jain
>Assignee: Lars Hofhansl
> Fix For: 2.0.0, 1.2.0, 1.3.0, 1.1.3, 0.98.17, 1.0.4
>
> Attachments: 14822-0.98.txt
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14822) Renewing leases of scanners doesn't work

2015-11-17 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15010347#comment-15010347
 ] 

Lars Hofhansl commented on HBASE-14822:
---

Looked. The lease times out in wall time unfortunately. Now I could try to mock 
the lease timeout in the region server, but that seems to be overkill. I'll 
instead increase the timeout logic for the test such that a VM needs to be gone 
for a minute to cause a failure.


> Renewing leases of scanners doesn't work
> 
>
> Key: HBASE-14822
> URL: https://issues.apache.org/jira/browse/HBASE-14822
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.98.14
>Reporter: Samarth Jain
>Assignee: Lars Hofhansl
> Fix For: 2.0.0, 1.2.0, 1.3.0, 1.1.3, 0.98.17, 1.0.4
>
> Attachments: 14822-0.98.txt
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-14822) Renewing leases of scanners doesn't work

2015-11-17 Thread Lars Hofhansl (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl updated HBASE-14822:
--
Attachment: 14822-0.98-v2.txt

* fixed license header
* using 1/2 the default leas timeout now. When the VM is so slow that we get 
into this ballpark very many other tests will fail anyway due to vanishing 
scanners. So that should be OK.

This should be good to commit.

> Renewing leases of scanners doesn't work
> 
>
> Key: HBASE-14822
> URL: https://issues.apache.org/jira/browse/HBASE-14822
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.98.14
>Reporter: Samarth Jain
>Assignee: Lars Hofhansl
> Fix For: 2.0.0, 1.2.0, 1.3.0, 1.1.3, 0.98.17, 1.0.4
>
> Attachments: 14822-0.98-v2.txt, 14822-0.98.txt
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14791) [0.98] CopyTable is extremely slow when moving delete markers

2015-11-12 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15003336#comment-15003336
 ] 

Lars Hofhansl commented on HBASE-14791:
---

Looks good!

Two questions:
# Would subclassing (as opposed to delegation as used in the patch) save us a 
bunch of code?
# Would the change from HTable to HTableInterface break compatibility for folks 
subclasses TableOutputFormat? (would not be an issue too if we do the 
subclassing of #1)


> [0.98] CopyTable is extremely slow when moving delete markers
> -
>
> Key: HBASE-14791
> URL: https://issues.apache.org/jira/browse/HBASE-14791
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.98.16
>Reporter: Lars Hofhansl
>Assignee: Alex Araujo
> Attachments: HBASE-14791-0.98-v1.patch
>
>
> We found that some of our copy table job run for many hours, even when there 
> isn't that much data to copy.
> [~vik.karma] did his magic and found that the issue is with copying delete 
> markers (we use raw mode to also move deletes across).
> Looking at the code in 0.98 it's immediately obvious that deletes (unlike 
> puts) are not batched and hence sent to the other side one by one, causing a 
> network RTT for each delete marker.
> Looks like in trunk it's doing the right thing (using BufferedMutators for 
> all mutations in TableOutputFormat). So likely only a 0.98 (and 1.0, 1.1, 
> 1.2?) issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (HBASE-14791) [0.98] CopyTable is extremely slow when moving delete markers

2015-11-12 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15003336#comment-15003336
 ] 

Lars Hofhansl edited comment on HBASE-14791 at 11/13/15 1:42 AM:
-

Looks good! [~alexaraujo]

Two questions:
# Would subclassing (as opposed to delegation as used in the patch) save us a 
bunch of code?
# Would the change from HTable to HTableInterface break compatibility for folks 
subclasses TableOutputFormat? (would not be an issue too if we do the 
subclassing of #1)



was (Author: lhofhansl):
Looks good!

Two questions:
# Would subclassing (as opposed to delegation as used in the patch) save us a 
bunch of code?
# Would the change from HTable to HTableInterface break compatibility for folks 
subclasses TableOutputFormat? (would not be an issue too if we do the 
subclassing of #1)


> [0.98] CopyTable is extremely slow when moving delete markers
> -
>
> Key: HBASE-14791
> URL: https://issues.apache.org/jira/browse/HBASE-14791
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.98.16
>Reporter: Lars Hofhansl
>Assignee: Alex Araujo
> Attachments: HBASE-14791-0.98-v1.patch
>
>
> We found that some of our copy table job run for many hours, even when there 
> isn't that much data to copy.
> [~vik.karma] did his magic and found that the issue is with copying delete 
> markers (we use raw mode to also move deletes across).
> Looking at the code in 0.98 it's immediately obvious that deletes (unlike 
> puts) are not batched and hence sent to the other side one by one, causing a 
> network RTT for each delete marker.
> Looks like in trunk it's doing the right thing (using BufferedMutators for 
> all mutations in TableOutputFormat). So likely only a 0.98 (and 1.0, 1.1, 
> 1.2?) issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14221) Reduce the number of time row comparison is done in a Scan

2015-11-12 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15003338#comment-15003338
 ] 

Lars Hofhansl commented on HBASE-14221:
---

The LoserTree did not work out in all cases, but in HBASE-9969 Matt has an 
alternate implementation of KeyValueHeap, which I thought was nice for two 
reasons:
# it saves some compares, and
# the implementation is our own, so we can tweak it more later (it has always 
bothered me a bit that _the_ central data structure for HBase's mergesort is 
just the Java standard PriorityQueue :) )


> Reduce the number of time row comparison is done in a Scan
> --
>
> Key: HBASE-14221
> URL: https://issues.apache.org/jira/browse/HBASE-14221
> Project: HBase
>  Issue Type: Sub-task
>  Components: Scanners
>Reporter: ramkrishna.s.vasudevan
>Assignee: ramkrishna.s.vasudevan
> Fix For: 2.0.0
>
> Attachments: 14221-0.98-takeALook.txt, HBASE-14221.patch, 
> HBASE-14221_1.patch, HBASE-14221_1.patch, HBASE-14221_6.patch, 
> withmatchingRowspatch.png, withoutmatchingRowspatch.png
>
>
> When we tried to do some profiling with the PE tool found this.
> Currently we do row comparisons in 3 places in a simple Scan case.
> 1) ScanQueryMatcher
> {code}
>int ret = this.rowComparator.compareRows(curCell, cell);
> if (!this.isReversed) {
>   if (ret <= -1) {
> return MatchCode.DONE;
>   } else if (ret >= 1) {
> // could optimize this, if necessary?
> // Could also be called SEEK_TO_CURRENT_ROW, but this
> // should be rare/never happens.
> return MatchCode.SEEK_NEXT_ROW;
>   }
> } else {
>   if (ret <= -1) {
> return MatchCode.SEEK_NEXT_ROW;
>   } else if (ret >= 1) {
> return MatchCode.DONE;
>   }
> }
> {code}
> 2) In StoreScanner next() while starting to scan the row
> {code}
> if (!scannerContext.hasAnyLimit(LimitScope.BETWEEN_CELLS) || 
> matcher.curCell == null ||
> isNewRow || !CellUtil.matchingRow(peeked, matcher.curCell)) {
>   this.countPerRow = 0;
>   matcher.setToNewRow(peeked);
> }
> {code}
> Particularly to see if we are in a new row.
> 3) In HRegion
> {code}
>   scannerContext.setKeepProgress(true);
>   heap.next(results, scannerContext);
>   scannerContext.setKeepProgress(tmpKeepProgress);
>   nextKv = heap.peek();
> moreCellsInRow = moreCellsInRow(nextKv, currentRowCell);
> {code}
> Here again there are cases where we need to careful for a MultiCF case.  Was 
> trying to solve this for the MultiCF case but is having lot of cases to 
> solve. But atleast for a single CF case I think these comparison can be 
> reduced.
> So for a single CF case in the SQM we are able to find if we have crossed a 
> row using the code pasted above in SQM. That comparison is definitely needed.
> Now in case of a single CF the HRegion is going to have only one element in 
> the heap and so the 3rd comparison can surely be avoided if the 
> StoreScanner.next() was over due to MatchCode.DONE caused by SQM.
> Coming to the 2nd compareRows that we do in StoreScanner. next() - even that 
> can be avoided if we know that the previous next() call was over due to a new 
> row. Doing all this I found that the compareRows in the profiler which was 
> 19% got reduced to 13%. Initially we can solve for single CF case which can 
> be extended to MultiCF cases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HBASE-14791) [0.98] CopyTable is extremely slow when moving delete markers

2015-11-10 Thread Lars Hofhansl (JIRA)
Lars Hofhansl created HBASE-14791:
-

 Summary: [0.98] CopyTable is extremely slow when moving delete 
markers
 Key: HBASE-14791
 URL: https://issues.apache.org/jira/browse/HBASE-14791
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.98.16
Reporter: Lars Hofhansl


We found that some of our copy table job run for many hours, even when there 
isn't that much data to copy.

[~vik.karma] did his magic and found that the issue with copying delete markers 
(we use raw mode to also move deletes across).
Looking at the code in 0.98 it's immediately obvious that deletes (unlike puts) 
are not batched and hence sent to the other side one by one, cause a network 
RTT for each delete marker.

Looks like in trunk it's doing the right thing (using BufferedMutators for all 
mutations in TableOutputFormat). So likely only a 0.98 (and 1.0, 1.1, 1.2?) 
issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14791) [0.98] CopyTable is extremely slow when moving delete markers

2015-11-10 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14999064#comment-14999064
 ] 

Lars Hofhansl commented on HBASE-14791:
---

Cool!

> [0.98] CopyTable is extremely slow when moving delete markers
> -
>
> Key: HBASE-14791
> URL: https://issues.apache.org/jira/browse/HBASE-14791
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.98.16
>Reporter: Lars Hofhansl
>Assignee: Alex Araujo
>
> We found that some of our copy table job run for many hours, even when there 
> isn't that much data to copy.
> [~vik.karma] did his magic and found that the issue with copying delete 
> markers (we use raw mode to also move deletes across).
> Looking at the code in 0.98 it's immediately obvious that deletes (unlike 
> puts) are not batched and hence sent to the other side one by one, cause a 
> network RTT for each delete marker.
> Looks like in trunk it's doing the right thing (using BufferedMutators for 
> all mutations in TableOutputFormat). So likely only a 0.98 (and 1.0, 1.1, 
> 1.2?) issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-14791) [0.98] CopyTable is extremely slow when moving delete markers

2015-11-10 Thread Lars Hofhansl (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl updated HBASE-14791:
--
Description: 
We found that some of our copy table job run for many hours, even when there 
isn't that much data to copy.

[~vik.karma] did his magic and found that the issue is with copying delete 
markers (we use raw mode to also move deletes across).
Looking at the code in 0.98 it's immediately obvious that deletes (unlike puts) 
are not batched and hence sent to the other side one by one, causing a network 
RTT for each delete marker.

Looks like in trunk it's doing the right thing (using BufferedMutators for all 
mutations in TableOutputFormat). So likely only a 0.98 (and 1.0, 1.1, 1.2?) 
issue.

  was:
We found that some of our copy table job run for many hours, even when there 
isn't that much data to copy.

[~vik.karma] did his magic and found that the issue with copying delete markers 
(we use raw mode to also move deletes across).
Looking at the code in 0.98 it's immediately obvious that deletes (unlike puts) 
are not batched and hence sent to the other side one by one, cause a network 
RTT for each delete marker.

Looks like in trunk it's doing the right thing (using BufferedMutators for all 
mutations in TableOutputFormat). So likely only a 0.98 (and 1.0, 1.1, 1.2?) 
issue.


> [0.98] CopyTable is extremely slow when moving delete markers
> -
>
> Key: HBASE-14791
> URL: https://issues.apache.org/jira/browse/HBASE-14791
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.98.16
>Reporter: Lars Hofhansl
>Assignee: Alex Araujo
>
> We found that some of our copy table job run for many hours, even when there 
> isn't that much data to copy.
> [~vik.karma] did his magic and found that the issue is with copying delete 
> markers (we use raw mode to also move deletes across).
> Looking at the code in 0.98 it's immediately obvious that deletes (unlike 
> puts) are not batched and hence sent to the other side one by one, causing a 
> network RTT for each delete marker.
> Looks like in trunk it's doing the right thing (using BufferedMutators for 
> all mutations in TableOutputFormat). So likely only a 0.98 (and 1.0, 1.1, 
> 1.2?) issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14791) [0.98] CopyTable is extremely slow when moving delete markers

2015-11-10 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1443#comment-1443
 ] 

Lars Hofhansl commented on HBASE-14791:
---

One tricky aspect of this is that we generally want to keep the order of the 
deletes w.r.t. puts as much as possible.
If have one buffering mechanism for puts and another for deletes that is hard 
to maintain.

For correctness it is enough to ensure that deletes are shipped after the puts, 
not sure that's easy to do, though.
Then again in cases where we want to ship the deletes, there's better an 
appropriate setup on the receiving to keep delete markers around correct, 
otherwise it makes not sense to ship them... not maybe not an issue at all?


> [0.98] CopyTable is extremely slow when moving delete markers
> -
>
> Key: HBASE-14791
> URL: https://issues.apache.org/jira/browse/HBASE-14791
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.98.16
>Reporter: Lars Hofhansl
>Assignee: Alex Araujo
>
> We found that some of our copy table job run for many hours, even when there 
> isn't that much data to copy.
> [~vik.karma] did his magic and found that the issue is with copying delete 
> markers (we use raw mode to also move deletes across).
> Looking at the code in 0.98 it's immediately obvious that deletes (unlike 
> puts) are not batched and hence sent to the other side one by one, causing a 
> network RTT for each delete marker.
> Looks like in trunk it's doing the right thing (using BufferedMutators for 
> all mutations in TableOutputFormat). So likely only a 0.98 (and 1.0, 1.1, 
> 1.2?) issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14511) StoreFile.Writer Meta Plugin

2015-11-08 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14995798#comment-14995798
 ] 

Lars Hofhansl commented on HBASE-14511:
---

This is mostly good to go, no? I like how you avoided the iterator creation in 
all the hot loops.


> StoreFile.Writer Meta Plugin
> 
>
> Key: HBASE-14511
> URL: https://issues.apache.org/jira/browse/HBASE-14511
> Project: HBase
>  Issue Type: New Feature
>Reporter: Vladimir Rodionov
>Assignee: Vladimir Rodionov
> Attachments: HBASE-14511-v3.patch, HBASE-14511.v1.patch, 
> HBASE-14511.v2.patch
>
>
> During my work on a new compaction policies (HBASE-14468, HBASE-14477) I had 
> to modify the existing code of a StoreFile.Writer to add additional meta-info 
> required by these new  policies. I think that it should be done by means of a 
> new Plugin framework, because this seems to be a general capability/feature. 
> As a future enhancement this can become a part of a more general 
> StoreFileWriter/Reader plugin architecture. But I need only Meta section of a 
> store file.
> This could be used, for example, to collect rowkeys distribution information 
> during hfile creation. This info can be used later to find the optimal region 
> split key or to create optimal set of sub-regions for M/R jobs or other jobs 
> which can operate on a sub-region level.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13082) Coarsen StoreScanner locks to RegionScanner

2015-11-05 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14991667#comment-14991667
 ] 

Lars Hofhansl commented on HBASE-13082:
---

bq. The volatile check is still a memory fence barrier similar to the lock 
although much less expensive. 

Right. The locks are not the problem, it's the memory barriers on every call to 
next and pretty that cause the  performance issue. The locks are (almost) never 
contended. The point was to lock once for the scan, and then use the versioned 
access proxy without locking.
Cool that even with the volatile it's faster, though!


> Coarsen StoreScanner locks to RegionScanner
> ---
>
> Key: HBASE-13082
> URL: https://issues.apache.org/jira/browse/HBASE-13082
> Project: HBase
>  Issue Type: Bug
>Reporter: Lars Hofhansl
>Assignee: ramkrishna.s.vasudevan
> Attachments: 13082-test.txt, 13082-v2.txt, 13082-v3.txt, 
> 13082-v4.txt, 13082.txt, 13082.txt, HBASE-13082.pdf, HBASE-13082_1.pdf, 
> HBASE-13082_1_WIP.patch, HBASE-13082_2_WIP.patch, HBASE-13082_3.patch, 
> HBASE-13082_4.patch, HBASE-13082_9.patch, HBASE-13082_9.patch, gc.png, 
> gc.png, gc.png, hits.png, next.png, next.png
>
>
> Continuing where HBASE-10015 left of.
> We can avoid locking (and memory fencing) inside StoreScanner by deferring to 
> the lock already held by the RegionScanner.
> In tests this shows quite a scan improvement and reduced CPU (the fences make 
> the cores wait for memory fetches).
> There are some drawbacks too:
> * All calls to RegionScanner need to be remain synchronized
> * Implementors of coprocessors need to be diligent in following the locking 
> contract. For example Phoenix does not lock RegionScanner.nextRaw() and 
> required in the documentation (not picking on Phoenix, this one is my fault 
> as I told them it's OK)
> * possible starving of flushes and compaction with heavy read load. 
> RegionScanner operations would keep getting the locks and the 
> flushes/compactions would not be able finalize the set of files.
> I'll have a patch soon.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14511) StoreFile.Writer Meta Plugin

2015-10-27 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14977791#comment-14977791
 ] 

Lars Hofhansl commented on HBASE-14511:
---

I can also see a new filter API: As an optional optimization a filter can be 
passed an HFile meta block (or whether abstraction is useful) and then decide 
to filter the entire file). I.e. the Meta Data here is only useful if one can 
act on it in a meaningful way.


> StoreFile.Writer Meta Plugin
> 
>
> Key: HBASE-14511
> URL: https://issues.apache.org/jira/browse/HBASE-14511
> Project: HBase
>  Issue Type: New Feature
>Reporter: Vladimir Rodionov
>Assignee: Vladimir Rodionov
> Attachments: HBASE-14511-v3.patch, HBASE-14511.v1.patch, 
> HBASE-14511.v2.patch
>
>
> During my work on a new compaction policies (HBASE-14468, HBASE-14477) I had 
> to modify the existing code of a StoreFile.Writer to add additional meta-info 
> required by these new  policies. I think that it should be done by means of a 
> new Plugin framework, because this seems to be a general capability/feature. 
> As a future enhancement this can become a part of a more general 
> StoreFileWriter/Reader plugin architecture. But I need only Meta section of a 
> store file.
> This could be used, for example, to collect rowkeys distribution information 
> during hfile creation. This info can be used later to find the optimal region 
> split key or to create optimal set of sub-regions for M/R jobs or other jobs 
> which can operate on a sub-region level.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13318) RpcServer.getListenerAddress should handle when the accept channel is closed

2015-10-24 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14973013#comment-14973013
 ] 

Lars Hofhansl commented on HBASE-13318:
---

Looks good (although I admit I find it hard convince myself that the behaviour 
has not changed).

> RpcServer.getListenerAddress should handle when the accept channel is closed
> 
>
> Key: HBASE-13318
> URL: https://issues.apache.org/jira/browse/HBASE-13318
> Project: HBase
>  Issue Type: Bug
>Reporter: Lars Hofhansl
>Assignee: Andrew Purtell
>Priority: Minor
>  Labels: thread-safety
> Fix For: 2.0.0, 1.2.0, 1.3.0, 1.0.3, 1.1.3, 0.98.16
>
> Attachments: HBASE-13318.patch
>
>
> We just saw exceptions like these:
> {noformat}
> Exception in thread "B.DefaultRpcServer.handler=45,queue=0,port=60020" 
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.hbase.ipc.RpcServer$Listener.getAddress(RpcServer.java:753)
>   at 
> org.apache.hadoop.hbase.ipc.RpcServer.getListenerAddress(RpcServer.java:2157)
>   at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:146)
>   at 
> org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:130)
>   at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:107)
>   at java.lang.Thread.run(Thread.java:745)
> {noformat}
> Looks like RpcServer$Listener.getAddress should be synchronized 
> (acceptChannel is set to null upon exiting the thread under in a synchronized 
> block).
> Should be happening very rarely only.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14657) Remove unneeded API from EncodedSeeker

2015-10-23 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14972387#comment-14972387
 ] 

Lars Hofhansl commented on HBASE-14657:
---

Cool... I was gonna do it. Just need to wait long enough :)
Thanks [~chenheng]!

> Remove unneeded API from EncodedSeeker
> --
>
> Key: HBASE-14657
> URL: https://issues.apache.org/jira/browse/HBASE-14657
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Lars Hofhansl
>Assignee: Heng Chen
> Fix For: 2.0.0, 1.2.0, 1.3.0, 1.0.3, 1.1.3
>
> Attachments: HBASE-14657-branch-1.patch, HBASE-14657.patch
>
>
> See parent. We do not need getKeyValueBuffer. It's only used for tests, and 
> parent patch fixes all tests to use getKeyValue instead.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14613) Remove MemStoreChunkPool?

2015-10-23 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14972390#comment-14972390
 ] 

Lars Hofhansl commented on HBASE-14613:
---

Hmm... Nothing in this thread represented a convincing argument that we should 
keep it. ;)
Reuse is no indication that the GC would not do better (and the increase young 
collections indicate an issue), and performance was unaffected as I had 
predicted.

Anyway, we can leave it in as long as it remains default off.

-1 on turning this always on by default, though.


> Remove MemStoreChunkPool?
> -
>
> Key: HBASE-14613
> URL: https://issues.apache.org/jira/browse/HBASE-14613
> Project: HBase
>  Issue Type: Brainstorming
>Reporter: Lars Hofhansl
>Priority: Minor
> Attachments: 14613-0.98.txt, gc.png, writes.png
>
>
> I just stumbled across MemStoreChunkPool. The idea behind is to reuse chunks 
> of allocations rather than letting the GC handle this.
> Now, it's off by default, and it seems to me to be of dubious value. I'd 
> recommend just removing it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14628) [0.98] Save object creation for scanning with block encodings

2015-10-20 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14965745#comment-14965745
 ] 

Lars Hofhansl commented on HBASE-14628:
---

And agreed on removing getValueShallowCopy. Return a ByteBuffer from any method 
leaks internal implementation that we should avoid (even in in internal 
interface).

> [0.98] Save object creation for scanning with block encodings
> -
>
> Key: HBASE-14628
> URL: https://issues.apache.org/jira/browse/HBASE-14628
> Project: HBase
>  Issue Type: Bug
>Reporter: Lars Hofhansl
>Assignee: Lars Hofhansl
> Fix For: 0.98.16
>
> Attachments: 14628-0.98-v2.txt, 14628-0.98.txt
>
>
> I noticed that (at least in 0.98 - master is entirely different) we create 
> ByteBuffer just to create a byte[], which is then used to create a KeyValue.
> We can save the creation of the ByteBuffer and hence save allocating an extra 
> object for each KV we find by creating the byte[] directly.
> In a Phoenix count\(*) query that saved from 10% of runtime.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-14628) [0.98] Save object creation for scanning with block encodings

2015-10-20 Thread Lars Hofhansl (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14628?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl updated HBASE-14628:
--
  Resolution: Fixed
Hadoop Flags: Reviewed
  Status: Resolved  (was: Patch Available)

Committed to 0.98. Thanks for looking.

> [0.98] Save object creation for scanning with block encodings
> -
>
> Key: HBASE-14628
> URL: https://issues.apache.org/jira/browse/HBASE-14628
> Project: HBase
>  Issue Type: Bug
>Reporter: Lars Hofhansl
>Assignee: Lars Hofhansl
> Fix For: 0.98.16
>
> Attachments: 14628-0.98-v2.txt, 14628-0.98.txt
>
>
> I noticed that (at least in 0.98 - master is entirely different) we create 
> ByteBuffer just to create a byte[], which is then used to create a KeyValue.
> We can save the creation of the ByteBuffer and hence save allocating an extra 
> object for each KV we find by creating the byte[] directly.
> In a Phoenix count\(*) query that saved from 10% of runtime.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14463) Severe performance downgrade when parallel reading a single key from BucketCache

2015-10-20 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14965840#comment-14965840
 ] 

Lars Hofhansl commented on HBASE-14463:
---

Bit late to the party. We do have the IdLock in order to only lock the block(s) 
in question, and not take a "global" lock in a sense. That probably causes the 
5% degradation. I'd assume that'd be worse if we only hit random blocks _and_ 
we have to load the blocks.

On first blush this does not strike as the right solution.

In HFileReaderXXX we do double-checked locking in order to avoid taking the 
lock completely when the block is already cached. Can we do something this that 
here?

> Severe performance downgrade when parallel reading a single key from 
> BucketCache
> 
>
> Key: HBASE-14463
> URL: https://issues.apache.org/jira/browse/HBASE-14463
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.98.14, 1.1.2
>Reporter: Yu Li
>Assignee: Yu Li
> Fix For: 2.0.0, 1.2.0, 1.3.0, 0.98.16
>
> Attachments: GC_with_WeakObjectPool.png, HBASE-14463.patch, 
> HBASE-14463_v11.patch, HBASE-14463_v12.patch, HBASE-14463_v2.patch, 
> HBASE-14463_v3.patch, HBASE-14463_v4.patch, HBASE-14463_v5.patch, 
> TestBucketCache-new_with_IdLock.png, 
> TestBucketCache-new_with_IdReadWriteLock.png, 
> TestBucketCache_with_IdLock-latest.png, TestBucketCache_with_IdLock.png, 
> TestBucketCache_with_IdReadWriteLock-latest.png, 
> TestBucketCache_with_IdReadWriteLock-resolveLockLeak.png, 
> TestBucketCache_with_IdReadWriteLock.png
>
>
> We store feature data of online items in HBase, do machine learning on these 
> features, and supply the outputs to our online search engine. In such 
> scenario we will launch hundreds of yarn workers and each worker will read 
> all features of one item(i.e. single rowkey in HBase), so there'll be heavy 
> parallel reading on a single rowkey.
> We were using LruCache but start to try BucketCache recently to resolve gc 
> issue, and just as titled we have observed severe performance downgrade. 
> After some analytics we found the root cause is the lock in 
> BucketCache#getBlock, as shown below
> {code}
>   try {
> lockEntry = offsetLock.getLockEntry(bucketEntry.offset());
> // ...
> if (bucketEntry.equals(backingMap.get(key))) {
>   // ...
>   int len = bucketEntry.getLength();
>   Cacheable cachedBlock = ioEngine.read(bucketEntry.offset(), len,
>   bucketEntry.deserializerReference(this.deserialiserMap));
> {code}
> Since ioEnging.read involves array copy, it's much more time-costed than the 
> operation in LruCache. And since we're using synchronized in 
> IdLock#getLockEntry, parallel read dropping on the same bucket would be 
> executed in serial, which causes a really bad performance.
> To resolve the problem, we propose to use ReentranceReadWriteLock in 
> BucketCache, and introduce a new class called IdReadWriteLock to implement it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14628) [0.98] Save object creation for scanning with block encodings

2015-10-20 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14965720#comment-14965720
 ] 

Lars Hofhansl commented on HBASE-14628:
---

Filed sub task (HBASE-14657) to remove the API from EncodedSeeker in 1.0+

> [0.98] Save object creation for scanning with block encodings
> -
>
> Key: HBASE-14628
> URL: https://issues.apache.org/jira/browse/HBASE-14628
> Project: HBase
>  Issue Type: Bug
>Reporter: Lars Hofhansl
>Assignee: Lars Hofhansl
> Fix For: 0.98.16
>
> Attachments: 14628-0.98-v2.txt, 14628-0.98.txt
>
>
> I noticed that (at least in 0.98 - master is entirely different) we create 
> ByteBuffer just to create a byte[], which is then used to create a KeyValue.
> We can save the creation of the ByteBuffer and hence save allocating an extra 
> object for each KV we find by creating the byte[] directly.
> In a Phoenix count\(*) query that saved from 10% of runtime.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HBASE-14657) Remove unneeded API from EncodedSeeker

2015-10-20 Thread Lars Hofhansl (JIRA)
Lars Hofhansl created HBASE-14657:
-

 Summary: Remove unneeded API from EncodedSeeker
 Key: HBASE-14657
 URL: https://issues.apache.org/jira/browse/HBASE-14657
 Project: HBase
  Issue Type: Sub-task
Reporter: Lars Hofhansl
 Fix For: 2.0.0, 1.2.0, 1.3.0, 1.0.3, 1.1.3


See parent. We do not need getKeyValueBuffer. It's only used for tests, and 
parent patch fixes all tests to use getKeyValue instead.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14463) Severe performance downgrade when parallel reading a single key from BucketCache

2015-10-20 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14965843#comment-14965843
 ] 

Lars Hofhansl commented on HBASE-14463:
---

Never mind. Looked at the patch again - should have taken a closer look before 
I sent the previous. Looks good.

> Severe performance downgrade when parallel reading a single key from 
> BucketCache
> 
>
> Key: HBASE-14463
> URL: https://issues.apache.org/jira/browse/HBASE-14463
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.98.14, 1.1.2
>Reporter: Yu Li
>Assignee: Yu Li
> Fix For: 2.0.0, 1.2.0, 1.3.0, 0.98.16
>
> Attachments: GC_with_WeakObjectPool.png, HBASE-14463.patch, 
> HBASE-14463_v11.patch, HBASE-14463_v12.patch, HBASE-14463_v2.patch, 
> HBASE-14463_v3.patch, HBASE-14463_v4.patch, HBASE-14463_v5.patch, 
> TestBucketCache-new_with_IdLock.png, 
> TestBucketCache-new_with_IdReadWriteLock.png, 
> TestBucketCache_with_IdLock-latest.png, TestBucketCache_with_IdLock.png, 
> TestBucketCache_with_IdReadWriteLock-latest.png, 
> TestBucketCache_with_IdReadWriteLock-resolveLockLeak.png, 
> TestBucketCache_with_IdReadWriteLock.png
>
>
> We store feature data of online items in HBase, do machine learning on these 
> features, and supply the outputs to our online search engine. In such 
> scenario we will launch hundreds of yarn workers and each worker will read 
> all features of one item(i.e. single rowkey in HBase), so there'll be heavy 
> parallel reading on a single rowkey.
> We were using LruCache but start to try BucketCache recently to resolve gc 
> issue, and just as titled we have observed severe performance downgrade. 
> After some analytics we found the root cause is the lock in 
> BucketCache#getBlock, as shown below
> {code}
>   try {
> lockEntry = offsetLock.getLockEntry(bucketEntry.offset());
> // ...
> if (bucketEntry.equals(backingMap.get(key))) {
>   // ...
>   int len = bucketEntry.getLength();
>   Cacheable cachedBlock = ioEngine.read(bucketEntry.offset(), len,
>   bucketEntry.deserializerReference(this.deserialiserMap));
> {code}
> Since ioEnging.read involves array copy, it's much more time-costed than the 
> operation in LruCache. And since we're using synchronized in 
> IdLock#getLockEntry, parallel read dropping on the same bucket would be 
> executed in serial, which causes a really bad performance.
> To resolve the problem, we propose to use ReentranceReadWriteLock in 
> BucketCache, and introduce a new class called IdReadWriteLock to implement it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14628) [0.98] Save object creation for scanning with block encodings

2015-10-20 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14965890#comment-14965890
 ] 

Lars Hofhansl commented on HBASE-14628:
---

[~giacomotaylor], FYI. Shaves another 5-10% of many Phoenix queries that 
traverse a lot of KVs.

> [0.98] Save object creation for scanning with block encodings
> -
>
> Key: HBASE-14628
> URL: https://issues.apache.org/jira/browse/HBASE-14628
> Project: HBase
>  Issue Type: Bug
>Reporter: Lars Hofhansl
>Assignee: Lars Hofhansl
> Fix For: 0.98.16
>
> Attachments: 14628-0.98-v2.txt, 14628-0.98.txt
>
>
> I noticed that (at least in 0.98 - master is entirely different) we create 
> ByteBuffer just to create a byte[], which is then used to create a KeyValue.
> We can save the creation of the ByteBuffer and hence save allocating an extra 
> object for each KV we find by creating the byte[] directly.
> In a Phoenix count\(*) query that saved from 10% of runtime.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14463) Severe performance downgrade when parallel reading a single key from BucketCache

2015-10-20 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14965848#comment-14965848
 ] 

Lars Hofhansl commented on HBASE-14463:
---

Curious: What's the penalty of not having the lockPool? Past experiences taught 
me that these things will never be sized right. :)

> Severe performance downgrade when parallel reading a single key from 
> BucketCache
> 
>
> Key: HBASE-14463
> URL: https://issues.apache.org/jira/browse/HBASE-14463
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.98.14, 1.1.2
>Reporter: Yu Li
>Assignee: Yu Li
> Fix For: 2.0.0, 1.2.0, 1.3.0, 0.98.16
>
> Attachments: GC_with_WeakObjectPool.png, HBASE-14463.patch, 
> HBASE-14463_v11.patch, HBASE-14463_v12.patch, HBASE-14463_v2.patch, 
> HBASE-14463_v3.patch, HBASE-14463_v4.patch, HBASE-14463_v5.patch, 
> TestBucketCache-new_with_IdLock.png, 
> TestBucketCache-new_with_IdReadWriteLock.png, 
> TestBucketCache_with_IdLock-latest.png, TestBucketCache_with_IdLock.png, 
> TestBucketCache_with_IdReadWriteLock-latest.png, 
> TestBucketCache_with_IdReadWriteLock-resolveLockLeak.png, 
> TestBucketCache_with_IdReadWriteLock.png
>
>
> We store feature data of online items in HBase, do machine learning on these 
> features, and supply the outputs to our online search engine. In such 
> scenario we will launch hundreds of yarn workers and each worker will read 
> all features of one item(i.e. single rowkey in HBase), so there'll be heavy 
> parallel reading on a single rowkey.
> We were using LruCache but start to try BucketCache recently to resolve gc 
> issue, and just as titled we have observed severe performance downgrade. 
> After some analytics we found the root cause is the lock in 
> BucketCache#getBlock, as shown below
> {code}
>   try {
> lockEntry = offsetLock.getLockEntry(bucketEntry.offset());
> // ...
> if (bucketEntry.equals(backingMap.get(key))) {
>   // ...
>   int len = bucketEntry.getLength();
>   Cacheable cachedBlock = ioEngine.read(bucketEntry.offset(), len,
>   bucketEntry.deserializerReference(this.deserialiserMap));
> {code}
> Since ioEnging.read involves array copy, it's much more time-costed than the 
> operation in LruCache. And since we're using synchronized in 
> IdLock#getLockEntry, parallel read dropping on the same bucket would be 
> executed in serial, which causes a really bad performance.
> To resolve the problem, we propose to use ReentranceReadWriteLock in 
> BucketCache, and introduce a new class called IdReadWriteLock to implement it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14628) [0.98] Save object creation for scanning with block encodings

2015-10-19 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14964499#comment-14964499
 ] 

Lars Hofhansl commented on HBASE-14628:
---

Tested the perf improvement again. Found it to be between 6% and 10% really.
Since the ByteBuffer is created for every KV read (regardless of whether it is 
returned or even used in a filter) we save a _lot_ of objects during scans.

Would like to commit this. Can I get a +1 for v2?

> [0.98] Save object creation for scanning with block encodings
> -
>
> Key: HBASE-14628
> URL: https://issues.apache.org/jira/browse/HBASE-14628
> Project: HBase
>  Issue Type: Bug
>Reporter: Lars Hofhansl
>Assignee: Lars Hofhansl
> Fix For: 0.98.16
>
> Attachments: 14628-0.98-v2.txt, 14628-0.98.txt
>
>
> I noticed that (at least in 0.98 - master is entirely different) we create 
> ByteBuffer just to create a byte[], which is then used to create a KeyValue.
> We can save the creation of the ByteBuffer and hence save allocating an extra 
> object for each KV we find by creating the byte[] directly.
> In a Phoenix count\(*) query that saved from 10% of runtime.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14628) Save object creation for scanning with block encodings

2015-10-18 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14962692#comment-14962692
 ] 

Lars Hofhansl commented on HBASE-14628:
---

Thanks will commit later tonight.


> Save object creation for scanning with block encodings
> --
>
> Key: HBASE-14628
> URL: https://issues.apache.org/jira/browse/HBASE-14628
> Project: HBase
>  Issue Type: Bug
>Reporter: Lars Hofhansl
> Attachments: 14628-0.98.txt
>
>
> I noticed that (at least in 0.98 - master is entirely different) we create 
> ByteBuffer just to create a byte[], which is then used to create a KeyValue.
> We can save the creation of the ByteBuffer and hence save allocating an extra 
> object for each KV we find by creating the byte[] directly.
> In a Phoenix count\(*) query that saved from 10% of runtime.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-14628) [0.98] Save object creation for scanning with block encodings

2015-10-18 Thread Lars Hofhansl (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14628?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl updated HBASE-14628:
--
Attachment: 14628-0.98-v2.txt

Patch needed a fix to the EncodedSeeker interface as well (since I wanted to 
remove the unneeded getKeyValueBuffer method - I had already fixed all tests 
not to use it any more).

Will do a few more tests and then commit if everything looks good.

The part of removing getKeyValueBuffer should be forward ported to 1.0+ I think.

> [0.98] Save object creation for scanning with block encodings
> -
>
> Key: HBASE-14628
> URL: https://issues.apache.org/jira/browse/HBASE-14628
> Project: HBase
>  Issue Type: Bug
>Reporter: Lars Hofhansl
> Fix For: 0.98.16
>
> Attachments: 14628-0.98-v2.txt, 14628-0.98.txt
>
>
> I noticed that (at least in 0.98 - master is entirely different) we create 
> ByteBuffer just to create a byte[], which is then used to create a KeyValue.
> We can save the creation of the ByteBuffer and hence save allocating an extra 
> object for each KV we find by creating the byte[] directly.
> In a Phoenix count\(*) query that saved from 10% of runtime.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-14628) [0.98] Save object creation for scanning with block encodings

2015-10-18 Thread Lars Hofhansl (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14628?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl updated HBASE-14628:
--
Fix Version/s: 0.98.16

> [0.98] Save object creation for scanning with block encodings
> -
>
> Key: HBASE-14628
> URL: https://issues.apache.org/jira/browse/HBASE-14628
> Project: HBase
>  Issue Type: Bug
>Reporter: Lars Hofhansl
> Fix For: 0.98.16
>
> Attachments: 14628-0.98.txt
>
>
> I noticed that (at least in 0.98 - master is entirely different) we create 
> ByteBuffer just to create a byte[], which is then used to create a KeyValue.
> We can save the creation of the ByteBuffer and hence save allocating an extra 
> object for each KV we find by creating the byte[] directly.
> In a Phoenix count\(*) query that saved from 10% of runtime.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-14628) [0.98] Save object creation for scanning with block encodings

2015-10-18 Thread Lars Hofhansl (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14628?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl updated HBASE-14628:
--
Summary: [0.98] Save object creation for scanning with block encodings  
(was: Save object creation for scanning with block encodings)

> [0.98] Save object creation for scanning with block encodings
> -
>
> Key: HBASE-14628
> URL: https://issues.apache.org/jira/browse/HBASE-14628
> Project: HBase
>  Issue Type: Bug
>Reporter: Lars Hofhansl
> Fix For: 0.98.16
>
> Attachments: 14628-0.98.txt
>
>
> I noticed that (at least in 0.98 - master is entirely different) we create 
> ByteBuffer just to create a byte[], which is then used to create a KeyValue.
> We can save the creation of the ByteBuffer and hence save allocating an extra 
> object for each KV we find by creating the byte[] directly.
> In a Phoenix count\(*) query that saved from 10% of runtime.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-14628) [0.98] Save object creation for scanning with block encodings

2015-10-18 Thread Lars Hofhansl (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14628?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl updated HBASE-14628:
--
Assignee: Lars Hofhansl
  Status: Patch Available  (was: Open)

> [0.98] Save object creation for scanning with block encodings
> -
>
> Key: HBASE-14628
> URL: https://issues.apache.org/jira/browse/HBASE-14628
> Project: HBase
>  Issue Type: Bug
>Reporter: Lars Hofhansl
>Assignee: Lars Hofhansl
> Fix For: 0.98.16
>
> Attachments: 14628-0.98-v2.txt, 14628-0.98.txt
>
>
> I noticed that (at least in 0.98 - master is entirely different) we create 
> ByteBuffer just to create a byte[], which is then used to create a KeyValue.
> We can save the creation of the ByteBuffer and hence save allocating an extra 
> object for each KV we find by creating the byte[] directly.
> In a Phoenix count\(*) query that saved from 10% of runtime.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14628) [0.98] Save object creation for scanning with block encodings

2015-10-18 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14962844#comment-14962844
 ] 

Lars Hofhansl commented on HBASE-14628:
---

The DataBlockEncoder interface is marked private, so I should be able to simply 
remove the method from the DataBlockEncoder.EncodedSeeker interface. Any 
concerns with that? If so, I can add the method and its implementations back 
(but I'd rather not).

> [0.98] Save object creation for scanning with block encodings
> -
>
> Key: HBASE-14628
> URL: https://issues.apache.org/jira/browse/HBASE-14628
> Project: HBase
>  Issue Type: Bug
>Reporter: Lars Hofhansl
>Assignee: Lars Hofhansl
> Fix For: 0.98.16
>
> Attachments: 14628-0.98-v2.txt, 14628-0.98.txt
>
>
> I noticed that (at least in 0.98 - master is entirely different) we create 
> ByteBuffer just to create a byte[], which is then used to create a KeyValue.
> We can save the creation of the ByteBuffer and hence save allocating an extra 
> object for each KV we find by creating the byte[] directly.
> In a Phoenix count\(*) query that saved from 10% of runtime.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14628) Save object creation for scanning with block encodings

2015-10-17 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14962162#comment-14962162
 ] 

Lars Hofhansl commented on HBASE-14628:
---

Yep, only useful in 0.98. Worth committing? 1.0 and later do a shallow copy of 
the seeker state... I could try to backport that, but 0.98 might not be ready 
for this in other aspects.

> Save object creation for scanning with block encodings
> --
>
> Key: HBASE-14628
> URL: https://issues.apache.org/jira/browse/HBASE-14628
> Project: HBase
>  Issue Type: Bug
>Reporter: Lars Hofhansl
> Attachments: 14628-0.98.txt
>
>
> I noticed that (at least in 0.98 - master is entirely different) we create 
> ByteBuffer just to create a byte[], which is then used to create a KeyValue.
> We can save the creation of the ByteBuffer and hence save allocating an extra 
> object for each KV we find by creating the byte[] directly.
> In a Phoenix count\(*) query that saved from 10% of runtime.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HBASE-14628) Save object creation for scanning with FAST_DIFF encoding

2015-10-16 Thread Lars Hofhansl (JIRA)
Lars Hofhansl created HBASE-14628:
-

 Summary: Save object creation for scanning with FAST_DIFF encoding
 Key: HBASE-14628
 URL: https://issues.apache.org/jira/browse/HBASE-14628
 Project: HBase
  Issue Type: Bug
Reporter: Lars Hofhansl


I noticed that (at least in 0.98 - master is entirely different) we create 
ByteBuffer just to create a byte[], which is then used to create a KeyValue.

We can save the creation of the ByteBuffer and hence save allocating an extra 
object for each KV we find by creating the byte[] directly.

In a Phoenix count\(*) query that saved from 10% of runtime.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14628) Save object creation for scanning with FAST_DIFF encoding

2015-10-16 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14961732#comment-14961732
 ] 

Lars Hofhansl commented on HBASE-14628:
---

[~anoop.hbase]
Master still creates a ByteBuffer on the way before it creates the Cell. For 
offheap that's unavoidable, but it still does for onheap if I am not mistaken. 
Lemme take another look, I agree it looks better there.

[~stack] I have a patch, but it fails one of the interesting tests, will post 
as soon as I have that fixed.

> Save object creation for scanning with FAST_DIFF encoding
> -
>
> Key: HBASE-14628
> URL: https://issues.apache.org/jira/browse/HBASE-14628
> Project: HBase
>  Issue Type: Bug
>Reporter: Lars Hofhansl
>
> I noticed that (at least in 0.98 - master is entirely different) we create 
> ByteBuffer just to create a byte[], which is then used to create a KeyValue.
> We can save the creation of the ByteBuffer and hence save allocating an extra 
> object for each KV we find by creating the byte[] directly.
> In a Phoenix count\(*) query that saved from 10% of runtime.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-14628) Save object creation for scanning with FAST_DIFF encoding

2015-10-16 Thread Lars Hofhansl (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14628?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl updated HBASE-14628:
--
Attachment: 14628-0.98.txt

Found the problem. Here's a 0.98 patch.
The actual macro win is 5% (not the 10% I promised above).

Readability is roughly equal, I feel. Have a look and let me whether it's worth 
the change.

> Save object creation for scanning with FAST_DIFF encoding
> -
>
> Key: HBASE-14628
> URL: https://issues.apache.org/jira/browse/HBASE-14628
> Project: HBase
>  Issue Type: Bug
>Reporter: Lars Hofhansl
> Attachments: 14628-0.98.txt
>
>
> I noticed that (at least in 0.98 - master is entirely different) we create 
> ByteBuffer just to create a byte[], which is then used to create a KeyValue.
> We can save the creation of the ByteBuffer and hence save allocating an extra 
> object for each KV we find by creating the byte[] directly.
> In a Phoenix count\(*) query that saved from 10% of runtime.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-14628) Save object creation for scanning with block encodings

2015-10-16 Thread Lars Hofhansl (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14628?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl updated HBASE-14628:
--
Summary: Save object creation for scanning with block encodings  (was: Save 
object creation for scanning with FAST_DIFF encoding)

> Save object creation for scanning with block encodings
> --
>
> Key: HBASE-14628
> URL: https://issues.apache.org/jira/browse/HBASE-14628
> Project: HBase
>  Issue Type: Bug
>Reporter: Lars Hofhansl
> Attachments: 14628-0.98.txt
>
>
> I noticed that (at least in 0.98 - master is entirely different) we create 
> ByteBuffer just to create a byte[], which is then used to create a KeyValue.
> We can save the creation of the ByteBuffer and hence save allocating an extra 
> object for each KV we find by creating the byte[] directly.
> In a Phoenix count\(*) query that saved from 10% of runtime.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14613) Remove MemStoreChunkPool?

2015-10-15 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14958401#comment-14958401
 ] 

Lars Hofhansl commented on HBASE-14613:
---

More data is always good. :)
I doubt it'll make a difference, but if it does I learned something new.

> Remove MemStoreChunkPool?
> -
>
> Key: HBASE-14613
> URL: https://issues.apache.org/jira/browse/HBASE-14613
> Project: HBase
>  Issue Type: Brainstorming
>Reporter: Lars Hofhansl
>Priority: Minor
> Attachments: 14613-0.98.txt
>
>
> I just stumbled across MemStoreChunkPool. The idea behind is to reuse chunks 
> of allocations rather than letting the GC handle this.
> Now, it's off by default, and it seems to me to be of dubious value. I'd 
> recommend just removing it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14613) Remove MemStoreChunkPool?

2015-10-15 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14960206#comment-14960206
 ] 

Lars Hofhansl commented on HBASE-14613:
---

I am always dubious about trying to be smarter than the GC. Extra management, 
the requirement to get the number of chunks right to be useful but not 
wasteful, etc.


> Remove MemStoreChunkPool?
> -
>
> Key: HBASE-14613
> URL: https://issues.apache.org/jira/browse/HBASE-14613
> Project: HBase
>  Issue Type: Brainstorming
>Reporter: Lars Hofhansl
>Priority: Minor
> Attachments: 14613-0.98.txt
>
>
> I just stumbled across MemStoreChunkPool. The idea behind is to reuse chunks 
> of allocations rather than letting the GC handle this.
> Now, it's off by default, and it seems to me to be of dubious value. I'd 
> recommend just removing it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-14613) Remove MemStoreChunkPool?

2015-10-15 Thread Lars Hofhansl (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14613?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl updated HBASE-14613:
--
Attachment: 14613-0.98.txt

Trivial patch. Simply remove the chunk pool and all its uses. Also allow a 
slight clean up of MemStoreLAB, as we no longer need to reference count chunks 
for how many scanner use 'em.

> Remove MemStoreChunkPool?
> -
>
> Key: HBASE-14613
> URL: https://issues.apache.org/jira/browse/HBASE-14613
> Project: HBase
>  Issue Type: Brainstorming
>Reporter: Lars Hofhansl
>Priority: Minor
> Attachments: 14613-0.98.txt
>
>
> I just stumbled across MemStoreChunkPool. The idea behind is to reuse chunks 
> of allocations rather than letting the GC handle this.
> Now, it's off by default, and it seems to me to be of dubious value. I'd 
> recommend just removing it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HBASE-14613) Remove MemStoreChunkPool?

2015-10-14 Thread Lars Hofhansl (JIRA)
Lars Hofhansl created HBASE-14613:
-

 Summary: Remove MemStoreChunkPool?
 Key: HBASE-14613
 URL: https://issues.apache.org/jira/browse/HBASE-14613
 Project: HBase
  Issue Type: Brainstorming
Reporter: Lars Hofhansl
Priority: Minor


I just stumbled across MemStoreChunkPool. The idea behind is to reuse chunks of 
allocations rather than letting the GC handle this.

Now, it's off by default, and it seems to me to be of dubious value. I'd 
recommend just removing it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14283) Reverse scan doesn’t work with HFile inline index/bloom blocks

2015-10-13 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14956254#comment-14956254
 ] 

Lars Hofhansl commented on HBASE-14283:
---

Am I understanding correctly that we already incur two reads now, even when 
we're scanning forward? If so, that seems unfortunate.

> Reverse scan doesn’t work with HFile inline index/bloom blocks
> --
>
> Key: HBASE-14283
> URL: https://issues.apache.org/jira/browse/HBASE-14283
> Project: HBase
>  Issue Type: Bug
>Reporter: Ben Lau
>Assignee: Ben Lau
> Attachments: HBASE-14283-0.98.patch, HBASE-14283-branch-1.0.patch, 
> HBASE-14283-branch-1.1.patch, HBASE-14283-branch-1.2.patch, 
> HBASE-14283-branch-1.patch, HBASE-14283-master.patch, HBASE-14283-v2.patch, 
> HBASE-14283.patch, hfile-seek-before.patch
>
>
> Reverse scans do not work if an HFile contains inline bloom blocks or leaf 
> level index blocks.  The reason is because the seekBefore() call calculates 
> the previous data block’s size by assuming data blocks are contiguous which 
> is not the case in HFile V2 and beyond.
> Attached is a first cut patch (targeting 
> bcef28eefaf192b0ad48c8011f98b8e944340da5 on trunk) which includes:
> (1) a unit test which exposes the bug and demonstrates failures for both 
> inline bloom blocks and inline index blocks
> (2) a proposed fix for inline index blocks that does not require a new HFile 
> version change, but is only performant for 1 and 2-level indexes and not 3+.  
> 3+ requires an HFile format update for optimal performance.
> This patch does not fix the bloom filter blocks bug.  But the fix should be 
> similar to the case of inline index blocks.  The reason I haven’t made the 
> change yet is I want to confirm that you guys would be fine with me revising 
> the HFile.Reader interface.
> Specifically, these 2 functions (getGeneralBloomFilterMetadata and 
> getDeleteBloomFilterMetadata) need to return the BloomFilter.  Right now the 
> HFileReader class doesn’t have a reference to the bloom filters (and hence 
> their indices) and only constructs the IO streams and hence has no way to 
> know where the bloom blocks are in the HFile.  It seems that the HFile.Reader 
> bloom method comments state that they “know nothing about how that metadata 
> is structured” but I do not know if that is a requirement of the abstraction 
> (why?) or just an incidental current property. 
> We would like to do 3 things with community approval:
> (1) Update the HFile.Reader interface and implementation to contain and 
> return BloomFilters directly rather than unstructured IO streams
> (2) Merge the fixes for index blocks and bloom blocks into open source
> (3) Create a new Jira ticket for open source HBase to add a ‘prevBlockSize’ 
> field in the block header in the next HFile version, so that seekBefore() 
> calls can not only be correct but performant in all cases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (HBASE-14283) Reverse scan doesn’t work with HFile inline index/bloom blocks

2015-10-13 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14956254#comment-14956254
 ] 

Lars Hofhansl edited comment on HBASE-14283 at 10/14/15 4:45 AM:
-

Am I understanding correctly that we always incur two reads now, even when 
we're scanning forward? If so, that seems unfortunate.


was (Author: lhofhansl):
Am I understanding correctly that we already incur two reads now, even when 
we're scanning forward? If so, that seems unfortunate.

> Reverse scan doesn’t work with HFile inline index/bloom blocks
> --
>
> Key: HBASE-14283
> URL: https://issues.apache.org/jira/browse/HBASE-14283
> Project: HBase
>  Issue Type: Bug
>Reporter: Ben Lau
>Assignee: Ben Lau
> Attachments: HBASE-14283-0.98.patch, HBASE-14283-branch-1.0.patch, 
> HBASE-14283-branch-1.1.patch, HBASE-14283-branch-1.2.patch, 
> HBASE-14283-branch-1.patch, HBASE-14283-master.patch, HBASE-14283-v2.patch, 
> HBASE-14283.patch, hfile-seek-before.patch
>
>
> Reverse scans do not work if an HFile contains inline bloom blocks or leaf 
> level index blocks.  The reason is because the seekBefore() call calculates 
> the previous data block’s size by assuming data blocks are contiguous which 
> is not the case in HFile V2 and beyond.
> Attached is a first cut patch (targeting 
> bcef28eefaf192b0ad48c8011f98b8e944340da5 on trunk) which includes:
> (1) a unit test which exposes the bug and demonstrates failures for both 
> inline bloom blocks and inline index blocks
> (2) a proposed fix for inline index blocks that does not require a new HFile 
> version change, but is only performant for 1 and 2-level indexes and not 3+.  
> 3+ requires an HFile format update for optimal performance.
> This patch does not fix the bloom filter blocks bug.  But the fix should be 
> similar to the case of inline index blocks.  The reason I haven’t made the 
> change yet is I want to confirm that you guys would be fine with me revising 
> the HFile.Reader interface.
> Specifically, these 2 functions (getGeneralBloomFilterMetadata and 
> getDeleteBloomFilterMetadata) need to return the BloomFilter.  Right now the 
> HFileReader class doesn’t have a reference to the bloom filters (and hence 
> their indices) and only constructs the IO streams and hence has no way to 
> know where the bloom blocks are in the HFile.  It seems that the HFile.Reader 
> bloom method comments state that they “know nothing about how that metadata 
> is structured” but I do not know if that is a requirement of the abstraction 
> (why?) or just an incidental current property. 
> We would like to do 3 things with community approval:
> (1) Update the HFile.Reader interface and implementation to contain and 
> return BloomFilters directly rather than unstructured IO streams
> (2) Merge the fixes for index blocks and bloom blocks into open source
> (3) Create a new Jira ticket for open source HBase to add a ‘prevBlockSize’ 
> field in the block header in the next HFile version, so that seekBefore() 
> calls can not only be correct but performant in all cases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (HBASE-14221) Reduce the number of time row comparison is done in a Scan

2015-10-13 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14956224#comment-14956224
 ] 

Lars Hofhansl edited comment on HBASE-14221 at 10/14/15 4:18 AM:
-

I think [~mcorgan]'s KeyValueScannerHeap is worth exploring still (see later on 
that jira). It beats PriorityQueue in every test, and since it is our 
implementation we can further tweak it down the road. Matt's MIA unfortunately, 
but I plan to test some more with it. 

(And I have some awesome database guys sitting less than 30 feet form me, and 
they come up with a striking similar scanner approach for their LSM based 
database)



was (Author: lhofhansl):
I think [~mcorgan] KeyValueScannerHeap is worth exploring still (see later on 
that jira). It's beats PriorityQueue in every test, and since it is our 
implementation we can further tweak it down the road. Matt's MIA unfortunately, 
but I plan to test some more with it. 

(And I have some awesome database guys sitting less than 30 feet form me, and 
they come up with a striking similar scanner approach for their LSM based 
database)


> Reduce the number of time row comparison is done in a Scan
> --
>
> Key: HBASE-14221
> URL: https://issues.apache.org/jira/browse/HBASE-14221
> Project: HBase
>  Issue Type: Sub-task
>  Components: Scanners
>Reporter: ramkrishna.s.vasudevan
>Assignee: ramkrishna.s.vasudevan
> Fix For: 2.0.0
>
> Attachments: 14221-0.98-takeALook.txt, HBASE-14221.patch, 
> HBASE-14221_1.patch, HBASE-14221_1.patch, HBASE-14221_6.patch, 
> withmatchingRowspatch.png, withoutmatchingRowspatch.png
>
>
> When we tried to do some profiling with the PE tool found this.
> Currently we do row comparisons in 3 places in a simple Scan case.
> 1) ScanQueryMatcher
> {code}
>int ret = this.rowComparator.compareRows(curCell, cell);
> if (!this.isReversed) {
>   if (ret <= -1) {
> return MatchCode.DONE;
>   } else if (ret >= 1) {
> // could optimize this, if necessary?
> // Could also be called SEEK_TO_CURRENT_ROW, but this
> // should be rare/never happens.
> return MatchCode.SEEK_NEXT_ROW;
>   }
> } else {
>   if (ret <= -1) {
> return MatchCode.SEEK_NEXT_ROW;
>   } else if (ret >= 1) {
> return MatchCode.DONE;
>   }
> }
> {code}
> 2) In StoreScanner next() while starting to scan the row
> {code}
> if (!scannerContext.hasAnyLimit(LimitScope.BETWEEN_CELLS) || 
> matcher.curCell == null ||
> isNewRow || !CellUtil.matchingRow(peeked, matcher.curCell)) {
>   this.countPerRow = 0;
>   matcher.setToNewRow(peeked);
> }
> {code}
> Particularly to see if we are in a new row.
> 3) In HRegion
> {code}
>   scannerContext.setKeepProgress(true);
>   heap.next(results, scannerContext);
>   scannerContext.setKeepProgress(tmpKeepProgress);
>   nextKv = heap.peek();
> moreCellsInRow = moreCellsInRow(nextKv, currentRowCell);
> {code}
> Here again there are cases where we need to careful for a MultiCF case.  Was 
> trying to solve this for the MultiCF case but is having lot of cases to 
> solve. But atleast for a single CF case I think these comparison can be 
> reduced.
> So for a single CF case in the SQM we are able to find if we have crossed a 
> row using the code pasted above in SQM. That comparison is definitely needed.
> Now in case of a single CF the HRegion is going to have only one element in 
> the heap and so the 3rd comparison can surely be avoided if the 
> StoreScanner.next() was over due to MatchCode.DONE caused by SQM.
> Coming to the 2nd compareRows that we do in StoreScanner. next() - even that 
> can be avoided if we know that the previous next() call was over due to a new 
> row. Doing all this I found that the compareRows in the profiler which was 
> 19% got reduced to 13%. Initially we can solve for single CF case which can 
> be extended to MultiCF cases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14221) Reduce the number of time row comparison is done in a Scan

2015-10-13 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14956224#comment-14956224
 ] 

Lars Hofhansl commented on HBASE-14221:
---

I think [~mcorgan] KeyValueScannerHeap is worth exploring still (see later on 
that jira). It's beats PriorityQueue in every test, and since it is our 
implementation we can further tweak it down the road. Matt's MIA unfortunately, 
but I plan to test some more with it. 

(And I have some awesome database guys sitting less than 30 feet form me, and 
they come up with a striking similar scanner approach for their LSM based 
database)


> Reduce the number of time row comparison is done in a Scan
> --
>
> Key: HBASE-14221
> URL: https://issues.apache.org/jira/browse/HBASE-14221
> Project: HBase
>  Issue Type: Sub-task
>  Components: Scanners
>Reporter: ramkrishna.s.vasudevan
>Assignee: ramkrishna.s.vasudevan
> Fix For: 2.0.0
>
> Attachments: 14221-0.98-takeALook.txt, HBASE-14221.patch, 
> HBASE-14221_1.patch, HBASE-14221_1.patch, HBASE-14221_6.patch, 
> withmatchingRowspatch.png, withoutmatchingRowspatch.png
>
>
> When we tried to do some profiling with the PE tool found this.
> Currently we do row comparisons in 3 places in a simple Scan case.
> 1) ScanQueryMatcher
> {code}
>int ret = this.rowComparator.compareRows(curCell, cell);
> if (!this.isReversed) {
>   if (ret <= -1) {
> return MatchCode.DONE;
>   } else if (ret >= 1) {
> // could optimize this, if necessary?
> // Could also be called SEEK_TO_CURRENT_ROW, but this
> // should be rare/never happens.
> return MatchCode.SEEK_NEXT_ROW;
>   }
> } else {
>   if (ret <= -1) {
> return MatchCode.SEEK_NEXT_ROW;
>   } else if (ret >= 1) {
> return MatchCode.DONE;
>   }
> }
> {code}
> 2) In StoreScanner next() while starting to scan the row
> {code}
> if (!scannerContext.hasAnyLimit(LimitScope.BETWEEN_CELLS) || 
> matcher.curCell == null ||
> isNewRow || !CellUtil.matchingRow(peeked, matcher.curCell)) {
>   this.countPerRow = 0;
>   matcher.setToNewRow(peeked);
> }
> {code}
> Particularly to see if we are in a new row.
> 3) In HRegion
> {code}
>   scannerContext.setKeepProgress(true);
>   heap.next(results, scannerContext);
>   scannerContext.setKeepProgress(tmpKeepProgress);
>   nextKv = heap.peek();
> moreCellsInRow = moreCellsInRow(nextKv, currentRowCell);
> {code}
> Here again there are cases where we need to careful for a MultiCF case.  Was 
> trying to solve this for the MultiCF case but is having lot of cases to 
> solve. But atleast for a single CF case I think these comparison can be 
> reduced.
> So for a single CF case in the SQM we are able to find if we have crossed a 
> row using the code pasted above in SQM. That comparison is definitely needed.
> Now in case of a single CF the HRegion is going to have only one element in 
> the heap and so the 3rd comparison can surely be avoided if the 
> StoreScanner.next() was over due to MatchCode.DONE caused by SQM.
> Coming to the 2nd compareRows that we do in StoreScanner. next() - even that 
> can be avoided if we know that the previous next() call was over due to a new 
> row. Doing all this I found that the compareRows in the profiler which was 
> 19% got reduced to 13%. Initially we can solve for single CF case which can 
> be extended to MultiCF cases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14221) Reduce the number of time row comparison is done in a Scan

2015-10-12 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14954366#comment-14954366
 ] 

Lars Hofhansl commented on HBASE-14221:
---

Agreed... I do think the most compares happen in the heaps (PriorityQueue) 
we're using, though.

> Reduce the number of time row comparison is done in a Scan
> --
>
> Key: HBASE-14221
> URL: https://issues.apache.org/jira/browse/HBASE-14221
> Project: HBase
>  Issue Type: Sub-task
>  Components: Scanners
>Reporter: ramkrishna.s.vasudevan
>Assignee: ramkrishna.s.vasudevan
> Fix For: 2.0.0
>
> Attachments: 14221-0.98-takeALook.txt, HBASE-14221.patch, 
> HBASE-14221_1.patch, HBASE-14221_1.patch, HBASE-14221_6.patch, 
> withmatchingRowspatch.png, withoutmatchingRowspatch.png
>
>
> When we tried to do some profiling with the PE tool found this.
> Currently we do row comparisons in 3 places in a simple Scan case.
> 1) ScanQueryMatcher
> {code}
>int ret = this.rowComparator.compareRows(curCell, cell);
> if (!this.isReversed) {
>   if (ret <= -1) {
> return MatchCode.DONE;
>   } else if (ret >= 1) {
> // could optimize this, if necessary?
> // Could also be called SEEK_TO_CURRENT_ROW, but this
> // should be rare/never happens.
> return MatchCode.SEEK_NEXT_ROW;
>   }
> } else {
>   if (ret <= -1) {
> return MatchCode.SEEK_NEXT_ROW;
>   } else if (ret >= 1) {
> return MatchCode.DONE;
>   }
> }
> {code}
> 2) In StoreScanner next() while starting to scan the row
> {code}
> if (!scannerContext.hasAnyLimit(LimitScope.BETWEEN_CELLS) || 
> matcher.curCell == null ||
> isNewRow || !CellUtil.matchingRow(peeked, matcher.curCell)) {
>   this.countPerRow = 0;
>   matcher.setToNewRow(peeked);
> }
> {code}
> Particularly to see if we are in a new row.
> 3) In HRegion
> {code}
>   scannerContext.setKeepProgress(true);
>   heap.next(results, scannerContext);
>   scannerContext.setKeepProgress(tmpKeepProgress);
>   nextKv = heap.peek();
> moreCellsInRow = moreCellsInRow(nextKv, currentRowCell);
> {code}
> Here again there are cases where we need to careful for a MultiCF case.  Was 
> trying to solve this for the MultiCF case but is having lot of cases to 
> solve. But atleast for a single CF case I think these comparison can be 
> reduced.
> So for a single CF case in the SQM we are able to find if we have crossed a 
> row using the code pasted above in SQM. That comparison is definitely needed.
> Now in case of a single CF the HRegion is going to have only one element in 
> the heap and so the 3rd comparison can surely be avoided if the 
> StoreScanner.next() was over due to MatchCode.DONE caused by SQM.
> Coming to the 2nd compareRows that we do in StoreScanner. next() - even that 
> can be avoided if we know that the previous next() call was over due to a new 
> row. Doing all this I found that the compareRows in the profiler which was 
> 19% got reduced to 13%. Initially we can solve for single CF case which can 
> be extended to MultiCF cases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14221) Reduce the number of time row comparison is done in a Scan

2015-10-12 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14954370#comment-14954370
 ] 

Lars Hofhansl commented on HBASE-14221:
---

Might be time to look at HBASE-9969 again.

> Reduce the number of time row comparison is done in a Scan
> --
>
> Key: HBASE-14221
> URL: https://issues.apache.org/jira/browse/HBASE-14221
> Project: HBase
>  Issue Type: Sub-task
>  Components: Scanners
>Reporter: ramkrishna.s.vasudevan
>Assignee: ramkrishna.s.vasudevan
> Fix For: 2.0.0
>
> Attachments: 14221-0.98-takeALook.txt, HBASE-14221.patch, 
> HBASE-14221_1.patch, HBASE-14221_1.patch, HBASE-14221_6.patch, 
> withmatchingRowspatch.png, withoutmatchingRowspatch.png
>
>
> When we tried to do some profiling with the PE tool found this.
> Currently we do row comparisons in 3 places in a simple Scan case.
> 1) ScanQueryMatcher
> {code}
>int ret = this.rowComparator.compareRows(curCell, cell);
> if (!this.isReversed) {
>   if (ret <= -1) {
> return MatchCode.DONE;
>   } else if (ret >= 1) {
> // could optimize this, if necessary?
> // Could also be called SEEK_TO_CURRENT_ROW, but this
> // should be rare/never happens.
> return MatchCode.SEEK_NEXT_ROW;
>   }
> } else {
>   if (ret <= -1) {
> return MatchCode.SEEK_NEXT_ROW;
>   } else if (ret >= 1) {
> return MatchCode.DONE;
>   }
> }
> {code}
> 2) In StoreScanner next() while starting to scan the row
> {code}
> if (!scannerContext.hasAnyLimit(LimitScope.BETWEEN_CELLS) || 
> matcher.curCell == null ||
> isNewRow || !CellUtil.matchingRow(peeked, matcher.curCell)) {
>   this.countPerRow = 0;
>   matcher.setToNewRow(peeked);
> }
> {code}
> Particularly to see if we are in a new row.
> 3) In HRegion
> {code}
>   scannerContext.setKeepProgress(true);
>   heap.next(results, scannerContext);
>   scannerContext.setKeepProgress(tmpKeepProgress);
>   nextKv = heap.peek();
> moreCellsInRow = moreCellsInRow(nextKv, currentRowCell);
> {code}
> Here again there are cases where we need to careful for a MultiCF case.  Was 
> trying to solve this for the MultiCF case but is having lot of cases to 
> solve. But atleast for a single CF case I think these comparison can be 
> reduced.
> So for a single CF case in the SQM we are able to find if we have crossed a 
> row using the code pasted above in SQM. That comparison is definitely needed.
> Now in case of a single CF the HRegion is going to have only one element in 
> the heap and so the 3rd comparison can surely be avoided if the 
> StoreScanner.next() was over due to MatchCode.DONE caused by SQM.
> Coming to the 2nd compareRows that we do in StoreScanner. next() - even that 
> can be avoided if we know that the previous next() call was over due to a new 
> row. Doing all this I found that the compareRows in the profiler which was 
> 19% got reduced to 13%. Initially we can solve for single CF case which can 
> be extended to MultiCF cases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-7755) Experiment with LAB in BlockEndcoding

2015-10-11 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14952406#comment-14952406
 ] 

Lars Hofhansl commented on HBASE-7755:
--

Just thought about this again. It's interesting that when I cannot find a good 
slab size here, how come we can find a good slab size for the memstore? I want 
to think about this again.

> Experiment with LAB in BlockEndcoding
> -
>
> Key: HBASE-7755
> URL: https://issues.apache.org/jira/browse/HBASE-7755
> Project: HBase
>  Issue Type: Bug
>Reporter: Lars Hofhansl
> Attachments: 7755-0.94-WORK_IN_PROGRESS.txt, 7755-0.94-W_I_P_v1.txt
>
>
> I was looking at and profiling the BlockEncoding code to figure out how to 
> make it faster. One issue that jumped out was we call 
> ByteBuffer.allocate(...) for each single KV.
> As an experiment I tried using the MemStoreLAB code to allocate those buffers.
> Here are some preliminary numbers, all scanning 10m rows (all in cache):
> * no encoding: 5.2s
> * FAST_DIFF without patch: 7.3s
> * FAST_DIFF with patch and small LAB: 4.1s
> * FAST_DIFF with patch and large LAB: 11s
> So this is very sensitive to the right sizing of the LAB.
> Need to do a bit more testing, but it seems that there is a chance to 
> actually make scanning with block encoding faster than without!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14549) Simplify scanner stack reset logic

2015-10-10 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14951645#comment-14951645
 ] 

Lars Hofhansl commented on HBASE-14549:
---

OK... I get it now. In KeyValueHeap we always have the top scanner removed from 
the heap and store in this.current.
In the case of a scanner stack reset, which used to be the top scanner may no 
longer be and we would have to put it back on the heap and pull a new top 
scanner out, but we have no way of knowing that.
I.e. the top StoreScanner in the RegionScanner's KeyValueHeap might have 
changed without notice.

This all looks a bit brittle. What if _other_ StoreScanners (other than the top 
scanner) change their peek-element? We'd have to remove and re-add to the 
RegionScanner's heap as well... I think. Seems the correct way would be to 
reset the RegionScanner stack whenever any of the stores have been compacted. 
That would also naturally coarsen the lock to RegionScanner, but the 
RegionScanner may not have all the information to reset all StoreScanners. 
Tricky...


> Simplify scanner stack reset logic
> --
>
> Key: HBASE-14549
> URL: https://issues.apache.org/jira/browse/HBASE-14549
> Project: HBase
>  Issue Type: Bug
>Reporter: Lars Hofhansl
>Assignee: Lars Hofhansl
> Attachments: 14549-0.98.txt
>
>
> Looking at the code, I find that the logic is unnecessarily complex.
> We indicate in updateReaders that the scanner stack needs to be reset. Then 
> almost all store scanner (and derived classes) methods need to check and 
> actually reset the scanner stack.
> Compaction are rare, we should reset the scanner stack in update readers, and 
> hence avoid needing to check in all methods.
> Patch forthcoming.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-14221) Reduce the number of time row comparison is done in a Scan

2015-10-07 Thread Lars Hofhansl (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl updated HBASE-14221:
--
Attachment: 14221-0.98-takeALook.txt

[~ram_krish], take a look at the "-takeALook" sample. That's what I mean.
I let the SQM decide when a new row is found (it's better encapsulation, and 
it's doing the comparison there anyway).

Haven't tested in beyond running TestScanner and TestAtomicOperation, which 
both still pass.


> Reduce the number of time row comparison is done in a Scan
> --
>
> Key: HBASE-14221
> URL: https://issues.apache.org/jira/browse/HBASE-14221
> Project: HBase
>  Issue Type: Sub-task
>  Components: Scanners
>Reporter: ramkrishna.s.vasudevan
>Assignee: ramkrishna.s.vasudevan
> Fix For: 2.0.0
>
> Attachments: 14221-0.98-takeALook.txt, HBASE-14221.patch, 
> HBASE-14221_1.patch, HBASE-14221_1.patch, HBASE-14221_6.patch, 
> withmatchingRowspatch.png, withoutmatchingRowspatch.png
>
>
> When we tried to do some profiling with the PE tool found this.
> Currently we do row comparisons in 3 places in a simple Scan case.
> 1) ScanQueryMatcher
> {code}
>int ret = this.rowComparator.compareRows(curCell, cell);
> if (!this.isReversed) {
>   if (ret <= -1) {
> return MatchCode.DONE;
>   } else if (ret >= 1) {
> // could optimize this, if necessary?
> // Could also be called SEEK_TO_CURRENT_ROW, but this
> // should be rare/never happens.
> return MatchCode.SEEK_NEXT_ROW;
>   }
> } else {
>   if (ret <= -1) {
> return MatchCode.SEEK_NEXT_ROW;
>   } else if (ret >= 1) {
> return MatchCode.DONE;
>   }
> }
> {code}
> 2) In StoreScanner next() while starting to scan the row
> {code}
> if (!scannerContext.hasAnyLimit(LimitScope.BETWEEN_CELLS) || 
> matcher.curCell == null ||
> isNewRow || !CellUtil.matchingRow(peeked, matcher.curCell)) {
>   this.countPerRow = 0;
>   matcher.setToNewRow(peeked);
> }
> {code}
> Particularly to see if we are in a new row.
> 3) In HRegion
> {code}
>   scannerContext.setKeepProgress(true);
>   heap.next(results, scannerContext);
>   scannerContext.setKeepProgress(tmpKeepProgress);
>   nextKv = heap.peek();
> moreCellsInRow = moreCellsInRow(nextKv, currentRowCell);
> {code}
> Here again there are cases where we need to careful for a MultiCF case.  Was 
> trying to solve this for the MultiCF case but is having lot of cases to 
> solve. But atleast for a single CF case I think these comparison can be 
> reduced.
> So for a single CF case in the SQM we are able to find if we have crossed a 
> row using the code pasted above in SQM. That comparison is definitely needed.
> Now in case of a single CF the HRegion is going to have only one element in 
> the heap and so the 3rd comparison can surely be avoided if the 
> StoreScanner.next() was over due to MatchCode.DONE caused by SQM.
> Coming to the 2nd compareRows that we do in StoreScanner. next() - even that 
> can be avoided if we know that the previous next() call was over due to a new 
> row. Doing all this I found that the compareRows in the profiler which was 
> 19% got reduced to 13%. Initially we can solve for single CF case which can 
> be extended to MultiCF cases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (HBASE-14221) Reduce the number of time row comparison is done in a Scan

2015-10-07 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14946327#comment-14946327
 ] 

Lars Hofhansl edited comment on HBASE-14221 at 10/7/15 6:08 AM:


[~ram_krish], take a look at the "-takeALook" sample. That's what I mean.
I let the SQM decide when a new row is found (it's better encapsulation, and 
it's doing the comparison there anyway).

Haven't tested in beyond running TestScanner and TestAtomicOperation, which 
both still pass.

(I am not suggesting we use my patch, it's just easier to explain what I mean 
by having it in a patch rather then describing it in words).


was (Author: lhofhansl):
[~ram_krish], take a look at the "-takeALook" sample. That's what I mean.
I let the SQM decide when a new row is found (it's better encapsulation, and 
it's doing the comparison there anyway).

Haven't tested in beyond running TestScanner and TestAtomicOperation, which 
both still pass.


> Reduce the number of time row comparison is done in a Scan
> --
>
> Key: HBASE-14221
> URL: https://issues.apache.org/jira/browse/HBASE-14221
> Project: HBase
>  Issue Type: Sub-task
>  Components: Scanners
>Reporter: ramkrishna.s.vasudevan
>Assignee: ramkrishna.s.vasudevan
> Fix For: 2.0.0
>
> Attachments: 14221-0.98-takeALook.txt, HBASE-14221.patch, 
> HBASE-14221_1.patch, HBASE-14221_1.patch, HBASE-14221_6.patch, 
> withmatchingRowspatch.png, withoutmatchingRowspatch.png
>
>
> When we tried to do some profiling with the PE tool found this.
> Currently we do row comparisons in 3 places in a simple Scan case.
> 1) ScanQueryMatcher
> {code}
>int ret = this.rowComparator.compareRows(curCell, cell);
> if (!this.isReversed) {
>   if (ret <= -1) {
> return MatchCode.DONE;
>   } else if (ret >= 1) {
> // could optimize this, if necessary?
> // Could also be called SEEK_TO_CURRENT_ROW, but this
> // should be rare/never happens.
> return MatchCode.SEEK_NEXT_ROW;
>   }
> } else {
>   if (ret <= -1) {
> return MatchCode.SEEK_NEXT_ROW;
>   } else if (ret >= 1) {
> return MatchCode.DONE;
>   }
> }
> {code}
> 2) In StoreScanner next() while starting to scan the row
> {code}
> if (!scannerContext.hasAnyLimit(LimitScope.BETWEEN_CELLS) || 
> matcher.curCell == null ||
> isNewRow || !CellUtil.matchingRow(peeked, matcher.curCell)) {
>   this.countPerRow = 0;
>   matcher.setToNewRow(peeked);
> }
> {code}
> Particularly to see if we are in a new row.
> 3) In HRegion
> {code}
>   scannerContext.setKeepProgress(true);
>   heap.next(results, scannerContext);
>   scannerContext.setKeepProgress(tmpKeepProgress);
>   nextKv = heap.peek();
> moreCellsInRow = moreCellsInRow(nextKv, currentRowCell);
> {code}
> Here again there are cases where we need to careful for a MultiCF case.  Was 
> trying to solve this for the MultiCF case but is having lot of cases to 
> solve. But atleast for a single CF case I think these comparison can be 
> reduced.
> So for a single CF case in the SQM we are able to find if we have crossed a 
> row using the code pasted above in SQM. That comparison is definitely needed.
> Now in case of a single CF the HRegion is going to have only one element in 
> the heap and so the 3rd comparison can surely be avoided if the 
> StoreScanner.next() was over due to MatchCode.DONE caused by SQM.
> Coming to the 2nd compareRows that we do in StoreScanner. next() - even that 
> can be avoided if we know that the previous next() call was over due to a new 
> row. Doing all this I found that the compareRows in the profiler which was 
> 19% got reduced to 13%. Initially we can solve for single CF case which can 
> be extended to MultiCF cases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14221) Reduce the number of time row comparison is done in a Scan

2015-10-07 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14948057#comment-14948057
 ] 

Lars Hofhansl commented on HBASE-14221:
---

I should make my patch compile fully. :)

The compare in StoreScanner.next() is actually only performed when scanner 
batching is enabled (which in most cases it is not).
If batching is off we know that each time we enter next() is must be a new row.

The complexity and readability of the patch might not be worth the 
improvement...?

> Reduce the number of time row comparison is done in a Scan
> --
>
> Key: HBASE-14221
> URL: https://issues.apache.org/jira/browse/HBASE-14221
> Project: HBase
>  Issue Type: Sub-task
>  Components: Scanners
>Reporter: ramkrishna.s.vasudevan
>Assignee: ramkrishna.s.vasudevan
> Fix For: 2.0.0
>
> Attachments: 14221-0.98-takeALook.txt, HBASE-14221.patch, 
> HBASE-14221_1.patch, HBASE-14221_1.patch, HBASE-14221_6.patch, 
> withmatchingRowspatch.png, withoutmatchingRowspatch.png
>
>
> When we tried to do some profiling with the PE tool found this.
> Currently we do row comparisons in 3 places in a simple Scan case.
> 1) ScanQueryMatcher
> {code}
>int ret = this.rowComparator.compareRows(curCell, cell);
> if (!this.isReversed) {
>   if (ret <= -1) {
> return MatchCode.DONE;
>   } else if (ret >= 1) {
> // could optimize this, if necessary?
> // Could also be called SEEK_TO_CURRENT_ROW, but this
> // should be rare/never happens.
> return MatchCode.SEEK_NEXT_ROW;
>   }
> } else {
>   if (ret <= -1) {
> return MatchCode.SEEK_NEXT_ROW;
>   } else if (ret >= 1) {
> return MatchCode.DONE;
>   }
> }
> {code}
> 2) In StoreScanner next() while starting to scan the row
> {code}
> if (!scannerContext.hasAnyLimit(LimitScope.BETWEEN_CELLS) || 
> matcher.curCell == null ||
> isNewRow || !CellUtil.matchingRow(peeked, matcher.curCell)) {
>   this.countPerRow = 0;
>   matcher.setToNewRow(peeked);
> }
> {code}
> Particularly to see if we are in a new row.
> 3) In HRegion
> {code}
>   scannerContext.setKeepProgress(true);
>   heap.next(results, scannerContext);
>   scannerContext.setKeepProgress(tmpKeepProgress);
>   nextKv = heap.peek();
> moreCellsInRow = moreCellsInRow(nextKv, currentRowCell);
> {code}
> Here again there are cases where we need to careful for a MultiCF case.  Was 
> trying to solve this for the MultiCF case but is having lot of cases to 
> solve. But atleast for a single CF case I think these comparison can be 
> reduced.
> So for a single CF case in the SQM we are able to find if we have crossed a 
> row using the code pasted above in SQM. That comparison is definitely needed.
> Now in case of a single CF the HRegion is going to have only one element in 
> the heap and so the 3rd comparison can surely be avoided if the 
> StoreScanner.next() was over due to MatchCode.DONE caused by SQM.
> Coming to the 2nd compareRows that we do in StoreScanner. next() - even that 
> can be avoided if we know that the previous next() call was over due to a new 
> row. Doing all this I found that the compareRows in the profiler which was 
> 19% got reduced to 13%. Initially we can solve for single CF case which can 
> be extended to MultiCF cases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14549) Simplify scanner stack reset logic

2015-10-06 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14946279#comment-14946279
 ] 

Lars Hofhansl commented on HBASE-14549:
---

Looking more at HBASE-5121, I think I do not understand what the issue is. 
We're recreating the scanner heap, how can there be state left over from the 
prior scan. I think as soon as I understand that, I can fix this one and make 
it simpler.


> Simplify scanner stack reset logic
> --
>
> Key: HBASE-14549
> URL: https://issues.apache.org/jira/browse/HBASE-14549
> Project: HBase
>  Issue Type: Bug
>Reporter: Lars Hofhansl
>Assignee: Lars Hofhansl
> Attachments: 14549-0.98.txt
>
>
> Looking at the code, I find that the logic is unnecessarily complex.
> We indicate in updateReaders that the scanner stack needs to be reset. Then 
> almost all store scanner (and derived classes) methods need to check and 
> actually reset the scanner stack.
> Compaction are rare, we should reset the scanner stack in update readers, and 
> hence avoid needing to check in all methods.
> Patch forthcoming.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-9260) Timestamp Compactions

2015-10-06 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14946284#comment-14946284
 ] 

Lars Hofhansl commented on HBASE-9260:
--

Sounds very similar to me.

> Timestamp Compactions
> -
>
> Key: HBASE-9260
> URL: https://issues.apache.org/jira/browse/HBASE-9260
> Project: HBase
>  Issue Type: New Feature
>  Components: Compaction
>Affects Versions: 0.94.10
>Reporter: Adrien Mogenet
>Priority: Minor
>  Labels: features, performance
>
> h1.TSCompactions
> h2.The issue
> One of the biggest issue I currently deal with is compacting big
> stores, i.e. when HBase cluster is 80% full on 4 TB nodes (let say
> with a single big table), compactions might take several hours (from
> 15 to 20 in my case).
> In 'time series' workloads, we could avoid compacting everything
> everytime. Think about OpenTSDB-like systems, or write-heavy,
> TTL based workloads where you want to free space everyday, deleting
> oldest data, and you're not concerned about read latency (i.e. read
> into a single bigger StoreFile).
> > Note: in this draft, I currently consider that we get free space from
> > the TTL behavior only, not really from the Delete operations.
> h2.Proposal and benefits
> For such cases, StoreFiles could be organized and managed in a way
> that would compact:
>   * recent StoreFiles with recent data
>   * oldest StoreFiles that are concerned by TTL eviction
> By the way, it would help when scanning with a timestamp criterion.
> h2.Configuration
>   * {{hbase.hstore.compaction.sortByTS}} (boolean, default=false)
> This indicates if new behavior is enabled or not. Set it to
> {{false}} and compactions will remain the same than current ones.
>   * {{hbase.hstore.compaction.ts.bucketSize}} (integer)
> If `sortByTS` is enabled, tells to HBase the target size of
> buckets. The lower, the more StoreFiles you'll get, but you should
> save more IO's. Higher values will generate less StoreFiles, but
> theses will be bigger and thus compactions could generate more
> IO's.
> h2.Examples
> Here is how a common store could look like after some flushes and
> perhaps some minor compactions:
> {noformat}
>,---, ,---,   ,---,
>|   | |   | ,---, |   |
>|   | |   | |   | |   |
>`---' `---' `---' `---'
> SF1   SF2   SF3   SF4
>\__ __/
>   V
>for all of these Storefiles,
>let say minimum TS is 01/01/2013
>and maximum TS is 31/03/2013
> {noformat}
> Set the bucket size to 1 month, and that's what we have after
> compaction:
> {noformat}
> ,---, ,---,
> |   | |   |
>   ,---, |   | |   |
>   |   | |   | |   |
>   `---' `---' `---'
>SF1   SF2   SF3
>,-,
>|  minimum TS  |  maximum TS  |
>  ,---'
>  | SF1 |  03/03/2013  |  31/03/2013  | most recent, growing
>  | SF2 |  31/01/2013  |  02/03/2013  | old data, "sealed"
>  | SF3 |  01/01/2013  |  30/01/2013  | oldest data, "sealed"
>  '---'
> {noformat}
> h2.StoreFile selection
>   * for minor compactions, current algorithm should already do the
> right job. Pick up `n` eldest files that are small enough, and
> write a bigger file. Remember, TSCompaction are designed for time
> series, so this 'minor selection' should leave "sealed" big old
> files as they are.
>   * for major compactions, when all the StoreFiles have been selected,
> apply the TTL first. StoreFiles that are entirely out of time just
> don't need to be rewritten. They'll be deleted in one time,
> avoiding lots of IO's.
> h2.New issues and trade-offs
>   1. In that case ({{bucketSize=1 month}}), after 1+ year, we'll have lots
>   of StoreFiles (and more generally after `n * bucketSize` seconds) if
>   there is no TTL eviction. In any case, a clever threshold should be
>   implemented to limit the maximum number of StoreFiles.
>   2. If we later add old data that matches timerange of a StoreFile
>   which has already been compacted, this could generate lots of IO's
>   to reconstruct a single StoreFile for this time bucket, perhaps just
>   to merge a few lines.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14221) Reduce the number of time row comparison is done in a Scan

2015-10-06 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14946298#comment-14946298
 ] 

Lars Hofhansl commented on HBASE-14221:
---

Good find.

Although, isn't there a simpler way to do this, without extending 
KeyValueScanner and adding a new enum of return codes, row state to be 
maintained, etc?

I always thought we can get rid of case #2 above, by piggy packing on the 
comparison of case #1 (and then doing the reset there). Even made a patch for 
that some point; like many things didn't finish it.


> Reduce the number of time row comparison is done in a Scan
> --
>
> Key: HBASE-14221
> URL: https://issues.apache.org/jira/browse/HBASE-14221
> Project: HBase
>  Issue Type: Sub-task
>  Components: Scanners
>Reporter: ramkrishna.s.vasudevan
>Assignee: ramkrishna.s.vasudevan
> Fix For: 2.0.0
>
> Attachments: HBASE-14221.patch, HBASE-14221_1.patch, 
> HBASE-14221_1.patch, HBASE-14221_6.patch, withmatchingRowspatch.png, 
> withoutmatchingRowspatch.png
>
>
> When we tried to do some profiling with the PE tool found this.
> Currently we do row comparisons in 3 places in a simple Scan case.
> 1) ScanQueryMatcher
> {code}
>int ret = this.rowComparator.compareRows(curCell, cell);
> if (!this.isReversed) {
>   if (ret <= -1) {
> return MatchCode.DONE;
>   } else if (ret >= 1) {
> // could optimize this, if necessary?
> // Could also be called SEEK_TO_CURRENT_ROW, but this
> // should be rare/never happens.
> return MatchCode.SEEK_NEXT_ROW;
>   }
> } else {
>   if (ret <= -1) {
> return MatchCode.SEEK_NEXT_ROW;
>   } else if (ret >= 1) {
> return MatchCode.DONE;
>   }
> }
> {code}
> 2) In StoreScanner next() while starting to scan the row
> {code}
> if (!scannerContext.hasAnyLimit(LimitScope.BETWEEN_CELLS) || 
> matcher.curCell == null ||
> isNewRow || !CellUtil.matchingRow(peeked, matcher.curCell)) {
>   this.countPerRow = 0;
>   matcher.setToNewRow(peeked);
> }
> {code}
> Particularly to see if we are in a new row.
> 3) In HRegion
> {code}
>   scannerContext.setKeepProgress(true);
>   heap.next(results, scannerContext);
>   scannerContext.setKeepProgress(tmpKeepProgress);
>   nextKv = heap.peek();
> moreCellsInRow = moreCellsInRow(nextKv, currentRowCell);
> {code}
> Here again there are cases where we need to careful for a MultiCF case.  Was 
> trying to solve this for the MultiCF case but is having lot of cases to 
> solve. But atleast for a single CF case I think these comparison can be 
> reduced.
> So for a single CF case in the SQM we are able to find if we have crossed a 
> row using the code pasted above in SQM. That comparison is definitely needed.
> Now in case of a single CF the HRegion is going to have only one element in 
> the heap and so the 3rd comparison can surely be avoided if the 
> StoreScanner.next() was over due to MatchCode.DONE caused by SQM.
> Coming to the 2nd compareRows that we do in StoreScanner. next() - even that 
> can be avoided if we know that the previous next() call was over due to a new 
> row. Doing all this I found that the compareRows in the profiler which was 
> 19% got reduced to 13%. Initially we can solve for single CF case which can 
> be extended to MultiCF cases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14549) Simplify scanner stack reset logic

2015-10-04 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14942825#comment-14942825
 ] 

Lars Hofhansl commented on HBASE-14549:
---

There is a slight performance improvement too.

> Simplify scanner stack reset logic
> --
>
> Key: HBASE-14549
> URL: https://issues.apache.org/jira/browse/HBASE-14549
> Project: HBase
>  Issue Type: Bug
>Reporter: Lars Hofhansl
>Assignee: Lars Hofhansl
> Attachments: 14549-0.98.txt
>
>
> Looking at the code, I find that the logic is unnecessarily complex.
> We indicate in updateReaders that the scanner stack needs to be reset. Then 
> almost all store scanner (and derived classes) methods need to check and 
> actually reset the scanner stack.
> Compaction are rare, we should reset the scanner stack in update readers, and 
> hence avoid needing to check in all methods.
> Patch forthcoming.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14549) Simplify scanner stack reset logic

2015-10-04 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14942865#comment-14942865
 ] 

Lars Hofhansl commented on HBASE-14549:
---

Hmm... Yes, not quite right, although I do not understand why, yet.

> Simplify scanner stack reset logic
> --
>
> Key: HBASE-14549
> URL: https://issues.apache.org/jira/browse/HBASE-14549
> Project: HBase
>  Issue Type: Bug
>Reporter: Lars Hofhansl
>Assignee: Lars Hofhansl
> Attachments: 14549-0.98.txt
>
>
> Looking at the code, I find that the logic is unnecessarily complex.
> We indicate in updateReaders that the scanner stack needs to be reset. Then 
> almost all store scanner (and derived classes) methods need to check and 
> actually reset the scanner stack.
> Compaction are rare, we should reset the scanner stack in update readers, and 
> hence avoid needing to check in all methods.
> Patch forthcoming.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14549) Simplify scanner stack reset logic

2015-10-04 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14942867#comment-14942867
 ] 

Lars Hofhansl commented on HBASE-14549:
---

See HBASE-5121 (and my _own_ suggestion to simplify fix there :) ).

> Simplify scanner stack reset logic
> --
>
> Key: HBASE-14549
> URL: https://issues.apache.org/jira/browse/HBASE-14549
> Project: HBase
>  Issue Type: Bug
>Reporter: Lars Hofhansl
>Assignee: Lars Hofhansl
> Attachments: 14549-0.98.txt
>
>
> Looking at the code, I find that the logic is unnecessarily complex.
> We indicate in updateReaders that the scanner stack needs to be reset. Then 
> almost all store scanner (and derived classes) methods need to check and 
> actually reset the scanner stack.
> Compaction are rare, we should reset the scanner stack in update readers, and 
> hence avoid needing to check in all methods.
> Patch forthcoming.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14549) Simplify scanner stack reset logic

2015-10-04 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14942876#comment-14942876
 ] 

Lars Hofhansl commented on HBASE-14549:
---

Thanks for keeping me honest [~stack] :)


> Simplify scanner stack reset logic
> --
>
> Key: HBASE-14549
> URL: https://issues.apache.org/jira/browse/HBASE-14549
> Project: HBase
>  Issue Type: Bug
>Reporter: Lars Hofhansl
>Assignee: Lars Hofhansl
> Attachments: 14549-0.98.txt
>
>
> Looking at the code, I find that the logic is unnecessarily complex.
> We indicate in updateReaders that the scanner stack needs to be reset. Then 
> almost all store scanner (and derived classes) methods need to check and 
> actually reset the scanner stack.
> Compaction are rare, we should reset the scanner stack in update readers, and 
> hence avoid needing to check in all methods.
> Patch forthcoming.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14549) Simplify scanner stack reset logic

2015-10-04 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14942862#comment-14942862
 ] 

Lars Hofhansl commented on HBASE-14549:
---

TestScanner looks very relevant. Looking.,

> Simplify scanner stack reset logic
> --
>
> Key: HBASE-14549
> URL: https://issues.apache.org/jira/browse/HBASE-14549
> Project: HBase
>  Issue Type: Bug
>Reporter: Lars Hofhansl
>Assignee: Lars Hofhansl
> Attachments: 14549-0.98.txt
>
>
> Looking at the code, I find that the logic is unnecessarily complex.
> We indicate in updateReaders that the scanner stack needs to be reset. Then 
> almost all store scanner (and derived classes) methods need to check and 
> actually reset the scanner stack.
> Compaction are rare, we should reset the scanner stack in update readers, and 
> hence avoid needing to check in all methods.
> Patch forthcoming.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14549) Simplify scanner stack reset logic

2015-10-03 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14942557#comment-14942557
 ] 

Lars Hofhansl commented on HBASE-14549:
---

TestAtomicOperation passes multiple runs (it mixes flushes and compaction with 
active scanning)

> Simplify scanner stack reset logic
> --
>
> Key: HBASE-14549
> URL: https://issues.apache.org/jira/browse/HBASE-14549
> Project: HBase
>  Issue Type: Bug
>Reporter: Lars Hofhansl
>Assignee: Lars Hofhansl
> Attachments: 14549-0.98.txt
>
>
> Looking at the code, I find that the logic is unnecessarily complex.
> We indicate in updateReaders that the scanner stack needs to be reset. Than 
> almost all store scanner (and derived classes) methods need to check and 
> actually reset the scanner stack.
> Compaction are rare, we should reset the scanner stack in update readers, and 
> hence avoid needing to check in all methods.
> Patch forthcoming.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-14549) Simplify scanner stack reset logic

2015-10-03 Thread Lars Hofhansl (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl updated HBASE-14549:
--
Attachment: 14549-0.98.txt

Patch for 0.98.

Simply does away with checkReseek and the requirement to call it in scanner 
methods, instead inlines the logic in updateReaders.
There's no performance benefit from this. It simply places the logic where it 
belong, improving readability and possibly making it easier to change the logic 
(for example HBASE-13082) in the future as it is all in one place.


> Simplify scanner stack reset logic
> --
>
> Key: HBASE-14549
> URL: https://issues.apache.org/jira/browse/HBASE-14549
> Project: HBase
>  Issue Type: Bug
>Reporter: Lars Hofhansl
>Assignee: Lars Hofhansl
> Attachments: 14549-0.98.txt
>
>
> Looking at the code, I find that the logic is unnecessarily complex.
> We indicate in updateReaders that the scanner stack needs to be reset. Than 
> almost all store scanner (and derived classes) methods need to check and 
> actually reset the scanner stack.
> Compaction are rare, we should reset the scanner stack in update readers, and 
> hence avoid needing to check in all methods.
> Patch forthcoming.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HBASE-14549) Simplify scanner stack reset logic

2015-10-03 Thread Lars Hofhansl (JIRA)
Lars Hofhansl created HBASE-14549:
-

 Summary: Simplify scanner stack reset logic
 Key: HBASE-14549
 URL: https://issues.apache.org/jira/browse/HBASE-14549
 Project: HBase
  Issue Type: Bug
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl


Looking at the code, I find that the logic is unnecessarily complex.
We indicate in updateReaders that the scanner stack needs to be reset. Than 
almost all store scanner (and derived classes) methods need to check and 
actually reset the scanner stack.
Compaction are rare, we should reset the scanner stack in update readers, and 
hence avoid needing to check in all methods.

Patch forthcoming.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-14549) Simplify scanner stack reset logic

2015-10-03 Thread Lars Hofhansl (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl updated HBASE-14549:
--
Description: 
Looking at the code, I find that the logic is unnecessarily complex.
We indicate in updateReaders that the scanner stack needs to be reset. Then 
almost all store scanner (and derived classes) methods need to check and 
actually reset the scanner stack.
Compaction are rare, we should reset the scanner stack in update readers, and 
hence avoid needing to check in all methods.

Patch forthcoming.

  was:
Looking at the code, I find that the logic is unnecessarily complex.
We indicate in updateReaders that the scanner stack needs to be reset. Than 
almost all store scanner (and derived classes) methods need to check and 
actually reset the scanner stack.
Compaction are rare, we should reset the scanner stack in update readers, and 
hence avoid needing to check in all methods.

Patch forthcoming.


> Simplify scanner stack reset logic
> --
>
> Key: HBASE-14549
> URL: https://issues.apache.org/jira/browse/HBASE-14549
> Project: HBase
>  Issue Type: Bug
>Reporter: Lars Hofhansl
>Assignee: Lars Hofhansl
> Attachments: 14549-0.98.txt
>
>
> Looking at the code, I find that the logic is unnecessarily complex.
> We indicate in updateReaders that the scanner stack needs to be reset. Then 
> almost all store scanner (and derived classes) methods need to check and 
> actually reset the scanner stack.
> Compaction are rare, we should reset the scanner stack in update readers, and 
> hence avoid needing to check in all methods.
> Patch forthcoming.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-14539) Slight improvement of StoreScanner.optimize

2015-10-02 Thread Lars Hofhansl (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl updated HBASE-14539:
--
Fix Version/s: (was: 0.98.12)
   (was: 1.1.0)
   (was: 1.0.1)
   1.1.3
   1.0.3
   1.2.1
   0.98.15
   1.3.0

> Slight improvement of StoreScanner.optimize
> ---
>
> Key: HBASE-14539
> URL: https://issues.apache.org/jira/browse/HBASE-14539
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Lars Hofhansl
>Assignee: Lars Hofhansl
>Priority: Minor
> Fix For: 2.0.0, 1.3.0, 0.98.15, 1.2.1, 1.0.3, 1.1.3
>
>
> While looking at the code I noticed that StoreScanner.optimize does not some 
> unnecessary work. This is a very tight loop and even just looking up a 
> reference can throw off the CPUs cache lines. This does safe a few percent of 
> performance (not a lot, though).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HBASE-14539) Slight improvement of StoreScanner.optimize

2015-10-02 Thread Lars Hofhansl (JIRA)
Lars Hofhansl created HBASE-14539:
-

 Summary: Slight improvement of StoreScanner.optimize
 Key: HBASE-14539
 URL: https://issues.apache.org/jira/browse/HBASE-14539
 Project: HBase
  Issue Type: Sub-task
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
Priority: Minor


While looking at the code I noticed that StoreScanner.optimize does not some 
unnecessary work. This is a very tight loop and even just looking up a 
reference can throw off the CPUs cache lines. This does safe a few percent of 
performance (not a lot, though).




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-14539) Slight improvement of StoreScanner.optimize

2015-10-02 Thread Lars Hofhansl (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl updated HBASE-14539:
--
Attachment: 14539-0.98.txt

Here's a trivial patch. Makes absolutely sure we do no work (other than the 
compares in the switch statements) unless we need to do any.

I measured a 3-5% improvement in some cases.

Trivial patch, no functional change. Will commit tomorrow unless I hear 
objections.

> Slight improvement of StoreScanner.optimize
> ---
>
> Key: HBASE-14539
> URL: https://issues.apache.org/jira/browse/HBASE-14539
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Lars Hofhansl
>Assignee: Lars Hofhansl
>Priority: Minor
> Fix For: 2.0.0, 1.3.0, 0.98.15, 1.2.1, 1.0.3, 1.1.3
>
> Attachments: 14539-0.98.txt
>
>
> While looking at the code I noticed that StoreScanner.optimize does not some 
> unnecessary work. This is a very tight loop and even just looking up a 
> reference can throw off the CPUs cache lines. This does safe a few percent of 
> performance (not a lot, though).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (HBASE-14509) Configurable sparse indexes?

2015-10-02 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14935818#comment-14935818
 ] 

Lars Hofhansl edited comment on HBASE-14509 at 10/2/15 6:16 AM:


[~lhofhansl], FYI

HBASE-14511 - StoreFile.Writer Meta plugin framework. I need only Meta section 
and only for Writer. For your sparse indexes, you will need full Reader/Writer 
plugin (both meta and data blocks). It is just a one way of doing indexes, of 
course. 


was (Author: vrodionov):
[~lhofhansl], FYI

https://issues.apache.org/jira/browse/HBASE-14511 - StoreFile.Writer Meta 
plugin framework. I need only Meta section and only for Writer. For your sparse 
indexes, you will need full Reader/Writer plugin (both meta and data blocks). 
It is just a one way of doing indexes, of course. 

> Configurable sparse indexes?
> 
>
> Key: HBASE-14509
> URL: https://issues.apache.org/jira/browse/HBASE-14509
> Project: HBase
>  Issue Type: Brainstorming
>Reporter: Lars Hofhansl
>
> This idea just popped up today and I wanted to record it for discussion:
> What if we kept sparse column indexes per region or HFile or per configurable 
> range?
> I.e. For any given CQ we record the lowest and highest value for a particular 
> range (HFile, Region, or a custom range like the Phoenix guide post).
> By tweaking the size of these ranges we can control the size of the index, vs 
> its selectivity.
> For example if we kept it by HFile we can almost instantly decide whether we 
> need scan a particular HFile at all to find a particular value in a Cell.
> We can also collect min/max values for each n MB of data, for example when we 
> can the region the first time. Assuming ranges are large enough we can always 
> keep the index in memory together with the region.
> Kind of a sparse local index. Might much easier than the buddy region stuff 
> we've been discussing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14511) StoreFile.Writer Meta Plugin

2015-10-02 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14940808#comment-14940808
 ] 

Lars Hofhansl commented on HBASE-14511:
---

I'd like to use this for Phoenix to store min/max for some column qualifiers in 
the HFile itself. At scan time we can then efficiently rule out entire HFiles 
based on those (similar to HBase does it with key ranges, and timestamps) - 
that would be a cheap local secondary index. [~giacomotaylor], FYI.
Can we make this accessible through coprocessor hooks somehow (I'd need to 
think about this side, though).

> StoreFile.Writer Meta Plugin
> 
>
> Key: HBASE-14511
> URL: https://issues.apache.org/jira/browse/HBASE-14511
> Project: HBase
>  Issue Type: New Feature
>Reporter: Vladimir Rodionov
>Assignee: Vladimir Rodionov
> Attachments: HBASE-14511.v1.patch, HBASE-14511.v2.patch
>
>
> During my work on a new compaction policies (HBASE-14468, HBASE-14477) I had 
> to modify the existing code of a StoreFile.Writer to add additional meta-info 
> required by these new  policies. I think that it should be done by means of a 
> new Plugin framework, because this seems to be a general capability/feature. 
> As a future enhancement this can become a part of a more general 
> StoreFileWriter/Reader plugin architecture. But I need only Meta section of a 
> store file.
> This could be used, for example, to collect rowkeys distribution information 
> during hfile creation. This info can be used later to find the optimal region 
> split key or to create optimal set of sub-regions for M/R jobs or other jobs 
> which can operate on a sub-region level.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14509) Configurable sparse indexes?

2015-10-02 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14942101#comment-14942101
 ] 

Lars Hofhansl commented on HBASE-14509:
---

The other part that's needed is to actually make use of the index, i.e. filter 
the HFiles that do not contain a range of values requested.
Coprocessors won't work here, they are a level too high for this (and the 
region level).

We could add a method to filter, which is passed an HFile or a FileInfo or 
something, and based on that gets to decide whether to include the HFile or 
not. Thoughts [~apurtell], [~stack]?

The other question is whether HFile is too large of a unit. Assuming CQ values 
are all over the place, storing min/max per HFile is not be very selective 
(i.e. a large HFile will likely contain a very small and a very value for a 
specific CQ). So maybe we need to record min/max CQ value for a range of keys. 
I.e. we have a mapping from (key1, key2) -> (min CQ, max CQ), then as we scan 
we skip scan to the next key1 of we find the value range does not contain the 
value we're looking for.

Lastly we need indicate at compaction time what CQs to keep track of. That 
would be a sly introduction of (some) schema.
We can try to automate that, but we can't keep track of all them, there might 
be many, or the CQ values might be very large.

Or we punt and just add the building blocks: Add the API I mention to Filter, 
and allow coprocessors to add record and add stuff to the HFile trailer. Then 
higher level tools like Phoenix can add the appropriate logic.


> Configurable sparse indexes?
> 
>
> Key: HBASE-14509
> URL: https://issues.apache.org/jira/browse/HBASE-14509
> Project: HBase
>  Issue Type: Brainstorming
>Reporter: Lars Hofhansl
>
> This idea just popped up today and I wanted to record it for discussion:
> What if we kept sparse column indexes per region or HFile or per configurable 
> range?
> I.e. For any given CQ we record the lowest and highest value for a particular 
> range (HFile, Region, or a custom range like the Phoenix guide post).
> By tweaking the size of these ranges we can control the size of the index, vs 
> its selectivity.
> For example if we kept it by HFile we can almost instantly decide whether we 
> need scan a particular HFile at all to find a particular value in a Cell.
> We can also collect min/max values for each n MB of data, for example when we 
> can the region the first time. Assuming ranges are large enough we can always 
> keep the index in memory together with the region.
> Kind of a sparse local index. Might much easier than the buddy region stuff 
> we've been discussing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14539) Slight improvement of StoreScanner.optimize

2015-10-02 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14942077#comment-14942077
 ] 

Lars Hofhansl commented on HBASE-14539:
---

I'll commit everywhere now.

> Slight improvement of StoreScanner.optimize
> ---
>
> Key: HBASE-14539
> URL: https://issues.apache.org/jira/browse/HBASE-14539
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Lars Hofhansl
>Assignee: Lars Hofhansl
>Priority: Minor
> Fix For: 2.0.0, 1.3.0, 0.98.15, 1.2.1, 1.0.3, 1.1.3
>
> Attachments: 14539-0.98.txt
>
>
> While looking at the code I noticed that StoreScanner.optimize does not some 
> unnecessary work. This is a very tight loop and even just looking up a 
> reference can throw off the CPUs cache lines. This does safe a few percent of 
> performance (not a lot, though).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HBASE-14539) Slight improvement of StoreScanner.optimize

2015-10-02 Thread Lars Hofhansl (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl resolved HBASE-14539.
---
  Resolution: Fixed
Hadoop Flags: Reviewed

Committed to 0.98.x, 1.0.x, 1.1.x, 1.2.x, 1.3, and 2.0.

> Slight improvement of StoreScanner.optimize
> ---
>
> Key: HBASE-14539
> URL: https://issues.apache.org/jira/browse/HBASE-14539
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Lars Hofhansl
>Assignee: Lars Hofhansl
>Priority: Minor
> Fix For: 2.0.0, 1.3.0, 1.2.1, 1.0.3, 1.1.3, 0.98.15
>
> Attachments: 14539-0.98.txt
>
>
> While looking at the code I noticed that StoreScanner.optimize does not some 
> unnecessary work. This is a very tight loop and even just looking up a 
> reference can throw off the CPUs cache lines. This does safe a few percent of 
> performance (not a lot, though).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14468) Compaction improvements: FIFO compaction policy

2015-10-01 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14940769#comment-14940769
 ] 

Lars Hofhansl commented on HBASE-14468:
---

HBASE-14677 doesn't exist, though :)

> Compaction improvements: FIFO compaction policy
> ---
>
> Key: HBASE-14468
> URL: https://issues.apache.org/jira/browse/HBASE-14468
> Project: HBase
>  Issue Type: Improvement
>Reporter: Vladimir Rodionov
>Assignee: Vladimir Rodionov
> Fix For: 2.0.0
>
> Attachments: HBASE-14468-v1.patch, HBASE-14468-v2.patch, 
> HBASE-14468-v3.patch, HBASE-14468-v4.patch
>
>
> h2. FIFO Compaction
> h3. Introduction
> FIFO compaction policy selects only files which have all cells expired. The 
> column family MUST have non-default TTL. 
> Essentially, FIFO compactor does only one job: collects expired store files. 
> I see many applications for this policy:
> # use it for very high volume raw data which has low TTL and which is the 
> source of another data (after additional processing). Example: Raw 
> time-series vs. time-based rollup aggregates and compacted time-series. We 
> collect raw time-series and store them into CF with FIFO compaction policy, 
> periodically we run  task which creates rollup aggregates and compacts 
> time-series, the original raw data can be discarded after that.
> # use it for data which can be kept entirely in a a block cache (RAM/SSD). 
> Say we have local SSD (1TB) which we can use as a block cache. No need for 
> compaction of a raw data at all.
> Because we do not do any real compaction, we do not use CPU and IO (disk and 
> network), we do not evict hot data from a block cache. The result: improved 
> throughput and latency both write and read.
> See: https://github.com/facebook/rocksdb/wiki/FIFO-compaction-style
> h3. To enable FIFO compaction policy
> For table:
> {code}
> HTableDescriptor desc = new HTableDescriptor(tableName);
> 
> desc.setConfiguration(DefaultStoreEngine.DEFAULT_COMPACTION_POLICY_CLASS_KEY, 
>   FIFOCompactionPolicy.class.getName());
> {code} 
> For CF:
> {code}
> HColumnDescriptor desc = new HColumnDescriptor(family);
> 
> desc.setConfiguration(DefaultStoreEngine.DEFAULT_COMPACTION_POLICY_CLASS_KEY, 
>   FIFOCompactionPolicy.class.getName());
> {code}
> h3. Limitations
> Do not use FIFO compaction if :
> * Table/CF has MIN_VERSION > 0
> * Table/CF has TTL = FOREVER (HColumnDescriptor.DEFAULT_TTL)
> * Table/CF is MOB 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14468) Compaction improvements: FIFO compaction policy

2015-09-30 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14939307#comment-14939307
 ] 

Lars Hofhansl commented on HBASE-14468:
---

The other thing I want to do is tired compactions along timerange. I.e. instead 
having a (major) compaction spit out a single file, we can a configurable 
number based on timebands. Then we can (say) query the last weeks worth of data 
without touched many of the older files. But that's a different topic.

> Compaction improvements: FIFO compaction policy
> ---
>
> Key: HBASE-14468
> URL: https://issues.apache.org/jira/browse/HBASE-14468
> Project: HBase
>  Issue Type: Improvement
>Reporter: Vladimir Rodionov
>Assignee: Vladimir Rodionov
> Fix For: 2.0.0
>
> Attachments: HBASE-14468-v1.patch, HBASE-14468-v2.patch, 
> HBASE-14468-v3.patch, HBASE-14468-v4.patch
>
>
> h2. FIFO Compaction
> h3. Introduction
> FIFO compaction policy selects only files which have all cells expired. The 
> column family MUST have non-default TTL. 
> Essentially, FIFO compactor does only one job: collects expired store files. 
> I see many applications for this policy:
> # use it for very high volume raw data which has low TTL and which is the 
> source of another data (after additional processing). Example: Raw 
> time-series vs. time-based rollup aggregates and compacted time-series. We 
> collect raw time-series and store them into CF with FIFO compaction policy, 
> periodically we run  task which creates rollup aggregates and compacts 
> time-series, the original raw data can be discarded after that.
> # use it for data which can be kept entirely in a a block cache (RAM/SSD). 
> Say we have local SSD (1TB) which we can use as a block cache. No need for 
> compaction of a raw data at all.
> Because we do not do any real compaction, we do not use CPU and IO (disk and 
> network), we do not evict hot data from a block cache. The result: improved 
> throughput and latency both write and read.
> See: https://github.com/facebook/rocksdb/wiki/FIFO-compaction-style
> h3. To enable FIFO compaction policy
> For table:
> {code}
> HTableDescriptor desc = new HTableDescriptor(tableName);
> 
> desc.setConfiguration(DefaultStoreEngine.DEFAULT_COMPACTION_POLICY_CLASS_KEY, 
>   FIFOCompactionPolicy.class.getName());
> {code} 
> For CF:
> {code}
> HColumnDescriptor desc = new HColumnDescriptor(family);
> 
> desc.setConfiguration(DefaultStoreEngine.DEFAULT_COMPACTION_POLICY_CLASS_KEY, 
>   FIFOCompactionPolicy.class.getName());
> {code}
> h3. Limitations
> Do not use FIFO compaction if :
> * Table/CF has MIN_VERSION > 0
> * Table/CF has TTL = FOREVER (HColumnDescriptor.DEFAULT_TTL)
> * Table/CF is MOB 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-9260) Timestamp Compactions

2015-09-30 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14939311#comment-14939311
 ] 

Lars Hofhansl commented on HBASE-9260:
--

Warming this up again. I called this time-tired compactions before. The work I 
mentioned above got abandoned.

I think there is a lot of value in this. What I had in mind earlier to have in 
the tone "the last week separate from older stuff". That clearly does not work 
since "the last week" is a moving target. What is described here is much 
better, we just group data by a fixed timerange (like every month, every year, 
week, or every 10 days, or alike) that can work since it's not a moving target. 
We also do policies that group by week and eventually by month, etc, although 
then we'd need to force major compactions just to shift the files around. The 
value of data is decaying with age, and somehow we should capture that.


> Timestamp Compactions
> -
>
> Key: HBASE-9260
> URL: https://issues.apache.org/jira/browse/HBASE-9260
> Project: HBase
>  Issue Type: New Feature
>  Components: Compaction
>Affects Versions: 0.94.10
>Reporter: Adrien Mogenet
>Priority: Minor
>  Labels: features, performance
>
> h1.TSCompactions
> h2.The issue
> One of the biggest issue I currently deal with is compacting big
> stores, i.e. when HBase cluster is 80% full on 4 TB nodes (let say
> with a single big table), compactions might take several hours (from
> 15 to 20 in my case).
> In 'time series' workloads, we could avoid compacting everything
> everytime. Think about OpenTSDB-like systems, or write-heavy,
> TTL based workloads where you want to free space everyday, deleting
> oldest data, and you're not concerned about read latency (i.e. read
> into a single bigger StoreFile).
> > Note: in this draft, I currently consider that we get free space from
> > the TTL behavior only, not really from the Delete operations.
> h2.Proposal and benefits
> For such cases, StoreFiles could be organized and managed in a way
> that would compact:
>   * recent StoreFiles with recent data
>   * oldest StoreFiles that are concerned by TTL eviction
> By the way, it would help when scanning with a timestamp criterion.
> h2.Configuration
>   * {{hbase.hstore.compaction.sortByTS}} (boolean, default=false)
> This indicates if new behavior is enabled or not. Set it to
> {{false}} and compactions will remain the same than current ones.
>   * {{hbase.hstore.compaction.ts.bucketSize}} (integer)
> If `sortByTS` is enabled, tells to HBase the target size of
> buckets. The lower, the more StoreFiles you'll get, but you should
> save more IO's. Higher values will generate less StoreFiles, but
> theses will be bigger and thus compactions could generate more
> IO's.
> h2.Examples
> Here is how a common store could look like after some flushes and
> perhaps some minor compactions:
> {noformat}
>,---, ,---,   ,---,
>|   | |   | ,---, |   |
>|   | |   | |   | |   |
>`---' `---' `---' `---'
> SF1   SF2   SF3   SF4
>\__ __/
>   V
>for all of these Storefiles,
>let say minimum TS is 01/01/2013
>and maximum TS is 31/03/2013
> {noformat}
> Set the bucket size to 1 month, and that's what we have after
> compaction:
> {noformat}
> ,---, ,---,
> |   | |   |
>   ,---, |   | |   |
>   |   | |   | |   |
>   `---' `---' `---'
>SF1   SF2   SF3
>,-,
>|  minimum TS  |  maximum TS  |
>  ,---'
>  | SF1 |  03/03/2013  |  31/03/2013  | most recent, growing
>  | SF2 |  31/01/2013  |  02/03/2013  | old data, "sealed"
>  | SF3 |  01/01/2013  |  30/01/2013  | oldest data, "sealed"
>  '---'
> {noformat}
> h2.StoreFile selection
>   * for minor compactions, current algorithm should already do the
> right job. Pick up `n` eldest files that are small enough, and
> write a bigger file. Remember, TSCompaction are designed for time
> series, so this 'minor selection' should leave "sealed" big old
> files as they are.
>   * for major compactions, when all the StoreFiles have been selected,
> apply the TTL first. StoreFiles that are entirely out of time just
> don't need to be rewritten. They'll be deleted in one time,
> avoiding lots of IO's.
> h2.New issues and trade-offs
>   1. In that case ({{bucketSize=1 month}}), after 1+ year, we'll have lots
>   of StoreFiles (and more generally after `n * bucketSize` seconds) if
>   there is no TTL eviction. In any case, a clever threshold should be
>   implemented to limit the maximum number of StoreFiles.
>   2. If we later add old data that matches timerange of a 

[jira] [Commented] (HBASE-13082) Coarsen StoreScanner locks to RegionScanner

2015-09-29 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14934759#comment-14934759
 ] 

Lars Hofhansl commented on HBASE-13082:
---

Feel free [~ram_krish]. I just happened to look at the code again a few days 
ago, we need to fix this.

> Coarsen StoreScanner locks to RegionScanner
> ---
>
> Key: HBASE-13082
> URL: https://issues.apache.org/jira/browse/HBASE-13082
> Project: HBase
>  Issue Type: Bug
>Reporter: Lars Hofhansl
>Assignee: Lars Hofhansl
> Attachments: 13082-test.txt, 13082-v2.txt, 13082-v3.txt, 
> 13082-v4.txt, 13082.txt, 13082.txt, gc.png, gc.png, gc.png, hits.png, 
> next.png, next.png
>
>
> Continuing where HBASE-10015 left of.
> We can avoid locking (and memory fencing) inside StoreScanner by deferring to 
> the lock already held by the RegionScanner.
> In tests this shows quite a scan improvement and reduced CPU (the fences make 
> the cores wait for memory fetches).
> There are some drawbacks too:
> * All calls to RegionScanner need to be remain synchronized
> * Implementors of coprocessors need to be diligent in following the locking 
> contract. For example Phoenix does not lock RegionScanner.nextRaw() and 
> required in the documentation (not picking on Phoenix, this one is my fault 
> as I told them it's OK)
> * possible starving of flushes and compaction with heavy read load. 
> RegionScanner operations would keep getting the locks and the 
> flushes/compactions would not be able finalize the set of files.
> I'll have a patch soon.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HBASE-14509) Configurable sparse indexes?

2015-09-29 Thread Lars Hofhansl (JIRA)
Lars Hofhansl created HBASE-14509:
-

 Summary: Configurable sparse indexes?
 Key: HBASE-14509
 URL: https://issues.apache.org/jira/browse/HBASE-14509
 Project: HBase
  Issue Type: Brainstorming
Reporter: Lars Hofhansl


This idea just popped up today and I wanted to record it for discussion:
What if we kept sparse column indexes per region or HFile or per configurable 
range?

I.e. For any given CQ we record the lowest and highest value for a particular 
range (HFile, Region, or a custom range like the Phoenix guide post).

By tweaking the size of these ranges we can control the size of the index, vs 
its selectivity.

For example if we kept it by HFile we can almost instantly decide whether we 
need scan a particular HFile at all to find a particular value in a Cell.

We can also collect min/max values for each n MB of data, for example when we 
can the region the first time. Assuming ranges are large enough we can always 
keep the index in memory together with the region.

Kind of a sparse local index. Might much easier than the buddy region stuff 
we've been discussing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14468) Compaction improvements: FIFO compaction policy

2015-09-29 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14934776#comment-14934776
 ] 

Lars Hofhansl commented on HBASE-14468:
---

Haven't looked at the patch, but the idea sounds great!
Does that mean we will essentially stay at memstore sized HFiles until they are 
collected?


> Compaction improvements: FIFO compaction policy
> ---
>
> Key: HBASE-14468
> URL: https://issues.apache.org/jira/browse/HBASE-14468
> Project: HBase
>  Issue Type: Improvement
>Reporter: Vladimir Rodionov
>Assignee: Vladimir Rodionov
> Fix For: 2.0.0
>
> Attachments: HBASE-14468-v1.patch, HBASE-14468-v2.patch
>
>
> h2. FIFO Compaction
> h3. Introduction
> FIFO compaction policy selects only files which have all cells expired. The 
> column family MUST have non-default TTL. One of the use cases for this policy 
> is when we need to store raw data which will be post-processed later and 
> discarded completely after quite short period of time. Raw time-series vs. 
> time-based rollup aggregates and compacted time-series. We collect raw 
> time-series and store them into CF with FIFO compaction policy, periodically 
> we run  task which creates rollup aggregates and compacts time-series, the 
> original raw data can be discarded after that.
> See: https://github.com/facebook/rocksdb/wiki/FIFO-compaction-style
> h3. To enable FIFO compaction policy
> For table:
> {code}
> HTableDescriptor desc = new HTableDescriptor(tableName);
> 
> desc.setConfiguration(DefaultStoreEngine.DEFAULT_COMPACTION_POLICY_CLASS_KEY, 
>   FIFOCompactionPolicy.class.getName());
> {code} 
> For CF:
> {code}
> HColumnDescriptor desc = new HColumnDescriptor(family);
> 
> desc.setConfiguration(DefaultStoreEngine.DEFAULT_COMPACTION_POLICY_CLASS_KEY, 
>   FIFOCompactionPolicy.class.getName());
> {code}
> h3. Limitations
> Do not use FIFO compaction if :
> * Table/CF has MIN_VERSION > 0
> * Table/CF has TTL = FOREVER (HColumnDescriptor.DEFAULT_TTL)
> * Table/CF is MOB 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14468) Compaction improvements: FIFO compaction policy

2015-09-29 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14935830#comment-14935830
 ] 

Lars Hofhansl commented on HBASE-14468:
---

Thanks for background. We have some use cases with TTL, but the TTL is measured 
in months or years, wonder if we can combine this with other compactor and/or 
have a policy that under some conditions compacts anyway, even when there are 
unexpired rows.

> Compaction improvements: FIFO compaction policy
> ---
>
> Key: HBASE-14468
> URL: https://issues.apache.org/jira/browse/HBASE-14468
> Project: HBase
>  Issue Type: Improvement
>Reporter: Vladimir Rodionov
>Assignee: Vladimir Rodionov
> Fix For: 2.0.0
>
> Attachments: HBASE-14468-v1.patch, HBASE-14468-v2.patch, 
> HBASE-14468-v3.patch
>
>
> h2. FIFO Compaction
> h3. Introduction
> FIFO compaction policy selects only files which have all cells expired. The 
> column family MUST have non-default TTL. 
> Essentially, FIFO compactor does only one job: collects expired store files. 
> I see many applications for this policy:
> # use it for very high volume raw data which has low TTL and which is the 
> source of another data (after additional processing). Example: Raw 
> time-series vs. time-based rollup aggregates and compacted time-series. We 
> collect raw time-series and store them into CF with FIFO compaction policy, 
> periodically we run  task which creates rollup aggregates and compacts 
> time-series, the original raw data can be discarded after that.
> # use it for data which can be kept entirely in a a block cache (RAM/SSD). 
> Say we have local SSD (1TB) which we can use as a block cache. No need for 
> compaction of a raw data at all.
> Because we do not do any real compaction, we do not use CPU and IO (disk and 
> network), we do not evict hot data from a block cache. The result: improved 
> throughput and latency both write and read.
> See: https://github.com/facebook/rocksdb/wiki/FIFO-compaction-style
> h3. To enable FIFO compaction policy
> For table:
> {code}
> HTableDescriptor desc = new HTableDescriptor(tableName);
> 
> desc.setConfiguration(DefaultStoreEngine.DEFAULT_COMPACTION_POLICY_CLASS_KEY, 
>   FIFOCompactionPolicy.class.getName());
> {code} 
> For CF:
> {code}
> HColumnDescriptor desc = new HColumnDescriptor(family);
> 
> desc.setConfiguration(DefaultStoreEngine.DEFAULT_COMPACTION_POLICY_CLASS_KEY, 
>   FIFOCompactionPolicy.class.getName());
> {code}
> h3. Limitations
> Do not use FIFO compaction if :
> * Table/CF has MIN_VERSION > 0
> * Table/CF has TTL = FOREVER (HColumnDescriptor.DEFAULT_TTL)
> * Table/CF is MOB 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14509) Configurable sparse indexes?

2015-09-29 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14935826#comment-14935826
 ] 

Lars Hofhansl commented on HBASE-14509:
---

That's cool! Thanks [~vrodionov], I'll keep an eye on that one.

> Configurable sparse indexes?
> 
>
> Key: HBASE-14509
> URL: https://issues.apache.org/jira/browse/HBASE-14509
> Project: HBase
>  Issue Type: Brainstorming
>Reporter: Lars Hofhansl
>
> This idea just popped up today and I wanted to record it for discussion:
> What if we kept sparse column indexes per region or HFile or per configurable 
> range?
> I.e. For any given CQ we record the lowest and highest value for a particular 
> range (HFile, Region, or a custom range like the Phoenix guide post).
> By tweaking the size of these ranges we can control the size of the index, vs 
> its selectivity.
> For example if we kept it by HFile we can almost instantly decide whether we 
> need scan a particular HFile at all to find a particular value in a Cell.
> We can also collect min/max values for each n MB of data, for example when we 
> can the region the first time. Assuming ranges are large enough we can always 
> keep the index in memory together with the region.
> Kind of a sparse local index. Might much easier than the buddy region stuff 
> we've been discussing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14509) Configurable sparse indexes?

2015-09-29 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14936182#comment-14936182
 ] 

Lars Hofhansl commented on HBASE-14509:
---

Actually someone pointed out to me that one can use a BF range scans when it 
used with _prefixes_ of keys.

> Configurable sparse indexes?
> 
>
> Key: HBASE-14509
> URL: https://issues.apache.org/jira/browse/HBASE-14509
> Project: HBase
>  Issue Type: Brainstorming
>Reporter: Lars Hofhansl
>
> This idea just popped up today and I wanted to record it for discussion:
> What if we kept sparse column indexes per region or HFile or per configurable 
> range?
> I.e. For any given CQ we record the lowest and highest value for a particular 
> range (HFile, Region, or a custom range like the Phoenix guide post).
> By tweaking the size of these ranges we can control the size of the index, vs 
> its selectivity.
> For example if we kept it by HFile we can almost instantly decide whether we 
> need scan a particular HFile at all to find a particular value in a Cell.
> We can also collect min/max values for each n MB of data, for example when we 
> can the region the first time. Assuming ranges are large enough we can always 
> keep the index in memory together with the region.
> Kind of a sparse local index. Might much easier than the buddy region stuff 
> we've been discussing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14509) Configurable sparse indexes?

2015-09-29 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14935539#comment-14935539
 ] 

Lars Hofhansl commented on HBASE-14509:
---

Bloom filters are good for point lookups, not for range scans. But BFs on 
column values is a great idea!
For scans with filters we need to be able to rule out large chunks of the data.

Schema in HBase... You're talking heresy :)

> Configurable sparse indexes?
> 
>
> Key: HBASE-14509
> URL: https://issues.apache.org/jira/browse/HBASE-14509
> Project: HBase
>  Issue Type: Brainstorming
>Reporter: Lars Hofhansl
>
> This idea just popped up today and I wanted to record it for discussion:
> What if we kept sparse column indexes per region or HFile or per configurable 
> range?
> I.e. For any given CQ we record the lowest and highest value for a particular 
> range (HFile, Region, or a custom range like the Phoenix guide post).
> By tweaking the size of these ranges we can control the size of the index, vs 
> its selectivity.
> For example if we kept it by HFile we can almost instantly decide whether we 
> need scan a particular HFile at all to find a particular value in a Cell.
> We can also collect min/max values for each n MB of data, for example when we 
> can the region the first time. Assuming ranges are large enough we can always 
> keep the index in memory together with the region.
> Kind of a sparse local index. Might much easier than the buddy region stuff 
> we've been discussing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14489) postScannerFilterRow consumes a lot of CPU

2015-09-26 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14909118#comment-14909118
 ] 

Lars Hofhansl commented on HBASE-14489:
---

Great. Thanks. I'll commit some time tomorrow.

> postScannerFilterRow consumes a lot of CPU
> --
>
> Key: HBASE-14489
> URL: https://issues.apache.org/jira/browse/HBASE-14489
> Project: HBase
>  Issue Type: Bug
>Reporter: Lars Hofhansl
>Assignee: Lars Hofhansl
>  Labels: performance
> Fix For: 2.0.0, 1.2.0, 1.3.0, 0.98.15, 1.0.3, 1.1.3
>
> Attachments: 14489-0.98.txt
>
>
> During an unrelated test I found that when scanning a tall table with CQ only 
> and filtering most results at the server, 50%(!) of time is spend in 
> postScannerFilterRow, even though the coprocessor does nothing in that hook.
> We need to find a way not to call this hook when not needed, or to question 
> why we have this hook at all.
> I think [~ram_krish] added the hook (or maybe [~anoop.hbase]). I am also not 
> sure whether Phoenix uses this hook ([~giacomotaylor]?)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


<    5   6   7   8   9   10   11   12   13   14   >