[jira] [Updated] (HBASE-15716) HRegion#RegionScannerImpl scannerReadPoints synchronization costs
[ https://issues.apache.org/jira/browse/HBASE-15716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-15716: -- Attachment: 15716.prune.synchronizations.v4.patch > HRegion#RegionScannerImpl scannerReadPoints synchronization costs > - > > Key: HBASE-15716 > URL: https://issues.apache.org/jira/browse/HBASE-15716 > Project: HBase > Issue Type: Bug > Components: Performance >Reporter: stack >Assignee: stack > Attachments: 15716.prune.synchronizations.patch, > 15716.prune.synchronizations.v3.patch, 15716.prune.synchronizations.v4.patch, > 15716.prune.synchronizations.v4.patch, Screen Shot 2016-04-26 at 2.05.45 > PM.png, Screen Shot 2016-04-26 at 2.06.14 PM.png, Screen Shot 2016-04-26 at > 2.07.06 PM.png, Screen Shot 2016-04-26 at 2.25.26 PM.png, Screen Shot > 2016-04-26 at 6.02.29 PM.png, Screen Shot 2016-04-27 at 9.49.35 AM.png, > current-branch-1.vs.NoSynchronization.vs.Patch.png, hits.png, > remove_cslm.patch > > > Here is a [~lhofhansl] special. > When we construct the region scanner, we get our read point and then store it > with the scanner instance in a Region scoped CSLM. This is done under a > synchronize on the CSLM. > This synchronize on a region-scoped Map creating region scanners is the > outstanding point of lock contention according to flight recorder (My work > load is workload c, random reads). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-15716) HRegion#RegionScannerImpl scannerReadPoints synchronization costs
[ https://issues.apache.org/jira/browse/HBASE-15716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-15716: -- Assignee: stack Status: Patch Available (was: Open) > HRegion#RegionScannerImpl scannerReadPoints synchronization costs > - > > Key: HBASE-15716 > URL: https://issues.apache.org/jira/browse/HBASE-15716 > Project: HBase > Issue Type: Bug > Components: Performance >Reporter: stack >Assignee: stack > Attachments: 15716.prune.synchronizations.patch, > 15716.prune.synchronizations.v3.patch, 15716.prune.synchronizations.v4.patch, > Screen Shot 2016-04-26 at 2.05.45 PM.png, Screen Shot 2016-04-26 at 2.06.14 > PM.png, Screen Shot 2016-04-26 at 2.07.06 PM.png, Screen Shot 2016-04-26 at > 2.25.26 PM.png, Screen Shot 2016-04-26 at 6.02.29 PM.png, Screen Shot > 2016-04-27 at 9.49.35 AM.png, > current-branch-1.vs.NoSynchronization.vs.Patch.png, hits.png, > remove_cslm.patch > > > Here is a [~lhofhansl] special. > When we construct the region scanner, we get our read point and then store it > with the scanner instance in a Region scoped CSLM. This is done under a > synchronize on the CSLM. > This synchronize on a region-scoped Map creating region scanners is the > outstanding point of lock contention according to flight recorder (My work > load is workload c, random reads). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-15716) HRegion#RegionScannerImpl scannerReadPoints synchronization costs
[ https://issues.apache.org/jira/browse/HBASE-15716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-15716: -- Attachment: current-branch-1.vs.NoSynchronization.vs.Patch.png Here are some runs that compare current branch-1, all synchronization removed, and then the v4 patch. I do not see the 30% quoted above but more like 10%. Then when the server is overrun, we seem to be able to do more work... 15%? > HRegion#RegionScannerImpl scannerReadPoints synchronization costs > - > > Key: HBASE-15716 > URL: https://issues.apache.org/jira/browse/HBASE-15716 > Project: HBase > Issue Type: Bug > Components: Performance >Reporter: stack > Attachments: 15716.prune.synchronizations.patch, > 15716.prune.synchronizations.v3.patch, 15716.prune.synchronizations.v4.patch, > Screen Shot 2016-04-26 at 2.05.45 PM.png, Screen Shot 2016-04-26 at 2.06.14 > PM.png, Screen Shot 2016-04-26 at 2.07.06 PM.png, Screen Shot 2016-04-26 at > 2.25.26 PM.png, Screen Shot 2016-04-26 at 6.02.29 PM.png, Screen Shot > 2016-04-27 at 9.49.35 AM.png, > current-branch-1.vs.NoSynchronization.vs.Patch.png, hits.png, > remove_cslm.patch > > > Here is a [~lhofhansl] special. > When we construct the region scanner, we get our read point and then store it > with the scanner instance in a Region scoped CSLM. This is done under a > synchronize on the CSLM. > This synchronize on a region-scoped Map creating region scanners is the > outstanding point of lock contention according to flight recorder (My work > load is workload c, random reads). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-15716) HRegion#RegionScannerImpl scannerReadPoints synchronization costs
[ https://issues.apache.org/jira/browse/HBASE-15716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-15716: -- Attachment: 15716.prune.synchronizations.v4.patch > HRegion#RegionScannerImpl scannerReadPoints synchronization costs > - > > Key: HBASE-15716 > URL: https://issues.apache.org/jira/browse/HBASE-15716 > Project: HBase > Issue Type: Bug > Components: Performance >Reporter: stack > Attachments: 15716.prune.synchronizations.patch, > 15716.prune.synchronizations.v3.patch, 15716.prune.synchronizations.v4.patch, > Screen Shot 2016-04-26 at 2.05.45 PM.png, Screen Shot 2016-04-26 at 2.06.14 > PM.png, Screen Shot 2016-04-26 at 2.07.06 PM.png, Screen Shot 2016-04-26 at > 2.25.26 PM.png, Screen Shot 2016-04-26 at 6.02.29 PM.png, Screen Shot > 2016-04-27 at 9.49.35 AM.png, > current-branch-1.vs.NoSynchronization.vs.Patch.png, hits.png, > remove_cslm.patch > > > Here is a [~lhofhansl] special. > When we construct the region scanner, we get our read point and then store it > with the scanner instance in a Region scoped CSLM. This is done under a > synchronize on the CSLM. > This synchronize on a region-scoped Map creating region scanners is the > outstanding point of lock contention according to flight recorder (My work > load is workload c, random reads). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-15716) HRegion#RegionScannerImpl scannerReadPoints synchronization costs
[ https://issues.apache.org/jira/browse/HBASE-15716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-15716: -- Attachment: Screen Shot 2016-04-27 at 9.49.35 AM.png hits.png 15716.prune.synchronizations.v3.patch This patch plugs the 'hole' identified in the above scenario (The one where we get the mvcc readpoint at p1 in the scanner creation but before we can add ourselves to the region scannerReadPoints map, the readpoint moves forward to p2; then a call to getSmallestReadpoint comes in, and Cells between p2 and p1 are purged corrupting our scan 'view') We plug the hole by doing a check and put and not progressing with the scanner creation until we are sure that what is registered in scannerReadPoints is the current readpoint. If it is not, we go around until what is in scannerReadPoints matches the current state of the mvcc read point. We are doing two reads of an atomic long (mvcc#getReadPoint) for synchronization across the atomic long read and update of the scannerReadPoints.put Map. The difference in the throughput is pretty dramatic: 220k ops/second vs 290k ops/second (30%?). See attached hits png. I also include the fr recording which shows lock incidence is gone. Let me check my work by doing a few more runs. [~lhofhansl] what you think of the latest patch? Can you find a hole in it? > HRegion#RegionScannerImpl scannerReadPoints synchronization costs > - > > Key: HBASE-15716 > URL: https://issues.apache.org/jira/browse/HBASE-15716 > Project: HBase > Issue Type: Bug > Components: Performance >Reporter: stack > Attachments: 15716.prune.synchronizations.patch, > 15716.prune.synchronizations.v3.patch, Screen Shot 2016-04-26 at 2.05.45 > PM.png, Screen Shot 2016-04-26 at 2.06.14 PM.png, Screen Shot 2016-04-26 at > 2.07.06 PM.png, Screen Shot 2016-04-26 at 2.25.26 PM.png, Screen Shot > 2016-04-26 at 6.02.29 PM.png, Screen Shot 2016-04-27 at 9.49.35 AM.png, > hits.png, remove_cslm.patch > > > Here is a [~lhofhansl] special. > When we construct the region scanner, we get our read point and then store it > with the scanner instance in a Region scoped CSLM. This is done under a > synchronize on the CSLM. > This synchronize on a region-scoped Map creating region scanners is the > outstanding point of lock contention according to flight recorder (My work > load is workload c, random reads). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-15716) HRegion#RegionScannerImpl scannerReadPoints synchronization costs
[ https://issues.apache.org/jira/browse/HBASE-15716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-15716: -- Attachment: Screen Shot 2016-04-26 at 6.02.29 PM.png 15716.prune.synchronizations.patch This patch works. Could make it dumber still by using a Set and just keeping the read point Longs. Thanks [~lhofhansl] Also, I think the bit where we do the cleanup of memstore versions is going to be subsumed by the segment stuff [~eshcar] is up to. > HRegion#RegionScannerImpl scannerReadPoints synchronization costs > - > > Key: HBASE-15716 > URL: https://issues.apache.org/jira/browse/HBASE-15716 > Project: HBase > Issue Type: Bug > Components: Performance >Reporter: stack > Attachments: 15716.prune.synchronizations.patch, Screen Shot > 2016-04-26 at 2.05.45 PM.png, Screen Shot 2016-04-26 at 2.06.14 PM.png, > Screen Shot 2016-04-26 at 2.07.06 PM.png, Screen Shot 2016-04-26 at 2.25.26 > PM.png, Screen Shot 2016-04-26 at 6.02.29 PM.png, remove_cslm.patch > > > Here is a [~lhofhansl] special. > When we construct the region scanner, we get our read point and then store it > with the scanner instance in a Region scoped CSLM. This is done under a > synchronize on the CSLM. > This synchronize on a region-scoped Map creating region scanners is the > outstanding point of lock contention according to flight recorder (My work > load is workload c, random reads). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-15716) HRegion#RegionScannerImpl scannerReadPoints synchronization costs
[ https://issues.apache.org/jira/browse/HBASE-15716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-15716: -- Attachment: Screen Shot 2016-04-26 at 2.25.26 PM.png remove_cslm.patch This patch removes the CSLM in favor of a HM since we don't need CSLM especially given we are synchronizing currently... only, it doesn't help. FR still notes the synchronizes (and throughput is same as w/ CSLM..) So, [~lhofhansl]... do we need to synchronize in here? We lock because we want the smallest read point... the oldest outstanding scanner. Does the lock help here? The lock will ensure we see newer scanners but we don't care about newer scanners just the one w/ the oldest read point. Can we just remove this synchronization and just pivot on the content of the scannerReadPoints CSLM? Let me put up a patch. > HRegion#RegionScannerImpl scannerReadPoints synchronization costs > - > > Key: HBASE-15716 > URL: https://issues.apache.org/jira/browse/HBASE-15716 > Project: HBase > Issue Type: Bug > Components: Performance >Reporter: stack > Attachments: Screen Shot 2016-04-26 at 2.05.45 PM.png, Screen Shot > 2016-04-26 at 2.06.14 PM.png, Screen Shot 2016-04-26 at 2.07.06 PM.png, > Screen Shot 2016-04-26 at 2.25.26 PM.png, remove_cslm.patch > > > Here is a [~lhofhansl] special. > When we construct the region scanner, we get our read point and then store it > with the scanner instance in a Region scoped CSLM. This is done under a > synchronize on the CSLM. > This synchronize on a region-scoped Map creating region scanners is the > outstanding point of lock contention according to flight recorder (My work > load is workload c, random reads). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-15716) HRegion#RegionScannerImpl scannerReadPoints synchronization costs
[ https://issues.apache.org/jira/browse/HBASE-15716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-15716: -- Attachment: Screen Shot 2016-04-26 at 2.07.06 PM.png Screen Shot 2016-04-26 at 2.05.45 PM.png Screen Shot 2016-04-26 at 2.06.14 PM.png Here are flight recordings of before and after. The before is current state of branch-1. The workload is ycsb c -- pure random read -- and the hit rate is about 220k/second. The after is my eliding the synchronization. See how our lock instances drops radically from 200k per thread per minute of my sample down to nothing (miscelllenous locking allocating BBs out of BucketCache). The speed up seen is not that much but nothing to sneer at... from 220k to 240k. > HRegion#RegionScannerImpl scannerReadPoints synchronization costs > - > > Key: HBASE-15716 > URL: https://issues.apache.org/jira/browse/HBASE-15716 > Project: HBase > Issue Type: Bug > Components: Performance >Reporter: stack > Attachments: Screen Shot 2016-04-26 at 2.05.45 PM.png, Screen Shot > 2016-04-26 at 2.06.14 PM.png, Screen Shot 2016-04-26 at 2.07.06 PM.png > > > Here is a [~lhofhansl] special. > When we construct the region scanner, we get our read point and then store it > with the scanner instance in a Region scoped CSLM. This is done under a > synchronize on the CSLM. > This synchronize on a region-scoped Map creating region scanners is the > outstanding point of lock contention according to flight recorder (My work > load is workload c, random reads). -- This message was sent by Atlassian JIRA (v6.3.4#6332)