[jira] [Updated] (HBASE-14708) Use copy on write Map for region location cache
[ https://issues.apache.org/jira/browse/HBASE-14708?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hiroshi Ikeda updated HBASE-14708: -- Attachment: anotherbench3.zip I created a revised hybrid implementation and attached it with its benchmark. I hope there is no race condition and it will clear my bad. I changed the benchmark code to use System.nanoTime() instead of System.currentTimeMillis() because the resolution of the latter seems 15 msec in my environment. I measured the benchmark with 10k init entries. As to adding/removing elements, still the hybrid implementation has 10% overhead compared to ConcurrentSkipListMap, but copy-on-write array implementation is 30 times slower than the hybrid implementation. As to reading elements, the hybrid implementation seems almost always a bit faster than the cony-on-write array implementation. I think that is because the hybrid implementation doesn't create an entry object for search. FYI > Use copy on write Map for region location cache > --- > > Key: HBASE-14708 > URL: https://issues.apache.org/jira/browse/HBASE-14708 > Project: HBase > Issue Type: Improvement > Components: Client >Affects Versions: 1.1.2 >Reporter: Elliott Clark >Assignee: Elliott Clark >Priority: Critical > Fix For: 2.0.0, 1.2.0, 1.3.0 > > Attachments: HBASE-14708-v10.patch, HBASE-14708-v11.patch, > HBASE-14708-v12.patch, HBASE-14708-v13.patch, HBASE-14708-v15.patch, > HBASE-14708-v16.patch, HBASE-14708-v17.patch, HBASE-14708-v2.patch, > HBASE-14708-v3.patch, HBASE-14708-v4.patch, HBASE-14708-v5.patch, > HBASE-14708-v6.patch, HBASE-14708-v7.patch, HBASE-14708-v8.patch, > HBASE-14708-v9.patch, HBASE-14708.patch, anotherbench.zip, anotherbench2.zip, > anotherbench3.zip, location_cache_times.pdf, result.csv > > > Internally a co-worker profiled their application that was talking to HBase. > > 60% of the time was spent in locating a region. This was while the cluster > was stable and no regions were moving. > To figure out if there was a faster way to cache region location I wrote up a > benchmark here: https://github.com/elliottneilclark/benchmark-hbase-cache > This tries to simulate a heavy load on the location cache. > * 24 different threads. > * 2 Deleting location data > * 2 Adding location data > * Using floor to get the result. > To repeat my work just run ./run.sh and it should produce a result.csv > Results: > ConcurrentSkiplistMap is a good middle ground. It's got equal speed for > reading and writing. > However most operations will not need to remove or add a region location. > There will be potentially several orders of magnitude more reads for cached > locations than there will be on clearing the cache. > So I propose a copy on write tree map. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-14708) Use copy on write Map for region location cache
[ https://issues.apache.org/jira/browse/HBASE-14708?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Elliott Clark updated HBASE-14708: -- Resolution: Fixed Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) > Use copy on write Map for region location cache > --- > > Key: HBASE-14708 > URL: https://issues.apache.org/jira/browse/HBASE-14708 > Project: HBase > Issue Type: Improvement > Components: Client >Affects Versions: 1.1.2 >Reporter: Elliott Clark >Assignee: Elliott Clark >Priority: Critical > Fix For: 2.0.0, 1.2.0, 1.3.0 > > Attachments: HBASE-14708-v10.patch, HBASE-14708-v11.patch, > HBASE-14708-v12.patch, HBASE-14708-v13.patch, HBASE-14708-v15.patch, > HBASE-14708-v16.patch, HBASE-14708-v17.patch, HBASE-14708-v2.patch, > HBASE-14708-v3.patch, HBASE-14708-v4.patch, HBASE-14708-v5.patch, > HBASE-14708-v6.patch, HBASE-14708-v7.patch, HBASE-14708-v8.patch, > HBASE-14708-v9.patch, HBASE-14708.patch, anotherbench.zip, anotherbench2.zip, > location_cache_times.pdf, result.csv > > > Internally a co-worker profiled their application that was talking to HBase. > > 60% of the time was spent in locating a region. This was while the cluster > was stable and no regions were moving. > To figure out if there was a faster way to cache region location I wrote up a > benchmark here: https://github.com/elliottneilclark/benchmark-hbase-cache > This tries to simulate a heavy load on the location cache. > * 24 different threads. > * 2 Deleting location data > * 2 Adding location data > * Using floor to get the result. > To repeat my work just run ./run.sh and it should produce a result.csv > Results: > ConcurrentSkiplistMap is a good middle ground. It's got equal speed for > reading and writing. > However most operations will not need to remove or add a region location. > There will be potentially several orders of magnitude more reads for cached > locations than there will be on clearing the cache. > So I propose a copy on write tree map. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-14708) Use copy on write Map for region location cache
[ https://issues.apache.org/jira/browse/HBASE-14708?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Elliott Clark updated HBASE-14708: -- Attachment: HBASE-14708-v15.patch Removed TreeMap since it seems to be the one causing concern. > Use copy on write Map for region location cache > --- > > Key: HBASE-14708 > URL: https://issues.apache.org/jira/browse/HBASE-14708 > Project: HBase > Issue Type: Improvement > Components: Client >Affects Versions: 1.1.2 >Reporter: Elliott Clark >Assignee: Elliott Clark >Priority: Critical > Fix For: 2.0.0, 1.2.0, 1.3.0 > > Attachments: HBASE-14708-v10.patch, HBASE-14708-v11.patch, > HBASE-14708-v12.patch, HBASE-14708-v13.patch, HBASE-14708-v15.patch, > HBASE-14708-v2.patch, HBASE-14708-v3.patch, HBASE-14708-v4.patch, > HBASE-14708-v5.patch, HBASE-14708-v6.patch, HBASE-14708-v7.patch, > HBASE-14708-v8.patch, HBASE-14708-v9.patch, HBASE-14708.patch, > anotherbench.zip, location_cache_times.pdf, result.csv > > > Internally a co-worker profiled their application that was talking to HBase. > > 60% of the time was spent in locating a region. This was while the cluster > was stable and no regions were moving. > To figure out if there was a faster way to cache region location I wrote up a > benchmark here: https://github.com/elliottneilclark/benchmark-hbase-cache > This tries to simulate a heavy load on the location cache. > * 24 different threads. > * 2 Deleting location data > * 2 Adding location data > * Using floor to get the result. > To repeat my work just run ./run.sh and it should produce a result.csv > Results: > ConcurrentSkiplistMap is a good middle ground. It's got equal speed for > reading and writing. > However most operations will not need to remove or add a region location. > There will be potentially several orders of magnitude more reads for cached > locations than there will be on clearing the cache. > So I propose a copy on write tree map. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-14708) Use copy on write Map for region location cache
[ https://issues.apache.org/jira/browse/HBASE-14708?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Elliott Clark updated HBASE-14708: -- Attachment: HBASE-14708-v16.patch > Use copy on write Map for region location cache > --- > > Key: HBASE-14708 > URL: https://issues.apache.org/jira/browse/HBASE-14708 > Project: HBase > Issue Type: Improvement > Components: Client >Affects Versions: 1.1.2 >Reporter: Elliott Clark >Assignee: Elliott Clark >Priority: Critical > Fix For: 2.0.0, 1.2.0, 1.3.0 > > Attachments: HBASE-14708-v10.patch, HBASE-14708-v11.patch, > HBASE-14708-v12.patch, HBASE-14708-v13.patch, HBASE-14708-v15.patch, > HBASE-14708-v16.patch, HBASE-14708-v2.patch, HBASE-14708-v3.patch, > HBASE-14708-v4.patch, HBASE-14708-v5.patch, HBASE-14708-v6.patch, > HBASE-14708-v7.patch, HBASE-14708-v8.patch, HBASE-14708-v9.patch, > HBASE-14708.patch, anotherbench.zip, location_cache_times.pdf, result.csv > > > Internally a co-worker profiled their application that was talking to HBase. > > 60% of the time was spent in locating a region. This was while the cluster > was stable and no regions were moving. > To figure out if there was a faster way to cache region location I wrote up a > benchmark here: https://github.com/elliottneilclark/benchmark-hbase-cache > This tries to simulate a heavy load on the location cache. > * 24 different threads. > * 2 Deleting location data > * 2 Adding location data > * Using floor to get the result. > To repeat my work just run ./run.sh and it should produce a result.csv > Results: > ConcurrentSkiplistMap is a good middle ground. It's got equal speed for > reading and writing. > However most operations will not need to remove or add a region location. > There will be potentially several orders of magnitude more reads for cached > locations than there will be on clearing the cache. > So I propose a copy on write tree map. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-14708) Use copy on write Map for region location cache
[ https://issues.apache.org/jira/browse/HBASE-14708?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Elliott Clark updated HBASE-14708: -- Attachment: HBASE-14708-v17.patch Remove un-used tailmapable since there is no COWTreemap. > Use copy on write Map for region location cache > --- > > Key: HBASE-14708 > URL: https://issues.apache.org/jira/browse/HBASE-14708 > Project: HBase > Issue Type: Improvement > Components: Client >Affects Versions: 1.1.2 >Reporter: Elliott Clark >Assignee: Elliott Clark >Priority: Critical > Fix For: 2.0.0, 1.2.0, 1.3.0 > > Attachments: HBASE-14708-v10.patch, HBASE-14708-v11.patch, > HBASE-14708-v12.patch, HBASE-14708-v13.patch, HBASE-14708-v15.patch, > HBASE-14708-v16.patch, HBASE-14708-v17.patch, HBASE-14708-v2.patch, > HBASE-14708-v3.patch, HBASE-14708-v4.patch, HBASE-14708-v5.patch, > HBASE-14708-v6.patch, HBASE-14708-v7.patch, HBASE-14708-v8.patch, > HBASE-14708-v9.patch, HBASE-14708.patch, anotherbench.zip, > location_cache_times.pdf, result.csv > > > Internally a co-worker profiled their application that was talking to HBase. > > 60% of the time was spent in locating a region. This was while the cluster > was stable and no regions were moving. > To figure out if there was a faster way to cache region location I wrote up a > benchmark here: https://github.com/elliottneilclark/benchmark-hbase-cache > This tries to simulate a heavy load on the location cache. > * 24 different threads. > * 2 Deleting location data > * 2 Adding location data > * Using floor to get the result. > To repeat my work just run ./run.sh and it should produce a result.csv > Results: > ConcurrentSkiplistMap is a good middle ground. It's got equal speed for > reading and writing. > However most operations will not need to remove or add a region location. > There will be potentially several orders of magnitude more reads for cached > locations than there will be on clearing the cache. > So I propose a copy on write tree map. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-14708) Use copy on write Map for region location cache
[ https://issues.apache.org/jira/browse/HBASE-14708?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hiroshi Ikeda updated HBASE-14708: -- Attachment: anotherbench2.zip I added a hybrid implementation to my previous benchmark code and its output FYI. That logic is not so trivial and I'm not sure how to manage for ConcurrentNavigableMap. > Use copy on write Map for region location cache > --- > > Key: HBASE-14708 > URL: https://issues.apache.org/jira/browse/HBASE-14708 > Project: HBase > Issue Type: Improvement > Components: Client >Affects Versions: 1.1.2 >Reporter: Elliott Clark >Assignee: Elliott Clark >Priority: Critical > Fix For: 2.0.0, 1.2.0, 1.3.0 > > Attachments: HBASE-14708-v10.patch, HBASE-14708-v11.patch, > HBASE-14708-v12.patch, HBASE-14708-v13.patch, HBASE-14708-v15.patch, > HBASE-14708-v16.patch, HBASE-14708-v17.patch, HBASE-14708-v2.patch, > HBASE-14708-v3.patch, HBASE-14708-v4.patch, HBASE-14708-v5.patch, > HBASE-14708-v6.patch, HBASE-14708-v7.patch, HBASE-14708-v8.patch, > HBASE-14708-v9.patch, HBASE-14708.patch, anotherbench.zip, anotherbench2.zip, > location_cache_times.pdf, result.csv > > > Internally a co-worker profiled their application that was talking to HBase. > > 60% of the time was spent in locating a region. This was while the cluster > was stable and no regions were moving. > To figure out if there was a faster way to cache region location I wrote up a > benchmark here: https://github.com/elliottneilclark/benchmark-hbase-cache > This tries to simulate a heavy load on the location cache. > * 24 different threads. > * 2 Deleting location data > * 2 Adding location data > * Using floor to get the result. > To repeat my work just run ./run.sh and it should produce a result.csv > Results: > ConcurrentSkiplistMap is a good middle ground. It's got equal speed for > reading and writing. > However most operations will not need to remove or add a region location. > There will be potentially several orders of magnitude more reads for cached > locations than there will be on clearing the cache. > So I propose a copy on write tree map. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-14708) Use copy on write Map for region location cache
[ https://issues.apache.org/jira/browse/HBASE-14708?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Elliott Clark updated HBASE-14708: -- Attachment: HBASE-14708-v13.patch > Use copy on write Map for region location cache > --- > > Key: HBASE-14708 > URL: https://issues.apache.org/jira/browse/HBASE-14708 > Project: HBase > Issue Type: Improvement > Components: Client >Affects Versions: 1.1.2 >Reporter: Elliott Clark >Assignee: Elliott Clark >Priority: Critical > Fix For: 2.0.0, 1.2.0, 1.3.0 > > Attachments: HBASE-14708-v10.patch, HBASE-14708-v11.patch, > HBASE-14708-v12.patch, HBASE-14708-v13.patch, HBASE-14708-v2.patch, > HBASE-14708-v3.patch, HBASE-14708-v4.patch, HBASE-14708-v5.patch, > HBASE-14708-v6.patch, HBASE-14708-v7.patch, HBASE-14708-v8.patch, > HBASE-14708-v9.patch, HBASE-14708.patch, anotherbench.zip, > location_cache_times.pdf, result.csv > > > Internally a co-worker profiled their application that was talking to HBase. > > 60% of the time was spent in locating a region. This was while the cluster > was stable and no regions were moving. > To figure out if there was a faster way to cache region location I wrote up a > benchmark here: https://github.com/elliottneilclark/benchmark-hbase-cache > This tries to simulate a heavy load on the location cache. > * 24 different threads. > * 2 Deleting location data > * 2 Adding location data > * Using floor to get the result. > To repeat my work just run ./run.sh and it should produce a result.csv > Results: > ConcurrentSkiplistMap is a good middle ground. It's got equal speed for > reading and writing. > However most operations will not need to remove or add a region location. > There will be potentially several orders of magnitude more reads for cached > locations than there will be on clearing the cache. > So I propose a copy on write tree map. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-14708) Use copy on write Map for region location cache
[ https://issues.apache.org/jira/browse/HBASE-14708?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Elliott Clark updated HBASE-14708: -- Summary: Use copy on write Map for region location cache (was: Use copy on write TreeMap for region location cache) > Use copy on write Map for region location cache > --- > > Key: HBASE-14708 > URL: https://issues.apache.org/jira/browse/HBASE-14708 > Project: HBase > Issue Type: Improvement > Components: Client >Affects Versions: 1.1.2 >Reporter: Elliott Clark >Assignee: Elliott Clark >Priority: Critical > Fix For: 2.0.0, 1.2.0, 1.3.0 > > Attachments: HBASE-14708-v10.patch, HBASE-14708-v11.patch, > HBASE-14708-v12.patch, HBASE-14708-v2.patch, HBASE-14708-v3.patch, > HBASE-14708-v4.patch, HBASE-14708-v5.patch, HBASE-14708-v6.patch, > HBASE-14708-v7.patch, HBASE-14708-v8.patch, HBASE-14708-v9.patch, > HBASE-14708.patch, anotherbench.zip, location_cache_times.pdf, result.csv > > > Internally a co-worker profiled their application that was talking to HBase. > > 60% of the time was spent in locating a region. This was while the cluster > was stable and no regions were moving. > To figure out if there was a faster way to cache region location I wrote up a > benchmark here: https://github.com/elliottneilclark/benchmark-hbase-cache > This tries to simulate a heavy load on the location cache. > * 24 different threads. > * 2 Deleting location data > * 2 Adding location data > * Using floor to get the result. > To repeat my work just run ./run.sh and it should produce a result.csv > Results: > ConcurrentSkiplistMap is a good middle ground. It's got equal speed for > reading and writing. > However most operations will not need to remove or add a region location. > There will be potentially several orders of magnitude more reads for cached > locations than there will be on clearing the cache. > So I propose a copy on write tree map. -- This message was sent by Atlassian JIRA (v6.3.4#6332)