[jira] [Updated] (HBASE-14708) Use copy on write Map for region location cache

2015-11-19 Thread Hiroshi Ikeda (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14708?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hiroshi Ikeda updated HBASE-14708:
--
Attachment: anotherbench3.zip

I created a revised hybrid implementation and attached it with its benchmark. I 
hope there is no race condition and it will clear my bad.

I changed the benchmark code to use System.nanoTime() instead of 
System.currentTimeMillis() because the resolution of the latter seems 15 msec 
in my environment.

I measured the benchmark with 10k init entries. As to adding/removing elements, 
still the hybrid implementation has 10% overhead compared to 
ConcurrentSkipListMap, but copy-on-write array implementation is 30 times 
slower than the hybrid implementation. As to reading elements, the hybrid 
implementation seems almost always a bit faster than the cony-on-write array 
implementation. I think that is because the hybrid implementation doesn't 
create an entry object for search.

FYI

> Use copy on write Map for region location cache
> ---
>
> Key: HBASE-14708
> URL: https://issues.apache.org/jira/browse/HBASE-14708
> Project: HBase
>  Issue Type: Improvement
>  Components: Client
>Affects Versions: 1.1.2
>Reporter: Elliott Clark
>Assignee: Elliott Clark
>Priority: Critical
> Fix For: 2.0.0, 1.2.0, 1.3.0
>
> Attachments: HBASE-14708-v10.patch, HBASE-14708-v11.patch, 
> HBASE-14708-v12.patch, HBASE-14708-v13.patch, HBASE-14708-v15.patch, 
> HBASE-14708-v16.patch, HBASE-14708-v17.patch, HBASE-14708-v2.patch, 
> HBASE-14708-v3.patch, HBASE-14708-v4.patch, HBASE-14708-v5.patch, 
> HBASE-14708-v6.patch, HBASE-14708-v7.patch, HBASE-14708-v8.patch, 
> HBASE-14708-v9.patch, HBASE-14708.patch, anotherbench.zip, anotherbench2.zip, 
> anotherbench3.zip, location_cache_times.pdf, result.csv
>
>
> Internally a co-worker profiled their application that was talking to HBase. 
> > 60% of the time was spent in locating a region. This was while the cluster 
> was stable and no regions were moving.
> To figure out if there was a faster way to cache region location I wrote up a 
> benchmark here: https://github.com/elliottneilclark/benchmark-hbase-cache
> This tries to simulate a heavy load on the location cache. 
> * 24 different threads.
> * 2 Deleting location data
> * 2 Adding location data
> * Using floor to get the result.
> To repeat my work just run ./run.sh and it should produce a result.csv
> Results:
> ConcurrentSkiplistMap is a good middle ground. It's got equal speed for 
> reading and writing.
> However most operations will not need to remove or add a region location. 
> There will be potentially several orders of magnitude more reads for cached 
> locations than there will be on clearing the cache.
> So I propose a copy on write tree map.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-14708) Use copy on write Map for region location cache

2015-11-17 Thread Elliott Clark (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14708?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Elliott Clark updated HBASE-14708:
--
  Resolution: Fixed
Hadoop Flags: Reviewed
  Status: Resolved  (was: Patch Available)

> Use copy on write Map for region location cache
> ---
>
> Key: HBASE-14708
> URL: https://issues.apache.org/jira/browse/HBASE-14708
> Project: HBase
>  Issue Type: Improvement
>  Components: Client
>Affects Versions: 1.1.2
>Reporter: Elliott Clark
>Assignee: Elliott Clark
>Priority: Critical
> Fix For: 2.0.0, 1.2.0, 1.3.0
>
> Attachments: HBASE-14708-v10.patch, HBASE-14708-v11.patch, 
> HBASE-14708-v12.patch, HBASE-14708-v13.patch, HBASE-14708-v15.patch, 
> HBASE-14708-v16.patch, HBASE-14708-v17.patch, HBASE-14708-v2.patch, 
> HBASE-14708-v3.patch, HBASE-14708-v4.patch, HBASE-14708-v5.patch, 
> HBASE-14708-v6.patch, HBASE-14708-v7.patch, HBASE-14708-v8.patch, 
> HBASE-14708-v9.patch, HBASE-14708.patch, anotherbench.zip, anotherbench2.zip, 
> location_cache_times.pdf, result.csv
>
>
> Internally a co-worker profiled their application that was talking to HBase. 
> > 60% of the time was spent in locating a region. This was while the cluster 
> was stable and no regions were moving.
> To figure out if there was a faster way to cache region location I wrote up a 
> benchmark here: https://github.com/elliottneilclark/benchmark-hbase-cache
> This tries to simulate a heavy load on the location cache. 
> * 24 different threads.
> * 2 Deleting location data
> * 2 Adding location data
> * Using floor to get the result.
> To repeat my work just run ./run.sh and it should produce a result.csv
> Results:
> ConcurrentSkiplistMap is a good middle ground. It's got equal speed for 
> reading and writing.
> However most operations will not need to remove or add a region location. 
> There will be potentially several orders of magnitude more reads for cached 
> locations than there will be on clearing the cache.
> So I propose a copy on write tree map.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-14708) Use copy on write Map for region location cache

2015-11-16 Thread Elliott Clark (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14708?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Elliott Clark updated HBASE-14708:
--
Attachment: HBASE-14708-v15.patch

Removed TreeMap since it seems to be the one causing concern.

> Use copy on write Map for region location cache
> ---
>
> Key: HBASE-14708
> URL: https://issues.apache.org/jira/browse/HBASE-14708
> Project: HBase
>  Issue Type: Improvement
>  Components: Client
>Affects Versions: 1.1.2
>Reporter: Elliott Clark
>Assignee: Elliott Clark
>Priority: Critical
> Fix For: 2.0.0, 1.2.0, 1.3.0
>
> Attachments: HBASE-14708-v10.patch, HBASE-14708-v11.patch, 
> HBASE-14708-v12.patch, HBASE-14708-v13.patch, HBASE-14708-v15.patch, 
> HBASE-14708-v2.patch, HBASE-14708-v3.patch, HBASE-14708-v4.patch, 
> HBASE-14708-v5.patch, HBASE-14708-v6.patch, HBASE-14708-v7.patch, 
> HBASE-14708-v8.patch, HBASE-14708-v9.patch, HBASE-14708.patch, 
> anotherbench.zip, location_cache_times.pdf, result.csv
>
>
> Internally a co-worker profiled their application that was talking to HBase. 
> > 60% of the time was spent in locating a region. This was while the cluster 
> was stable and no regions were moving.
> To figure out if there was a faster way to cache region location I wrote up a 
> benchmark here: https://github.com/elliottneilclark/benchmark-hbase-cache
> This tries to simulate a heavy load on the location cache. 
> * 24 different threads.
> * 2 Deleting location data
> * 2 Adding location data
> * Using floor to get the result.
> To repeat my work just run ./run.sh and it should produce a result.csv
> Results:
> ConcurrentSkiplistMap is a good middle ground. It's got equal speed for 
> reading and writing.
> However most operations will not need to remove or add a region location. 
> There will be potentially several orders of magnitude more reads for cached 
> locations than there will be on clearing the cache.
> So I propose a copy on write tree map.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-14708) Use copy on write Map for region location cache

2015-11-16 Thread Elliott Clark (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14708?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Elliott Clark updated HBASE-14708:
--
Attachment: HBASE-14708-v16.patch

> Use copy on write Map for region location cache
> ---
>
> Key: HBASE-14708
> URL: https://issues.apache.org/jira/browse/HBASE-14708
> Project: HBase
>  Issue Type: Improvement
>  Components: Client
>Affects Versions: 1.1.2
>Reporter: Elliott Clark
>Assignee: Elliott Clark
>Priority: Critical
> Fix For: 2.0.0, 1.2.0, 1.3.0
>
> Attachments: HBASE-14708-v10.patch, HBASE-14708-v11.patch, 
> HBASE-14708-v12.patch, HBASE-14708-v13.patch, HBASE-14708-v15.patch, 
> HBASE-14708-v16.patch, HBASE-14708-v2.patch, HBASE-14708-v3.patch, 
> HBASE-14708-v4.patch, HBASE-14708-v5.patch, HBASE-14708-v6.patch, 
> HBASE-14708-v7.patch, HBASE-14708-v8.patch, HBASE-14708-v9.patch, 
> HBASE-14708.patch, anotherbench.zip, location_cache_times.pdf, result.csv
>
>
> Internally a co-worker profiled their application that was talking to HBase. 
> > 60% of the time was spent in locating a region. This was while the cluster 
> was stable and no regions were moving.
> To figure out if there was a faster way to cache region location I wrote up a 
> benchmark here: https://github.com/elliottneilclark/benchmark-hbase-cache
> This tries to simulate a heavy load on the location cache. 
> * 24 different threads.
> * 2 Deleting location data
> * 2 Adding location data
> * Using floor to get the result.
> To repeat my work just run ./run.sh and it should produce a result.csv
> Results:
> ConcurrentSkiplistMap is a good middle ground. It's got equal speed for 
> reading and writing.
> However most operations will not need to remove or add a region location. 
> There will be potentially several orders of magnitude more reads for cached 
> locations than there will be on clearing the cache.
> So I propose a copy on write tree map.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-14708) Use copy on write Map for region location cache

2015-11-16 Thread Elliott Clark (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14708?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Elliott Clark updated HBASE-14708:
--
Attachment: HBASE-14708-v17.patch

Remove un-used tailmapable since there is no COWTreemap.

> Use copy on write Map for region location cache
> ---
>
> Key: HBASE-14708
> URL: https://issues.apache.org/jira/browse/HBASE-14708
> Project: HBase
>  Issue Type: Improvement
>  Components: Client
>Affects Versions: 1.1.2
>Reporter: Elliott Clark
>Assignee: Elliott Clark
>Priority: Critical
> Fix For: 2.0.0, 1.2.0, 1.3.0
>
> Attachments: HBASE-14708-v10.patch, HBASE-14708-v11.patch, 
> HBASE-14708-v12.patch, HBASE-14708-v13.patch, HBASE-14708-v15.patch, 
> HBASE-14708-v16.patch, HBASE-14708-v17.patch, HBASE-14708-v2.patch, 
> HBASE-14708-v3.patch, HBASE-14708-v4.patch, HBASE-14708-v5.patch, 
> HBASE-14708-v6.patch, HBASE-14708-v7.patch, HBASE-14708-v8.patch, 
> HBASE-14708-v9.patch, HBASE-14708.patch, anotherbench.zip, 
> location_cache_times.pdf, result.csv
>
>
> Internally a co-worker profiled their application that was talking to HBase. 
> > 60% of the time was spent in locating a region. This was while the cluster 
> was stable and no regions were moving.
> To figure out if there was a faster way to cache region location I wrote up a 
> benchmark here: https://github.com/elliottneilclark/benchmark-hbase-cache
> This tries to simulate a heavy load on the location cache. 
> * 24 different threads.
> * 2 Deleting location data
> * 2 Adding location data
> * Using floor to get the result.
> To repeat my work just run ./run.sh and it should produce a result.csv
> Results:
> ConcurrentSkiplistMap is a good middle ground. It's got equal speed for 
> reading and writing.
> However most operations will not need to remove or add a region location. 
> There will be potentially several orders of magnitude more reads for cached 
> locations than there will be on clearing the cache.
> So I propose a copy on write tree map.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-14708) Use copy on write Map for region location cache

2015-11-16 Thread Hiroshi Ikeda (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14708?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hiroshi Ikeda updated HBASE-14708:
--
Attachment: anotherbench2.zip

I added a hybrid implementation to my previous benchmark code and its output 
FYI. That logic is not so trivial and I'm not sure how to manage for 
ConcurrentNavigableMap.

> Use copy on write Map for region location cache
> ---
>
> Key: HBASE-14708
> URL: https://issues.apache.org/jira/browse/HBASE-14708
> Project: HBase
>  Issue Type: Improvement
>  Components: Client
>Affects Versions: 1.1.2
>Reporter: Elliott Clark
>Assignee: Elliott Clark
>Priority: Critical
> Fix For: 2.0.0, 1.2.0, 1.3.0
>
> Attachments: HBASE-14708-v10.patch, HBASE-14708-v11.patch, 
> HBASE-14708-v12.patch, HBASE-14708-v13.patch, HBASE-14708-v15.patch, 
> HBASE-14708-v16.patch, HBASE-14708-v17.patch, HBASE-14708-v2.patch, 
> HBASE-14708-v3.patch, HBASE-14708-v4.patch, HBASE-14708-v5.patch, 
> HBASE-14708-v6.patch, HBASE-14708-v7.patch, HBASE-14708-v8.patch, 
> HBASE-14708-v9.patch, HBASE-14708.patch, anotherbench.zip, anotherbench2.zip, 
> location_cache_times.pdf, result.csv
>
>
> Internally a co-worker profiled their application that was talking to HBase. 
> > 60% of the time was spent in locating a region. This was while the cluster 
> was stable and no regions were moving.
> To figure out if there was a faster way to cache region location I wrote up a 
> benchmark here: https://github.com/elliottneilclark/benchmark-hbase-cache
> This tries to simulate a heavy load on the location cache. 
> * 24 different threads.
> * 2 Deleting location data
> * 2 Adding location data
> * Using floor to get the result.
> To repeat my work just run ./run.sh and it should produce a result.csv
> Results:
> ConcurrentSkiplistMap is a good middle ground. It's got equal speed for 
> reading and writing.
> However most operations will not need to remove or add a region location. 
> There will be potentially several orders of magnitude more reads for cached 
> locations than there will be on clearing the cache.
> So I propose a copy on write tree map.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-14708) Use copy on write Map for region location cache

2015-11-05 Thread Elliott Clark (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14708?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Elliott Clark updated HBASE-14708:
--
Attachment: HBASE-14708-v13.patch

> Use copy on write Map for region location cache
> ---
>
> Key: HBASE-14708
> URL: https://issues.apache.org/jira/browse/HBASE-14708
> Project: HBase
>  Issue Type: Improvement
>  Components: Client
>Affects Versions: 1.1.2
>Reporter: Elliott Clark
>Assignee: Elliott Clark
>Priority: Critical
> Fix For: 2.0.0, 1.2.0, 1.3.0
>
> Attachments: HBASE-14708-v10.patch, HBASE-14708-v11.patch, 
> HBASE-14708-v12.patch, HBASE-14708-v13.patch, HBASE-14708-v2.patch, 
> HBASE-14708-v3.patch, HBASE-14708-v4.patch, HBASE-14708-v5.patch, 
> HBASE-14708-v6.patch, HBASE-14708-v7.patch, HBASE-14708-v8.patch, 
> HBASE-14708-v9.patch, HBASE-14708.patch, anotherbench.zip, 
> location_cache_times.pdf, result.csv
>
>
> Internally a co-worker profiled their application that was talking to HBase. 
> > 60% of the time was spent in locating a region. This was while the cluster 
> was stable and no regions were moving.
> To figure out if there was a faster way to cache region location I wrote up a 
> benchmark here: https://github.com/elliottneilclark/benchmark-hbase-cache
> This tries to simulate a heavy load on the location cache. 
> * 24 different threads.
> * 2 Deleting location data
> * 2 Adding location data
> * Using floor to get the result.
> To repeat my work just run ./run.sh and it should produce a result.csv
> Results:
> ConcurrentSkiplistMap is a good middle ground. It's got equal speed for 
> reading and writing.
> However most operations will not need to remove or add a region location. 
> There will be potentially several orders of magnitude more reads for cached 
> locations than there will be on clearing the cache.
> So I propose a copy on write tree map.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-14708) Use copy on write Map for region location cache

2015-11-04 Thread Elliott Clark (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14708?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Elliott Clark updated HBASE-14708:
--
Summary: Use copy on write Map for region location cache  (was: Use copy on 
write TreeMap for region location cache)

> Use copy on write Map for region location cache
> ---
>
> Key: HBASE-14708
> URL: https://issues.apache.org/jira/browse/HBASE-14708
> Project: HBase
>  Issue Type: Improvement
>  Components: Client
>Affects Versions: 1.1.2
>Reporter: Elliott Clark
>Assignee: Elliott Clark
>Priority: Critical
> Fix For: 2.0.0, 1.2.0, 1.3.0
>
> Attachments: HBASE-14708-v10.patch, HBASE-14708-v11.patch, 
> HBASE-14708-v12.patch, HBASE-14708-v2.patch, HBASE-14708-v3.patch, 
> HBASE-14708-v4.patch, HBASE-14708-v5.patch, HBASE-14708-v6.patch, 
> HBASE-14708-v7.patch, HBASE-14708-v8.patch, HBASE-14708-v9.patch, 
> HBASE-14708.patch, anotherbench.zip, location_cache_times.pdf, result.csv
>
>
> Internally a co-worker profiled their application that was talking to HBase. 
> > 60% of the time was spent in locating a region. This was while the cluster 
> was stable and no regions were moving.
> To figure out if there was a faster way to cache region location I wrote up a 
> benchmark here: https://github.com/elliottneilclark/benchmark-hbase-cache
> This tries to simulate a heavy load on the location cache. 
> * 24 different threads.
> * 2 Deleting location data
> * 2 Adding location data
> * Using floor to get the result.
> To repeat my work just run ./run.sh and it should produce a result.csv
> Results:
> ConcurrentSkiplistMap is a good middle ground. It's got equal speed for 
> reading and writing.
> However most operations will not need to remove or add a region location. 
> There will be potentially several orders of magnitude more reads for cached 
> locations than there will be on clearing the cache.
> So I propose a copy on write tree map.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)