[jira] [Resolved] (HBASE-7326) SortedCopyOnWriteSet is not thread safe due to leaked TreeSet implementations
[ https://issues.apache.org/jira/browse/HBASE-7326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chia-Ping Tsai resolved HBASE-7326. --- Resolution: Duplicate > SortedCopyOnWriteSet is not thread safe due to leaked TreeSet implementations > - > > Key: HBASE-7326 > URL: https://issues.apache.org/jira/browse/HBASE-7326 > Project: HBase > Issue Type: Bug > Components: util >Reporter: Gary Helmling > Fix For: 2.0.0 > > > The SortedCopyOnWriteSet implementation uses an internal TreeSet that is > copied and replaced on mutation operations. However, in a few areas, > SortedCopyOnWriteSet leaks references to the underlying TreeSet > implementations, allowing for unsafe usage: > * iterator() > * subSet() > * headSet() > * tailSet() > For Iterator.remove(), we can wrap in an implementation that throws > UnsupportedOperationException. For the sub set methods, we could return new > SortedCopyOnWriteSet instances (which would not modify the parent set), or > wrap with a new sub set implementation that safely allows modification of the > parent set. > To be clear, the current usage of SortedCopyOnWriteSet does not make use of > any of these non-thread-safe methods, but the implementation should be fixed > to be completely thread safe and prevent any new issues. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
Re: hbase does not seem to handle mixed workloads well
bq. increasing number of threads seems to be an incomplete solution I agree - since the number of concurrent IO intensive client requests is unpredictable. One possible solution, in case of cache miss for the reads, is thru mechanism similar to Java 8's CompletableFuture. See this post: https://www.infoq.com/articles/Functional-Style-Callbacks-Using-CompletableFuture The result of HDFS reads, when ready, can be channeled back thru functions specified to methods thenApply / thenAccept so that the handler can be non-blocking. Cheers On Sat, Apr 1, 2017 at 5:46 PM, 杨苏立 Yang Su Liwrote: > Yes, that is indeed the problem. It is caused by > > 1) HBase has a fixed number (by default 30) of RPC handlers (a reasonable > design choice) > 2) RPC handlers block on HDFS reads (also a reasonable design choice) > > As the system has a higher load of I/O intensive workloads, all RPC > handlers would be blocked and no progress can be made for requests that do > not require I/O. > > However, increasing number of threads seems to be an incomplete solution -- > you run into the same problem with higher load of I/O intensive > workloads... > > > > On Sat, Apr 1, 2017 at 3:47 PM, Enis Söztutar wrote: > > > I think the problem is that you ONLY have 30 "handler" threads ( > > hbase.regionserver.handler.count). Handlers are the main thread pool > that > > executes the RPC requests. When you do an IO bound requests, very likely > > all of the 30 threads are just blocked by the disk access, so that the > > total throughput drops. > > > > It is typical to run with 100-300 threads on the regionserver side, > > depending on your settings. You can use the "Debug dump" from the > > regionserver we UI or jstack to inspect what the "handler" threads are > > doing. > > > > Enis > > > > On Fri, Mar 31, 2017 at 7:57 PM, 杨苏立 Yang Su Li > > wrote: > > > > > On Fri, Mar 31, 2017 at 9:39 PM, Ted Yu wrote: > > > > > > > Can you tell us which release of hbase you used ? > > > > > > > > > > 2.0.0 Snapshot > > > > > > > > > > > Please describe values for the config parameters in hbase-site.xml > > > > > > > > The content of hbase-site.xml is shown below, but indeed this problem > > is > > > not sensitive to configuration -- we can reproduce the same problem > with > > > different configurations, and across different hbase version. > > > > > > > > > > Do you have SSD(s) in your cluster ? > > > > If so and the mixed workload involves writes, have you taken a look > at > > > > HBASE-12848 > > > > ? > > > > > > > No, we don't use SSD (for hbase). And the workload does not involve > > writes > > > (even though workload with writes show similar behavior). I stated that > > > both clients are doing 1KB Gets. > > > > > > > > > > > > > > > hbase-master > > > node0.orighbasecluster.distsched-pg0.wisc.cloudlab.us:6 > > > > > > > > > > > > > > hbase.rootdir > > > hdfs:// > > > node0.orighbasecluster.distsched-pg0.wisc.cloudlab.us:9000/hbase > > > > > > > > > > > > > hbase.fs.tmp.dir > > > hdfs:// > > > node0.orighbasecluster.distsched-pg0.wisc.cloudlab. > us:9000/hbase-staging > > > > > > > > > > > > > > > hbase.cluster.distributed > > > true > > > > > > > > > > > > hbase.zookeeper.property.dataDir > > > /tmp/zookeeper > > > > > > > > > > > > hbase.zookeeper.property.clientPort > > > 2181 > > > > > > > > > > > > hbase.zookeeper.quorum > > > node0.orighbasecluster.distsched-pg0.wisc.cloudlab.us > > > > > > > > > > > > hbase.ipc.server.read.threadpool.size > > > 10 > > > > > > > > > > > > hbase.regionserver.handler.count > > > 30 > > > > > > > > > > > > > > > > > > > > > > > > > > Cheers > > > > > > > > On Fri, Mar 31, 2017 at 7:29 PM, 杨苏立 Yang Su Li > > > > wrote: > > > > > > > > > Hi, > > > > > > > > > > We found that when there is a mix of CPU-intensive and I/O > intensive > > > > > workload, HBase seems to slow everything down to the disk > throughput > > > > level. > > > > > > > > > > This is shown in the performance graph at > > > > > http://pages.cs.wisc.edu/~suli/blocking-orig.pdf : both client-1 > and > > > > > client-2 are issuing 1KB Gets. From second 0 , both repeatedly > > access a > > > > > small set of data that is cachable and both get high throughput > (~45k > > > > > ops/s). At second 60, client-1 switch to an I/O intensive workload > > and > > > > > begins to randomly access a large set of data (does not fit in > > cache). > > > > > *Both* client-1 and client-2's throughput drops to ~0.5K ops/s. > > > > > > > > > > Is this acceptable behavior for HBase or is it considered a bug or > > > > > performance drawback? > > > > > I can find an old JIRA entry about similar problems ( > > > > > https://issues.apache.org/jira/browse/HBASE-8836), but that was > > never > > > > > resolved. > > > > > > > > > > Thanks. > > > > > > > > > > Suli > > > > > > > > > > -- > > > > > Suli Yang > > > > > > > > > >
Re: hbase does not seem to handle mixed workloads well
Yes, that is indeed the problem. It is caused by 1) HBase has a fixed number (by default 30) of RPC handlers (a reasonable design choice) 2) RPC handlers block on HDFS reads (also a reasonable design choice) As the system has a higher load of I/O intensive workloads, all RPC handlers would be blocked and no progress can be made for requests that do not require I/O. However, increasing number of threads seems to be an incomplete solution -- you run into the same problem with higher load of I/O intensive workloads... On Sat, Apr 1, 2017 at 3:47 PM, Enis Söztutarwrote: > I think the problem is that you ONLY have 30 "handler" threads ( > hbase.regionserver.handler.count). Handlers are the main thread pool that > executes the RPC requests. When you do an IO bound requests, very likely > all of the 30 threads are just blocked by the disk access, so that the > total throughput drops. > > It is typical to run with 100-300 threads on the regionserver side, > depending on your settings. You can use the "Debug dump" from the > regionserver we UI or jstack to inspect what the "handler" threads are > doing. > > Enis > > On Fri, Mar 31, 2017 at 7:57 PM, 杨苏立 Yang Su Li > wrote: > > > On Fri, Mar 31, 2017 at 9:39 PM, Ted Yu wrote: > > > > > Can you tell us which release of hbase you used ? > > > > > > > 2.0.0 Snapshot > > > > > > > > Please describe values for the config parameters in hbase-site.xml > > > > > > The content of hbase-site.xml is shown below, but indeed this problem > is > > not sensitive to configuration -- we can reproduce the same problem with > > different configurations, and across different hbase version. > > > > > > > Do you have SSD(s) in your cluster ? > > > If so and the mixed workload involves writes, have you taken a look at > > > HBASE-12848 > > > ? > > > > > No, we don't use SSD (for hbase). And the workload does not involve > writes > > (even though workload with writes show similar behavior). I stated that > > both clients are doing 1KB Gets. > > > > > > > > > > hbase-master > > node0.orighbasecluster.distsched-pg0.wisc.cloudlab.us:6 > > > > > > > > > hbase.rootdir > > hdfs:// > > node0.orighbasecluster.distsched-pg0.wisc.cloudlab.us:9000/hbase > > > > > > > > hbase.fs.tmp.dir > > hdfs:// > > node0.orighbasecluster.distsched-pg0.wisc.cloudlab.us:9000/hbase-staging > > > > > > > > > > hbase.cluster.distributed > > true > > > > > > > > hbase.zookeeper.property.dataDir > > /tmp/zookeeper > > > > > > > > hbase.zookeeper.property.clientPort > > 2181 > > > > > > > > hbase.zookeeper.quorum > > node0.orighbasecluster.distsched-pg0.wisc.cloudlab.us > > > > > > > > hbase.ipc.server.read.threadpool.size > > 10 > > > > > > > > hbase.regionserver.handler.count > > 30 > > > > > > > > > > > > > > > > > > Cheers > > > > > > On Fri, Mar 31, 2017 at 7:29 PM, 杨苏立 Yang Su Li > > > wrote: > > > > > > > Hi, > > > > > > > > We found that when there is a mix of CPU-intensive and I/O intensive > > > > workload, HBase seems to slow everything down to the disk throughput > > > level. > > > > > > > > This is shown in the performance graph at > > > > http://pages.cs.wisc.edu/~suli/blocking-orig.pdf : both client-1 and > > > > client-2 are issuing 1KB Gets. From second 0 , both repeatedly > access a > > > > small set of data that is cachable and both get high throughput (~45k > > > > ops/s). At second 60, client-1 switch to an I/O intensive workload > and > > > > begins to randomly access a large set of data (does not fit in > cache). > > > > *Both* client-1 and client-2's throughput drops to ~0.5K ops/s. > > > > > > > > Is this acceptable behavior for HBase or is it considered a bug or > > > > performance drawback? > > > > I can find an old JIRA entry about similar problems ( > > > > https://issues.apache.org/jira/browse/HBASE-8836), but that was > never > > > > resolved. > > > > > > > > Thanks. > > > > > > > > Suli > > > > > > > > -- > > > > Suli Yang > > > > > > > > Department of Physics > > > > University of Wisconsin Madison > > > > > > > > 4257 Chamberlin Hall > > > > Madison WI 53703 > > > > > > > > > > > > > > > -- > > Suli Yang > > > > Department of Physics > > University of Wisconsin Madison > > > > 4257 Chamberlin Hall > > Madison WI 53703 > > > -- Suli Yang Department of Physics University of Wisconsin Madison 4257 Chamberlin Hall Madison WI 53703
Re: Successful: HBase Generate Website
+1 to that :D Stack wrote: Hot Dog! On Fri, Mar 31, 2017 at 3:03 PM, Misty Stanley-Joneswrote: FYI, the linked Jenkins job now automatically updates the site! No more need to manually push. Merry Christmas! :) - Original message - From: Apache Jenkins Server To: dev@hbase.apache.org Subject: Successful: HBase Generate Website Date: Fri, 31 Mar 2017 21:32:17 + (UTC) Build status: Successful If successful, the website and docs have been generated and the site has been updated automatically. If failed, see https://builds.apache.org/job/hbase_generate_website/561/console YOU DO NOT NEED TO DO THE FOLLOWING ANYMORE! It is here for informational purposes and shows what the Jenkins job does to push the site. git clone https://git-wip-us.apache.org/repos/asf/hbase-site.git cd hbase-site wget -O- https://builds.apache.org/job/hbase_generate_website/561/ artifact/website.patch.zip | funzip> 1c4d9c8965952cbd17f0afdacbb0c0ac1e5bd1d7.patch git fetch git checkout -b asf-site-1c4d9c8965952cbd17f0afdacbb0c0ac1e5bd1d7 origin/asf-site git am --whitespace=fix 1c4d9c8965952cbd17f0afdacbb0c0ac1e5bd1d7.patch git push origin asf-site-1c4d9c8965952cbd17f0afdacbb0c0ac1e5bd1d7:asf-site git commit --allow-empty -m "INFRA-10751 Empty commit" git push origin asf-site git checkout asf-site git branch -D asf-site-1c4d9c8965952cbd17f0afdacbb0c0ac1e5bd1d7
Re: hbase does not seem to handle mixed workloads well
I think the problem is that you ONLY have 30 "handler" threads ( hbase.regionserver.handler.count). Handlers are the main thread pool that executes the RPC requests. When you do an IO bound requests, very likely all of the 30 threads are just blocked by the disk access, so that the total throughput drops. It is typical to run with 100-300 threads on the regionserver side, depending on your settings. You can use the "Debug dump" from the regionserver we UI or jstack to inspect what the "handler" threads are doing. Enis On Fri, Mar 31, 2017 at 7:57 PM, 杨苏立 Yang Su Liwrote: > On Fri, Mar 31, 2017 at 9:39 PM, Ted Yu wrote: > > > Can you tell us which release of hbase you used ? > > > > 2.0.0 Snapshot > > > > > Please describe values for the config parameters in hbase-site.xml > > > > The content of hbase-site.xml is shown below, but indeed this problem is > not sensitive to configuration -- we can reproduce the same problem with > different configurations, and across different hbase version. > > > > Do you have SSD(s) in your cluster ? > > If so and the mixed workload involves writes, have you taken a look at > > HBASE-12848 > > ? > > > No, we don't use SSD (for hbase). And the workload does not involve writes > (even though workload with writes show similar behavior). I stated that > both clients are doing 1KB Gets. > > > > > hbase-master > node0.orighbasecluster.distsched-pg0.wisc.cloudlab.us:6 > > > > hbase.rootdir > hdfs:// > node0.orighbasecluster.distsched-pg0.wisc.cloudlab.us:9000/hbase > > > > hbase.fs.tmp.dir > hdfs:// > node0.orighbasecluster.distsched-pg0.wisc.cloudlab.us:9000/hbase-staging > > > > > hbase.cluster.distributed > true > > > > hbase.zookeeper.property.dataDir > /tmp/zookeeper > > > > hbase.zookeeper.property.clientPort > 2181 > > > > hbase.zookeeper.quorum > node0.orighbasecluster.distsched-pg0.wisc.cloudlab.us > > > > hbase.ipc.server.read.threadpool.size > 10 > > > > hbase.regionserver.handler.count > 30 > > > > > > > > > > Cheers > > > > On Fri, Mar 31, 2017 at 7:29 PM, 杨苏立 Yang Su Li > > wrote: > > > > > Hi, > > > > > > We found that when there is a mix of CPU-intensive and I/O intensive > > > workload, HBase seems to slow everything down to the disk throughput > > level. > > > > > > This is shown in the performance graph at > > > http://pages.cs.wisc.edu/~suli/blocking-orig.pdf : both client-1 and > > > client-2 are issuing 1KB Gets. From second 0 , both repeatedly access a > > > small set of data that is cachable and both get high throughput (~45k > > > ops/s). At second 60, client-1 switch to an I/O intensive workload and > > > begins to randomly access a large set of data (does not fit in cache). > > > *Both* client-1 and client-2's throughput drops to ~0.5K ops/s. > > > > > > Is this acceptable behavior for HBase or is it considered a bug or > > > performance drawback? > > > I can find an old JIRA entry about similar problems ( > > > https://issues.apache.org/jira/browse/HBASE-8836), but that was never > > > resolved. > > > > > > Thanks. > > > > > > Suli > > > > > > -- > > > Suli Yang > > > > > > Department of Physics > > > University of Wisconsin Madison > > > > > > 4257 Chamberlin Hall > > > Madison WI 53703 > > > > > > > > > -- > Suli Yang > > Department of Physics > University of Wisconsin Madison > > 4257 Chamberlin Hall > Madison WI 53703 >
Re: Updated Code of Conduct
On Fri, Mar 31, 2017 at 3:11 PM, Misty Stanley-Joneswrote: > All, > > We have updated the Code of Conduct to be a little more explicit about > how much we value diversity in the HBase project, and to ask for your > feedback on how we can improve. Your feedback is ALWAYS welcome, whether > on one of the public mailing lists or privately to a committer or a PMC > member on the project. On behalf of the entire PMC, thank you for being > part of the HBase project, and for all of the amazing work you all do! > > See the changes at http://hbase.apache.org/coc.html. > > Thanks, > Misty > +1 St.Ack
Re: Successful: HBase Generate Website
Hot Dog! On Fri, Mar 31, 2017 at 3:03 PM, Misty Stanley-Joneswrote: > FYI, the linked Jenkins job now automatically updates the site! No more > need to manually push. Merry Christmas! :) > > - Original message - > From: Apache Jenkins Server > To: dev@hbase.apache.org > Subject: Successful: HBase Generate Website > Date: Fri, 31 Mar 2017 21:32:17 + (UTC) > > Build status: Successful > > If successful, the website and docs have been generated and the site has > been updated automatically. > If failed, see > https://builds.apache.org/job/hbase_generate_website/561/console > > YOU DO NOT NEED TO DO THE FOLLOWING ANYMORE! It is here for > informational purposes and shows what the Jenkins job does to push the > site. > > git clone https://git-wip-us.apache.org/repos/asf/hbase-site.git > cd hbase-site > wget -O- > https://builds.apache.org/job/hbase_generate_website/561/ > artifact/website.patch.zip > | funzip > 1c4d9c8965952cbd17f0afdacbb0c0ac1e5bd1d7.patch > git fetch > git checkout -b asf-site-1c4d9c8965952cbd17f0afdacbb0c0ac1e5bd1d7 > origin/asf-site > git am --whitespace=fix 1c4d9c8965952cbd17f0afdacbb0c0ac1e5bd1d7.patch > git push origin > asf-site-1c4d9c8965952cbd17f0afdacbb0c0ac1e5bd1d7:asf-site > git commit --allow-empty -m "INFRA-10751 Empty commit" > git push origin asf-site > git checkout asf-site > git branch -D asf-site-1c4d9c8965952cbd17f0afdacbb0c0ac1e5bd1d7 > > > >
[jira] [Created] (HBASE-17867) Implement async procedure RPC API(list/exec/abort/isFinished)
Zheng Hu created HBASE-17867: Summary: Implement async procedure RPC API(list/exec/abort/isFinished) Key: HBASE-17867 URL: https://issues.apache.org/jira/browse/HBASE-17867 Project: HBase Issue Type: Sub-task Reporter: Zheng Hu -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (HBASE-17866) Implement async setQuota/getQuota methods.
Zheng Hu created HBASE-17866: Summary: Implement async setQuota/getQuota methods. Key: HBASE-17866 URL: https://issues.apache.org/jira/browse/HBASE-17866 Project: HBase Issue Type: Sub-task Reporter: Zheng Hu Assignee: Zheng Hu -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (HBASE-17865) Implement async listSnapshot/deleteSnapshot methods.
Zheng Hu created HBASE-17865: Summary: Implement async listSnapshot/deleteSnapshot methods. Key: HBASE-17865 URL: https://issues.apache.org/jira/browse/HBASE-17865 Project: HBase Issue Type: Sub-task Reporter: Zheng Hu Assignee: Zheng Hu -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (HBASE-17864) Implement async snapshot/cloneSnapshot/restoreSnapshot methods
Zheng Hu created HBASE-17864: Summary: Implement async snapshot/cloneSnapshot/restoreSnapshot methods Key: HBASE-17864 URL: https://issues.apache.org/jira/browse/HBASE-17864 Project: HBase Issue Type: Sub-task Reporter: Zheng Hu Assignee: Zheng Hu -- This message was sent by Atlassian JIRA (v6.3.15#6346)