[jira] [Resolved] (HBASE-7326) SortedCopyOnWriteSet is not thread safe due to leaked TreeSet implementations

2017-04-01 Thread Chia-Ping Tsai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-7326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chia-Ping Tsai resolved HBASE-7326.
---
Resolution: Duplicate

> SortedCopyOnWriteSet is not thread safe due to leaked TreeSet implementations
> -
>
> Key: HBASE-7326
> URL: https://issues.apache.org/jira/browse/HBASE-7326
> Project: HBase
>  Issue Type: Bug
>  Components: util
>Reporter: Gary Helmling
> Fix For: 2.0.0
>
>
> The SortedCopyOnWriteSet implementation uses an internal TreeSet that is 
> copied and replaced on mutation operations.  However, in a few areas, 
> SortedCopyOnWriteSet leaks references to the underlying TreeSet 
> implementations, allowing for unsafe usage:
> * iterator()
> * subSet()
> * headSet()
> * tailSet()
> For Iterator.remove(), we can wrap in an implementation that throws 
> UnsupportedOperationException.  For the sub set methods, we could return new 
> SortedCopyOnWriteSet instances (which would not modify the parent set), or 
> wrap with a new sub set implementation that safely allows modification of the 
> parent set.
> To be clear, the current usage of SortedCopyOnWriteSet does not make use of 
> any of these non-thread-safe methods, but the implementation should be fixed 
> to be completely thread safe and prevent any new issues.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


Re: hbase does not seem to handle mixed workloads well

2017-04-01 Thread Ted Yu
bq.  increasing number of threads seems to be an incomplete solution

I agree - since the number of concurrent IO intensive client requests is
unpredictable.

One possible solution, in case of cache miss for the reads, is thru
mechanism similar to Java 8's CompletableFuture.

See this post:
https://www.infoq.com/articles/Functional-Style-Callbacks-Using-CompletableFuture

The result of HDFS reads, when ready, can be channeled back thru functions
specified to methods thenApply / thenAccept so that the handler can be
non-blocking.

Cheers

On Sat, Apr 1, 2017 at 5:46 PM, 杨苏立 Yang Su Li  wrote:

> Yes, that is indeed the problem. It is caused by
>
> 1) HBase has a fixed number (by default 30) of RPC handlers (a reasonable
> design choice)
> 2) RPC handlers block on HDFS reads (also a reasonable design choice)
>
> As the system has a higher load of I/O intensive workloads, all RPC
> handlers would be blocked and no progress can be made for requests that do
> not require I/O.
>
> However, increasing number of threads seems to be an incomplete solution --
> you run into the same problem with higher load of I/O intensive
> workloads...
>
>
>
> On Sat, Apr 1, 2017 at 3:47 PM, Enis Söztutar  wrote:
>
> > I think the problem is that you ONLY have 30 "handler" threads (
> > hbase.regionserver.handler.count). Handlers are the main thread pool
> that
> > executes the RPC requests. When you do an IO bound requests, very likely
> > all of the 30 threads are just blocked by the disk access, so that the
> > total throughput drops.
> >
> > It is typical to run with 100-300 threads on the regionserver side,
> > depending on your settings. You can use the "Debug dump" from the
> > regionserver we UI or jstack to inspect what the "handler" threads are
> > doing.
> >
> > Enis
> >
> > On Fri, Mar 31, 2017 at 7:57 PM, 杨苏立 Yang Su Li 
> > wrote:
> >
> > > On Fri, Mar 31, 2017 at 9:39 PM, Ted Yu  wrote:
> > >
> > > > Can you tell us which release of hbase you used ?
> > > >
> > >
> > > 2.0.0 Snapshot
> > >
> > > >
> > > > Please describe values for the config parameters in hbase-site.xml
> > > >
> > > > The content of hbase-site.xml is shown below, but indeed this problem
> > is
> > > not sensitive to configuration -- we can reproduce the same problem
> with
> > > different configurations, and across different hbase version.
> > >
> > >
> > > > Do you have SSD(s) in your cluster ?
> > > > If so and the mixed workload involves writes, have you taken a look
> at
> > > > HBASE-12848
> > > > ?
> > > >
> > > No, we don't use SSD (for hbase). And the workload does not involve
> > writes
> > > (even though workload with writes show similar behavior). I stated that
> > > both clients are doing 1KB Gets.
> > >
> > > 
> > >
> > > 
> > > hbase-master
> > > node0.orighbasecluster.distsched-pg0.wisc.cloudlab.us:6
> > 
> > > 
> > >
> > > 
> > > hbase.rootdir
> > > hdfs://
> > > node0.orighbasecluster.distsched-pg0.wisc.cloudlab.us:9000/hbase
> 
> > > 
> > >
> > > 
> > > hbase.fs.tmp.dir
> > > hdfs://
> > > node0.orighbasecluster.distsched-pg0.wisc.cloudlab.
> us:9000/hbase-staging
> > > 
> > > 
> > >
> > > 
> > > hbase.cluster.distributed
> > > true
> > > 
> > >
> > > 
> > > hbase.zookeeper.property.dataDir
> > > /tmp/zookeeper
> > > 
> > >
> > > 
> > > hbase.zookeeper.property.clientPort
> > > 2181
> > > 
> > >
> > > 
> > > hbase.zookeeper.quorum
> > > node0.orighbasecluster.distsched-pg0.wisc.cloudlab.us
> > > 
> > >
> > > 
> > > hbase.ipc.server.read.threadpool.size
> > > 10
> > > 
> > >
> > > 
> > > hbase.regionserver.handler.count
> > > 30
> > > 
> > >
> > > 
> > >
> > >
> > >
> > > >
> > > > Cheers
> > > >
> > > > On Fri, Mar 31, 2017 at 7:29 PM, 杨苏立 Yang Su Li 
> > > > wrote:
> > > >
> > > > > Hi,
> > > > >
> > > > > We found that when there is a mix of CPU-intensive and I/O
> intensive
> > > > > workload, HBase seems to slow everything down to the disk
> throughput
> > > > level.
> > > > >
> > > > > This is shown in the performance graph at
> > > > > http://pages.cs.wisc.edu/~suli/blocking-orig.pdf : both client-1
> and
> > > > > client-2 are issuing 1KB Gets. From second 0 , both repeatedly
> > access a
> > > > > small set of data that is cachable and both get high throughput
> (~45k
> > > > > ops/s). At second 60, client-1 switch to an I/O intensive workload
> > and
> > > > > begins to randomly access a large set of data (does not fit in
> > cache).
> > > > > *Both* client-1 and client-2's throughput drops to ~0.5K ops/s.
> > > > >
> > > > > Is this acceptable behavior for HBase or is it considered a bug or
> > > > > performance drawback?
> > > > > I can find an old JIRA entry about similar problems (
> > > > > https://issues.apache.org/jira/browse/HBASE-8836), but that was
> > never
> > > > > resolved.
> > > > >
> > > > > Thanks.
> > > > >
> > > > > Suli
> > > > >
> > > > > --
> > > > > Suli Yang
> > > > >
> > > > > 

Re: hbase does not seem to handle mixed workloads well

2017-04-01 Thread 杨苏立 Yang Su Li
Yes, that is indeed the problem. It is caused by

1) HBase has a fixed number (by default 30) of RPC handlers (a reasonable
design choice)
2) RPC handlers block on HDFS reads (also a reasonable design choice)

As the system has a higher load of I/O intensive workloads, all RPC
handlers would be blocked and no progress can be made for requests that do
not require I/O.

However, increasing number of threads seems to be an incomplete solution --
you run into the same problem with higher load of I/O intensive workloads...



On Sat, Apr 1, 2017 at 3:47 PM, Enis Söztutar  wrote:

> I think the problem is that you ONLY have 30 "handler" threads (
> hbase.regionserver.handler.count). Handlers are the main thread pool that
> executes the RPC requests. When you do an IO bound requests, very likely
> all of the 30 threads are just blocked by the disk access, so that the
> total throughput drops.
>
> It is typical to run with 100-300 threads on the regionserver side,
> depending on your settings. You can use the "Debug dump" from the
> regionserver we UI or jstack to inspect what the "handler" threads are
> doing.
>
> Enis
>
> On Fri, Mar 31, 2017 at 7:57 PM, 杨苏立 Yang Su Li 
> wrote:
>
> > On Fri, Mar 31, 2017 at 9:39 PM, Ted Yu  wrote:
> >
> > > Can you tell us which release of hbase you used ?
> > >
> >
> > 2.0.0 Snapshot
> >
> > >
> > > Please describe values for the config parameters in hbase-site.xml
> > >
> > > The content of hbase-site.xml is shown below, but indeed this problem
> is
> > not sensitive to configuration -- we can reproduce the same problem with
> > different configurations, and across different hbase version.
> >
> >
> > > Do you have SSD(s) in your cluster ?
> > > If so and the mixed workload involves writes, have you taken a look at
> > > HBASE-12848
> > > ?
> > >
> > No, we don't use SSD (for hbase). And the workload does not involve
> writes
> > (even though workload with writes show similar behavior). I stated that
> > both clients are doing 1KB Gets.
> >
> > 
> >
> > 
> > hbase-master
> > node0.orighbasecluster.distsched-pg0.wisc.cloudlab.us:6
> 
> > 
> >
> > 
> > hbase.rootdir
> > hdfs://
> > node0.orighbasecluster.distsched-pg0.wisc.cloudlab.us:9000/hbase
> > 
> >
> > 
> > hbase.fs.tmp.dir
> > hdfs://
> > node0.orighbasecluster.distsched-pg0.wisc.cloudlab.us:9000/hbase-staging
> > 
> > 
> >
> > 
> > hbase.cluster.distributed
> > true
> > 
> >
> > 
> > hbase.zookeeper.property.dataDir
> > /tmp/zookeeper
> > 
> >
> > 
> > hbase.zookeeper.property.clientPort
> > 2181
> > 
> >
> > 
> > hbase.zookeeper.quorum
> > node0.orighbasecluster.distsched-pg0.wisc.cloudlab.us
> > 
> >
> > 
> > hbase.ipc.server.read.threadpool.size
> > 10
> > 
> >
> > 
> > hbase.regionserver.handler.count
> > 30
> > 
> >
> > 
> >
> >
> >
> > >
> > > Cheers
> > >
> > > On Fri, Mar 31, 2017 at 7:29 PM, 杨苏立 Yang Su Li 
> > > wrote:
> > >
> > > > Hi,
> > > >
> > > > We found that when there is a mix of CPU-intensive and I/O intensive
> > > > workload, HBase seems to slow everything down to the disk throughput
> > > level.
> > > >
> > > > This is shown in the performance graph at
> > > > http://pages.cs.wisc.edu/~suli/blocking-orig.pdf : both client-1 and
> > > > client-2 are issuing 1KB Gets. From second 0 , both repeatedly
> access a
> > > > small set of data that is cachable and both get high throughput (~45k
> > > > ops/s). At second 60, client-1 switch to an I/O intensive workload
> and
> > > > begins to randomly access a large set of data (does not fit in
> cache).
> > > > *Both* client-1 and client-2's throughput drops to ~0.5K ops/s.
> > > >
> > > > Is this acceptable behavior for HBase or is it considered a bug or
> > > > performance drawback?
> > > > I can find an old JIRA entry about similar problems (
> > > > https://issues.apache.org/jira/browse/HBASE-8836), but that was
> never
> > > > resolved.
> > > >
> > > > Thanks.
> > > >
> > > > Suli
> > > >
> > > > --
> > > > Suli Yang
> > > >
> > > > Department of Physics
> > > > University of Wisconsin Madison
> > > >
> > > > 4257 Chamberlin Hall
> > > > Madison WI 53703
> > > >
> > >
> >
> >
> >
> > --
> > Suli Yang
> >
> > Department of Physics
> > University of Wisconsin Madison
> >
> > 4257 Chamberlin Hall
> > Madison WI 53703
> >
>



-- 
Suli Yang

Department of Physics
University of Wisconsin Madison

4257 Chamberlin Hall
Madison WI 53703


Re: Successful: HBase Generate Website

2017-04-01 Thread Josh Elser

+1 to that :D

Stack wrote:

Hot Dog!

On Fri, Mar 31, 2017 at 3:03 PM, Misty Stanley-Jones
wrote:


FYI, the linked Jenkins job now automatically updates the site! No more
need to manually push. Merry Christmas! :)

- Original message -
From: Apache Jenkins Server
To: dev@hbase.apache.org
Subject: Successful: HBase Generate Website
Date: Fri, 31 Mar 2017 21:32:17 + (UTC)

Build status: Successful

If successful, the website and docs have been generated and the site has
been updated automatically.
If failed, see
https://builds.apache.org/job/hbase_generate_website/561/console

YOU DO NOT NEED TO DO THE FOLLOWING ANYMORE! It is here for
informational purposes and shows what the Jenkins job does to push the
site.

   git clone https://git-wip-us.apache.org/repos/asf/hbase-site.git
   cd hbase-site
   wget -O-
   https://builds.apache.org/job/hbase_generate_website/561/
artifact/website.patch.zip
   | funzip>  1c4d9c8965952cbd17f0afdacbb0c0ac1e5bd1d7.patch
   git fetch
   git checkout -b asf-site-1c4d9c8965952cbd17f0afdacbb0c0ac1e5bd1d7
   origin/asf-site
   git am --whitespace=fix 1c4d9c8965952cbd17f0afdacbb0c0ac1e5bd1d7.patch
   git push origin
   asf-site-1c4d9c8965952cbd17f0afdacbb0c0ac1e5bd1d7:asf-site
   git commit --allow-empty -m "INFRA-10751 Empty commit"
   git push origin asf-site
   git checkout asf-site
   git branch -D asf-site-1c4d9c8965952cbd17f0afdacbb0c0ac1e5bd1d7








Re: hbase does not seem to handle mixed workloads well

2017-04-01 Thread Enis Söztutar
I think the problem is that you ONLY have 30 "handler" threads (
hbase.regionserver.handler.count). Handlers are the main thread pool that
executes the RPC requests. When you do an IO bound requests, very likely
all of the 30 threads are just blocked by the disk access, so that the
total throughput drops.

It is typical to run with 100-300 threads on the regionserver side,
depending on your settings. You can use the "Debug dump" from the
regionserver we UI or jstack to inspect what the "handler" threads are
doing.

Enis

On Fri, Mar 31, 2017 at 7:57 PM, 杨苏立 Yang Su Li  wrote:

> On Fri, Mar 31, 2017 at 9:39 PM, Ted Yu  wrote:
>
> > Can you tell us which release of hbase you used ?
> >
>
> 2.0.0 Snapshot
>
> >
> > Please describe values for the config parameters in hbase-site.xml
> >
> > The content of hbase-site.xml is shown below, but indeed this problem is
> not sensitive to configuration -- we can reproduce the same problem with
> different configurations, and across different hbase version.
>
>
> > Do you have SSD(s) in your cluster ?
> > If so and the mixed workload involves writes, have you taken a look at
> > HBASE-12848
> > ?
> >
> No, we don't use SSD (for hbase). And the workload does not involve writes
> (even though workload with writes show similar behavior). I stated that
> both clients are doing 1KB Gets.
>
> 
>
> 
> hbase-master
> node0.orighbasecluster.distsched-pg0.wisc.cloudlab.us:6
> 
>
> 
> hbase.rootdir
> hdfs://
> node0.orighbasecluster.distsched-pg0.wisc.cloudlab.us:9000/hbase
> 
>
> 
> hbase.fs.tmp.dir
> hdfs://
> node0.orighbasecluster.distsched-pg0.wisc.cloudlab.us:9000/hbase-staging
> 
> 
>
> 
> hbase.cluster.distributed
> true
> 
>
> 
> hbase.zookeeper.property.dataDir
> /tmp/zookeeper
> 
>
> 
> hbase.zookeeper.property.clientPort
> 2181
> 
>
> 
> hbase.zookeeper.quorum
> node0.orighbasecluster.distsched-pg0.wisc.cloudlab.us
> 
>
> 
> hbase.ipc.server.read.threadpool.size
> 10
> 
>
> 
> hbase.regionserver.handler.count
> 30
> 
>
> 
>
>
>
> >
> > Cheers
> >
> > On Fri, Mar 31, 2017 at 7:29 PM, 杨苏立 Yang Su Li 
> > wrote:
> >
> > > Hi,
> > >
> > > We found that when there is a mix of CPU-intensive and I/O intensive
> > > workload, HBase seems to slow everything down to the disk throughput
> > level.
> > >
> > > This is shown in the performance graph at
> > > http://pages.cs.wisc.edu/~suli/blocking-orig.pdf : both client-1 and
> > > client-2 are issuing 1KB Gets. From second 0 , both repeatedly access a
> > > small set of data that is cachable and both get high throughput (~45k
> > > ops/s). At second 60, client-1 switch to an I/O intensive workload and
> > > begins to randomly access a large set of data (does not fit in cache).
> > > *Both* client-1 and client-2's throughput drops to ~0.5K ops/s.
> > >
> > > Is this acceptable behavior for HBase or is it considered a bug or
> > > performance drawback?
> > > I can find an old JIRA entry about similar problems (
> > > https://issues.apache.org/jira/browse/HBASE-8836), but that was never
> > > resolved.
> > >
> > > Thanks.
> > >
> > > Suli
> > >
> > > --
> > > Suli Yang
> > >
> > > Department of Physics
> > > University of Wisconsin Madison
> > >
> > > 4257 Chamberlin Hall
> > > Madison WI 53703
> > >
> >
>
>
>
> --
> Suli Yang
>
> Department of Physics
> University of Wisconsin Madison
>
> 4257 Chamberlin Hall
> Madison WI 53703
>


Re: Updated Code of Conduct

2017-04-01 Thread Stack
On Fri, Mar 31, 2017 at 3:11 PM, Misty Stanley-Jones 
wrote:

> All,
>
> We have updated the Code of Conduct to be a little more explicit about
> how much we value diversity in the HBase project, and to ask for your
> feedback on how we can improve. Your feedback is ALWAYS welcome, whether
> on one of the public mailing lists or privately to a committer or a PMC
> member on the project. On behalf of the entire PMC, thank you for being
> part of the HBase project, and for all of the amazing work you all do!
>
> See the changes at http://hbase.apache.org/coc.html.
>
> Thanks,
> Misty
>

+1
St.Ack


Re: Successful: HBase Generate Website

2017-04-01 Thread Stack
Hot Dog!

On Fri, Mar 31, 2017 at 3:03 PM, Misty Stanley-Jones 
wrote:

> FYI, the linked Jenkins job now automatically updates the site! No more
> need to manually push. Merry Christmas! :)
>
> - Original message -
> From: Apache Jenkins Server 
> To: dev@hbase.apache.org
> Subject: Successful: HBase Generate Website
> Date: Fri, 31 Mar 2017 21:32:17 + (UTC)
>
> Build status: Successful
>
> If successful, the website and docs have been generated and the site has
> been updated automatically.
> If failed, see
> https://builds.apache.org/job/hbase_generate_website/561/console
>
> YOU DO NOT NEED TO DO THE FOLLOWING ANYMORE! It is here for
> informational purposes and shows what the Jenkins job does to push the
> site.
>
>   git clone https://git-wip-us.apache.org/repos/asf/hbase-site.git
>   cd hbase-site
>   wget -O-
>   https://builds.apache.org/job/hbase_generate_website/561/
> artifact/website.patch.zip
>   | funzip > 1c4d9c8965952cbd17f0afdacbb0c0ac1e5bd1d7.patch
>   git fetch
>   git checkout -b asf-site-1c4d9c8965952cbd17f0afdacbb0c0ac1e5bd1d7
>   origin/asf-site
>   git am --whitespace=fix 1c4d9c8965952cbd17f0afdacbb0c0ac1e5bd1d7.patch
>   git push origin
>   asf-site-1c4d9c8965952cbd17f0afdacbb0c0ac1e5bd1d7:asf-site
>   git commit --allow-empty -m "INFRA-10751 Empty commit"
>   git push origin asf-site
>   git checkout asf-site
>   git branch -D asf-site-1c4d9c8965952cbd17f0afdacbb0c0ac1e5bd1d7
>
>
>
>


[jira] [Created] (HBASE-17867) Implement async procedure RPC API(list/exec/abort/isFinished)

2017-04-01 Thread Zheng Hu (JIRA)
Zheng Hu created HBASE-17867:


 Summary: Implement async procedure RPC 
API(list/exec/abort/isFinished)
 Key: HBASE-17867
 URL: https://issues.apache.org/jira/browse/HBASE-17867
 Project: HBase
  Issue Type: Sub-task
Reporter: Zheng Hu






--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (HBASE-17866) Implement async setQuota/getQuota methods.

2017-04-01 Thread Zheng Hu (JIRA)
Zheng Hu created HBASE-17866:


 Summary: Implement async setQuota/getQuota methods.
 Key: HBASE-17866
 URL: https://issues.apache.org/jira/browse/HBASE-17866
 Project: HBase
  Issue Type: Sub-task
Reporter: Zheng Hu
Assignee: Zheng Hu






--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (HBASE-17865) Implement async listSnapshot/deleteSnapshot methods.

2017-04-01 Thread Zheng Hu (JIRA)
Zheng Hu created HBASE-17865:


 Summary: Implement async listSnapshot/deleteSnapshot methods.
 Key: HBASE-17865
 URL: https://issues.apache.org/jira/browse/HBASE-17865
 Project: HBase
  Issue Type: Sub-task
Reporter: Zheng Hu
Assignee: Zheng Hu






--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (HBASE-17864) Implement async snapshot/cloneSnapshot/restoreSnapshot methods

2017-04-01 Thread Zheng Hu (JIRA)
Zheng Hu created HBASE-17864:


 Summary: Implement async snapshot/cloneSnapshot/restoreSnapshot 
methods
 Key: HBASE-17864
 URL: https://issues.apache.org/jira/browse/HBASE-17864
 Project: HBase
  Issue Type: Sub-task
Reporter: Zheng Hu
Assignee: Zheng Hu






--
This message was sent by Atlassian JIRA
(v6.3.15#6346)