bq.  increasing number of threads seems to be an incomplete solution

I agree - since the number of concurrent IO intensive client requests is
unpredictable.

One possible solution, in case of cache miss for the reads, is thru
mechanism similar to Java 8's CompletableFuture.

See this post:
https://www.infoq.com/articles/Functional-Style-Callbacks-Using-CompletableFuture

The result of HDFS reads, when ready, can be channeled back thru functions
specified to methods thenApply / thenAccept so that the handler can be
non-blocking.

Cheers

On Sat, Apr 1, 2017 at 5:46 PM, 杨苏立 Yang Su Li <yangs...@gmail.com> wrote:

> Yes, that is indeed the problem. It is caused by
>
> 1) HBase has a fixed number (by default 30) of RPC handlers (a reasonable
> design choice)
> 2) RPC handlers block on HDFS reads (also a reasonable design choice)
>
> As the system has a higher load of I/O intensive workloads, all RPC
> handlers would be blocked and no progress can be made for requests that do
> not require I/O.
>
> However, increasing number of threads seems to be an incomplete solution --
> you run into the same problem with higher load of I/O intensive
> workloads...
>
>
>
> On Sat, Apr 1, 2017 at 3:47 PM, Enis Söztutar <enis....@gmail.com> wrote:
>
> > I think the problem is that you ONLY have 30 "handler" threads (
> > hbase.regionserver.handler.count). Handlers are the main thread pool
> that
> > executes the RPC requests. When you do an IO bound requests, very likely
> > all of the 30 threads are just blocked by the disk access, so that the
> > total throughput drops.
> >
> > It is typical to run with 100-300 threads on the regionserver side,
> > depending on your settings. You can use the "Debug dump" from the
> > regionserver we UI or jstack to inspect what the "handler" threads are
> > doing.
> >
> > Enis
> >
> > On Fri, Mar 31, 2017 at 7:57 PM, 杨苏立 Yang Su Li <yangs...@gmail.com>
> > wrote:
> >
> > > On Fri, Mar 31, 2017 at 9:39 PM, Ted Yu <yuzhih...@gmail.com> wrote:
> > >
> > > > Can you tell us which release of hbase you used ?
> > > >
> > >
> > > 2.0.0 Snapshot
> > >
> > > >
> > > > Please describe values for the config parameters in hbase-site.xml
> > > >
> > > > The content of hbase-site.xml is shown below, but indeed this problem
> > is
> > > not sensitive to configuration -- we can reproduce the same problem
> with
> > > different configurations, and across different hbase version.
> > >
> > >
> > > > Do you have SSD(s) in your cluster ?
> > > > If so and the mixed workload involves writes, have you taken a look
> at
> > > > HBASE-12848
> > > > ?
> > > >
> > > No, we don't use SSD (for hbase). And the workload does not involve
> > writes
> > > (even though workload with writes show similar behavior). I stated that
> > > both clients are doing 1KB Gets.
> > >
> > > <configuration>
> > >
> > > <property>
> > > <name>hbase-master</name>
> > > <value>node0.orighbasecluster.distsched-pg0.wisc.cloudlab.us:60000
> > </value>
> > > </property>
> > >
> > > <property>
> > > <name>hbase.rootdir</name>
> > > <value>hdfs://
> > > node0.orighbasecluster.distsched-pg0.wisc.cloudlab.us:9000/hbase
> </value>
> > > </property>
> > >
> > > <property>
> > > <name>hbase.fs.tmp.dir</name>
> > > <value>hdfs://
> > > node0.orighbasecluster.distsched-pg0.wisc.cloudlab.
> us:9000/hbase-staging
> > > </value>
> > > </property>
> > >
> > > <property>
> > > <name>hbase.cluster.distributed</name>
> > > <value>true</value>
> > > </property>
> > >
> > > <property>
> > > <name>hbase.zookeeper.property.dataDir</name>
> > > <value>/tmp/zookeeper</value>
> > > </property>
> > >
> > > <property>
> > > <name>hbase.zookeeper.property.clientPort</name>
> > > <value>2181</value>
> > > </property>
> > >
> > > <property>
> > > <name>hbase.zookeeper.quorum</name>
> > > <value>node0.orighbasecluster.distsched-pg0.wisc.cloudlab.us</value>
> > > </property>
> > >
> > > <property>
> > >     <name>hbase.ipc.server.read.threadpool.size</name>
> > >     <value>10</value>
> > > </property>
> > >
> > > <property>
> > > <name>hbase.regionserver.handler.count</name>
> > > <value>30</value>
> > > </property>
> > >
> > > </configuration>
> > >
> > >
> > >
> > > >
> > > > Cheers
> > > >
> > > > On Fri, Mar 31, 2017 at 7:29 PM, 杨苏立 Yang Su Li <yangs...@gmail.com>
> > > > wrote:
> > > >
> > > > > Hi,
> > > > >
> > > > > We found that when there is a mix of CPU-intensive and I/O
> intensive
> > > > > workload, HBase seems to slow everything down to the disk
> throughput
> > > > level.
> > > > >
> > > > > This is shown in the performance graph at
> > > > > http://pages.cs.wisc.edu/~suli/blocking-orig.pdf : both client-1
> and
> > > > > client-2 are issuing 1KB Gets. From second 0 , both repeatedly
> > access a
> > > > > small set of data that is cachable and both get high throughput
> (~45k
> > > > > ops/s). At second 60, client-1 switch to an I/O intensive workload
> > and
> > > > > begins to randomly access a large set of data (does not fit in
> > cache).
> > > > > *Both* client-1 and client-2's throughput drops to ~0.5K ops/s.
> > > > >
> > > > > Is this acceptable behavior for HBase or is it considered a bug or
> > > > > performance drawback?
> > > > > I can find an old JIRA entry about similar problems (
> > > > > https://issues.apache.org/jira/browse/HBASE-8836), but that was
> > never
> > > > > resolved.
> > > > >
> > > > > Thanks.
> > > > >
> > > > > Suli
> > > > >
> > > > > --
> > > > > Suli Yang
> > > > >
> > > > > Department of Physics
> > > > > University of Wisconsin Madison
> > > > >
> > > > > 4257 Chamberlin Hall
> > > > > Madison WI 53703
> > > > >
> > > >
> > >
> > >
> > >
> > > --
> > > Suli Yang
> > >
> > > Department of Physics
> > > University of Wisconsin Madison
> > >
> > > 4257 Chamberlin Hall
> > > Madison WI 53703
> > >
> >
>
>
>
> --
> Suli Yang
>
> Department of Physics
> University of Wisconsin Madison
>
> 4257 Chamberlin Hall
> Madison WI 53703
>

Reply via email to