Hi Hbasers,


I am experimenting with HBase-9272 (a parallel unordered scanner) to
provide Trafodion a primitive that can be used when we need to scan regions
as fast as possible and don’t care about order.

However, while HBase-9272 has been optimized to round robin on region
servers, the primitive I am trying to implement is supposed to “most often”
perform parallel scan from a single region server, where our ESP (Executor
Server Process) is located -> so it should be able to use short circuit
HConnection.



Today, trafodion can perform this parallel scanning task by launching
multiple ESP per region server. However, this is higher resource cost than
would be thread level parallelism, as ESP are processes.



After having adapted HBase-9272 to work on HBase scanner 1.0 and the
interference of the replica feature (I actually ported back ClientScanner
of Hbase 98 to use HBase-9272 with Hbase 1.0), I got things working but I
am getting unexpected results:



Running on a single node HBase (the development environment), when I use
the ESP parallelism I am getting the expected gain in performance, but when
using the HBase-9272 I am not seeing any benefit compared to single regular
scan.



So I am wondering if I am not bottlenecking on a shared connection when
using HBase-9272 that would not let concurrency happen when running against
the same region server? Or what else could that be?

In my testing, I use a full scan on a table with 10 regions, and limit the
degree of parallelism to 2, just to be able to compare between ESP
parallelism and thread level parallelism.



Thanks in advance for the help,

Eric Owhadi

Reply via email to