Hi Hbasers,
I am experimenting with HBase-9272 (a parallel unordered scanner) to provide Trafodion a primitive that can be used when we need to scan regions as fast as possible and don’t care about order. However, while HBase-9272 has been optimized to round robin on region servers, the primitive I am trying to implement is supposed to “most often” perform parallel scan from a single region server, where our ESP (Executor Server Process) is located -> so it should be able to use short circuit HConnection. Today, trafodion can perform this parallel scanning task by launching multiple ESP per region server. However, this is higher resource cost than would be thread level parallelism, as ESP are processes. After having adapted HBase-9272 to work on HBase scanner 1.0 and the interference of the replica feature (I actually ported back ClientScanner of Hbase 98 to use HBase-9272 with Hbase 1.0), I got things working but I am getting unexpected results: Running on a single node HBase (the development environment), when I use the ESP parallelism I am getting the expected gain in performance, but when using the HBase-9272 I am not seeing any benefit compared to single regular scan. So I am wondering if I am not bottlenecking on a shared connection when using HBase-9272 that would not let concurrency happen when running against the same region server? Or what else could that be? In my testing, I use a full scan on a table with 10 regions, and limit the degree of parallelism to 2, just to be able to compare between ESP parallelism and thread level parallelism. Thanks in advance for the help, Eric Owhadi