Hi Nick,
Yes I did find the root cause: I was running the Trafodion code in debug
mode. So the consequence was that the trafodion code was the bottleneck, and
not the HBase layer that was running in release mode. Therefore, when doing
ESP parallelism, I could see the perf increase, since I would run in
parallel the bottleneck code, while when doing parallel scanner, the
bottleneck code was not parallelized, and feeding it faster would make no
difference, by definition of bottleneck.

Now after correcting this by running in release mode, and using a simple
java program to do count based on the parallel scanner (to assess % of work
done in Trafodion stack) I did find the following results:
- When no predicate pushdown is involved, the work done by the trafodion
stack (get the row, create it in trafodion format, apply predicate, return
the row to the node) is consuming significant resource/time and do benefit
from parallelization. Therefore ESP parallelism (that is parallelizing this)
yield better result than scanner level parallelism.
On my test running on a single region server, counting a table of 10 000
000, split on 10 regions, with degree of parallelism set to 2, running 10
times and averaging, I get:
Parallel scan 2 threads running Hbase 98 scanner with HBase 9272 : 39.9s
2 ESP running HBase 1.0 scanner:  30 s

So there is a 25% advantage with ESP parallelism.

Now, you may wonder how much of this 25% comes from parallelizing the
Trafodion work, vs other issues (Hbase 98 scanner vs Hbase 1.0 scanner,
synchronization in Hbase 9272)?

Other test comparing trafodion work vs the simple java counter shows that
trafodion work is adding about 16% for a count job on a single thread.

Other test, looking at Java test Parallel Scan 2 thread vs 2ESP Hbase 1.0
client scanner: 30.5 vs 30 -> is indicating that parallel scanner should be
able to get some more optimization, but that may be because of the low
parallelism level I picked.

So conclusion:
- Until Trafodion have more cases where predicate push down can be used, ESP
parallelism is the way to go
- When we reach the point where predicate push down cover more cases, I
suspect that HBase 9272 can use another optimization pass, as eluded in the
JIRA comments around being probably still conservative on synchronization...

I am therefore putting this work to "sleep", and will focus on increasing
the cases where pushdown are possible in trafodion. This will yield better
ROI from a perf stand point than providing another parallelism option. Once
we reach a point where pushdown are prevalent, I will resume and see if we
can get to a point where scanner level parallelism make sense vs ESP.
Another important point is that trying to use scanner parallelism vs ESP is
about resource utilization: but with today's contention on RS size due to
stop the world GC pauses, at least on physical cluster implementation, we
are having a lot more memory that we can use. So ESP resource utilization is
less of an issue... the story is different on cloud VM based implementation,
but it is too early to know where we should focus...

Note: these tests were done forcing Trafodion to bypass the default count
path that is really using coproc aggregation, and is a lot faster than
client side counting...

Hope this make sense,
Eric




-----Original Message-----
From: Nick Dimiduk [mailto:ndimi...@apache.org]
Sent: Thursday, October 29, 2015 12:09 AM
To: hbase-dev <dev@hbase.apache.org>
Subject: Re: trying HBase-9272 style parallel scanner for Trafodion-1421

Hi Eric,

Did you ever get to the bottom of this? Maybe you can pastebin some jstack's
while your scanners are running with the different modes?

-n

On Thu, Oct 15, 2015 at 7:58 AM, Eric Owhadi <eric.owh...@esgyn.com> wrote:
> Hi Hbasers,
>
>
>
> I am experimenting with HBase-9272 (a parallel unordered scanner) to
> provide Trafodion a primitive that can be used when we need to scan
> regions as fast as possible and don’t care about order.
>
> However, while HBase-9272 has been optimized to round robin on region
> servers, the primitive I am trying to implement is supposed to “most
>  often”
> perform parallel scan from a single region server, where our ESP
> (Executor Server Process) is located -> so it should be able to use
> short circuit HConnection.
>
>
>
> Today, trafodion can perform this parallel scanning task by launching
> multiple ESP per region server. However, this is higher resource cost
> than would be thread level parallelism, as ESP are processes.
>
>
>
> After having adapted HBase-9272 to work on HBase scanner 1.0 and the
> interference of the replica feature (I actually ported back
> ClientScanner of Hbase 98 to use HBase-9272 with Hbase 1.0), I got
> things working but I am getting unexpected results:
>
>
>
> Running on a single node HBase (the development environment), when I
> use the ESP parallelism I am getting the expected gain in performance,
> but when using the HBase-9272 I am not seeing any benefit compared to
> single regular scan.
>
>
>
> So I am wondering if I am not bottlenecking on a shared connection
> when using HBase-9272 that would not let concurrency happen when
> running against the same region server? Or what else could that be?
>
> In my testing, I use a full scan on a table with 10 regions, and limit
> the degree of parallelism to 2, just to be able to compare between ESP
> parallelism and thread level parallelism.
>
>
>
> Thanks in advance for the help,
>
> Eric Owhadi

Reply via email to