That's a tricky one. I have a couple of ideas but it's a bit difficult to confirm since the profile isn't really designed to easily answer questions like this. ProbeTime measures wall-clock time rather than actual time spent executing on the CPU.
My first guess is that it's because the Kudu scan is using more CPU than the Parquet scan and ProbeTime so the thread doing the scan is competing for resources more with the join thread (either hyperthreads competing for resources on the same CPU or threads competing for time on logical processors). You could compare the User CPU time for the fragment instances containing the joins and scans to see if there is also a discrepancy in CPU time. That won't answer this directly but might provide some clues. My second guess is that there's some kind of subtle memory locality or scheduling effect. On Tue, Jan 2, 2018 at 11:40 PM, helifu <[email protected]> wrote: > Hi everybody, > > > > Recently I ran a simple PHJ on‘parquet’ and ‘kudu’with this sql > independently: > > select count(*) from lineitem as l, orders as o where l.l_orderkey = > o.o_orderkey and o.o_orderdate < '1996-01-01' and o.o_orderdate >= > '1995-01-01'; > > > > And I found that the ‘ProbeTime’ on ‘kudu’is much larger than on > ‘parquet’!! > > > > Below are the plans and profiles: > > > > Thanks in advance. > > > > > > 何李夫 > > 2017-04-10 16:06:24 > > >
