My 2 cents:

From Apache point of view it is OK to do a release even if unit tests do not pass at all or there is a large number of regression introduced. Apache release is a source release and as long as it compiles and does not have license issues, it is up to community (PMC) to decide on any other criteria for a release.

The issue in DRILL-6453 is not limited to a large number of hash joins. It should be possible to reproduce it even with a single hash join as long as left and right sides are getting batches from one(many) to many exchanges (broadcast or hash partitioner senders).

Thank you,

Vlad

On 7/13/18 08:41, Aman Sinha wrote:
I would say we have to take a measured approach to this and decide on a
case-by-case which issue is a show stopper.
While of course we have to make every effort to avoid regression, we cannot
claim that a particular release will not cause any regression.
I believe there are 10000+ passing tests,  so that should provide a level
of confidence.   The TPC-DS 72 is a 10 table join which in the hadoop world
of
denormalized schemas is not relatively common.  The main question is does
the issue reproduce with fewer joins having the same type of distribution
plan ?


Aman

On Fri, Jul 13, 2018 at 7:36 AM Arina Yelchiyeva <[email protected]>
wrote:

We cannot release with existing regressions, especially taking into account
the there are not minor issues.
As far as I understand reverting is not an option since hash join spill
feature are extended into several commits + subsequent fixes.
I guess we need to consider postponing the release until issues are
resolved.

Kind regards,
Arina

On Fri, Jul 13, 2018 at 5:14 PM Boaz Ben-Zvi <[email protected]> wrote:

(Guessing ...) It is possible that the root cause for DRILL-6606 is
similar to that in  DRILL-6453 -- that is the new "early sniffing" in the
Hash-Join, which repeatedly invokes next() on the two "children" of the
join *during schema discovery* until non-empty data is returned (or NONE,
STOP, etc).  Last night Salim, Vlad and I briefly discussed alternatives,
like postponing the "sniffing" to a later time (beginning of the build
for
the right child, and beginning of the probe for the left child).

However this would require some work time. So what should we do about
1.14
?

   Thanks,

           Boaz

On Fri, Jul 13, 2018 at 3:46 AM, Arina Yelchiyeva <
[email protected]> wrote:

During implementing late limit 0 optimization, Bohdan has found one more
regression after Hash Join spill to disk.
https://issues.apache.org/jira/browse/DRILL-6606
<
https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_DRILL-2D6606&d=DwMFaQ&c=cskdkSMqhcnjZxdQVpwTXg&r=7lXQnf0aC8VQ0iMXwVgNHw&m=OHnyHeZpNk3hcwkG-JoQG6E90tKdoS47J1rv5x-hJzw&s=wm5zpJf9K2zYzrqRB1LqLpKcvmBK5y6XC0ZUqVmSjko&e=
Boaz please take a look.

Kind regards,
Arina



Reply via email to