My 2 cents:
From Apache point of view it is OK to do a release even if unit tests
do not pass at all or there is a large number of regression introduced.
Apache release is a source release and as long as it compiles and does
not have license issues, it is up to community (PMC) to decide on any
other criteria for a release.
The issue in DRILL-6453 is not limited to a large number of hash joins.
It should be possible to reproduce it even with a single hash join as
long as left and right sides are getting batches from one(many) to many
exchanges (broadcast or hash partitioner senders).
Thank you,
Vlad
On 7/13/18 08:41, Aman Sinha wrote:
I would say we have to take a measured approach to this and decide on a
case-by-case which issue is a show stopper.
While of course we have to make every effort to avoid regression, we cannot
claim that a particular release will not cause any regression.
I believe there are 10000+ passing tests, so that should provide a level
of confidence. The TPC-DS 72 is a 10 table join which in the hadoop world
of
denormalized schemas is not relatively common. The main question is does
the issue reproduce with fewer joins having the same type of distribution
plan ?
Aman
On Fri, Jul 13, 2018 at 7:36 AM Arina Yelchiyeva <[email protected]>
wrote:
We cannot release with existing regressions, especially taking into account
the there are not minor issues.
As far as I understand reverting is not an option since hash join spill
feature are extended into several commits + subsequent fixes.
I guess we need to consider postponing the release until issues are
resolved.
Kind regards,
Arina
On Fri, Jul 13, 2018 at 5:14 PM Boaz Ben-Zvi <[email protected]> wrote:
(Guessing ...) It is possible that the root cause for DRILL-6606 is
similar to that in DRILL-6453 -- that is the new "early sniffing" in the
Hash-Join, which repeatedly invokes next() on the two "children" of the
join *during schema discovery* until non-empty data is returned (or NONE,
STOP, etc). Last night Salim, Vlad and I briefly discussed alternatives,
like postponing the "sniffing" to a later time (beginning of the build
for
the right child, and beginning of the probe for the left child).
However this would require some work time. So what should we do about
1.14
?
Thanks,
Boaz
On Fri, Jul 13, 2018 at 3:46 AM, Arina Yelchiyeva <
[email protected]> wrote:
During implementing late limit 0 optimization, Bohdan has found one more
regression after Hash Join spill to disk.
https://issues.apache.org/jira/browse/DRILL-6606
<
https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_DRILL-2D6606&d=DwMFaQ&c=cskdkSMqhcnjZxdQVpwTXg&r=7lXQnf0aC8VQ0iMXwVgNHw&m=OHnyHeZpNk3hcwkG-JoQG6E90tKdoS47J1rv5x-hJzw&s=wm5zpJf9K2zYzrqRB1LqLpKcvmBK5y6XC0ZUqVmSjko&e=
Boaz please take a look.
Kind regards,
Arina