Re: Asking for code review: HIVE-26968, HIVE-26986, HIVE-27006

2023-02-14 Thread Sungwoo Park

Hello Alessandro,

Thank you for the comment. HIVE-27006 makes sense only after HIVE-26986 is 
merged, so the failing tests can be taken care of later. HIVE-27006 does not 
affect the results of TPC-DS queries, so Seunggon (who created the JIRAs) can 
focus on HIVE-26968 first.


To reproduce the bugs, one should build TPC-DS datasets (using Iceberg 
for HIVE-26968) and execute queries. Checking the output plan using a Metastore 
image is not enough.


Best regards,

--- Sungwoo

On Tue, 14 Feb 2023, Alessandro Solimando wrote:


Hi Sungwoo,
thanks for bringing this up, IMO correctness issues should be set to
"Blocker" level in Jira, therefore no 4.0.0 should be released before
fixing the aforementioned tickets.

The patches seem well though and solid from a cursory look, but they
fall outside of my area of expertise, I don't have time right now to review
them because I would first need to understand Shared Work Optimizer first,
which is non-trivial.

I have nonetheless approved the blocked workflows (for first time
contributors some need a committer to run them), I have also noticed
that HIVE-27006 has failing tests, so in the meantime those failures could
be addressed.

Another action that will probably get you closer to having the PRs in is to
address (some of) the code smells/issues that Sonar has identified (from a
cursory look there were some unused imports etc.), the neater the PR the
lesser the time a reviewer will need, the higher the chances they get
reviewed.

Best regards,
Alessandro

On Tue, 14 Feb 2023 at 15:06, Sungwoo Park  wrote:


Seonggon created three JIRAs a while ago which affect the result of TPC-DS
queries,
and I wonder if anyone would have time for reviewing the pull requests.

HIVE-26968: SharedWorkOptimizer merges TableScan operators that have
different DPP parents
HIVE-26986: A DAG created by OperatorGraph is not equal to the Tez DAG.
HIVE-27006: ParallelEdgeFixer inserts misconfigured operator and does not
connect it in Tez DAG

In the current build, TPC-DS query 64 returns wrong results (no rows) on
Iceberg tables.
This is fixed in HIVE-26968.

TPC-DS query 71 fails with an error ("cannot find _col0 from []").
This is fixed in HIVE-26986.

HIVE-27006 fixes a bug which we found while testing with TPC-DS queries.
(It depends on HIVE-26986.)

I hope these JIRAs are merged to the master branch before the release of
Hive 4.0.0.
Considering the maturity of Hive and the impending release of Hive 4.0.0,
it does not seem like a good plan to release Hive 4.0.0 that fails on some
TPC-DS queries.

Thanks!

Sungwoo Park





Re: Asking for code review: HIVE-26968, HIVE-26986, HIVE-27006

2023-02-14 Thread Alessandro Solimando
Hi Sungwoo,
thanks for bringing this up, IMO correctness issues should be set to
"Blocker" level in Jira, therefore no 4.0.0 should be released before
fixing the aforementioned tickets.

The patches seem well though and solid from a cursory look, but they
fall outside of my area of expertise, I don't have time right now to review
them because I would first need to understand Shared Work Optimizer first,
which is non-trivial.

I have nonetheless approved the blocked workflows (for first time
contributors some need a committer to run them), I have also noticed
that HIVE-27006 has failing tests, so in the meantime those failures could
be addressed.

Another action that will probably get you closer to having the PRs in is to
address (some of) the code smells/issues that Sonar has identified (from a
cursory look there were some unused imports etc.), the neater the PR the
lesser the time a reviewer will need, the higher the chances they get
reviewed.

Best regards,
Alessandro

On Tue, 14 Feb 2023 at 15:06, Sungwoo Park  wrote:

> Seonggon created three JIRAs a while ago which affect the result of TPC-DS
> queries,
> and I wonder if anyone would have time for reviewing the pull requests.
>
> HIVE-26968: SharedWorkOptimizer merges TableScan operators that have
> different DPP parents
> HIVE-26986: A DAG created by OperatorGraph is not equal to the Tez DAG.
> HIVE-27006: ParallelEdgeFixer inserts misconfigured operator and does not
> connect it in Tez DAG
>
> In the current build, TPC-DS query 64 returns wrong results (no rows) on
> Iceberg tables.
> This is fixed in HIVE-26968.
>
> TPC-DS query 71 fails with an error ("cannot find _col0 from []").
> This is fixed in HIVE-26986.
>
> HIVE-27006 fixes a bug which we found while testing with TPC-DS queries.
> (It depends on HIVE-26986.)
>
> I hope these JIRAs are merged to the master branch before the release of
> Hive 4.0.0.
> Considering the maturity of Hive and the impending release of Hive 4.0.0,
> it does not seem like a good plan to release Hive 4.0.0 that fails on some
> TPC-DS queries.
>
> Thanks!
>
> Sungwoo Park
>


Asking for code review: HIVE-26968, HIVE-26986, HIVE-27006

2023-02-14 Thread Sungwoo Park
Seonggon created three JIRAs a while ago which affect the result of TPC-DS queries, 
and I wonder if anyone would have time for reviewing the pull requests.


HIVE-26968: SharedWorkOptimizer merges TableScan operators that have different 
DPP parents
HIVE-26986: A DAG created by OperatorGraph is not equal to the Tez DAG.
HIVE-27006: ParallelEdgeFixer inserts misconfigured operator and does not 
connect it in Tez DAG

In the current build, TPC-DS query 64 returns wrong results (no rows) on 
Iceberg tables.
This is fixed in HIVE-26968.

TPC-DS query 71 fails with an error ("cannot find _col0 from []").
This is fixed in HIVE-26986.

HIVE-27006 fixes a bug which we found while testing with TPC-DS queries.
(It depends on HIVE-26986.)

I hope these JIRAs are merged to the master branch before the release of Hive 
4.0.0.
Considering the maturity of Hive and the impending release of Hive 4.0.0,
it does not seem like a good plan to release Hive 4.0.0 that fails on some 
TPC-DS queries.

Thanks!

Sungwoo Park