zabetak commented on code in PR #6382:
URL: https://github.com/apache/hive/pull/6382#discussion_r3065427664
##########
ql/src/java/org/apache/hadoop/hive/ql/parse/TezCompiler.java:
##########
@@ -1977,8 +1977,9 @@ private void
removeSemijoinOptimizationByBenefit(OptimizeTezProcContext procCtx)
LOG.debug("Old stats for {}: {}", roi.filterOperator,
roi.filterStats);
LOG.debug("Number of rows reduction: {}/{}", newNumRows,
roi.filterStats.getNumRows());
}
+ boolean useColStats = roi.filterStats.getColumnStats() != null;
StatsUtils.updateStats(roi.filterStats, newNumRows,
- true, roi.filterOperator, roi.colNames);
+ useColStats, roi.filterOperator, roi.colNames);
Review Comment:
The modifications to the test were pushing the query closer to the failure
but they were not enough to trigger the NPE and hit the problematic code. I
played a bit with the code and managed to trigger the NPE using the test added
in
https://github.com/apache/hive/pull/6382/commits/7328e0f5a0a5cc1157433ce5ca23956904ae5270
In addition, I removed various redundant properties and renamed a bit the
tables to make the test more readable.
With these changes the PR should be ready to merge.
##########
ql/src/test/queries/clientpositive/semijoin_stats_missing_colstats.q:
##########
@@ -0,0 +1,45 @@
+-- HIVE-29516: Test that semijoin optimization handles missing column
statistics gracefully
Review Comment:
Fixed by
https://github.com/apache/hive/pull/6382/commits/7328e0f5a0a5cc1157433ce5ca23956904ae5270
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]