ngsg commented on PR #4239:
URL: https://github.com/apache/hive/pull/4239#issuecomment-2577511386

   Thank you to @abstractdog and @deniskuzZ for reviewing the patch. After 
studying the issue again, I have concluded that the proposed patch is 
insufficient to fully address the issue, so I have decided to close this PR.
   
   We reviewed the issue and determined that this patch could be helpful for 
datasets where the null-distribution is similar to TPC-DS data. Specifically, 
if `numNulls` is closely related to the default partition(null partition), this 
improves the accuracy of the `numNulls` returned by `AggregateStatsCache`. 
However, not all datasets have a null distribution similar to TPC-DS, meaning 
this patch isn't universally applicable.
   
   Additionally, it appears that the non-deterministic behavior we encountered 
is unrelated to `numNulls`. We observed that `AggregateStatsCache` makes 
SemiJoin branch removal non-deterministic, but it uses `numRows` and `NDV`, not 
`numNulls`. Therefore, I think this non-deterministic behavior is not suitable 
for verifying the patch.
   
   While some optimizers may depend on `numNulls`, I have not yet been able to 
write a qfile to properly verify this patch. As a result, I believe it is best 
to close the PR for now and reopen it once I have either a more effective patch 
or a feasible qfile to verify this patch.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org

Reply via email to