[
https://issues.apache.org/jira/browse/HIVE-29646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18089401#comment-18089401
]
Stamatis Zampetakis commented on HIVE-29646:
--------------------------------------------
{quote}Ideally, statistics should be kept up to date, either through
incremental maintenance or via scheduled recomputation. If we start from the
assumption that statistics on external tables are inherently untrustworthy,
then I think a large class of cost-based optimizations becomes questionable,
not just runtime filtering.
In Iceberg table-level basic statistics are derived directly from the current
snapshot metadata and are therefore accurate by construction. The concern is
more around partition-level/column statistics, which can become stale when data
is modified.
{quote}
I fully agree. I don't know why runtime filtering became a special case but the
overall reasoning makes sense to me.
{quote}The property can still be used to guard operations where stale metadata
may affect query correctness, while allowing statistics-based planning
optimizations.
{quote}
I don't understand how the property can act as a guard with the way the code is
right now. Are you proposing to re-purpose the property for other needs or are
you referring to the actual state of the code in the repo/PR?
Currently, the property guards against SMB/Bucket join and runtime filtering
transformation on external tables. It does not have any effect on query
answering based on stats. Stat based optimizations on external tables are
explicitly prohibited and this is not configurable at the moment.
At this point, I am convinced by the proposal to lift the limitation of
performing these optimizations on external tables. The discussion above is
mainly to clarify the new purpose/effects of the existing property assuming
that we leave it in place.
> Enable semijoin reduction and map-join conversion on external tables with
> accurate statistics
> ---------------------------------------------------------------------------------------------
>
> Key: HIVE-29646
> URL: https://issues.apache.org/jira/browse/HIVE-29646
> Project: Hive
> Issue Type: Improvement
> Reporter: Denys Kuzmenko
> Priority: Major
> Labels: pull-request-available
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)