Zhenhua Wang created SPARK-22310:
------------------------------------

             Summary: Refactor join estimation to incorporate estimation logic 
for different kinds of statistics
                 Key: SPARK-22310
                 URL: https://issues.apache.org/jira/browse/SPARK-22310
             Project: Spark
          Issue Type: Sub-task
          Components: SQL
    Affects Versions: 2.3.0
            Reporter: Zhenhua Wang


The current join estimation logic is only based on basic column statistics 
(such as ndv, etc). If we want to add estimation for other kinds of statistics 
(such as histograms), it's not easy to incorporate into the current algorithm:
1. When we have multiple pairs of join keys, the current algorithm computes 
cardinality in a single formula. But if different join keys have different 
kinds of stats, the computation logic for each pair of join keys become 
different, so the previous formula does not apply.
2. Currently it computes cardinality and updates join keys' column stats 
separately. It's better to do these two steps together, since both computation 
and update logic are different for different kinds of stats.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to