Ioana Delaney created SPARK-23757:
-------------------------------------

             Summary: [Performance] Star schema detection improvements
                 Key: SPARK-23757
                 URL: https://issues.apache.org/jira/browse/SPARK-23757
             Project: Spark
          Issue Type: Sub-task
          Components: SQL
    Affects Versions: 3.0.0
            Reporter: Ioana Delaney


Star schema consists of one or more fact tables referencing a number of 
dimension tables. Queries against star schema are expected to run fast because 
of the established RI constraints among the tables. In general, star schema 
joins are detected using the following conditions:

1. RI constraints (reliable detection)
* Dimension contains a primary key that is being joined to the fact table.
* Fact table contains foreign keys referencing multiple dimension tables.

2. Cardinality based heuristics
* Usually, the table with the highest cardinality is the fact table.


Existing SPARK-17791 uses a combination of the above two conditions to detect 
and optimize star joins. With support for informational RI constraints, the 
algorithm in SPARK-17791 can be improved with reliable RI detection.




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to