[jira] [Updated] (SPARK-23757) [Performance] Star schema detection improvements

Dongjoon Hyun (Jira) Sat, 28 Mar 2020 19:38:09 -0700


     [ 
https://issues.apache.org/jira/browse/SPARK-23757?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Dongjoon Hyun updated SPARK-23757:
----------------------------------
    Affects Version/s:     (was: 3.0.0)
                       3.1.0

> [Performance] Star schema detection improvements
> ------------------------------------------------
>
>                 Key: SPARK-23757
>                 URL: https://issues.apache.org/jira/browse/SPARK-23757
>             Project: Spark
>          Issue Type: Sub-task
>          Components: SQL
>    Affects Versions: 3.1.0
>            Reporter: Ioana Delaney
>            Priority: Major
>
> Star schema consists of one or more fact tables referencing a number of 
> dimension tables. Queries against star schema are expected to run fast 
> because of the established RI constraints among the tables. In general, star 
> schema joins are detected using the following conditions:
> 1. RI constraints (reliable detection)
> * Dimension contains a primary key that is being joined to the fact table.
> * Fact table contains foreign keys referencing multiple dimension tables.
> 2. Cardinality based heuristics
> * Usually, the table with the highest cardinality is the fact table.
> Existing SPARK-17791 uses a combination of the above two conditions to detect 
> and optimize star joins. With support for informational RI constraints, the 
> algorithm in SPARK-17791 can be improved with reliable RI detection.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-23757) [Performance] Star schema detection improvements

Reply via email to