[ https://issues.apache.org/jira/browse/SPARK-23757?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Dongjoon Hyun updated SPARK-23757: ---------------------------------- Affects Version/s: (was: 3.0.0) 3.1.0 > [Performance] Star schema detection improvements > ------------------------------------------------ > > Key: SPARK-23757 > URL: https://issues.apache.org/jira/browse/SPARK-23757 > Project: Spark > Issue Type: Sub-task > Components: SQL > Affects Versions: 3.1.0 > Reporter: Ioana Delaney > Priority: Major > > Star schema consists of one or more fact tables referencing a number of > dimension tables. Queries against star schema are expected to run fast > because of the established RI constraints among the tables. In general, star > schema joins are detected using the following conditions: > 1. RI constraints (reliable detection) > * Dimension contains a primary key that is being joined to the fact table. > * Fact table contains foreign keys referencing multiple dimension tables. > 2. Cardinality based heuristics > * Usually, the table with the highest cardinality is the fact table. > Existing SPARK-17791 uses a combination of the above two conditions to detect > and optimize star joins. With support for informational RI constraints, the > algorithm in SPARK-17791 can be improved with reliable RI detection. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org