[ https://issues.apache.org/jira/browse/SPARK-19842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17028234#comment-17028234 ]
Dongjoon Hyun commented on SPARK-19842: --------------------------------------- I removed the `Target Version : 3.0.0` because we created `branch-3.0` and entered `Feature Freeze` phase. > Informational Referential Integrity Constraints Support in Spark > ---------------------------------------------------------------- > > Key: SPARK-19842 > URL: https://issues.apache.org/jira/browse/SPARK-19842 > Project: Spark > Issue Type: Improvement > Components: SQL > Affects Versions: 2.2.0 > Reporter: Ioana Delaney > Priority: Major > Attachments: InformationalRIConstraints.doc > > > *Informational Referential Integrity Constraints Support in Spark* > This work proposes support for _informational primary key_ and _foreign key > (referential integrity) constraints_ in Spark. The main purpose is to open up > an area of query optimization techniques that rely on referential integrity > constraints semantics. > An _informational_ or _statistical constraint_ is a constraint such as a > _unique_, _primary key_, _foreign key_, or _check constraint_, that can be > used by Spark to improve query performance. Informational constraints are not > enforced by the Spark SQL engine; rather, they are used by Catalyst to > optimize the query processing. They provide semantics information that allows > Catalyst to rewrite queries to eliminate joins, push down aggregates, remove > unnecessary Distinct operations, and perform a number of other optimizations. > Informational constraints are primarily targeted to applications that load > and analyze data that originated from a data warehouse. For such > applications, the conditions for a given constraint are known to be true, so > the constraint does not need to be enforced during data load operations. > The attached document covers constraint definition, metastore storage, > constraint validation, and maintenance. The document shows many examples of > query performance improvements that utilize referential integrity constraints > and can be implemented in Spark. > Link to the google doc: > [InformationalRIConstraints|https://docs.google.com/document/d/17r-cOqbKF7Px0xb9L7krKg2-RQB_gD2pxOmklm-ehsw/edit] -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org