[jira] [Commented] (SPARK-19842) Informational Referential Integrity Constraints Support in Spark

Ioana Delaney (JIRA) Thu, 20 Jul 2017 04:24:02 -0700

    [ 
https://issues.apache.org/jira/browse/SPARK-19842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16094547#comment-16094547
 ]


Ioana Delaney commented on SPARK-19842:
---------------------------------------

Yes, we've been working to productize our code. Our progress was slowed down 
due to some other ongoing projects, but we are planning to open the first round 
of PRs in the next few weeks if the community agrees with this feature. 


> Informational Referential Integrity Constraints Support in Spark
> ----------------------------------------------------------------
>
>                 Key: SPARK-19842
>                 URL: https://issues.apache.org/jira/browse/SPARK-19842
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 2.2.0
>            Reporter: Ioana Delaney
>         Attachments: InformationalRIConstraints.doc
>
>
> *Informational Referential Integrity Constraints Support in Spark*
> This work proposes support for _informational primary key_ and _foreign key 
> (referential integrity) constraints_ in Spark. The main purpose is to open up 
> an area of query optimization techniques that rely on referential integrity 
> constraints semantics. 
> An _informational_ or _statistical constraint_ is a constraint such as a 
> _unique_, _primary key_, _foreign key_, or _check constraint_, that can be 
> used by Spark to improve query performance. Informational constraints are not 
> enforced by the Spark SQL engine; rather, they are used by Catalyst to 
> optimize the query processing. They provide semantics information that allows 
> Catalyst to rewrite queries to eliminate joins, push down aggregates, remove 
> unnecessary Distinct operations, and perform a number of other optimizations. 
> Informational constraints are primarily targeted to applications that load 
> and analyze data that originated from a data warehouse. For such 
> applications, the conditions for a given constraint are known to be true, so 
> the constraint does not need to be enforced during data load operations. 
> The attached document covers constraint definition, metastore storage, 
> constraint validation, and maintenance. The document shows many examples of 
> query performance improvements that utilize referential integrity constraints 
> and can be implemented in Spark.
> Link to the google doc: 
> [InformationalRIConstraints|https://docs.google.com/document/d/17r-cOqbKF7Px0xb9L7krKg2-RQB_gD2pxOmklm-ehsw/edit]



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-19842) Informational Referential Integrity Constraints Support in Spark

Reply via email to