[jira] [Commented] (SPARK-24130) Data Source V2: Join Push Down
[ https://issues.apache.org/jira/browse/SPARK-24130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16857006#comment-16857006 ] Parshuram V Patki commented on SPARK-24130: --- Do we have any traction on this? > Data Source V2: Join Push Down > -- > > Key: SPARK-24130 > URL: https://issues.apache.org/jira/browse/SPARK-24130 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 2.3.0 >Reporter: Jia Li >Priority: Major > Attachments: Data Source V2 Join Push Down.pdf > > > Spark applications often directly query external data sources such as > relational databases, or files. Spark provides Data Sources APIs for > accessing structured data through Spark SQL. Data Sources APIs in both V1 and > V2 support optimizations such as Filter push down and Column pruning which > are subset of the functionality that can be pushed down to some data sources. > We’re proposing to extend Data Sources APIs with join push down (JPD). Join > push down significantly improves query performance by reducing the amount of > data transfer and exploiting the capabilities of the data sources such as > index access. > Join push down design document is available > [here|https://docs.google.com/document/d/1k-kRadTcUbxVfUQwqBbIXs_yPZMxh18-e-cz77O_TaE/edit?usp=sharing]. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-24130) Data Source V2: Join Push Down
[ https://issues.apache.org/jira/browse/SPARK-24130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16782237#comment-16782237 ] William Wong commented on SPARK-24130: -- https://github.com/apache/spark/pull/22547 It seems that the PR was closed already. > Data Source V2: Join Push Down > -- > > Key: SPARK-24130 > URL: https://issues.apache.org/jira/browse/SPARK-24130 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 2.3.0 >Reporter: Jia Li >Priority: Major > Attachments: Data Source V2 Join Push Down.pdf > > > Spark applications often directly query external data sources such as > relational databases, or files. Spark provides Data Sources APIs for > accessing structured data through Spark SQL. Data Sources APIs in both V1 and > V2 support optimizations such as Filter push down and Column pruning which > are subset of the functionality that can be pushed down to some data sources. > We’re proposing to extend Data Sources APIs with join push down (JPD). Join > push down significantly improves query performance by reducing the amount of > data transfer and exploiting the capabilities of the data sources such as > index access. > Join push down design document is available > [here|https://docs.google.com/document/d/1k-kRadTcUbxVfUQwqBbIXs_yPZMxh18-e-cz77O_TaE/edit?usp=sharing]. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-24130) Data Source V2: Join Push Down
[ https://issues.apache.org/jira/browse/SPARK-24130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16780735#comment-16780735 ] William Wong commented on SPARK-24130: -- Hi [~smilegator], Yes. It will be a very valuable enhancement. Appreciate if you can let us know the progress. Many thanks. > Data Source V2: Join Push Down > -- > > Key: SPARK-24130 > URL: https://issues.apache.org/jira/browse/SPARK-24130 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 2.3.0 >Reporter: Jia Li >Priority: Major > Attachments: Data Source V2 Join Push Down.pdf > > > Spark applications often directly query external data sources such as > relational databases, or files. Spark provides Data Sources APIs for > accessing structured data through Spark SQL. Data Sources APIs in both V1 and > V2 support optimizations such as Filter push down and Column pruning which > are subset of the functionality that can be pushed down to some data sources. > We’re proposing to extend Data Sources APIs with join push down (JPD). Join > push down significantly improves query performance by reducing the amount of > data transfer and exploiting the capabilities of the data sources such as > index access. > Join push down design document is available > [here|https://docs.google.com/document/d/1k-kRadTcUbxVfUQwqBbIXs_yPZMxh18-e-cz77O_TaE/edit?usp=sharing]. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-24130) Data Source V2: Join Push Down
[ https://issues.apache.org/jira/browse/SPARK-24130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16645584#comment-16645584 ] Dilip Biswal commented on SPARK-24130: -- [~smilegator] Thank you. > Data Source V2: Join Push Down > -- > > Key: SPARK-24130 > URL: https://issues.apache.org/jira/browse/SPARK-24130 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 2.3.0 >Reporter: Jia Li >Priority: Major > Attachments: Data Source V2 Join Push Down.pdf > > > Spark applications often directly query external data sources such as > relational databases, or files. Spark provides Data Sources APIs for > accessing structured data through Spark SQL. Data Sources APIs in both V1 and > V2 support optimizations such as Filter push down and Column pruning which > are subset of the functionality that can be pushed down to some data sources. > We’re proposing to extend Data Sources APIs with join push down (JPD). Join > push down significantly improves query performance by reducing the amount of > data transfer and exploiting the capabilities of the data sources such as > index access. > Join push down design document is available > [here|https://docs.google.com/document/d/1k-kRadTcUbxVfUQwqBbIXs_yPZMxh18-e-cz77O_TaE/edit?usp=sharing]. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-24130) Data Source V2: Join Push Down
[ https://issues.apache.org/jira/browse/SPARK-24130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16645565#comment-16645565 ] Xiao Li commented on SPARK-24130: - Any data source migration work is being blocked by https://github.com/apache/spark/pull/22547 > Data Source V2: Join Push Down > -- > > Key: SPARK-24130 > URL: https://issues.apache.org/jira/browse/SPARK-24130 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 2.3.0 >Reporter: Jia Li >Priority: Major > Attachments: Data Source V2 Join Push Down.pdf > > > Spark applications often directly query external data sources such as > relational databases, or files. Spark provides Data Sources APIs for > accessing structured data through Spark SQL. Data Sources APIs in both V1 and > V2 support optimizations such as Filter push down and Column pruning which > are subset of the functionality that can be pushed down to some data sources. > We’re proposing to extend Data Sources APIs with join push down (JPD). Join > push down significantly improves query performance by reducing the amount of > data transfer and exploiting the capabilities of the data sources such as > index access. > Join push down design document is available > [here|https://docs.google.com/document/d/1k-kRadTcUbxVfUQwqBbIXs_yPZMxh18-e-cz77O_TaE/edit?usp=sharing]. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-24130) Data Source V2: Join Push Down
[ https://issues.apache.org/jira/browse/SPARK-24130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16645337#comment-16645337 ] Dilip Biswal commented on SPARK-24130: -- [~smilegator] Would you know if the V2 implementation for JDBC datasource is in the works and when it may be available ? Thanks a lot. > Data Source V2: Join Push Down > -- > > Key: SPARK-24130 > URL: https://issues.apache.org/jira/browse/SPARK-24130 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 2.3.0 >Reporter: Jia Li >Priority: Major > Attachments: Data Source V2 Join Push Down.pdf > > > Spark applications often directly query external data sources such as > relational databases, or files. Spark provides Data Sources APIs for > accessing structured data through Spark SQL. Data Sources APIs in both V1 and > V2 support optimizations such as Filter push down and Column pruning which > are subset of the functionality that can be pushed down to some data sources. > We’re proposing to extend Data Sources APIs with join push down (JPD). Join > push down significantly improves query performance by reducing the amount of > data transfer and exploiting the capabilities of the data sources such as > index access. > Join push down design document is available > [here|https://docs.google.com/document/d/1k-kRadTcUbxVfUQwqBbIXs_yPZMxh18-e-cz77O_TaE/edit?usp=sharing]. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-24130) Data Source V2: Join Push Down
[ https://issues.apache.org/jira/browse/SPARK-24130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16520587#comment-16520587 ] Dilip Biswal commented on SPARK-24130: -- [~Shurap1] We are currently waiting for feedback from the community on how to proceed. I think we need a V2 implementation of JDBC datasource before we can proceed on the pushdown. > Data Source V2: Join Push Down > -- > > Key: SPARK-24130 > URL: https://issues.apache.org/jira/browse/SPARK-24130 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 2.3.0 >Reporter: Jia Li >Priority: Major > Attachments: Data Source V2 Join Push Down.pdf > > > Spark applications often directly query external data sources such as > relational databases, or files. Spark provides Data Sources APIs for > accessing structured data through Spark SQL. Data Sources APIs in both V1 and > V2 support optimizations such as Filter push down and Column pruning which > are subset of the functionality that can be pushed down to some data sources. > We’re proposing to extend Data Sources APIs with join push down (JPD). Join > push down significantly improves query performance by reducing the amount of > data transfer and exploiting the capabilities of the data sources such as > index access. > Join push down design document is available > [here|https://docs.google.com/document/d/1k-kRadTcUbxVfUQwqBbIXs_yPZMxh18-e-cz77O_TaE/edit?usp=sharing]. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-24130) Data Source V2: Join Push Down
[ https://issues.apache.org/jira/browse/SPARK-24130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16520581#comment-16520581 ] Dilip Biswal commented on SPARK-24130: -- [~Shurap1] We are currently waiting for feedback from the community on how to proceed. I think we need a V2 implementation of JDBC datasource before we can proceed on the pushdown. > Data Source V2: Join Push Down > -- > > Key: SPARK-24130 > URL: https://issues.apache.org/jira/browse/SPARK-24130 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 2.3.0 >Reporter: Jia Li >Priority: Major > Attachments: Data Source V2 Join Push Down.pdf > > > Spark applications often directly query external data sources such as > relational databases, or files. Spark provides Data Sources APIs for > accessing structured data through Spark SQL. Data Sources APIs in both V1 and > V2 support optimizations such as Filter push down and Column pruning which > are subset of the functionality that can be pushed down to some data sources. > We’re proposing to extend Data Sources APIs with join push down (JPD). Join > push down significantly improves query performance by reducing the amount of > data transfer and exploiting the capabilities of the data sources such as > index access. > Join push down design document is available > [here|https://docs.google.com/document/d/1k-kRadTcUbxVfUQwqBbIXs_yPZMxh18-e-cz77O_TaE/edit?usp=sharing]. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-24130) Data Source V2: Join Push Down
[ https://issues.apache.org/jira/browse/SPARK-24130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16520461#comment-16520461 ] vaquar khan commented on SPARK-24130: - Could you please update doc in Jira insted of google doc > Data Source V2: Join Push Down > -- > > Key: SPARK-24130 > URL: https://issues.apache.org/jira/browse/SPARK-24130 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 2.3.0 >Reporter: Jia Li >Priority: Major > > Spark applications often directly query external data sources such as > relational databases, or files. Spark provides Data Sources APIs for > accessing structured data through Spark SQL. Data Sources APIs in both V1 and > V2 support optimizations such as Filter push down and Column pruning which > are subset of the functionality that can be pushed down to some data sources. > We’re proposing to extend Data Sources APIs with join push down (JPD). Join > push down significantly improves query performance by reducing the amount of > data transfer and exploiting the capabilities of the data sources such as > index access. > Join push down design document is available > [here|https://docs.google.com/document/d/1k-kRadTcUbxVfUQwqBbIXs_yPZMxh18-e-cz77O_TaE/edit?usp=sharing]. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-24130) Data Source V2: Join Push Down
[ https://issues.apache.org/jira/browse/SPARK-24130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16520386#comment-16520386 ] Parshuram V Patki commented on SPARK-24130: --- [~jliwork] do you think this improvement will make it to spark 2.4.0? > Data Source V2: Join Push Down > -- > > Key: SPARK-24130 > URL: https://issues.apache.org/jira/browse/SPARK-24130 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 2.3.0 >Reporter: Jia Li >Priority: Major > > Spark applications often directly query external data sources such as > relational databases, or files. Spark provides Data Sources APIs for > accessing structured data through Spark SQL. Data Sources APIs in both V1 and > V2 support optimizations such as Filter push down and Column pruning which > are subset of the functionality that can be pushed down to some data sources. > We’re proposing to extend Data Sources APIs with join push down (JPD). Join > push down significantly improves query performance by reducing the amount of > data transfer and exploiting the capabilities of the data sources such as > index access. > Join push down design document is available > [here|https://docs.google.com/document/d/1k-kRadTcUbxVfUQwqBbIXs_yPZMxh18-e-cz77O_TaE/edit?usp=sharing]. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-24130) Data Source V2: Join Push Down
[ https://issues.apache.org/jira/browse/SPARK-24130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16473765#comment-16473765 ] Jia Li commented on SPARK-24130: Thank you for letting me know the permission issue, Ryan Blue. I have fixed it. Please feel free to let me know of any further issue or comment. > Data Source V2: Join Push Down > -- > > Key: SPARK-24130 > URL: https://issues.apache.org/jira/browse/SPARK-24130 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 2.3.0 >Reporter: Jia Li >Priority: Major > > Spark applications often directly query external data sources such as > relational databases, or files. Spark provides Data Sources APIs for > accessing structured data through Spark SQL. Data Sources APIs in both V1 and > V2 support optimizations such as Filter push down and Column pruning which > are subset of the functionality that can be pushed down to some data sources. > We’re proposing to extend Data Sources APIs with join push down (JPD). Join > push down significantly improves query performance by reducing the amount of > data transfer and exploiting the capabilities of the data sources such as > index access. > Join push down design document is available > [here|https://docs.google.com/document/d/1k-kRadTcUbxVfUQwqBbIXs_yPZMxh18-e-cz77O_TaE/edit?usp=sharing]. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-24130) Data Source V2: Join Push Down
[ https://issues.apache.org/jira/browse/SPARK-24130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16473594#comment-16473594 ] Ryan Blue commented on SPARK-24130: --- [~jliwork] could you please open up access to allow open commenting on the design doc? I'd like to add some questions and comments while I review it. Thank you! > Data Source V2: Join Push Down > -- > > Key: SPARK-24130 > URL: https://issues.apache.org/jira/browse/SPARK-24130 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 2.3.0 >Reporter: Jia Li >Priority: Major > > Spark applications often directly query external data sources such as > relational databases, or files. Spark provides Data Sources APIs for > accessing structured data through Spark SQL. Data Sources APIs in both V1 and > V2 support optimizations such as Filter push down and Column pruning which > are subset of the functionality that can be pushed down to some data sources. > We’re proposing to extend Data Sources APIs with join push down (JPD). Join > push down significantly improves query performance by reducing the amount of > data transfer and exploiting the capabilities of the data sources such as > index access. > Join push down design document is available > [here|https://docs.google.com/document/d/1k-kRadTcUbxVfUQwqBbIXs_yPZMxh18-e-cz77O_TaE/edit?usp=sharing]. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org