[jira] [Commented] (SPARK-24130) Data Source V2: Join Push Down

2019-06-05 Thread Parshuram V Patki (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16857006#comment-16857006
 ] 

Parshuram V Patki commented on SPARK-24130:
---

Do we have any traction on this?

> Data Source V2: Join Push Down
> --
>
> Key: SPARK-24130
> URL: https://issues.apache.org/jira/browse/SPARK-24130
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Jia Li
>Priority: Major
> Attachments: Data Source V2 Join Push Down.pdf
>
>
> Spark applications often directly query external data sources such as 
> relational databases, or files. Spark provides Data Sources APIs for 
> accessing structured data through Spark SQL. Data Sources APIs in both V1 and 
> V2 support optimizations such as Filter push down and Column pruning which 
> are subset of the functionality that can be pushed down to some data sources. 
> We’re proposing to extend Data Sources APIs with join push down (JPD). Join 
> push down significantly improves query performance by reducing the amount of 
> data transfer and exploiting the capabilities of the data sources such as 
> index access.
> Join push down design document is available 
> [here|https://docs.google.com/document/d/1k-kRadTcUbxVfUQwqBbIXs_yPZMxh18-e-cz77O_TaE/edit?usp=sharing].



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-24130) Data Source V2: Join Push Down

2019-03-01 Thread William Wong (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16782237#comment-16782237
 ] 

William Wong commented on SPARK-24130:
--

https://github.com/apache/spark/pull/22547 
It seems that the PR was closed already. 

> Data Source V2: Join Push Down
> --
>
> Key: SPARK-24130
> URL: https://issues.apache.org/jira/browse/SPARK-24130
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Jia Li
>Priority: Major
> Attachments: Data Source V2 Join Push Down.pdf
>
>
> Spark applications often directly query external data sources such as 
> relational databases, or files. Spark provides Data Sources APIs for 
> accessing structured data through Spark SQL. Data Sources APIs in both V1 and 
> V2 support optimizations such as Filter push down and Column pruning which 
> are subset of the functionality that can be pushed down to some data sources. 
> We’re proposing to extend Data Sources APIs with join push down (JPD). Join 
> push down significantly improves query performance by reducing the amount of 
> data transfer and exploiting the capabilities of the data sources such as 
> index access.
> Join push down design document is available 
> [here|https://docs.google.com/document/d/1k-kRadTcUbxVfUQwqBbIXs_yPZMxh18-e-cz77O_TaE/edit?usp=sharing].



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-24130) Data Source V2: Join Push Down

2019-02-28 Thread William Wong (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16780735#comment-16780735
 ] 

William Wong commented on SPARK-24130:
--

Hi [~smilegator], Yes. It will be a very valuable enhancement. Appreciate if 
you can let us know the progress. Many thanks. 

> Data Source V2: Join Push Down
> --
>
> Key: SPARK-24130
> URL: https://issues.apache.org/jira/browse/SPARK-24130
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Jia Li
>Priority: Major
> Attachments: Data Source V2 Join Push Down.pdf
>
>
> Spark applications often directly query external data sources such as 
> relational databases, or files. Spark provides Data Sources APIs for 
> accessing structured data through Spark SQL. Data Sources APIs in both V1 and 
> V2 support optimizations such as Filter push down and Column pruning which 
> are subset of the functionality that can be pushed down to some data sources. 
> We’re proposing to extend Data Sources APIs with join push down (JPD). Join 
> push down significantly improves query performance by reducing the amount of 
> data transfer and exploiting the capabilities of the data sources such as 
> index access.
> Join push down design document is available 
> [here|https://docs.google.com/document/d/1k-kRadTcUbxVfUQwqBbIXs_yPZMxh18-e-cz77O_TaE/edit?usp=sharing].



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-24130) Data Source V2: Join Push Down

2018-10-10 Thread Dilip Biswal (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16645584#comment-16645584
 ] 

Dilip Biswal commented on SPARK-24130:
--

[~smilegator] Thank you.

> Data Source V2: Join Push Down
> --
>
> Key: SPARK-24130
> URL: https://issues.apache.org/jira/browse/SPARK-24130
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Jia Li
>Priority: Major
> Attachments: Data Source V2 Join Push Down.pdf
>
>
> Spark applications often directly query external data sources such as 
> relational databases, or files. Spark provides Data Sources APIs for 
> accessing structured data through Spark SQL. Data Sources APIs in both V1 and 
> V2 support optimizations such as Filter push down and Column pruning which 
> are subset of the functionality that can be pushed down to some data sources. 
> We’re proposing to extend Data Sources APIs with join push down (JPD). Join 
> push down significantly improves query performance by reducing the amount of 
> data transfer and exploiting the capabilities of the data sources such as 
> index access.
> Join push down design document is available 
> [here|https://docs.google.com/document/d/1k-kRadTcUbxVfUQwqBbIXs_yPZMxh18-e-cz77O_TaE/edit?usp=sharing].



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-24130) Data Source V2: Join Push Down

2018-10-10 Thread Xiao Li (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16645565#comment-16645565
 ] 

Xiao Li commented on SPARK-24130:
-

Any data source migration work is being blocked by 
https://github.com/apache/spark/pull/22547

> Data Source V2: Join Push Down
> --
>
> Key: SPARK-24130
> URL: https://issues.apache.org/jira/browse/SPARK-24130
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Jia Li
>Priority: Major
> Attachments: Data Source V2 Join Push Down.pdf
>
>
> Spark applications often directly query external data sources such as 
> relational databases, or files. Spark provides Data Sources APIs for 
> accessing structured data through Spark SQL. Data Sources APIs in both V1 and 
> V2 support optimizations such as Filter push down and Column pruning which 
> are subset of the functionality that can be pushed down to some data sources. 
> We’re proposing to extend Data Sources APIs with join push down (JPD). Join 
> push down significantly improves query performance by reducing the amount of 
> data transfer and exploiting the capabilities of the data sources such as 
> index access.
> Join push down design document is available 
> [here|https://docs.google.com/document/d/1k-kRadTcUbxVfUQwqBbIXs_yPZMxh18-e-cz77O_TaE/edit?usp=sharing].



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-24130) Data Source V2: Join Push Down

2018-10-10 Thread Dilip Biswal (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16645337#comment-16645337
 ] 

Dilip Biswal commented on SPARK-24130:
--

[~smilegator] Would you know if the V2 implementation for JDBC datasource is in 
the works and when it may be available ? Thanks a lot.

> Data Source V2: Join Push Down
> --
>
> Key: SPARK-24130
> URL: https://issues.apache.org/jira/browse/SPARK-24130
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Jia Li
>Priority: Major
> Attachments: Data Source V2 Join Push Down.pdf
>
>
> Spark applications often directly query external data sources such as 
> relational databases, or files. Spark provides Data Sources APIs for 
> accessing structured data through Spark SQL. Data Sources APIs in both V1 and 
> V2 support optimizations such as Filter push down and Column pruning which 
> are subset of the functionality that can be pushed down to some data sources. 
> We’re proposing to extend Data Sources APIs with join push down (JPD). Join 
> push down significantly improves query performance by reducing the amount of 
> data transfer and exploiting the capabilities of the data sources such as 
> index access.
> Join push down design document is available 
> [here|https://docs.google.com/document/d/1k-kRadTcUbxVfUQwqBbIXs_yPZMxh18-e-cz77O_TaE/edit?usp=sharing].



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-24130) Data Source V2: Join Push Down

2018-06-22 Thread Dilip Biswal (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16520587#comment-16520587
 ] 

Dilip Biswal commented on SPARK-24130:
--

[~Shurap1] We are currently waiting for feedback from the community on how to 
proceed. I think we need a V2 implementation of JDBC datasource before we can 
proceed on the pushdown.

> Data Source V2: Join Push Down
> --
>
> Key: SPARK-24130
> URL: https://issues.apache.org/jira/browse/SPARK-24130
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Jia Li
>Priority: Major
> Attachments: Data Source V2 Join Push Down.pdf
>
>
> Spark applications often directly query external data sources such as 
> relational databases, or files. Spark provides Data Sources APIs for 
> accessing structured data through Spark SQL. Data Sources APIs in both V1 and 
> V2 support optimizations such as Filter push down and Column pruning which 
> are subset of the functionality that can be pushed down to some data sources. 
> We’re proposing to extend Data Sources APIs with join push down (JPD). Join 
> push down significantly improves query performance by reducing the amount of 
> data transfer and exploiting the capabilities of the data sources such as 
> index access.
> Join push down design document is available 
> [here|https://docs.google.com/document/d/1k-kRadTcUbxVfUQwqBbIXs_yPZMxh18-e-cz77O_TaE/edit?usp=sharing].



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-24130) Data Source V2: Join Push Down

2018-06-22 Thread Dilip Biswal (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16520581#comment-16520581
 ] 

Dilip Biswal commented on SPARK-24130:
--

[~Shurap1] We are currently waiting for feedback from the community on how to 
proceed. I think we need a V2 implementation of JDBC datasource before we can 
proceed on the pushdown.

> Data Source V2: Join Push Down
> --
>
> Key: SPARK-24130
> URL: https://issues.apache.org/jira/browse/SPARK-24130
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Jia Li
>Priority: Major
> Attachments: Data Source V2 Join Push Down.pdf
>
>
> Spark applications often directly query external data sources such as 
> relational databases, or files. Spark provides Data Sources APIs for 
> accessing structured data through Spark SQL. Data Sources APIs in both V1 and 
> V2 support optimizations such as Filter push down and Column pruning which 
> are subset of the functionality that can be pushed down to some data sources. 
> We’re proposing to extend Data Sources APIs with join push down (JPD). Join 
> push down significantly improves query performance by reducing the amount of 
> data transfer and exploiting the capabilities of the data sources such as 
> index access.
> Join push down design document is available 
> [here|https://docs.google.com/document/d/1k-kRadTcUbxVfUQwqBbIXs_yPZMxh18-e-cz77O_TaE/edit?usp=sharing].



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-24130) Data Source V2: Join Push Down

2018-06-22 Thread vaquar khan (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16520461#comment-16520461
 ] 

vaquar khan commented on SPARK-24130:
-

Could you please update doc in Jira insted of google doc 

> Data Source V2: Join Push Down
> --
>
> Key: SPARK-24130
> URL: https://issues.apache.org/jira/browse/SPARK-24130
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Jia Li
>Priority: Major
>
> Spark applications often directly query external data sources such as 
> relational databases, or files. Spark provides Data Sources APIs for 
> accessing structured data through Spark SQL. Data Sources APIs in both V1 and 
> V2 support optimizations such as Filter push down and Column pruning which 
> are subset of the functionality that can be pushed down to some data sources. 
> We’re proposing to extend Data Sources APIs with join push down (JPD). Join 
> push down significantly improves query performance by reducing the amount of 
> data transfer and exploiting the capabilities of the data sources such as 
> index access.
> Join push down design document is available 
> [here|https://docs.google.com/document/d/1k-kRadTcUbxVfUQwqBbIXs_yPZMxh18-e-cz77O_TaE/edit?usp=sharing].



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-24130) Data Source V2: Join Push Down

2018-06-22 Thread Parshuram V Patki (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16520386#comment-16520386
 ] 

Parshuram V Patki commented on SPARK-24130:
---

[~jliwork] do you think this improvement will make it to spark 2.4.0?

> Data Source V2: Join Push Down
> --
>
> Key: SPARK-24130
> URL: https://issues.apache.org/jira/browse/SPARK-24130
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Jia Li
>Priority: Major
>
> Spark applications often directly query external data sources such as 
> relational databases, or files. Spark provides Data Sources APIs for 
> accessing structured data through Spark SQL. Data Sources APIs in both V1 and 
> V2 support optimizations such as Filter push down and Column pruning which 
> are subset of the functionality that can be pushed down to some data sources. 
> We’re proposing to extend Data Sources APIs with join push down (JPD). Join 
> push down significantly improves query performance by reducing the amount of 
> data transfer and exploiting the capabilities of the data sources such as 
> index access.
> Join push down design document is available 
> [here|https://docs.google.com/document/d/1k-kRadTcUbxVfUQwqBbIXs_yPZMxh18-e-cz77O_TaE/edit?usp=sharing].



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-24130) Data Source V2: Join Push Down

2018-05-13 Thread Jia Li (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-24130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16473765#comment-16473765
 ] 

Jia Li commented on SPARK-24130:


Thank you for letting me know the permission issue, Ryan Blue. I have fixed it. 
Please feel free to let me know of any further issue or comment. 

> Data Source V2: Join Push Down
> --
>
> Key: SPARK-24130
> URL: https://issues.apache.org/jira/browse/SPARK-24130
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Jia Li
>Priority: Major
>
> Spark applications often directly query external data sources such as 
> relational databases, or files. Spark provides Data Sources APIs for 
> accessing structured data through Spark SQL. Data Sources APIs in both V1 and 
> V2 support optimizations such as Filter push down and Column pruning which 
> are subset of the functionality that can be pushed down to some data sources. 
> We’re proposing to extend Data Sources APIs with join push down (JPD). Join 
> push down significantly improves query performance by reducing the amount of 
> data transfer and exploiting the capabilities of the data sources such as 
> index access.
> Join push down design document is available 
> [here|https://docs.google.com/document/d/1k-kRadTcUbxVfUQwqBbIXs_yPZMxh18-e-cz77O_TaE/edit?usp=sharing].



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-24130) Data Source V2: Join Push Down

2018-05-13 Thread Ryan Blue (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-24130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16473594#comment-16473594
 ] 

Ryan Blue commented on SPARK-24130:
---

[~jliwork] could you please open up access to allow open commenting on the 
design doc? I'd like to add some questions and comments while I review it. 
Thank you!

> Data Source V2: Join Push Down
> --
>
> Key: SPARK-24130
> URL: https://issues.apache.org/jira/browse/SPARK-24130
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Jia Li
>Priority: Major
>
> Spark applications often directly query external data sources such as 
> relational databases, or files. Spark provides Data Sources APIs for 
> accessing structured data through Spark SQL. Data Sources APIs in both V1 and 
> V2 support optimizations such as Filter push down and Column pruning which 
> are subset of the functionality that can be pushed down to some data sources. 
> We’re proposing to extend Data Sources APIs with join push down (JPD). Join 
> push down significantly improves query performance by reducing the amount of 
> data transfer and exploiting the capabilities of the data sources such as 
> index access.
> Join push down design document is available 
> [here|https://docs.google.com/document/d/1k-kRadTcUbxVfUQwqBbIXs_yPZMxh18-e-cz77O_TaE/edit?usp=sharing].



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org