[ https://issues.apache.org/jira/browse/SPARK-12449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15151339#comment-15151339 ]
Stephan Kessler commented on SPARK-12449: ----------------------------------------- [~maxseiden] good idea! In order to simplify things even more - we could get rid (at least in the first shot) of the partitioned and holistic approach, since we aim for databases as datasources. What do you think on keeping the ability to kind of ask the datasource if it supports the pushdown of a well-defined operation? This would simplify the implementation of the datasource as well as the Strategy for the planner. [~velvia] i am currently working heavily on the pushdown of partial aggregates in combination with Tungsten, so i am happy to contribute in that direction. Should i try to formulate a new/simplified design doc that covers the gradual approach? I am very happy to help with the PR and the definitions of tasks as well. > Pushing down arbitrary logical plans to data sources > ---------------------------------------------------- > > Key: SPARK-12449 > URL: https://issues.apache.org/jira/browse/SPARK-12449 > Project: Spark > Issue Type: Improvement > Components: SQL > Reporter: Stephan Kessler > Attachments: pushingDownLogicalPlans.pdf > > > With the help of the DataSource API we can pull data from external sources > for processing. Implementing interfaces such as {{PrunedFilteredScan}} allows > to push down filters and projects pruning unnecessary fields and rows > directly in the data source. > However, data sources such as SQL Engines are capable of doing even more > preprocessing, e.g., evaluating aggregates. This is beneficial because it > would reduce the amount of data transferred from the source to Spark. The > existing interfaces do not allow such kind of processing in the source. > We would propose to add a new interface {{CatalystSource}} that allows to > defer the processing of arbitrary logical plans to the data source. We have > already shown the details at the Spark Summit 2015 Europe > [https://spark-summit.org/eu-2015/events/the-pushdown-of-everything/] > I will add a design document explaining details. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org