[ https://issues.apache.org/jira/browse/SPARK-24882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16561145#comment-16561145 ]
Ryan Blue commented on SPARK-24882: ----------------------------------- [~cloud_fan], thanks for making those changes. I'll have a look at the updated doc. For scan configuration, I think this builder pattern would work. The builder's super-class would be provided by Spark. That way, the methods for pushing always work. Similarly, the ScanConfig interface would be provided with default implementations, so Spark can always get the scan configuration. When a source supports push-down, it would override {{pushPredicates}} and return the predicates that were pushed in the ScanConfig ({{pushedPredicates}}. Then Spark can remove those pushed predicates. If the source doesn't support push-down, then it needs to implement nothing at all: the default {{pushPredicates}} implementation on the builder is a no-op, and the default {{pushedPredicates}} implementation returns {{new Expression[0]}} to indicate that nothing was pushed. The feedback that Spark needs comes from the final ScanConfig and then there's no need to do instanceOf checks for interfaces. Spark's code always makes the pushdown calls and they can be easily ignored by the source implementation. > separate responsibilities of the data source v2 read API > -------------------------------------------------------- > > Key: SPARK-24882 > URL: https://issues.apache.org/jira/browse/SPARK-24882 > Project: Spark > Issue Type: Improvement > Components: SQL > Affects Versions: 2.4.0 > Reporter: Wenchen Fan > Assignee: Wenchen Fan > Priority: Major > > Data source V2 is out for a while, see the SPIP > [here|https://docs.google.com/document/d/1n_vUVbF4KD3gxTmkNEon5qdQ-Z8qU5Frf6WMQZ6jJVM/edit?usp=sharing]. > We have already migrated most of the built-in streaming data sources to the > V2 API, and the file source migration is in progress. During the migration, > we found several problems and want to address them before we stabilize the V2 > API. > To solve these problems, we need to separate responsibilities in the data > source v2 read API. Details please see the attached google doc: > https://docs.google.com/document/d/1DDXCTCrup4bKWByTalkXWgavcPdvur8a4eEu8x1BzPM/edit?usp=sharing -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org