[ https://issues.apache.org/jira/browse/SPARK-24882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16564362#comment-16564362 ]
Ryan Blue edited comment on SPARK-24882 at 7/31/18 9:02 PM: ------------------------------------------------------------ {quote}the problem is then we need to make `CatalogSupport` a must-have for data sources instead of an optional plugin {quote} Data sources are read and write implementations. Catalog support should be a layer above read/write implementation that is used to provide CTAS and other table-level support. If you're interested in the anonymous table use case from the email discussion, I posted a suggestion there to add an {{anonymousTable}} function to {{DataSourceV2}}. That allows a source instantiated directly through v1-style reflection to provide a {{Table}} based on an options map. Then that table would implement {{ReadSupport}} and {{WriteSupport}} as I've suggested in this thread. That would preserve the ability to instantiate a source directly and use it, and would center around a {{Table}} that implements the read and write traits. An alternative to the {{anonymousTable}} method is what I did in the WIP pull request for CTAS. In that PR, I created two ways to work with {{DataSourceV2}}: through the existing {{DataSourceV2Relation}} and through a new {{TableV2Relation}}. The first is for {{DataSourceV2}} instances that implement the read and write traits, while the latter is for {{Table}} objects that implement them. Either way works, though it would be cleaner to just use {{Table}}. Thanks for the builder update! Immutability is the most important part, but I'd still prefer a builder interface with default methods instead of the mix-in traits. was (Author: rdblue): {quote}the problem is then we need to make `CatalogSupport` a must-have for data sources instead of an optional plugin {quote} Data sources are read and write implementations. Catalog support should be a layer above read/write implementation that is used to provide CTAS and other table-level support. If you're interested in the anonymous table use case from the email discussion, I posted a suggestion there to add an {{anonymousTable}} function to {{DataSourceV2}}. That allows a source instantiated directly through v1-style reflection to provide a {{Table}} based on an options map. Then that table would implement {{ReadSupport}} and {{WriteSupport}} as I've suggested in this thread. That would preserve the ability to instantiate a source directly and use it, and would center around a {{Table}} that implements the read and write traits. An alternative to the {{anonymousTable}} method is what I did in the WIP pull request for CTAS. In that PR, I created two ways to work with {{DataSourceV2}}: through the existing {{DataSourceV2Relation}} and through a new {{TableV2Relation}}. The first is for {{DataSourceV2}} instances that implement the read and write traits, while the latter is for {{Table}} objects that implement them. Either way works, though it would be cleaner to just use {{Table}}. Thanks for the builder update! Immutability is the most important part, but I'd still prefer a builder interface with default methods instead of the mix-in traits. > data source v2 API improvement > ------------------------------ > > Key: SPARK-24882 > URL: https://issues.apache.org/jira/browse/SPARK-24882 > Project: Spark > Issue Type: Improvement > Components: SQL > Affects Versions: 2.4.0 > Reporter: Wenchen Fan > Assignee: Wenchen Fan > Priority: Major > > Data source V2 is out for a while, see the SPIP > [here|https://docs.google.com/document/d/1n_vUVbF4KD3gxTmkNEon5qdQ-Z8qU5Frf6WMQZ6jJVM/edit?usp=sharing]. > We have already migrated most of the built-in streaming data sources to the > V2 API, and the file source migration is in progress. During the migration, > we found several problems and want to address them before we stabilize the V2 > API. > To solve these problems, we need to separate responsibilities in the data > source v2 API, isolate the stateull part of the API, think of better naming > of some interfaces. Details please see the attached google doc: > https://docs.google.com/document/d/1DDXCTCrup4bKWByTalkXWgavcPdvur8a4eEu8x1BzPM/edit?usp=sharing -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org