Re: DSv2 & DataSourceRegister

2020-04-07 Thread Andrew Melo
Hello On Tue, Apr 7, 2020 at 23:16 Wenchen Fan wrote: > Are you going to provide a single artifact for Spark 2.4 and 3.0? I'm not > sure this is possible as the DS V2 API is very different in 3.0, e.g. there > is no `DataSourceV2` anymore, and you should implement `TableProvider` (if > you

Re: DSv2 & DataSourceRegister

2020-04-07 Thread Wenchen Fan
Are you going to provide a single artifact for Spark 2.4 and 3.0? I'm not sure this is possible as the DS V2 API is very different in 3.0, e.g. there is no `DataSourceV2` anymore, and you should implement `TableProvider` (if you don't have database/table). On Wed, Apr 8, 2020 at 6:58 AM Andrew

Re: DSv2 & DataSourceRegister

2020-04-07 Thread Andrew Melo
Hi Ryan, On Tue, Apr 7, 2020 at 5:21 PM Ryan Blue wrote: > > Hi Andrew, > > With DataSourceV2, I recommend plugging in a catalog instead of using > DataSource. As you've noticed, the way that you plug in data sources isn't > very flexible. That's one of the reasons why we changed the plugin

Re: DSv2 & DataSourceRegister

2020-04-07 Thread Ryan Blue
Hi Andrew, With DataSourceV2, I recommend plugging in a catalog instead of using DataSource. As you've noticed, the way that you plug in data sources isn't very flexible. That's one of the reasons why we changed the plugin system and made it possible to use named catalogs that load

Re: spark lacks fault tolerance with dynamic partition overwrite

2020-04-07 Thread Koert Kuipers
ah ok i was not aware of that jira issue. i will follow the progress there. thanks for letting me known On Tue, Apr 7, 2020 at 11:20 AM wuyi wrote: > Hi, Koert, > > The community is back to this issue to recently and there's already a fix > https://github.com/apache/spark/pull/26339 for it. >

DSv2 & DataSourceRegister

2020-04-07 Thread Andrew Melo
Hi all, I posted an improvement ticket in JIRA and Hyukjin Kwon requested I send an email to the dev list for discussion. As the DSv2 API evolves, some breaking changes are occasionally made to the API. It's possible to split a plugin into a "common" part and multiple version-specific parts and

Re: spark lacks fault tolerance with dynamic partition overwrite

2020-04-07 Thread wuyi
Hi, Koert, The community is back to this issue to recently and there's already a fix https://github.com/apache/spark/pull/26339 for it. You can track and review it there. Best, Yi Wu -- Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/

Re: RDD order guarantees

2020-04-07 Thread Antonin Delpeuch
Hi, Sorry to dig out this thread but this bug is still present. The fix proposed in this thread (creating a new FileSystem implementation which sorts listed files) was rejected, with the suggestion that it is the FileInputFormat's responsibility to sort the file names if preserving partition