Re: SQL logical plans and DataSourceV2 (was: data source v2 online meetup)

2018-02-06 Thread Ryan Blue
Instead of exploring possible operations ourselves, I think we should follow the SQL standard. Most of these do. We should make conscious decisions with the standard in mind for the SQL API. But we also have the Scala API (and versions of it in other languages) and need to consider how these

Re: SQL logical plans and DataSourceV2 (was: data source v2 online meetup)

2018-02-05 Thread Wenchen Fan
I think many advanced Spark users already have customer catalyst rules, to deal with the query plan directly, so it makes a lot of sense to standardize the logical plan. However, instead of exploring possible operations ourselves, I think we should follow the SQL standard. ReplaceTable, RTAS:

Re: SQL logical plans and DataSourceV2 (was: data source v2 online meetup)

2018-02-05 Thread Ryan Blue
Thanks for responding! I’ve been coming up with a list of the high-level operations that are needed. I think all of them come down to 5 questions about what’s happening: - Does the target table exist? - If it does exist, should it be dropped? - If not, should it get created? - Should

Re: SQL logical plans and DataSourceV2 (was: data source v2 online meetup)

2018-02-02 Thread Michael Armbrust
> > So here are my recommendations for moving forward, with DataSourceV2 as a > starting point: > >1. Use well-defined logical plan nodes for all high-level operations: >insert, create, CTAS, overwrite table, etc. >2. Use rules that match on these high-level plan nodes, so that it >

SQL logical plans and DataSourceV2 (was: data source v2 online meetup)

2018-02-01 Thread Ryan Blue
Over the last couple years, I’ve noticed a trend toward specialized logical plans and increasing use of RunnableCommand nodes. DataSourceV2 is currently on the same path, and I’d like to make the case that we should avoid these practices. I think it’s helpful to consider an example I’ve been