Re: SQL logical plans and DataSourceV2 (was: data source v2 online meetup)

2018-02-06 Thread Ryan Blue
Instead of exploring possible operations ourselves, I think we should follow the SQL standard. Most of these do. We should make conscious decisions with the standard in mind for the SQL API. But we also have the Scala API (and versions of it in other languages) and need to consider how these

Re: SQL logical plans and DataSourceV2 (was: data source v2 online meetup)

2018-02-05 Thread Wenchen Fan
I think many advanced Spark users already have customer catalyst rules, to deal with the query plan directly, so it makes a lot of sense to standardize the logical plan. However, instead of exploring possible operations ourselves, I think we should follow the SQL standard. ReplaceTable, RTAS:

Re: SQL logical plans and DataSourceV2 (was: data source v2 online meetup)

2018-02-05 Thread Ryan Blue
Thanks for responding! I’ve been coming up with a list of the high-level operations that are needed. I think all of them come down to 5 questions about what’s happening: - Does the target table exist? - If it does exist, should it be dropped? - If not, should it get created? - Should

Re: SQL logical plans and DataSourceV2 (was: data source v2 online meetup)

2018-02-02 Thread Michael Armbrust
> > So here are my recommendations for moving forward, with DataSourceV2 as a > starting point: > >1. Use well-defined logical plan nodes for all high-level operations: >insert, create, CTAS, overwrite table, etc. >2. Use rules that match on these high-level plan nodes, so that it >

Re: data source v2 online meetup

2018-02-02 Thread Jacek Laskowski
Hi Reynold, That in general is a very good idea to get the community engaged (even if most people would just listen / hide in the dark like myself). I know no other open source project at ASF or elsewhere that such an initiative was even tried. Kudos for the idea! Pozdrawiam, Jacek Laskowski

SQL logical plans and DataSourceV2 (was: data source v2 online meetup)

2018-02-01 Thread Ryan Blue
Over the last couple years, I’ve noticed a trend toward specialized logical plans and increasing use of RunnableCommand nodes. DataSourceV2 is currently on the same path, and I’d like to make the case that we should avoid these practices. I think it’s helpful to consider an example I’ve been

Re: data source v2 online meetup

2018-02-01 Thread Reynold Xin
+1 hangout >>> >>> -- >>> *From:* Xiao Li <gatorsm...@gmail.com> >>> *Sent:* Wednesday, January 31, 2018 10:46:26 PM >>> *To:* Ryan Blue >>> *Cc:* Reynold Xin; dev; Wenchen Fen; Russell Spitzer >>> *Subjec

Re: data source v2 online meetup

2018-02-01 Thread Russell Spitzer
-- >> *From:* Xiao Li <gatorsm...@gmail.com> >> *Sent:* Wednesday, January 31, 2018 10:46:26 PM >> *To:* Ryan Blue >> *Cc:* Reynold Xin; dev; Wenchen Fen; Russell Spitzer >> *Subject:* Re: data source v2 online meetup >> >> Hi, Ryan, >> &g

Re: data source v2 online meetup

2018-02-01 Thread Ryan Blue
t 9:10 AM, Felix Cheung <felixcheun...@hotmail.com> wrote: > +1 hangout > > -- > *From:* Xiao Li <gatorsm...@gmail.com> > *Sent:* Wednesday, January 31, 2018 10:46:26 PM > *To:* Ryan Blue > *Cc:* Reynold Xin; dev; Wenchen Fen; Russell

Re: data source v2 online meetup

2018-02-01 Thread Felix Cheung
+1 hangout From: Xiao Li <gatorsm...@gmail.com> Sent: Wednesday, January 31, 2018 10:46:26 PM To: Ryan Blue Cc: Reynold Xin; dev; Wenchen Fen; Russell Spitzer Subject: Re: data source v2 online meetup Hi, Ryan, wow, your Iceberg already used data source

Re: data source v2 online meetup

2018-01-31 Thread Xiao Li
Hi, Ryan, wow, your Iceberg already used data source V2 API! That is pretty cool! I am just afraid these new APIs are not stable. We might deprecate or change some data source v2 APIs in the next version (2.4). Sorry for the inconvenience it might introduce. Thanks for your feedback always,

Re: data source v2 online meetup

2018-01-31 Thread Ryan Blue
Thanks for suggesting this, I think it's a great idea. I'll definitely attend and can talk about the changes that we've made DataSourceV2 to enable our new table format, Iceberg . On Wed, Jan 31, 2018 at 2:35 PM, Reynold Xin

data source v2 online meetup

2018-01-31 Thread Reynold Xin
Data source v2 API is one of the larger main changes in Spark 2.3, and whatever that has already been committed is only the first version and we'd need more work post-2.3 to improve and stablize it. I think at this point we should stop making changes to it in branch-2.3, and instead focus on