Re: data source api v2 refactoring

2018-10-21 Thread JackyLee
I have pushed a patch for SQLStreaming, which just resolved the problem just discussed. the Jira: https://issues.apache.org/jira/browse/SPARK-24630 the Patch: https://github.com/apache/spark/pull/22575 SQLStreaming just defined the table API for StructStreaming, and the Table APIs for

RE: data source api v2 refactoring

2018-10-18 Thread Mendelson, Assaf
). Thanks, Assaf From: Wenchen Fan [mailto:cloud0...@gmail.com] Sent: Thursday, October 18, 2018 5:26 PM To: Reynold?Xin Cc: Ryan Blue; Hyukjin Kwon; Spark dev list Subject: Re: data source api v2 refactoring [EXTERNAL EMAIL] Please report any suspicious attachments, links, or requests

Re: data source api v2 refactoring

2018-10-18 Thread Wenchen Fan
uot; > *Cc: *Wenchen Fan , Hyukjin Kwon , > Spark Dev List > *Subject: *Re: data source api v2 refactoring > > > > Hi Jayesh, > > > > The existing sources haven't been ported to v2 yet. That is going to be > tricky because the existing sources implement behav

Re: data source api v2 refactoring

2018-09-19 Thread Thakrar, Jayesh
Thanks for the info Ryan – very helpful! From: Ryan Blue Reply-To: "rb...@netflix.com" Date: Wednesday, September 19, 2018 at 3:17 PM To: "Thakrar, Jayesh" Cc: Wenchen Fan , Hyukjin Kwon , Spark Dev List Subject: Re: data source api v2 refactoring Hi Jayesh, The exis

Re: data source api v2 refactoring

2018-09-19 Thread Ryan Blue
rom: *Ryan Blue > *Reply-To: * > *Date: *Friday, September 7, 2018 at 2:19 PM > *To: *Wenchen Fan > *Cc: *Hyukjin Kwon , Spark Dev List < > dev@spark.apache.org> > *Subject: *Re: data source api v2 refactoring > > > > There are a few v2-related changes that we can w

Re: data source api v2 refactoring

2018-09-07 Thread Thakrar, Jayesh
To: Wenchen Fan Cc: Hyukjin Kwon , Spark Dev List Subject: Re: data source api v2 refactoring There are a few v2-related changes that we can work in parallel, at least for reviews: * SPARK-25006, #21978<https://github.com/apache/spark/pull/21978>: Add catalog to TableIdentifier - this propos

Re: data source api v2 refactoring

2018-09-07 Thread Ryan Blue
;> } >>>> >>>> Without WriteConfig, the API looks like >>>> trait Table { >>>> LogicalWrite newAppendWrite(); >>>> >>>> LogicalWrite newDeleteWrite(deleteExprs); >>>> } >>>> >>>> >>>> I

Re: data source api v2 refactoring

2018-09-07 Thread Wenchen Fan
ewDeleteWrite(deleteExprs); >>> } >>> >>> >>> It looks to me that the API is simpler without WriteConfig, what do you >>> think? >>> >>> Thanks, >>> Wenchen >>> >>> On Wed, Sep 5, 2018 at 4:24 AM Ry

Re: data source api v2 refactoring

2018-09-07 Thread Hyukjin Kwon
gt;> mode, a physical scan outputs data for one epoch, but it's not true for >>> continuous mode. >>> >>> I'm not sure if it's necessary to include streaming epoch in the API >>> abstraction, for features like metrics reporting. >>> >>> On Sun, S

Re: data source api v2 refactoring

2018-09-06 Thread Ryan Blue
> >> Latest from Wenchen in case it was dropped. >> >> -- Forwarded message - >> From: Wenchen Fan >> Date: Mon, Sep 3, 2018 at 6:16 AM >> Subject: Re: data source api v2 refactoring >> To: >> Cc: Ryan Blue , Reynold Xin , < &g

Re: data source api v2 refactoring

2018-09-04 Thread Wenchen Fan
ase it was dropped. > > -- Forwarded message - > From: Wenchen Fan > Date: Mon, Sep 3, 2018 at 6:16 AM > Subject: Re: data source api v2 refactoring > To: > Cc: Ryan Blue , Reynold Xin , < > dev@spark.apache.org> > > > Hi Mridul, > >

Fwd: data source api v2 refactoring

2018-09-04 Thread Ryan Blue
Latest from Wenchen in case it was dropped. -- Forwarded message - From: Wenchen Fan Date: Mon, Sep 3, 2018 at 6:16 AM Subject: Re: data source api v2 refactoring To: Cc: Ryan Blue , Reynold Xin , < dev@spark.apache.org> Hi Mridul, I'm not sure what's going on, my

Re: data source api v2 refactoring

2018-09-04 Thread Marcelo Vanzin
archives ... [1] > Wondering which othersenderswere getting dropped (if yes). > > Regards > Mridul > > [1] > http://apache-spark-developers-list.1001551.n3.nabble.com/data-source-api-v2-refactoring-td24848.html > > > On Sat, Sep 1, 2018 at 8:58 PM Ryan Blue wrote: >>

Re: data source api v2 refactoring

2018-09-01 Thread Mridul Muralidharan
-source-api-v2-refactoring-td24848.html On Sat, Sep 1, 2018 at 8:58 PM Ryan Blue wrote: > Thanks for clarifying, Wenchen. I think that's what I expected. > > As for the abstraction, here's the way that I think about it: there are > two important parts of a scan: the definition of what

Re: data source api v2 refactoring

2018-09-01 Thread Ryan Blue
Thanks for clarifying, Wenchen. I think that's what I expected. As for the abstraction, here's the way that I think about it: there are two important parts of a scan: the definition of what will be read, and task sets that actually perform the read. In batch, there's one definition of the scan

Re: data source api v2 refactoring

2018-08-31 Thread Jungtaek Lim
Nice suggestion Reynold and great news to see that Wenchen succeeded prototyping! One thing I would like to make sure is, how continuous mode works with such abstraction. Would continuous mode be also abstracted with Stream, and createScan would provide unbounded Scan? Thanks, Jungtaek Lim

Re: data source api v2 refactoring

2018-08-31 Thread Ryan Blue
Thanks, Reynold! I think your API sketch looks great. I appreciate having the Table level in the abstraction to plug into as well. I think this makes it clear what everything does, particularly having the Stream level that represents a configured (by ScanConfig) streaming read and can act as a

data source api v2 refactoring

2018-08-31 Thread Reynold Xin
I spent some time last week looking at the current data source v2 apis, and I thought we should be a bit more buttoned up in terms of the abstractions and the guarantees Spark provides. In particular, I feel we need the following levels of "abstractions", to fit the use cases in Spark, from batch,