Re: DataSourceV2 capability API

2018-11-12 Thread JackyLee
I don't know if it is a right thing to make table API as ContinuousScanBuilder -> ContinuousScan -> ContinuousBatch, it makes batch/microBatch/Continuous too different from each other. In my opinion, these are basically similar at the table level. So is it possible to design an API like this?

Re: DataSourceV2 capability API

2018-11-12 Thread Wenchen Fan
Nov 9, 2018 at 9:11 AM Ryan Blue wrote: >>>>> >>>>>> I'd have two places. First, a class that defines properties supported >>>>>> and identified by Spark, like the SQLConf definitions. Second, in >>>>>> documentation for t

Re: DataSourceV2 capability API

2018-11-09 Thread Ryan Blue
t; On Fri, Nov 9, 2018 at 9:11 AM Ryan Blue wrote: >>>> >>>>> I'd have two places. First, a class that defines properties supported >>>>> and identified by Spark, like the SQLConf definitions. Second, in >>>>> documentation for the v2 table API. >>>

Re: DataSourceV2 capability API

2018-11-09 Thread Ryan Blue
. Second, in >>>> documentation for the v2 table API. >>>> >>>> On Fri, Nov 9, 2018 at 9:00 AM Felix Cheung >>>> wrote: >>>> >>>>> One question is where will the list of capability strings be defined? >>>>>

Re: DataSourceV2 capability API

2018-11-09 Thread Reynold Xin
the v2 table API. >>> >>> On Fri, Nov 9, 2018 at 9:00 AM Felix Cheung >>> wrote: >>> >>>> One question is where will the list of capability strings be defined? >>>> >>>> >>>> -- >>>> *From:* Ryan Blue >&

Re: DataSourceV2 capability API

2018-11-09 Thread Ryan Blue
gt;>> *Sent:* Thursday, November 8, 2018 2:09 PM >>> *To:* Reynold Xin >>> *Cc:* Spark Dev List >>> *Subject:* Re: DataSourceV2 capability API >>> >>> >>> Yes, we currently use traits that have methods. Something like “supports >>&g

Re: DataSourceV2 capability API

2018-11-09 Thread Reynold Xin
Cheung > wrote: > >> One question is where will the list of capability strings be defined? >> >> >> -- >> *From:* Ryan Blue >> *Sent:* Thursday, November 8, 2018 2:09 PM >> *To:* Reynold Xin >> *Cc:* Spark Dev Li

Re: DataSourceV2 capability API

2018-11-09 Thread Ryan Blue
ned? > > > -- > *From:* Ryan Blue > *Sent:* Thursday, November 8, 2018 2:09 PM > *To:* Reynold Xin > *Cc:* Spark Dev List > *Subject:* Re: DataSourceV2 capability API > > > Yes, we currently use traits that have methods. Something like

Re: DataSourceV2 capability API

2018-11-09 Thread Felix Cheung
One question is where will the list of capability strings be defined? From: Ryan Blue Sent: Thursday, November 8, 2018 2:09 PM To: Reynold Xin Cc: Spark Dev List Subject: Re: DataSourceV2 capability API Yes, we currently use traits that have methods. Something

Re: DataSourceV2 capability API

2018-11-08 Thread Ryan Blue
Yes, we currently use traits that have methods. Something like “supports reading missing columns” doesn’t need to deliver methods. The other example is where we don’t have an object to test for a trait ( scan.isInstanceOf[SupportsBatch]) until we have a Scan with pushdown done. That could be

Re: DataSourceV2 capability API

2018-11-08 Thread Reynold Xin
This is currently accomplished by having traits that data sources can extend, as well as runtime exceptions right? It's hard to argue one way vs another without knowing how things will evolve (e.g. how many different capabilities there will be). On Thu, Nov 8, 2018 at 12:50 PM Ryan Blue wrote:

DataSourceV2 capability API

2018-11-08 Thread Ryan Blue
Hi everyone, I’d like to propose an addition to DataSourceV2 tables, a capability API. This API would allow Spark to query a table to determine whether it supports a capability or not: val table = catalog.load(identifier) val supportsContinuous = table.isSupported("continuous-streaming") There