I have pushed a patch for SQLStreaming, which just resolved the problem just
discussed.
the Jira:
https://issues.apache.org/jira/browse/SPARK-24630
the Patch:
https://github.com/apache/spark/pull/22575
SQLStreaming just defined the table API for StructStreaming, and the Table
APIs for
).
Thanks,
Assaf
From: Wenchen Fan [mailto:cloud0...@gmail.com]
Sent: Thursday, October 18, 2018 5:26 PM
To: Reynold?Xin
Cc: Ryan Blue; Hyukjin Kwon; Spark dev list
Subject: Re: data source api v2 refactoring
[EXTERNAL EMAIL]
Please report any suspicious attachments, links, or requests
uot;
> *Cc: *Wenchen Fan , Hyukjin Kwon ,
> Spark Dev List
> *Subject: *Re: data source api v2 refactoring
>
>
>
> Hi Jayesh,
>
>
>
> The existing sources haven't been ported to v2 yet. That is going to be
> tricky because the existing sources implement behav
Thanks for the info Ryan – very helpful!
From: Ryan Blue
Reply-To: "rb...@netflix.com"
Date: Wednesday, September 19, 2018 at 3:17 PM
To: "Thakrar, Jayesh"
Cc: Wenchen Fan , Hyukjin Kwon ,
Spark Dev List
Subject: Re: data source api v2 refactoring
Hi Jayesh,
The exis
rom: *Ryan Blue
> *Reply-To: *
> *Date: *Friday, September 7, 2018 at 2:19 PM
> *To: *Wenchen Fan
> *Cc: *Hyukjin Kwon , Spark Dev List <
> dev@spark.apache.org>
> *Subject: *Re: data source api v2 refactoring
>
>
>
> There are a few v2-related changes that we can w
To: Wenchen Fan
Cc: Hyukjin Kwon , Spark Dev List
Subject: Re: data source api v2 refactoring
There are a few v2-related changes that we can work in parallel, at least for
reviews:
* SPARK-25006, #21978<https://github.com/apache/spark/pull/21978>: Add catalog
to TableIdentifier - this propos
;> }
>>>>
>>>> Without WriteConfig, the API looks like
>>>> trait Table {
>>>> LogicalWrite newAppendWrite();
>>>>
>>>> LogicalWrite newDeleteWrite(deleteExprs);
>>>> }
>>>>
>>>>
>>>> I
ewDeleteWrite(deleteExprs);
>>> }
>>>
>>>
>>> It looks to me that the API is simpler without WriteConfig, what do you
>>> think?
>>>
>>> Thanks,
>>> Wenchen
>>>
>>> On Wed, Sep 5, 2018 at 4:24 AM Ry
gt;> mode, a physical scan outputs data for one epoch, but it's not true for
>>> continuous mode.
>>>
>>> I'm not sure if it's necessary to include streaming epoch in the API
>>> abstraction, for features like metrics reporting.
>>>
>>> On Sun, S
>
>> Latest from Wenchen in case it was dropped.
>>
>> -- Forwarded message -
>> From: Wenchen Fan
>> Date: Mon, Sep 3, 2018 at 6:16 AM
>> Subject: Re: data source api v2 refactoring
>> To:
>> Cc: Ryan Blue , Reynold Xin , <
&g
ase it was dropped.
>
> -- Forwarded message -
> From: Wenchen Fan
> Date: Mon, Sep 3, 2018 at 6:16 AM
> Subject: Re: data source api v2 refactoring
> To:
> Cc: Ryan Blue , Reynold Xin , <
> dev@spark.apache.org>
>
>
> Hi Mridul,
>
>
Latest from Wenchen in case it was dropped.
-- Forwarded message -
From: Wenchen Fan
Date: Mon, Sep 3, 2018 at 6:16 AM
Subject: Re: data source api v2 refactoring
To:
Cc: Ryan Blue , Reynold Xin , <
dev@spark.apache.org>
Hi Mridul,
I'm not sure what's going on, my
archives ... [1]
> Wondering which othersenderswere getting dropped (if yes).
>
> Regards
> Mridul
>
> [1]
> http://apache-spark-developers-list.1001551.n3.nabble.com/data-source-api-v2-refactoring-td24848.html
>
>
> On Sat, Sep 1, 2018 at 8:58 PM Ryan Blue wrote:
>>
-source-api-v2-refactoring-td24848.html
On Sat, Sep 1, 2018 at 8:58 PM Ryan Blue wrote:
> Thanks for clarifying, Wenchen. I think that's what I expected.
>
> As for the abstraction, here's the way that I think about it: there are
> two important parts of a scan: the definition of what
Thanks for clarifying, Wenchen. I think that's what I expected.
As for the abstraction, here's the way that I think about it: there are two
important parts of a scan: the definition of what will be read, and task
sets that actually perform the read. In batch, there's one definition of
the scan
Nice suggestion Reynold and great news to see that Wenchen succeeded
prototyping!
One thing I would like to make sure is, how continuous mode works with such
abstraction. Would continuous mode be also abstracted with Stream, and
createScan would provide unbounded Scan?
Thanks,
Jungtaek Lim
Thanks, Reynold!
I think your API sketch looks great. I appreciate having the Table level in
the abstraction to plug into as well. I think this makes it clear what
everything does, particularly having the Stream level that represents a
configured (by ScanConfig) streaming read and can act as a
I spent some time last week looking at the current data source v2 apis, and
I thought we should be a bit more buttoned up in terms of the abstractions
and the guarantees Spark provides. In particular, I feel we need the
following levels of "abstractions", to fit the use cases in Spark, from
batch,
18 matches
Mail list logo