Re: [Vote] SPIP: Continuous Processing Mode for Structured Streaming

2017-11-06 Thread Michael Armbrust
+1

On Sat, Nov 4, 2017 at 11:02 AM, Xiao Li  wrote:

> +1
>
> 2017-11-04 11:00 GMT-07:00 Burak Yavuz :
>
>> +1
>>
>> On Fri, Nov 3, 2017 at 10:02 PM, vaquar khan 
>> wrote:
>>
>>> +1
>>>
>>> On Fri, Nov 3, 2017 at 8:14 PM, Weichen Xu 
>>> wrote:
>>>
 +1.

 On Sat, Nov 4, 2017 at 8:04 AM, Matei Zaharia 
 wrote:

> +1 from me too.
>
> Matei
>
> > On Nov 3, 2017, at 4:59 PM, Wenchen Fan  wrote:
> >
> > +1.
> >
> > I think this architecture makes a lot of sense to let executors talk
> to source/sink directly, and bring very low latency.
> >
> > On Thu, Nov 2, 2017 at 9:01 AM, Sean Owen 
> wrote:
> > +0 simply because I don't feel I know enough to have an opinion. I
> have no reason to doubt the change though, from a skim through the doc.
> >
> >
> > On Wed, Nov 1, 2017 at 3:37 PM Reynold Xin 
> wrote:
> > Earlier I sent out a discussion thread for CP in Structured
> Streaming:
> >
> > https://issues.apache.org/jira/browse/SPARK-20928
> >
> > It is meant to be a very small, surgical change to Structured
> Streaming to enable ultra-low latency. This is great timing because we are
> also designing and implementing data source API v2. If designed properly,
> we can have the same data source API working for both streaming and batch.
> >
> >
> > Following the SPIP process, I'm putting this SPIP up for a vote.
> >
> > +1: Let's go ahead and design / implement the SPIP.
> > +0: Don't really care.
> > -1: I do not think this is a good idea for the following reasons.
> >
> >
> >
>
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>

>>>
>>>
>>> --
>>> Regards,
>>> Vaquar Khan
>>> +1 -224-436-0783 <(224)%20436-0783>
>>> Greater Chicago
>>>
>>
>>
>


Re: [Vote] SPIP: Continuous Processing Mode for Structured Streaming

2017-11-06 Thread Reynold Xin
Thanks Tom. I'd imagine more details belong either in a full design doc, or
a PR description. Might make sense to do an additional design doc, if there
is enough delta from the current sketch doc.


On Mon, Nov 6, 2017 at 7:29 AM, Tom Graves  wrote:

> +1 for the idea and feature, but I think the design is definitely lacking
> detail on the internal changes needed and how the execution pieces work and
> the communication.  Are you planning on posting more of those details or
> were you just planning on discussing in PR?
>
> Tom
>
> On Wednesday, November 1, 2017, 11:29:21 AM CDT, Debasish Das <
> debasish.da...@gmail.com> wrote:
>
>
> +1
>
> Is there any design doc related to API/internal changes ? Will CP be the
> default in structured streaming or it's a mode in conjunction with
> exisiting behavior.
>
> Thanks.
> Deb
>
> On Nov 1, 2017 8:37 AM, "Reynold Xin"  wrote:
>
> Earlier I sent out a discussion thread for CP in Structured Streaming:
>
> https://issues.apache.org/ jira/browse/SPARK-20928
> 
>
> It is meant to be a very small, surgical change to Structured Streaming to
> enable ultra-low latency. This is great timing because we are also
> designing and implementing data source API v2. If designed properly, we can
> have the same data source API working for both streaming and batch.
>
>
> Following the SPIP process, I'm putting this SPIP up for a vote.
>
> +1: Let's go ahead and design / implement the SPIP.
> +0: Don't really care.
> -1: I do not think this is a good idea for the following reasons.
>
>
>
>


Re: [Vote] SPIP: Continuous Processing Mode for Structured Streaming

2017-11-06 Thread Tom Graves
 +1 for the idea and feature, but I think the design is definitely lacking 
detail on the internal changes needed and how the execution pieces work and the 
communication.  Are you planning on posting more of those details or were you 
just planning on discussing in PR?
Tom
On Wednesday, November 1, 2017, 11:29:21 AM CDT, Debasish Das 
 wrote:  
 
 +1
Is there any design doc related to API/internal changes ? Will CP be the 
default in structured streaming or it's a mode in conjunction with exisiting 
behavior.
Thanks.Deb
On Nov 1, 2017 8:37 AM, "Reynold Xin"  wrote:

Earlier I sent out a discussion thread for CP in Structured Streaming:
https://issues.apache.org/ jira/browse/SPARK-20928
It is meant to be a very small, surgical change to Structured Streaming to 
enable ultra-low latency. This is great timing because we are also designing 
and implementing data source API v2. If designed properly, we can have the same 
data source API working for both streaming and batch.

Following the SPIP process, I'm putting this SPIP up for a vote.
+1: Let's go ahead and design / implement the SPIP.+0: Don't really care.-1: I 
do not think this is a good idea for the following reasons.



  

[ML] Migrating transformers from mllib to ml

2017-11-06 Thread Marco Gaido
Hello,

I saw that there are several TODOs to migrate some transformers (like
HashingTF and IDF) to use only ml.Vector in order to avoid the overhead of
converting them to the mllib ones and back.

Is there any reason why this has not been done so far? Is it to avoid code
duplication? If so, is it still an issue since we are going to deprecate
mllib from 2.3 (at least this is what I read on Spark docs)? If no, I can
work on this.

Thanks,
Marco