Re: [VOTE][SPIP] SPARK-21190: Vectorized UDFs in Python

2017-09-13 Thread Noman Khan
+1(non-binding) Regards Noman From: Xiao Li <gatorsm...@gmail.com> Sent: Tuesday, September 12, 2017 2:44:26 AM To: Matei Zaharia; Hyukjin Kwon Cc: spark-dev Subject: Re: [VOTE][SPIP] SPARK-21190: Vectorized UDFs in Python +1 Xiao On Mon, 11 Sep 2017 at 6

Re: [VOTE][SPIP] SPARK-21190: Vectorized UDFs in Python

2017-09-12 Thread Takuya UESHIN
ne in the PR review though. > >> > > >> > On Sat, Sep 2, 2017 at 2:07 AM, Felix Cheung > > > felixcheung_m@ > > > > >> wrote: > >> > +1 on this and like the suggestion of type in string form. > >> > > >> > Would it be co

Re: [VOTE][SPIP] SPARK-21190: Vectorized UDFs in Python

2017-09-11 Thread Liang-Chi Hsieh
d like the suggestion of type in string form. >> > >> > Would it be correct to assume there will be data type check, for >> example >> the returned pandas data frame column data types match what are >> specified. >> We have seen quite a bit of issues/confus

Re: [VOTE][SPIP] SPARK-21190: Vectorized UDFs in Python

2017-09-11 Thread Liang-Chi Hsieh
d like the suggestion of type in string form. >> > >> > Would it be correct to assume there will be data type check, for >> example >> the returned pandas data frame column data types match what are >> specified. >> We have seen quite a bit of issues/confus

Re: [VOTE][SPIP] SPARK-21190: Vectorized UDFs in Python

2017-09-11 Thread Xiao Li
specified. > We have seen quite a bit of issues/confusions with that in R. > > > > Would it make sense to have a more generic decorator name so that it > could also be useable for other efficient vectorized format in the future? > Or do we anticipate the decorator to be format s

Re: [VOTE][SPIP] SPARK-21190: Vectorized UDFs in Python

2017-09-11 Thread Matei Zaharia
specific and will have more in the > future? > > From: Reynold Xin <r...@databricks.com> > Sent: Friday, September 1, 2017 5:16:11 AM > To: Takuya UESHIN > Cc: spark-dev > Subject: Re: [VOTE][SPIP] SPARK-21190: Vectorized UDFs in Python > > Ok, thanks. >

Re: [VOTE][SPIP] SPARK-21190: Vectorized UDFs in Python

2017-09-11 Thread Hyukjin Kwon
assume there will be data type check, for >>>>>> example the returned pandas data frame column data types match what are >>>>>> specified. We have seen quite a bit of issues/confusions with that in R. >>>>>> >>>>>> Would it mak

Re: [VOTE][SPIP] SPARK-21190: Vectorized UDFs in Python

2017-09-11 Thread Yin Huai
o assume there will be data type check, for >>>>> example the returned pandas data frame column data types match what are >>>>> specified. We have seen quite a bit of issues/confusions with that in R. >>>>> >>>>> Would it make sense to ha

Re: [VOTE][SPIP] SPARK-21190: Vectorized UDFs in Python

2017-09-11 Thread Sameer Agarwal
Would it make sense to have a more generic decorator name so that it >>>> could also be useable for other efficient vectorized format in the future? >>>> Or do we anticipate the decorator to be format specific and will have more >>>> in the future? >>>>

Re: [VOTE][SPIP] SPARK-21190: Vectorized UDFs in Python

2017-09-07 Thread Bryan Cutler
ure? >>> >>> ------ >>> *From:* Reynold Xin <r...@databricks.com> >>> *Sent:* Friday, September 1, 2017 5:16:11 AM >>> *To:* Takuya UESHIN >>> *Cc:* spark-dev >>> *Subject:* Re: [VOTE][SPIP] SPARK-21190: Vecto

Re: [VOTE][SPIP] SPARK-21190: Vectorized UDFs in Python

2017-09-06 Thread Takuya UESHIN
databricks.com> >> *Sent:* Friday, September 1, 2017 5:16:11 AM >> *To:* Takuya UESHIN >> *Cc:* spark-dev >> *Subject:* Re: [VOTE][SPIP] SPARK-21190: Vectorized UDFs in Python >> >> Ok, thanks. >> >> +1 on the SPIP for scope etc >> >>

Re: [VOTE][SPIP] SPARK-21190: Vectorized UDFs in Python

2017-09-05 Thread Wenchen Fan
> > *Sent:* Friday, September 1, 2017 5:16:11 AM > *To:* Takuya UESHIN > *Cc:* spark-dev > *Subject:* Re: [VOTE][SPIP] SPARK-21190: Vectorized UDFs in Python > > Ok, thanks. > > +1 on the SPIP for scope etc > > > On API details (will deal with in code reviews as we

Re: [VOTE][SPIP] SPARK-21190: Vectorized UDFs in Python

2017-09-01 Thread Felix Cheung
Friday, September 1, 2017 5:16:11 AM To: Takuya UESHIN Cc: spark-dev Subject: Re: [VOTE][SPIP] SPARK-21190: Vectorized UDFs in Python Ok, thanks. +1 on the SPIP for scope etc On API details (will deal with in code reviews as well but leaving a note here in case I forget) 1. I would suggest

Re: [VOTE][SPIP] SPARK-21190: Vectorized UDFs in Python

2017-09-01 Thread Reynold Xin
Ok, thanks. +1 on the SPIP for scope etc On API details (will deal with in code reviews as well but leaving a note here in case I forget) 1. I would suggest having the API also accept data type specification in string form. It is usually simpler to say "long" then "LongType()". 2. Think about

Re: [VOTE][SPIP] SPARK-21190: Vectorized UDFs in Python

2017-09-01 Thread Takuya UESHIN
Yes, the aggregation is out of scope for now. I think we should continue discussing the aggregation at JIRA and we will be adding those later separately. Thanks. On Fri, Sep 1, 2017 at 6:52 PM, Reynold Xin wrote: > Is the idea aggregate is out of scope for the current

Re: [VOTE][SPIP] SPARK-21190: Vectorized UDFs in Python

2017-09-01 Thread Reynold Xin
Is the idea aggregate is out of scope for the current effort and we will be adding those later? On Fri, Sep 1, 2017 at 8:01 AM Takuya UESHIN wrote: > Hi all, > > We've been discussing to support vectorized UDFs in Python and we almost > got a consensus about the APIs, so

[VOTE][SPIP] SPARK-21190: Vectorized UDFs in Python

2017-09-01 Thread Takuya UESHIN
Hi all, We've been discussing to support vectorized UDFs in Python and we almost got a consensus about the APIs, so I'd like to summarize and call for a vote. Note that this vote should focus on APIs for vectorized UDFs, not APIs for vectorized UDAFs or Window operations.