Re: [VOTE][SPARK-27396] SPIP: Public APIs for extended Columnar Processing Support

2019-04-30 Thread Bobby Evans
/DL systems that can benefits from columnar format are mostly > in Python. > > > > 3. Simple operations, though benefits vectorization, might not be > worth the data exchange overhead. > > > > > > > > So would an improved Pandas UDF API would be g

Re: [VOTE][SPARK-27396] SPIP: Public APIs for extended Columnar Processing Support

2019-04-23 Thread Matei Zaharia
2. ML/DL systems that can benefits from columnar format are mostly in > > > Python. > > > 3. Simple operations, though benefits vectorization, might not be worth > > > the data exchange overhead. > > > > > > So would an improved Pandas UDF API would be good enough?

Re: [VOTE][SPARK-27396] SPIP: Public APIs for extended Columnar Processing Support

2019-04-22 Thread Tom Graves
amji Sent: Friday, April 19, 2019 12:21 PM To: Bryan Cutler Cc: Dev Subject: Re: [VOTE][SPARK-27396] SPIP: Public APIs for extended Columnar Processing Support + (non-binding) Sent from my iPhone Pardon the dumb thumb typos :) On Apr 19, 2019, at 10:30 AM, Bryan Cutler wrote: +1 (n

Re: [VOTE][SPARK-27396] SPIP: Public APIs for extended Columnar Processing Support

2019-04-22 Thread Bobby Evans
RA. Main concerns here: >> >> 1. Exposing third-party Java APIs in Spark is risky. Arrow might have >> >> 1.0 release someday. >> >> 2. ML/DL systems that can benefits from columnar format are mostly in >> >> Python. >> >> 3. Simple opera

Re: [VOTE][SPARK-27396] SPIP: Public APIs for extended Columnar Processing Support

2019-04-22 Thread Reynold Xin
the SPIP JIRA. >>> >>> >>>> >>>>> >>>>> >>>>> On Sat, Apr 20, 2019 at 12:52 AM Xiangrui Meng < mengxr@ gmail. com ( >>>>> men...@gmail.com ) > >>>>> >>>>> >>>> >&

Re: [VOTE][SPARK-27396] SPIP: Public APIs for extended Columnar Processing Support

2019-04-22 Thread Bobby Evans
hat I should join the discussion earlier! Hope it is not too >> late:) >> > > >> > > On Fri, Apr 19, 2019 at 1:20 PM wrote: >> > > +1 (non-binding) for better columnar data processing support. >> > > >> > > >> > > >>

Re: [VOTE][SPARK-27396] SPIP: Public APIs for extended Columnar Processing Support

2019-04-22 Thread Xiangrui Meng
mnar format are mostly in > Python. > > > 3. Simple operations, though benefits vectorization, might not be > worth the data exchange overhead. > > > > > > So would an improved Pandas UDF API would be good enough? For example, > SPARK-26412 (UDF that takes an ite

Re: [VOTE][SPARK-27396] SPIP: Public APIs for extended Columnar Processing Support

2019-04-22 Thread Tom Graves
oved Pandas UDF API would be good enough? For example, > > SPARK-26412 (UDF that takes an iterator of of Arrow batches). > > > > Sorry that I should join the discussion earlier! Hope it is not too late:) > > > > On Fri, Apr 19, 2019 at 1:20 PM

Re: [VOTE][SPARK-27396] SPIP: Public APIs for extended Columnar Processing Support

2019-04-22 Thread Bobby Evans
row might have >> 1.0 release someday. >> > > 2. ML/DL systems that can benefits from columnar format are mostly in >> Python. >> > > 3. Simple operations, though benefits vectorization, might not be >> worth the data exchange overhead. >> > > >>

Re: [VOTE][SPARK-27396] SPIP: Public APIs for extended Columnar Processing Support

2019-04-20 Thread Bryan Cutler
of Arrow batches). > > > > > > Sorry that I should join the discussion earlier! Hope it is not too > late:) > > > > > > On Fri, Apr 19, 2019 at 1:20 PM wrote: > > > +1 (non-binding) for better columnar data processing support. > > > &g