Re: [VOTE][SPARK-27396] SPIP: Public APIs for extended Columnar Processing Support

2019-04-30 Thread Bobby Evans
/DL systems that can benefits from columnar format are mostly > in Python. > > > > 3. Simple operations, though benefits vectorization, might not be > worth the data exchange overhead. > > > > > > > > So would an improved Pandas UDF API would be g

Re: [VOTE][SPARK-27396] SPIP: Public APIs for extended Columnar Processing Support

2019-04-23 Thread Matei Zaharia
2. ML/DL systems that can benefits from columnar format are mostly in > > > Python. > > > 3. Simple operations, though benefits vectorization, might not be worth > > > the data exchange overhead. > > > > > > So would an improved Pandas UDF API would be good enough?

Re: [VOTE][SPARK-27396] SPIP: Public APIs for extended Columnar Processing Support

2019-04-22 Thread Tom Graves
amji Sent: Friday, April 19, 2019 12:21 PM To: Bryan Cutler Cc: Dev Subject: Re: [VOTE][SPARK-27396] SPIP: Public APIs for extended Columnar Processing Support + (non-binding) Sent from my iPhone Pardon the dumb thumb typos :) On Apr 19, 2019, at 10:30 AM, Bryan Cutler wrote: +1 (n

Re: [VOTE][SPARK-27396] SPIP: Public APIs for extended Columnar Processing Support

2019-04-22 Thread Bobby Evans
rays directly and work on those if the API is not >> guaranteed to stay stable (that is, we’d still use our own classes to >> manipulate the data internally, and end users could use the Arrow library >> if they want it). >> >> Matei >> >> On Apr 20, 2019, at 8:38

Re: [VOTE][SPARK-27396] SPIP: Public APIs for extended Columnar Processing Support

2019-04-22 Thread Reynold Xin
patible: just document the format and >>> extend it only in ways that don’t break the meaning of old data (for >>> example, add new version numbers or field types that are read in a >>> different way). It’s a bit harder for a Ja

Re: [VOTE][SPARK-27396] SPIP: Public APIs for extended Columnar Processing Support

2019-04-22 Thread Bobby Evans
>> > > >> > > I think you misunderstood the point of this SPIP. I responded to your >> comments in the SPIP JIRA. >> > > >> > > On Sat, Apr 20, 2019 at 12:52 AM Xiangrui Meng >> wrote: >> > > I posted my comment in the JIRA. Main concerns her

Re: [VOTE][SPARK-27396] SPIP: Public APIs for extended Columnar Processing Support

2019-04-22 Thread Xiangrui Meng
mnar format are mostly in > Python. > > > 3. Simple operations, though benefits vectorization, might not be > worth the data exchange overhead. > > > > > > So would an improved Pandas UDF API would be good enough? For example, > SPARK-26412 (UDF that takes an ite

Re: [VOTE][SPARK-27396] SPIP: Public APIs for extended Columnar Processing Support

2019-04-22 Thread Tom Graves
oved Pandas UDF API would be good enough? For example, > > SPARK-26412 (UDF that takes an iterator of of Arrow batches). > > > > Sorry that I should join the discussion earlier! Hope it is not too late:) > > > > On Fri, Apr 19, 2019 at 1:20 PM

Re: [VOTE][SPARK-27396] SPIP: Public APIs for extended Columnar Processing Support

2019-04-22 Thread Bobby Evans
row might have >> 1.0 release someday. >> > > 2. ML/DL systems that can benefits from columnar format are mostly in >> Python. >> > > 3. Simple operations, though benefits vectorization, might not be >> worth the data exchange overhead. >> > > >>

Re: [VOTE][SPARK-27396] SPIP: Public APIs for extended Columnar Processing Support

2019-04-20 Thread Bryan Cutler
of Arrow batches). > > > > > > Sorry that I should join the discussion earlier! Hope it is not too > late:) > > > > > > On Fri, Apr 19, 2019 at 1:20 PM wrote: > > > +1 (non-binding) for better columnar data processing support. > > > &g