/DL systems that can benefits from columnar format are mostly
> in Python.
> > > > 3. Simple operations, though benefits vectorization, might not be
> worth the data exchange overhead.
> > > >
> > > > So would an improved Pandas UDF API would be g
2. ML/DL systems that can benefits from columnar format are mostly in
> > > Python.
> > > 3. Simple operations, though benefits vectorization, might not be worth
> > > the data exchange overhead.
> > >
> > > So would an improved Pandas UDF API would be good enough?
amji
Sent: Friday, April 19, 2019 12:21 PM
To: Bryan Cutler
Cc: Dev
Subject: Re: [VOTE][SPARK-27396] SPIP: Public APIs for extended
Columnar Processing Support
+ (non-binding)
Sent from my iPhone
Pardon the dumb thumb typos :)
On Apr 19, 2019, at 10:30 AM, Bryan Cutler wrote:
+1 (n
rays directly and work on those if the API is not
>> guaranteed to stay stable (that is, we’d still use our own classes to
>> manipulate the data internally, and end users could use the Arrow library
>> if they want it).
>>
>> Matei
>>
>> On Apr 20, 2019, at 8:38
patible: just document the format and
>>> extend it only in ways that don’t break the meaning of old data (for
>>> example, add new version numbers or field types that are read in a
>>> different way). It’s a bit harder for a Ja
>> > >
>> > > I think you misunderstood the point of this SPIP. I responded to your
>> comments in the SPIP JIRA.
>> > >
>> > > On Sat, Apr 20, 2019 at 12:52 AM Xiangrui Meng
>> wrote:
>> > > I posted my comment in the JIRA. Main concerns her
mnar format are mostly in
> Python.
> > > 3. Simple operations, though benefits vectorization, might not be
> worth the data exchange overhead.
> > >
> > > So would an improved Pandas UDF API would be good enough? For example,
> SPARK-26412 (UDF that takes an ite
oved Pandas UDF API would be good enough? For example,
> > SPARK-26412 (UDF that takes an iterator of of Arrow batches).
> >
> > Sorry that I should join the discussion earlier! Hope it is not too late:)
> >
> > On Fri, Apr 19, 2019 at 1:20 PM
row might have
>> 1.0 release someday.
>> > > 2. ML/DL systems that can benefits from columnar format are mostly in
>> Python.
>> > > 3. Simple operations, though benefits vectorization, might not be
>> worth the data exchange overhead.
>> > >
>>
of Arrow batches).
> > >
> > > Sorry that I should join the discussion earlier! Hope it is not too
> late:)
> > >
> > > On Fri, Apr 19, 2019 at 1:20 PM wrote:
> > > +1 (non-binding) for better columnar data processing support.
> > >
&g
10 matches
Mail list logo