Re: Vectorized R gapply[Collect]() implementation

2019-02-14 Thread Hyukjin Kwon
wesome! > > > -- > *From:* Shivaram Venkataraman > *Sent:* Saturday, February 9, 2019 8:33 AM > *To:* Hyukjin Kwon > *Cc:* dev; Felix Cheung; Bryan Cutler; Liang-Chi Hsieh; Shivaram > Venkataraman > *Subject:* Re: Vectorized R gapply[Collect]()

Re: Vectorized R gapply[Collect]() implementation

2019-02-10 Thread Felix Cheung
This is super awesome! From: Shivaram Venkataraman Sent: Saturday, February 9, 2019 8:33 AM To: Hyukjin Kwon Cc: dev; Felix Cheung; Bryan Cutler; Liang-Chi Hsieh; Shivaram Venkataraman Subject: Re: Vectorized R gapply[Collect]() implementation Those speedups

Re: Vectorized R gapply[Collect]() implementation

2019-02-09 Thread Shivaram Venkataraman
Those speedups look awesome! Great work Hyukjin! Thanks Shivaram On Sat, Feb 9, 2019 at 7:41 AM Hyukjin Kwon wrote: > > Guys, as continuation of Arrow optimization for R DataFrame to Spark > DataFrame, > > I am trying to make a vectorized gapply[Collect] implementation as an > experiment like

Vectorized R gapply[Collect]() implementation

2019-02-09 Thread Hyukjin Kwon
Guys, as continuation of Arrow optimization for R DataFrame to Spark DataFrame, I am trying to make a vectorized gapply[Collect] implementation as an experiment like vectorized Pandas UDFs It brought 820%+ performance improvement. See https://github.com/apache/spark/pull/23746 Please come and