Re: Arrow and R benchmark

Jonathan Chiang Sat, 24 Nov 2018 16:20:02 -0800

Hi Wes and Romain,

I wrote a preliminary benchmark for reading and writing different file
types from R into arrow, borrowed some code from Hadley. I would like some
feedback to improve it and then possible push a R/benchmarks folder. I am
willing to dedicate most of next week to this project, as I am taking a
vacation from work, but would like to contribute to Arrow and R.

To Romain: What is the difference in R when using tibble versus reading
from arrow?
Is the general advantage that you can serialize the data to arrow when
saving it? Then be able to call it in Python with arrow then pandas?

General Roadmap Question to Wes and Romain :
My vision for the future of data science, is the ability to serialize data
securely and pass data and models securely with some form of authentication
between IDEs with secure ports. This idea would develop with something
similar to gRPC, with more security designed with sharing data. I noticed
flight gRpc.

Also, I was interested if there was any momentum in  the R community to
serialize models similar to the work of Onnx into a unified model storage
system. The idea is to have a secure reproducible environment for R and
Python developer groups to readily share models and data, with the caveat
that data sent also has added security and possibly a history associated
with it for security. This piece of work, is something I am passionate in
seeing come to fruition. And would like to explore options for this
actualization.

The background for me is to enable HealthCare teams to share medical data
securely among different analytics teams. The security provisions would
enable more robust cloud based storage and computation in a secure fashion.

Thanks,
Jonathan

Side Note:
Building arrow for R on Linux was a big hassle relative to mac. Was unable
to build on linux.

On Thu, Nov 15, 2018 at 7:50 PM Jonathan Chiang <chiang...@gmail.com> wrote:

> I'll go through that python repo and see what I can do.
>
> Thanks,
> Jonathan
>
> On Thu, Nov 15, 2018 at 1:55 PM Wes McKinney <wesmck...@gmail.com> wrote:
>
>> I would suggest starting an r/benchmarks directory like we have in
>> Python (https://github.com/apache/arrow/tree/master/python/benchmarks)
>> and documenting the process for running all the benchmarks.
>> On Thu, Nov 15, 2018 at 4:52 PM Romain François <rom...@purrple.cat>
>> wrote:
>> >
>> > Right now, most of the code examples is in the unit tests, but this is
>> not measuring performance or stressing it. Perhaps you can start from there
>> ?
>> >
>> > Romain
>> >
>> > > Le 15 nov. 2018 à 22:16, Wes McKinney <wesmck...@gmail.com> a écrit :
>> > >
>> > > Adding dev@arrow.apache.org
>> > >> On Thu, Nov 15, 2018 at 4:13 PM Jonathan Chiang <chiang...@gmail.com>
>> wrote:
>> > >>
>> > >> Hi,
>> > >>
>> > >> I would like to contribute to developing benchmark suites for R and
>> Arrow? What would be the best way to start?
>> > >>
>> > >> Thanks,
>> > >> Jonathan
>> >
>>
>

Re: Arrow and R benchmark

Reply via email to