Re: [DISCUSS][JAVA] Implement a CSV to Arrow adapter

2019-07-18 Thread Ji Liu
Thanks a lot for Wes and Liya's feedbacks.

Agreed that parsing performance of CSV files is important, and I just found a 
benchmark test for Java CSV library[1][2] which shows FastCSV has obvious 
advantages. Anyway, I will test it myself.


Thanks,
Ji Liu

[1] https://raw.githubusercontent.com/osiegmar/FastCSV/master/benchmark.png
[2] https://github.com/osiegmar/FastCSV


--
From:Fan Liya 
Send Time:2019年7月19日(星期五) 10:14
To:dev 
Cc:Ji Liu ; Micah Kornfield 
Subject:Re: [DISCUSS][JAVA] Implement a CSV to Arrow adapter

Hi Ji,

Thanks for proposing this. CSV adapter sounds like a useful feature.

Best,
Liya Fan
On Fri, Jul 19, 2019 at 12:31 AM Wes McKinney  wrote:
We wrote a custom reader in C++ since performance of parsing CSV files
 matters a lot -- we wanted to do multi-threaded execution of
 conversion steps, also. I don't know what the performance of
 commons-csv is but it might be worth doing some benchmarks to see.

 On Thu, Jul 18, 2019 at 4:35 AM Ji Liu  wrote:
 >
 > Hi all,
 >
 > Seems there is no adapter to convert CSV data to Arrow data in Java side 
 > which C++ has.  Now we already have JDBC adapter, Orc adapter and Avro 
 > adapter (In progress),  I think an adapter for CSV would probably also be 
 > nice.
 > After a brief discuss with @Micah Kornfield, Apache commons-csv [1] seems an 
 > efficient CSV parser that we could potentially leverage but I don't know if 
 > there are other better options. Any inputs and comments would be appreciated.
 >
 > Thanks,
 > Ji Liu[1]https://commons.apache.org/proper/commons-csv/


Re: [DISCUSS][JAVA] Implement a CSV to Arrow adapter

2019-07-18 Thread Fan Liya
Hi Ji,

Thanks for proposing this. CSV adapter sounds like a useful feature.

Best,
Liya Fan

On Fri, Jul 19, 2019 at 12:31 AM Wes McKinney  wrote:

> We wrote a custom reader in C++ since performance of parsing CSV files
> matters a lot -- we wanted to do multi-threaded execution of
> conversion steps, also. I don't know what the performance of
> commons-csv is but it might be worth doing some benchmarks to see.
>
> On Thu, Jul 18, 2019 at 4:35 AM Ji Liu  wrote:
> >
> > Hi all,
> >
> > Seems there is no adapter to convert CSV data to Arrow data in Java side
> which C++ has.  Now we already have JDBC adapter, Orc adapter and Avro
> adapter (In progress),  I think an adapter for CSV would probably also be
> nice.
> > After a brief discuss with @Micah Kornfield, Apache commons-csv [1]
> seems an efficient CSV parser that we could potentially leverage but I
> don't know if there are other better options. Any inputs and comments would
> be appreciated.
> >
> > Thanks,
> > Ji Liu[1]https://commons.apache.org/proper/commons-csv/
>


Re: [DISCUSS][JAVA] Implement a CSV to Arrow adapter

2019-07-18 Thread Wes McKinney
We wrote a custom reader in C++ since performance of parsing CSV files
matters a lot -- we wanted to do multi-threaded execution of
conversion steps, also. I don't know what the performance of
commons-csv is but it might be worth doing some benchmarks to see.

On Thu, Jul 18, 2019 at 4:35 AM Ji Liu  wrote:
>
> Hi all,
>
> Seems there is no adapter to convert CSV data to Arrow data in Java side 
> which C++ has.  Now we already have JDBC adapter, Orc adapter and Avro 
> adapter (In progress),  I think an adapter for CSV would probably also be 
> nice.
> After a brief discuss with @Micah Kornfield, Apache commons-csv [1] seems an 
> efficient CSV parser that we could potentially leverage but I don't know if 
> there are other better options. Any inputs and comments would be appreciated.
>
> Thanks,
> Ji Liu[1]https://commons.apache.org/proper/commons-csv/


[DISCUSS][JAVA] Implement a CSV to Arrow adapter

2019-07-18 Thread Ji Liu
Hi all,

Seems there is no adapter to convert CSV data to Arrow data in Java side which 
C++ has.  Now we already have JDBC adapter, Orc adapter and Avro adapter (In 
progress),  I think an adapter for CSV would probably also be nice. 
After a brief discuss with @Micah Kornfield, Apache commons-csv [1] seems an 
efficient CSV parser that we could potentially leverage but I don't know if 
there are other better options. Any inputs and comments would be appreciated.

Thanks,
Ji Liu[1]https://commons.apache.org/proper/commons-csv/