Re: [DISCUSS][JAVA] Implement a CSV to Arrow adapter
Thanks a lot for Wes and Liya's feedbacks. Agreed that parsing performance of CSV files is important, and I just found a benchmark test for Java CSV library[1][2] which shows FastCSV has obvious advantages. Anyway, I will test it myself. Thanks, Ji Liu [1] https://raw.githubusercontent.com/osiegmar/FastCSV/master/benchmark.png [2] https://github.com/osiegmar/FastCSV -- From:Fan Liya Send Time:2019年7月19日(星期五) 10:14 To:dev Cc:Ji Liu ; Micah Kornfield Subject:Re: [DISCUSS][JAVA] Implement a CSV to Arrow adapter Hi Ji, Thanks for proposing this. CSV adapter sounds like a useful feature. Best, Liya Fan On Fri, Jul 19, 2019 at 12:31 AM Wes McKinney wrote: We wrote a custom reader in C++ since performance of parsing CSV files matters a lot -- we wanted to do multi-threaded execution of conversion steps, also. I don't know what the performance of commons-csv is but it might be worth doing some benchmarks to see. On Thu, Jul 18, 2019 at 4:35 AM Ji Liu wrote: > > Hi all, > > Seems there is no adapter to convert CSV data to Arrow data in Java side > which C++ has. Now we already have JDBC adapter, Orc adapter and Avro > adapter (In progress), I think an adapter for CSV would probably also be > nice. > After a brief discuss with @Micah Kornfield, Apache commons-csv [1] seems an > efficient CSV parser that we could potentially leverage but I don't know if > there are other better options. Any inputs and comments would be appreciated. > > Thanks, > Ji Liu[1]https://commons.apache.org/proper/commons-csv/
Re: [DISCUSS][JAVA] Implement a CSV to Arrow adapter
Hi Ji, Thanks for proposing this. CSV adapter sounds like a useful feature. Best, Liya Fan On Fri, Jul 19, 2019 at 12:31 AM Wes McKinney wrote: > We wrote a custom reader in C++ since performance of parsing CSV files > matters a lot -- we wanted to do multi-threaded execution of > conversion steps, also. I don't know what the performance of > commons-csv is but it might be worth doing some benchmarks to see. > > On Thu, Jul 18, 2019 at 4:35 AM Ji Liu wrote: > > > > Hi all, > > > > Seems there is no adapter to convert CSV data to Arrow data in Java side > which C++ has. Now we already have JDBC adapter, Orc adapter and Avro > adapter (In progress), I think an adapter for CSV would probably also be > nice. > > After a brief discuss with @Micah Kornfield, Apache commons-csv [1] > seems an efficient CSV parser that we could potentially leverage but I > don't know if there are other better options. Any inputs and comments would > be appreciated. > > > > Thanks, > > Ji Liu[1]https://commons.apache.org/proper/commons-csv/ >
Re: [DISCUSS][JAVA] Implement a CSV to Arrow adapter
We wrote a custom reader in C++ since performance of parsing CSV files matters a lot -- we wanted to do multi-threaded execution of conversion steps, also. I don't know what the performance of commons-csv is but it might be worth doing some benchmarks to see. On Thu, Jul 18, 2019 at 4:35 AM Ji Liu wrote: > > Hi all, > > Seems there is no adapter to convert CSV data to Arrow data in Java side > which C++ has. Now we already have JDBC adapter, Orc adapter and Avro > adapter (In progress), I think an adapter for CSV would probably also be > nice. > After a brief discuss with @Micah Kornfield, Apache commons-csv [1] seems an > efficient CSV parser that we could potentially leverage but I don't know if > there are other better options. Any inputs and comments would be appreciated. > > Thanks, > Ji Liu[1]https://commons.apache.org/proper/commons-csv/
[DISCUSS][JAVA] Implement a CSV to Arrow adapter
Hi all, Seems there is no adapter to convert CSV data to Arrow data in Java side which C++ has. Now we already have JDBC adapter, Orc adapter and Avro adapter (In progress), I think an adapter for CSV would probably also be nice. After a brief discuss with @Micah Kornfield, Apache commons-csv [1] seems an efficient CSV parser that we could potentially leverage but I don't know if there are other better options. Any inputs and comments would be appreciated. Thanks, Ji Liu[1]https://commons.apache.org/proper/commons-csv/