Thanks a lot for Wes and Liya's feedbacks. Agreed that parsing performance of CSV files is important, and I just found a benchmark test for Java CSV library[1][2] which shows FastCSV has obvious advantages. Anyway, I will test it myself.
Thanks, Ji Liu [1] https://raw.githubusercontent.com/osiegmar/FastCSV/master/benchmark.png [2] https://github.com/osiegmar/FastCSV ------------------------------------------------------------------ From:Fan Liya <[email protected]> Send Time:2019年7月19日(星期五) 10:14 To:dev <[email protected]> Cc:Ji Liu <[email protected]>; Micah Kornfield <[email protected]> Subject:Re: [DISCUSS][JAVA] Implement a CSV to Arrow adapter Hi Ji, Thanks for proposing this. CSV adapter sounds like a useful feature. Best, Liya Fan On Fri, Jul 19, 2019 at 12:31 AM Wes McKinney <[email protected]> wrote: We wrote a custom reader in C++ since performance of parsing CSV files matters a lot -- we wanted to do multi-threaded execution of conversion steps, also. I don't know what the performance of commons-csv is but it might be worth doing some benchmarks to see. On Thu, Jul 18, 2019 at 4:35 AM Ji Liu <[email protected]> wrote: > > Hi all, > > Seems there is no adapter to convert CSV data to Arrow data in Java side > which C++ has. Now we already have JDBC adapter, Orc adapter and Avro > adapter (In progress), I think an adapter for CSV would probably also be > nice. > After a brief discuss with @Micah Kornfield, Apache commons-csv [1] seems an > efficient CSV parser that we could potentially leverage but I don't know if > there are other better options. Any inputs and comments would be appreciated. > > Thanks, > Ji Liu[1]https://commons.apache.org/proper/commons-csv/
