Re: [csv] Performance comparison

Gary Gregory Mon, 12 Mar 2012 17:41:36 -0700

On Mar 12, 2012, at 20:25, sebb <seb...@gmail.com> wrote:

> On 13 March 2012 00:12, Emmanuel Bourg <ebo...@apache.org> wrote:
>> I kept tickling ExtendedBufferedReader and I have some interesting results.
>>
>> First I tried to simplify it by extending java.io.LineNumberReader instead
>> of BufferedReader. The performance decreased by 20%, probably because the
>> class is synchronized internally.
>>
>> But wait, isn't BufferedReader also synchronized? I copied the code of
>> BufferedReader and removed the synchronized blocks. Now the time to parse
>> the file is down to 2652 ms, 28% faster than previously!
>>
>> Of course the code of BufferedReader can't be copied from the JDK due to the
>> license mismatch, so I took the version from Harmony. On my test it is about
>> 4% faster than the JDK counterpart, and the parsing time is now around 2553
>> ms.
>
> I'm concerned that the CSV code may grow and grow with private
> versions of code that could be provided by the JDK.
>
> By all means make sure the code is efficient in the way it uses the
> JDK classes, but I don't think we should be recoding standard classes.


+1

Gary
>
>> Now Commons CSV can start claiming being the fastest CSV parser around :)
>>
>> Emmanuel Bourg
>>
>>
>> Le 12/03/2012 11:31, Emmanuel Bourg a écrit :
>>
>>> I have identified the performance killer, it's the
>>> ExtendedBufferedReader. It implements a complex logic to fetch one
>>> character ahead, but this extra character is rarely used. I have
>>> implemented a simpler look ahead using mark/reset as suggested by Bob
>>> Smith in CSV-42 and the performance improved by 30%.
>>>
>>> Now the parsing is down to 3406 ms, and that's almost without touching
>>> the parser yet.
>>>
>>> Emmanuel Bourg
>>>
>>>
>>> Le 11/03/2012 15:05, Emmanuel Bourg a écrit :
>>>>
>>>> Hi,
>>>>
>>>> I compared the performance of Commons CSV with the other CSV parsers
>>>> available. I took the world cities file from Maxmind as a test file [1],
>>>> it's a big file of 130M with 2.8 million records.
>>>>
>>>> Here are the results obtained on a Core 2 Duo E8400 after several
>>>> iterations to let the JIT compiler kick in:
>>>>
>>>> Direct read 750 ms
>>>> Java CSV 3328 ms
>>>> Super CSV 3562 ms (+7%)
>>>> OpenCSV 3609 ms (+8.4%)
>>>> GenJava CSV 3844 ms (+15.5%)
>>>> Commons CSV 4656 ms (+39.9%)
>>>> Skife CSV 4813 ms (+44.6%)
>>>>
>>>> I also tried Nuiton CSV and Esperio CSV but I couldn't figure how to use
>>>> them.
>>>>
>>>> I haven't analyzed why Commons CSV is slower yet, but it seems there is
>>>> room for improvements. The memory usage will have to be compared too,
>>>> I'm looking for a way to measure it.
>>>>
>>>>
>>>> Emmanuel Bourg
>>>>
>>>> [1] http://www.maxmind.com/download/worldcities/worldcitiespop.txt.gz
>>>>
>>>
>>>
>>
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
> For additional commands, e-mail: dev-h...@commons.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

Re: [csv] Performance comparison

Reply via email to