On 13 March 2012 00:12, Emmanuel Bourg <ebo...@apache.org> wrote:
> I kept tickling ExtendedBufferedReader and I have some interesting results.
>
> First I tried to simplify it by extending java.io.LineNumberReader instead
> of BufferedReader. The performance decreased by 20%, probably because the
> class is synchronized internally.
>
> But wait, isn't BufferedReader also synchronized? I copied the code of
> BufferedReader and removed the synchronized blocks. Now the time to parse
> the file is down to 2652 ms, 28% faster than previously!
>
> Of course the code of BufferedReader can't be copied from the JDK due to the
> license mismatch, so I took the version from Harmony. On my test it is about
> 4% faster than the JDK counterpart, and the parsing time is now around 2553
> ms.

I'm concerned that the CSV code may grow and grow with private
versions of code that could be provided by the JDK.

By all means make sure the code is efficient in the way it uses the
JDK classes, but I don't think we should be recoding standard classes.

> Now Commons CSV can start claiming being the fastest CSV parser around :)
>
> Emmanuel Bourg
>
>
> Le 12/03/2012 11:31, Emmanuel Bourg a écrit :
>
>> I have identified the performance killer, it's the
>> ExtendedBufferedReader. It implements a complex logic to fetch one
>> character ahead, but this extra character is rarely used. I have
>> implemented a simpler look ahead using mark/reset as suggested by Bob
>> Smith in CSV-42 and the performance improved by 30%.
>>
>> Now the parsing is down to 3406 ms, and that's almost without touching
>> the parser yet.
>>
>> Emmanuel Bourg
>>
>>
>> Le 11/03/2012 15:05, Emmanuel Bourg a écrit :
>>>
>>> Hi,
>>>
>>> I compared the performance of Commons CSV with the other CSV parsers
>>> available. I took the world cities file from Maxmind as a test file [1],
>>> it's a big file of 130M with 2.8 million records.
>>>
>>> Here are the results obtained on a Core 2 Duo E8400 after several
>>> iterations to let the JIT compiler kick in:
>>>
>>> Direct read 750 ms
>>> Java CSV 3328 ms
>>> Super CSV 3562 ms (+7%)
>>> OpenCSV 3609 ms (+8.4%)
>>> GenJava CSV 3844 ms (+15.5%)
>>> Commons CSV 4656 ms (+39.9%)
>>> Skife CSV 4813 ms (+44.6%)
>>>
>>> I also tried Nuiton CSV and Esperio CSV but I couldn't figure how to use
>>> them.
>>>
>>> I haven't analyzed why Commons CSV is slower yet, but it seems there is
>>> room for improvements. The memory usage will have to be compared too,
>>> I'm looking for a way to measure it.
>>>
>>>
>>> Emmanuel Bourg
>>>
>>> [1] http://www.maxmind.com/download/worldcities/worldcitiespop.txt.gz
>>>
>>
>>
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

Reply via email to