On Mar 12, 2012, at 20:25, sebb <seb...@gmail.com> wrote: > On 13 March 2012 00:12, Emmanuel Bourg <ebo...@apache.org> wrote: >> I kept tickling ExtendedBufferedReader and I have some interesting results. >> >> First I tried to simplify it by extending java.io.LineNumberReader instead >> of BufferedReader. The performance decreased by 20%, probably because the >> class is synchronized internally. >> >> But wait, isn't BufferedReader also synchronized? I copied the code of >> BufferedReader and removed the synchronized blocks. Now the time to parse >> the file is down to 2652 ms, 28% faster than previously! >> >> Of course the code of BufferedReader can't be copied from the JDK due to the >> license mismatch, so I took the version from Harmony. On my test it is about >> 4% faster than the JDK counterpart, and the parsing time is now around 2553 >> ms. > > I'm concerned that the CSV code may grow and grow with private > versions of code that could be provided by the JDK. > > By all means make sure the code is efficient in the way it uses the > JDK classes, but I don't think we should be recoding standard classes.
+1 Gary > >> Now Commons CSV can start claiming being the fastest CSV parser around :) >> >> Emmanuel Bourg >> >> >> Le 12/03/2012 11:31, Emmanuel Bourg a écrit : >> >>> I have identified the performance killer, it's the >>> ExtendedBufferedReader. It implements a complex logic to fetch one >>> character ahead, but this extra character is rarely used. I have >>> implemented a simpler look ahead using mark/reset as suggested by Bob >>> Smith in CSV-42 and the performance improved by 30%. >>> >>> Now the parsing is down to 3406 ms, and that's almost without touching >>> the parser yet. >>> >>> Emmanuel Bourg >>> >>> >>> Le 11/03/2012 15:05, Emmanuel Bourg a écrit : >>>> >>>> Hi, >>>> >>>> I compared the performance of Commons CSV with the other CSV parsers >>>> available. I took the world cities file from Maxmind as a test file [1], >>>> it's a big file of 130M with 2.8 million records. >>>> >>>> Here are the results obtained on a Core 2 Duo E8400 after several >>>> iterations to let the JIT compiler kick in: >>>> >>>> Direct read 750 ms >>>> Java CSV 3328 ms >>>> Super CSV 3562 ms (+7%) >>>> OpenCSV 3609 ms (+8.4%) >>>> GenJava CSV 3844 ms (+15.5%) >>>> Commons CSV 4656 ms (+39.9%) >>>> Skife CSV 4813 ms (+44.6%) >>>> >>>> I also tried Nuiton CSV and Esperio CSV but I couldn't figure how to use >>>> them. >>>> >>>> I haven't analyzed why Commons CSV is slower yet, but it seems there is >>>> room for improvements. The memory usage will have to be compared too, >>>> I'm looking for a way to measure it. >>>> >>>> >>>> Emmanuel Bourg >>>> >>>> [1] http://www.maxmind.com/download/worldcities/worldcitiespop.txt.gz >>>> >>> >>> >> >> > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org > For additional commands, e-mail: dev-h...@commons.apache.org > --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org For additional commands, e-mail: dev-h...@commons.apache.org