Le 12/03/2012 17:03, Benedikt Ritter a écrit :

The hole logic behind CSVLexer.nextToken() is very hard to read
(IMHO). Maybe a some refactoring would help to make it easier to
identify bottle necks?

Yes I started investigating in this direction. I filed a few bugs regarding the behavior of the escaping that aim at clarifying the parser.

I think the nextToken() method should be broken into smaller methods to help the JIT compiler.

The JIT does some surprising things, I found that even unused code branches can have an impact on the performance. For example if simpleTokenLexer() is changed to not support escaped characters, the performance improves by 10% (the input has no escaped character). And that's not merely because an if statement was removed. If I add a System.out.println() in this if block that is never called, the performance improves as well.

So any change to the parser will have to be carefully tested. Innocent changes can have a significant impact.


Emmanuel Bourg

Attachment: smime.p7s
Description: S/MIME Cryptographic Signature

Reply via email to