Would one of the parser libraries not work here? On Mar 12, 2012 12:22 PM, "Emmanuel Bourg" <ebo...@apache.org> wrote:
> Le 12/03/2012 17:03, Benedikt Ritter a écrit : > > The hole logic behind CSVLexer.nextToken() is very hard to read >> (IMHO). Maybe a some refactoring would help to make it easier to >> identify bottle necks? >> > > Yes I started investigating in this direction. I filed a few bugs > regarding the behavior of the escaping that aim at clarifying the parser. > > I think the nextToken() method should be broken into smaller methods to > help the JIT compiler. > > The JIT does some surprising things, I found that even unused code > branches can have an impact on the performance. For example if > simpleTokenLexer() is changed to not support escaped characters, the > performance improves by 10% (the input has no escaped character). And > that's not merely because an if statement was removed. If I add a > System.out.println() in this if block that is never called, the performance > improves as well. > > So any change to the parser will have to be carefully tested. Innocent > changes can have a significant impact. > > > Emmanuel Bourg > >