Am 12. März 2012 18:24 schrieb Emmanuel Bourg <ebo...@apache.org>: > Le 12/03/2012 18:17, Benedikt Ritter a écrit : > > >> this method assumes, that a line separator will always be "\r" or >> "\r\n". This is true for the pre-configured CSVFormats EXCEL, TDF and >> MYSQL. I'm not a pro when it comes to file encoding, but isn't there >> the possibility that new encodings will have different line >> separators? > > > Indeed, there are unicode line separators, see: > > https://issues.apache.org/jira/browse/CSV-51 > > >> If that is the case, isEndOfLine() should somehow use >> format.getLineSeparator(). >> For example the lookAhead only has to be made, if >> lineSeperator.length()> 1. This may have a positive impact on the >> performance of parsing files with an encoding whose line separator is >> only one char long. > > > CSVFormat defines a line separator, but it's only used by CSVPrinter. I'm > not sure if we should restrict to this separator when parsing. >
I'm not sure if I got you right. You have to pass a CSVFormat if you want to construct a CSVLexer(), so we could use the lexer's internal CSVformat. > Emmanuel Bourg > --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org For additional commands, e-mail: dev-h...@commons.apache.org