Re: [cvs] CSVLexer.isEndOfLine(int c) makes assumptions on the line separator of a CSVFormat

Benedikt Ritter Mon, 12 Mar 2012 10:32:01 -0700

Am 12. März 2012 18:24 schrieb Emmanuel Bourg <ebo...@apache.org>:
> Le 12/03/2012 18:17, Benedikt Ritter a écrit :
>
>
>> this method assumes, that a line separator will always be "\r" or
>> "\r\n". This is true for the pre-configured CSVFormats EXCEL, TDF and
>> MYSQL. I'm not a pro when it comes to file encoding, but isn't there
>> the possibility that new encodings will have different line
>> separators?
>
>
> Indeed, there are unicode line separators, see:
>
> https://issues.apache.org/jira/browse/CSV-51
>
>
>> If that is the case, isEndOfLine() should somehow use
>> format.getLineSeparator().
>> For example the lookAhead only has to be made, if
>> lineSeperator.length()>  1. This may have a positive impact on the
>> performance of parsing files with an encoding whose line separator is
>> only one char long.
>
>
> CSVFormat defines a line separator, but it's only used by CSVPrinter. I'm
> not sure if we should restrict to this separator when parsing.
>


I'm not sure if I got you right. You have to pass a CSVFormat if you
want to construct a CSVLexer(), so we could use the lexer's internal
CSVformat.

> Emmanuel Bourg
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

Re: [cvs] CSVLexer.isEndOfLine(int c) makes assumptions on the line separator of a CSVFormat

Reply via email to