On 13 October 2012 14:31, sebb <[email protected]> wrote:
> On 13 October 2012 12:55, sebb <[email protected]> wrote:
>> Before r1397883, the Lexer operated only on char fields; it's now been
>> converted to use Character, which means that unboxing is needed.
>>
>> Also, the Character fields need to be checked for null before use.
>>
>> It has just occurred to me that there is a genuine illegal char value
>> for everything except the delimiter - that is, the delimiter itself.
>> It does not make sense for there to be no delimiter, nor does it make
>> sense for any other meta-character to be the same as the delimiter.
>>
>> So rather than having
>>
>>     Character escape;
>> ...
>>     boolean isEscape(final int c) {
>>         return escape != null && c == escape.charValue();
>>     }
>>
>> one could use the simpler (and more efficient)
>>
>>     char escape;
>> ...
>>     boolean isEscape(final int c) { // similarly for isEncapsulator etc.
>>         return escape != delimiter;
>>     }
>
> Sorry, that's rubbish; it needs to be:
>
>      boolean isEscape(final int c) { // similarly for isEncapsulator etc.
>          return escape != delimiter && c == escape;
>      }
>
> which is hardly better than before.
>
> However, if the Lexer ctor ensures that escape (etc) can never be the
> same as the delimiter, then the check can be simplified to:
>
>      boolean isEscape(final int c) { // similarly for isEncapsulator etc.
>          return c == escape;
>      }

Rats! (I've not had coffee yet today!)

That last simplification does not work; the check needs to be:

     boolean isEscape(final int c) { // similarly for isEncapsulator etc.
         return c != delimiter && c == escape;
     }

The check for c != delimiter could be eliminated by setting the escape
char to DISABLED if its Character is null.

That would effectively revert back to the original code, except that
the DISABLED char would only be used in the Lexer.

See patch attached to https://issues.apache.org/jira/browse/CSV-94

This passes the unit tests.

>
>> This would have the added bonus of automatically disallowing delimiter
>> as the escape (or encapsulator etc.) because they would not be
>> recognised.
>> [At present the code does not check this]
>>
>> The Lexer ctor would need to be changed to convert a null escape
>> Character (comment etc) to the delimiter.
>
> and it would need to throw IAE or similar if any of the meta chars
> match the delimiter.

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to