Re:[CSV] The Feature Multiple-Character Delimiter

2020-05-13 Thread Chen Guoping1
At 2020-05-13 22:29:20, "Gary Gregory"  wrote:
>On Wed, May 13, 2020 at 6:48 AM sebb  wrote:
>
>
>Chen,
>
>Are you talking about record separators, field separators, or both?
>
>Gary
>


Hi, all


Sorry, field seperators.
It is the problem described by 
[CSV-206](https://issues.apache.org/jira/projects/CSV/issues/CSV-206)


Chen














At 2020-05-13 22:29:20, "Gary Gregory"  wrote:
>On Wed, May 13, 2020 at 6:48 AM sebb  wrote:
>
>> On Wed, 13 May 2020 at 00:27, Gary Gregory  wrote:
>> >
>> > Hi,
>> >
>> > May you give an example where more than one character is used as a
>> > separator? Is there a database or known tool out there that uses such a
>> > format?
>>
>> The IBAN Registry (TXT) located at:
>> https://www.swift.com/standards/data-standards/iban
>> uses \r\n as EOL.
>>
>> Some of the fields include \n within quoted values.
>>
>
>Chen,
>
>Are you talking about record separators, field separators, or both?
>
>Gary
>
>
>>
>> > WRT escaping I would think that \ escapes the one character that follows
>> > only. It is up to the reader to decide what to do with an escape
>> sequence.
>> > Anyone else?
>> >
>> > Gary
>> >
>> > On Tue, May 12, 2020 at 7:42 AM Chen Guoping1 
>> > wrote:
>> >
>> > > Hi, all
>> > >
>> > >
>> > >
>> > >
>> > > In CSV parsing, there are many scenarios where multiple characters are
>> > > used as separators,
>> > >
>> > > To support this feature, we should change the char type of delimiter to
>> > > String. This will lead to
>> > >
>> > > API changes, and old usage code may need to be modified to pass.
>> > >
>> > >
>> > >
>> > >
>> > > When parsing we can get the character array in advance through
>> > > lookAhead(int n) in the
>> > >
>> > > ExtendedBufferedReader to determine whether it is a delimiter
>> > >
>> > >
>> > >
>> > >
>> > > char[] lookAhead(int n) throws IOException {
>> > >
>> > > char[] buf = new char[n];
>> > >
>> > > super.mark(n);
>> > >
>> > > super.read(buf, 0, n);
>> > >
>> > > super.reset();
>> > >
>> > > return buf;
>> > >
>> > > }
>> > >
>> > >
>> > >
>> > >
>> > > I have a little problem to confirm. The escape character is' \ ',  when
>> > > delimiter is a char ','
>> > > printWithEscape print '\,' , so when delimiter is multiple characters
>> > > "[|]" printWithEscape
>> > > print ’“\[\|\]” or print "\[|]"? I'd prefer to print "\[\|\]". Is there
>> > > more any suggestion about
>> > > this feature ?
>> > >
>> > >
>> > > ——
>> > > Chen Guoping
>> > >
>> > >
>> > >
>> > >
>> > >
>> > >
>> > >
>> > >
>>
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
>> For additional commands, e-mail: dev-h...@commons.apache.org
>>
>>


Re: [CSV] The Feature Multiple-Character Delimiter

2020-05-13 Thread Gary Gregory
On Wed, May 13, 2020 at 6:48 AM sebb  wrote:

> On Wed, 13 May 2020 at 00:27, Gary Gregory  wrote:
> >
> > Hi,
> >
> > May you give an example where more than one character is used as a
> > separator? Is there a database or known tool out there that uses such a
> > format?
>
> The IBAN Registry (TXT) located at:
> https://www.swift.com/standards/data-standards/iban
> uses \r\n as EOL.
>
> Some of the fields include \n within quoted values.
>

Chen,

Are you talking about record separators, field separators, or both?

Gary


>
> > WRT escaping I would think that \ escapes the one character that follows
> > only. It is up to the reader to decide what to do with an escape
> sequence.
> > Anyone else?
> >
> > Gary
> >
> > On Tue, May 12, 2020 at 7:42 AM Chen Guoping1 
> > wrote:
> >
> > > Hi, all
> > >
> > >
> > >
> > >
> > > In CSV parsing, there are many scenarios where multiple characters are
> > > used as separators,
> > >
> > > To support this feature, we should change the char type of delimiter to
> > > String. This will lead to
> > >
> > > API changes, and old usage code may need to be modified to pass.
> > >
> > >
> > >
> > >
> > > When parsing we can get the character array in advance through
> > > lookAhead(int n) in the
> > >
> > > ExtendedBufferedReader to determine whether it is a delimiter
> > >
> > >
> > >
> > >
> > > char[] lookAhead(int n) throws IOException {
> > >
> > > char[] buf = new char[n];
> > >
> > > super.mark(n);
> > >
> > > super.read(buf, 0, n);
> > >
> > > super.reset();
> > >
> > > return buf;
> > >
> > > }
> > >
> > >
> > >
> > >
> > > I have a little problem to confirm. The escape character is' \ ',  when
> > > delimiter is a char ','
> > > printWithEscape print '\,' , so when delimiter is multiple characters
> > > "[|]" printWithEscape
> > > print ’“\[\|\]” or print "\[|]"? I'd prefer to print "\[\|\]". Is there
> > > more any suggestion about
> > > this feature ?
> > >
> > >
> > > ——
> > > Chen Guoping
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
> For additional commands, e-mail: dev-h...@commons.apache.org
>
>


Re: [CSV] The Feature Multiple-Character Delimiter

2020-05-13 Thread sebb
On Wed, 13 May 2020 at 00:27, Gary Gregory  wrote:
>
> Hi,
>
> May you give an example where more than one character is used as a
> separator? Is there a database or known tool out there that uses such a
> format?

The IBAN Registry (TXT) located at:
https://www.swift.com/standards/data-standards/iban
uses \r\n as EOL.

Some of the fields include \n within quoted values.

> WRT escaping I would think that \ escapes the one character that follows
> only. It is up to the reader to decide what to do with an escape sequence.
> Anyone else?
>
> Gary
>
> On Tue, May 12, 2020 at 7:42 AM Chen Guoping1 
> wrote:
>
> > Hi, all
> >
> >
> >
> >
> > In CSV parsing, there are many scenarios where multiple characters are
> > used as separators,
> >
> > To support this feature, we should change the char type of delimiter to
> > String. This will lead to
> >
> > API changes, and old usage code may need to be modified to pass.
> >
> >
> >
> >
> > When parsing we can get the character array in advance through
> > lookAhead(int n) in the
> >
> > ExtendedBufferedReader to determine whether it is a delimiter
> >
> >
> >
> >
> > char[] lookAhead(int n) throws IOException {
> >
> > char[] buf = new char[n];
> >
> > super.mark(n);
> >
> > super.read(buf, 0, n);
> >
> > super.reset();
> >
> > return buf;
> >
> > }
> >
> >
> >
> >
> > I have a little problem to confirm. The escape character is' \ ',  when
> > delimiter is a char ','
> > printWithEscape print '\,' , so when delimiter is multiple characters
> > "[|]" printWithEscape
> > print ’“\[\|\]” or print "\[|]"? I'd prefer to print "\[\|\]". Is there
> > more any suggestion about
> > this feature ?
> >
> >
> > ——
> > Chen Guoping
> >
> >
> >
> >
> >
> >
> >
> >

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org



Re: [CSV] The Feature Multiple-Character Delimiter

2020-05-12 Thread Gary Gregory
Hi,

May you give an example where more than one character is used as a
separator? Is there a database or known tool out there that uses such a
format?

WRT escaping I would think that \ escapes the one character that follows
only. It is up to the reader to decide what to do with an escape sequence.
Anyone else?

Gary

On Tue, May 12, 2020 at 7:42 AM Chen Guoping1 
wrote:

> Hi, all
>
>
>
>
> In CSV parsing, there are many scenarios where multiple characters are
> used as separators,
>
> To support this feature, we should change the char type of delimiter to
> String. This will lead to
>
> API changes, and old usage code may need to be modified to pass.
>
>
>
>
> When parsing we can get the character array in advance through
> lookAhead(int n) in the
>
> ExtendedBufferedReader to determine whether it is a delimiter
>
>
>
>
> char[] lookAhead(int n) throws IOException {
>
> char[] buf = new char[n];
>
> super.mark(n);
>
> super.read(buf, 0, n);
>
> super.reset();
>
> return buf;
>
> }
>
>
>
>
> I have a little problem to confirm. The escape character is' \ ',  when
> delimiter is a char ','
> printWithEscape print '\,' , so when delimiter is multiple characters
> "[|]" printWithEscape
> print ’“\[\|\]” or print "\[|]"? I'd prefer to print "\[\|\]". Is there
> more any suggestion about
> this feature ?
>
>
> ——
> Chen Guoping
>
>
>
>
>
>
>
>