Re: [CSV] Inconsistent record separator behavior

2018-08-22 Thread Bruno P. Kinoshita
Hi,


Will try to look at the code and give a better answer during the weekend. But 
risking a silly question, would it mean that users are not able to parse a CSV 
unless each CSV row is separated by LF or CRLF? I remember getting a CSV in a 
government website some time ago that was formatted in a very strange way, and 
if I remember well it was a small file, but without LF or CRLF. I think it was 
using | to separate the rows, and , for columns.


Quick search returned at least another person with similar issue 
https://stackoverflow.com/questions/29903202/how-to-read-csv-on-python-with-newline-separator


Not sure if I understood the problem well, but in case it makes sense... my 
suggestion would be to perhaps confirm if we could change 
CSVPrinter.printComment to accept other characters for line ending? 


Thanks!

Bruno



From: Benedikt Ritter 
To: Commons Developers List  
Sent: Tuesday, 21 August 2018 7:13 PM
Subject: [CSV] Inconsistent record separator behavior



Hi,


we have this strange handling of record separator / line endings in CSV:


Users can use what ever character sequence they like as a record separator.

I could for example use the ! character to mark the end of a record.

Then we have CSVPrinter.printComment(String). This inserts comments into a

CSV output. It detects CRLF and call println() on the CSVFormat, which in

turn uses the record separator to indicate a new record...


So now I'm thinking: Does it make sense to use anything else but LF or CRLF

as record separator? Maybe we should deprecate

CSVFormat.recordSeparator(String) and introduce a LineEnding enum where

users can choose between LF and CRLF. This way we can make the behavior

between parsing and printing consistent.


Thoughts?

Benedikt

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org



Re: [CSV] Inconsistent record separator behavior

2018-08-22 Thread Benedikt Ritter
Hi Bruno,

Am Mi., 22. Aug. 2018 um 15:10 Uhr schrieb Bruno P. Kinoshita
:

> Hi,
>
>
> Will try to look at the code and give a better answer during the weekend.
> But risking a silly question, would it mean that users are not able to
> parse a CSV unless each CSV row is separated by LF or CRLF?


Yes.


> I remember getting a CSV in a government website some time ago that was
> formatted in a very strange way, and if I remember well it was a small
> file, but without LF or CRLF. I think it was using | to separate the rows,
> and , for columns.
>

I didn't know that there are formats that don't use a new line as line
separator.


>
>
> Quick search returned at least another person with similar issue
> https://stackoverflow.com/questions/29903202/how-to-read-csv-on-python-with-newline-separator
>
>
> Not sure if I understood the problem well, but in case it makes sense...
> my suggestion would be to perhaps confirm if we could change
> CSVPrinter.printComment to accept other characters for line ending?
>

The inconsistency I'm seeing is, that we an the one hand accept any
character sequence as a record separator. Comments in a way a like special
records to me. But our implementation seems to put them on a new "line"
using the println() method. The println() method in turn uses the record
seperator to start a new record. So it's not necessarily a new line.
Nevertheless while processing a comment, we look out for CR and LF and then
we call println() again. Maybe I'm just not getting it, but it feels pretty
messed up :-)

Regards,
Benedikt


>
>
> Thanks!
>
> Bruno
>
>
> 
> From: Benedikt Ritter 
> To: Commons Developers List 
> Sent: Tuesday, 21 August 2018 7:13 PM
> Subject: [CSV] Inconsistent record separator behavior
>
>
>
> Hi,
>
>
> we have this strange handling of record separator / line endings in CSV:
>
>
> Users can use what ever character sequence they like as a record separator.
>
> I could for example use the ! character to mark the end of a record.
>
> Then we have CSVPrinter.printComment(String). This inserts comments into a
>
> CSV output. It detects CRLF and call println() on the CSVFormat, which in
>
> turn uses the record separator to indicate a new record...
>
>
> So now I'm thinking: Does it make sense to use anything else but LF or CRLF
>
> as record separator? Maybe we should deprecate
>
> CSVFormat.recordSeparator(String) and introduce a LineEnding enum where
>
> users can choose between LF and CRLF. This way we can make the behavior
>
> between parsing and printing consistent.
>
>
> Thoughts?
>
> Benedikt
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
> For additional commands, e-mail: dev-h...@commons.apache.org
>
>


Re: [CSV] Inconsistent record separator behavior

2018-08-22 Thread Bruno P. Kinoshita


>Maybe I'm just not getting it, but it feels pretty messed up :-)


Mutual feeling, and +1 for consistency. From what I understood, users should be 
able to parse these crazy CVS's, but if they tried to re-create them, with 
comments, then they wouldn't be able to avoid the println/newline (so it 
wouldn't be parseable later with the same reader).


We probably need a ticket for it to aggregate the discussion and maybe a 
possible solution.

Cheers


From: Benedikt Ritter 
To: Commons Developers List ; 
brunodepau...@yahoo.com.br 
Sent: Thursday, 23 August 2018 7:10 AM
Subject: Re: [CSV] Inconsistent record separator behavior



Hi Bruno,

Am Mi., 22. Aug. 2018 um 15:10 Uhr schrieb Bruno P. Kinoshita
:

> Hi,
>
>
> Will try to look at the code and give a better answer during the weekend.
> But risking a silly question, would it mean that users are not able to
> parse a CSV unless each CSV row is separated by LF or CRLF?


Yes.


> I remember getting a CSV in a government website some time ago that was
> formatted in a very strange way, and if I remember well it was a small
> file, but without LF or CRLF. I think it was using | to separate the rows,
> and , for columns.
>

I didn't know that there are formats that don't use a new line as line
separator.


>
>
> Quick search returned at least another person with similar issue
> https://stackoverflow.com/questions/29903202/how-to-read-csv-on-python-with-newline-separator
>
>
> Not sure if I understood the problem well, but in case it makes sense...
> my suggestion would be to perhaps confirm if we could change
> CSVPrinter.printComment to accept other characters for line ending?
>

The inconsistency I'm seeing is, that we an the one hand accept any
character sequence as a record separator. Comments in a way a like special
records to me. But our implementation seems to put them on a new "line"
using the println() method. The println() method in turn uses the record
seperator to start a new record. So it's not necessarily a new line.
Nevertheless while processing a comment, we look out for CR and LF and then
we call println() again. Maybe I'm just not getting it, but it feels pretty
messed up :-)

Regards,
Benedikt



>
>
> Thanks!
>
> Bruno
>
>
> 
> From: Benedikt Ritter 
> To: Commons Developers List 
> Sent: Tuesday, 21 August 2018 7:13 PM
> Subject: [CSV] Inconsistent record separator behavior
>
>
>
> Hi,
>
>
> we have this strange handling of record separator / line endings in CSV:
>
>
> Users can use what ever character sequence they like as a record separator.
>
> I could for example use the ! character to mark the end of a record.
>
> Then we have CSVPrinter.printComment(String). This inserts comments into a
>
> CSV output. It detects CRLF and call println() on the CSVFormat, which in
>
> turn uses the record separator to indicate a new record...
>
>
> So now I'm thinking: Does it make sense to use anything else but LF or CRLF
>
> as record separator? Maybe we should deprecate
>
> CSVFormat.recordSeparator(String) and introduce a LineEnding enum where
>
> users can choose between LF and CRLF. This way we can make the behavior
>
> between parsing and printing consistent.
>
>
> Thoughts?
>
> Benedikt
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
> For additional commands, e-mail: dev-h...@commons.apache.org

>
>

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org



Re: [CSV] Inconsistent record separator behavior

2018-08-22 Thread sebb
On 23 August 2018 at 00:01, Bruno P. Kinoshita
 wrote:
>
>>Maybe I'm just not getting it, but it feels pretty messed up :-)
>
>
> Mutual feeling, and +1 for consistency. From what I understood, users should 
> be able to parse these crazy CVS's, but if they tried to re-create them, with 
> comments, then they wouldn't be able to avoid the println/newline (so it 
> wouldn't be parseable later with the same reader).
>
>
> We probably need a ticket for it to aggregate the discussion and maybe a 
> possible solution.

I'm wondering whether we need to be as flexible when *creating* the CSV files.

"Be liberal in what you accept, and conservative in what you send" (Jon Postel)

In this case send == create, as it might be sent to other less liberal readers.

I don't have a problem with the output being less flexible, so long as
it is sufficiently flexible (which I think it likely is already).

I don't think consistency is necessary - or even desirable - here.

> Cheers
>
> 
> From: Benedikt Ritter 
> To: Commons Developers List ; 
> brunodepau...@yahoo.com.br
> Sent: Thursday, 23 August 2018 7:10 AM
> Subject: Re: [CSV] Inconsistent record separator behavior
>
>
>
> Hi Bruno,
>
> Am Mi., 22. Aug. 2018 um 15:10 Uhr schrieb Bruno P. Kinoshita
> :
>
>> Hi,
>>
>>
>> Will try to look at the code and give a better answer during the weekend.
>> But risking a silly question, would it mean that users are not able to
>> parse a CSV unless each CSV row is separated by LF or CRLF?
>
>
> Yes.
>
>
>> I remember getting a CSV in a government website some time ago that was
>> formatted in a very strange way, and if I remember well it was a small
>> file, but without LF or CRLF. I think it was using | to separate the rows,
>> and , for columns.
>>
>
> I didn't know that there are formats that don't use a new line as line
> separator.
>
>
>>
>>
>> Quick search returned at least another person with similar issue
>> https://stackoverflow.com/questions/29903202/how-to-read-csv-on-python-with-newline-separator
>>
>>
>> Not sure if I understood the problem well, but in case it makes sense...
>> my suggestion would be to perhaps confirm if we could change
>> CSVPrinter.printComment to accept other characters for line ending?
>>
>
> The inconsistency I'm seeing is, that we an the one hand accept any
> character sequence as a record separator. Comments in a way a like special
> records to me. But our implementation seems to put them on a new "line"
> using the println() method. The println() method in turn uses the record
> seperator to start a new record. So it's not necessarily a new line.
> Nevertheless while processing a comment, we look out for CR and LF and then
> we call println() again. Maybe I'm just not getting it, but it feels pretty
> messed up :-)
>
> Regards,
> Benedikt
>
>
>
>>
>>
>> Thanks!
>>
>> Bruno
>>
>>
>> 
>> From: Benedikt Ritter 
>> To: Commons Developers List 
>> Sent: Tuesday, 21 August 2018 7:13 PM
>> Subject: [CSV] Inconsistent record separator behavior
>>
>>
>>
>> Hi,
>>
>>
>> we have this strange handling of record separator / line endings in CSV:
>>
>>
>> Users can use what ever character sequence they like as a record separator.
>>
>> I could for example use the ! character to mark the end of a record.
>>
>> Then we have CSVPrinter.printComment(String). This inserts comments into a
>>
>> CSV output. It detects CRLF and call println() on the CSVFormat, which in
>>
>> turn uses the record separator to indicate a new record...
>>
>>
>> So now I'm thinking: Does it make sense to use anything else but LF or CRLF
>>
>> as record separator? Maybe we should deprecate
>>
>> CSVFormat.recordSeparator(String) and introduce a LineEnding enum where
>>
>> users can choose between LF and CRLF. This way we can make the behavior
>>
>> between parsing and printing consistent.
>>
>>
>> Thoughts?
>>
>> Benedikt
>>
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
>> For additional commands, e-mail: dev-h...@commons.apache.org
>
>>
>>
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
> For additional commands, e-mail: dev-h...@commons.apache.org
>

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org



Re: [CSV] Inconsistent record separator behavior

2018-08-22 Thread Bruno P. Kinoshita
Very good arguments (as always) Sebb. I'd also be OK with leaving as is, until 
we have a user with a good reason for changing the send/create. 


And thanks for including the author of the quote. Going through his Wikipedia 
page, lots of things to read later.

Bruno



From: sebb 
To: Commons Developers List ; Bruno P. Kinoshita 
 
Sent: Thursday, 23 August 2018 11:23 AM
Subject: Re: [CSV] Inconsistent record separator behavior



On 23 August 2018 at 00:01, Bruno P. Kinoshita
 wrote:
>
>>Maybe I'm just not getting it, but it feels pretty messed up :-)
>
>
> Mutual feeling, and +1 for consistency. From what I understood, users should 
> be able to parse these crazy CVS's, but if they tried to re-create them, with 
> comments, then they wouldn't be able to avoid the println/newline (so it 
> wouldn't be parseable later with the same reader).
>
>
> We probably need a ticket for it to aggregate the discussion and maybe a 
> possible solution.

I'm wondering whether we need to be as flexible when *creating* the CSV files.

"Be liberal in what you accept, and conservative in what you send" (Jon Postel)

In this case send == create, as it might be sent to other less liberal readers.

I don't have a problem with the output being less flexible, so long as
it is sufficiently flexible (which I think it likely is already).

I don't think consistency is necessary - or even desirable - here.

> Cheers
>
> 
> From: Benedikt Ritter 
> To: Commons Developers List ; 
> brunodepau...@yahoo.com.br
> Sent: Thursday, 23 August 2018 7:10 AM
> Subject: Re: [CSV] Inconsistent record separator behavior
>
>
>
> Hi Bruno,
>
> Am Mi., 22. Aug. 2018 um 15:10 Uhr schrieb Bruno P. Kinoshita
> :
>
>> Hi,
>>
>>
>> Will try to look at the code and give a better answer during the weekend.
>> But risking a silly question, would it mean that users are not able to
>> parse a CSV unless each CSV row is separated by LF or CRLF?
>
>
> Yes.
>
>
>> I remember getting a CSV in a government website some time ago that was
>> formatted in a very strange way, and if I remember well it was a small
>> file, but without LF or CRLF. I think it was using | to separate the rows,
>> and , for columns.
>>
>
> I didn't know that there are formats that don't use a new line as line
> separator.
>
>
>>
>>
>> Quick search returned at least another person with similar issue
>> https://stackoverflow.com/questions/29903202/how-to-read-csv-on-python-with-newline-separator
>>
>>
>> Not sure if I understood the problem well, but in case it makes sense...
>> my suggestion would be to perhaps confirm if we could change
>> CSVPrinter.printComment to accept other characters for line ending?
>>
>
> The inconsistency I'm seeing is, that we an the one hand accept any
> character sequence as a record separator. Comments in a way a like special
> records to me. But our implementation seems to put them on a new "line"
> using the println() method. The println() method in turn uses the record
> seperator to start a new record. So it's not necessarily a new line.
> Nevertheless while processing a comment, we look out for CR and LF and then
> we call println() again. Maybe I'm just not getting it, but it feels pretty
> messed up :-)
>
> Regards,
> Benedikt
>
>
>
>>
>>
>> Thanks!
>>
>> Bruno
>>
>>
>> 
>> From: Benedikt Ritter 
>> To: Commons Developers List 
>> Sent: Tuesday, 21 August 2018 7:13 PM
>> Subject: [CSV] Inconsistent record separator behavior
>>
>>
>>
>> Hi,
>>
>>
>> we have this strange handling of record separator / line endings in CSV:
>>
>>
>> Users can use what ever character sequence they like as a record separator.
>>
>> I could for example use the ! character to mark the end of a record.
>>
>> Then we have CSVPrinter.printComment(String). This inserts comments into a
>>
>> CSV output. It detects CRLF and call println() on the CSVFormat, which in
>>
>> turn uses the record separator to indicate a new record...
>>
>>
>> So now I'm thinking: Does it make sense to use anything else but LF or CRLF
>>
>> as record separator? Maybe we should deprecate
>>
>> CSVFormat.recordSeparator(String) and introduce a LineEnding enum where
>>
>> users can choose between LF and CRLF. This way we can make the behavior
>>
>> between parsing and printing consistent.
>>
&g

Re: [CSV] Inconsistent record separator behavior

2018-08-22 Thread Benedikt Ritter
Hey sebb,

Am Do., 23. Aug. 2018 um 01:23 Uhr schrieb sebb :

> On 23 August 2018 at 00:01, Bruno P. Kinoshita
>  wrote:
> >
> >>Maybe I'm just not getting it, but it feels pretty messed up :-)
> >
> >
> > Mutual feeling, and +1 for consistency. From what I understood, users
> should be able to parse these crazy CVS's, but if they tried to re-create
> them, with comments, then they wouldn't be able to avoid the
> println/newline (so it wouldn't be parseable later with the same reader).
> >
> >
> > We probably need a ticket for it to aggregate the discussion and maybe a
> possible solution.
>
> I'm wondering whether we need to be as flexible when *creating* the CSV
> files.
>
> "Be liberal in what you accept, and conservative in what you send" (Jon
> Postel)
>
> In this case send == create, as it might be sent to other less liberal
> readers.
>
> I don't have a problem with the output being less flexible, so long as
> it is sufficiently flexible (which I think it likely is already).
>
> I don't think consistency is necessary - or even desirable - here.
>

okay, but wouldn't you expect that you can use a CSVFormat instance to read
a file that you created with it? This is currently not the case.

Regards,
Benedikt


>
> > Cheers
> >
> > ________
> > From: Benedikt Ritter 
> > To: Commons Developers List ;
> brunodepau...@yahoo.com.br
> > Sent: Thursday, 23 August 2018 7:10 AM
> > Subject: Re: [CSV] Inconsistent record separator behavior
> >
> >
> >
> > Hi Bruno,
> >
> > Am Mi., 22. Aug. 2018 um 15:10 Uhr schrieb Bruno P. Kinoshita
> > :
> >
> >> Hi,
> >>
> >>
> >> Will try to look at the code and give a better answer during the
> weekend.
> >> But risking a silly question, would it mean that users are not able to
> >> parse a CSV unless each CSV row is separated by LF or CRLF?
> >
> >
> > Yes.
> >
> >
> >> I remember getting a CSV in a government website some time ago that was
> >> formatted in a very strange way, and if I remember well it was a small
> >> file, but without LF or CRLF. I think it was using | to separate the
> rows,
> >> and , for columns.
> >>
> >
> > I didn't know that there are formats that don't use a new line as line
> > separator.
> >
> >
> >>
> >>
> >> Quick search returned at least another person with similar issue
> >>
> https://stackoverflow.com/questions/29903202/how-to-read-csv-on-python-with-newline-separator
> >>
> >>
> >> Not sure if I understood the problem well, but in case it makes sense...
> >> my suggestion would be to perhaps confirm if we could change
> >> CSVPrinter.printComment to accept other characters for line ending?
> >>
> >
> > The inconsistency I'm seeing is, that we an the one hand accept any
> > character sequence as a record separator. Comments in a way a like
> special
> > records to me. But our implementation seems to put them on a new "line"
> > using the println() method. The println() method in turn uses the record
> > seperator to start a new record. So it's not necessarily a new line.
> > Nevertheless while processing a comment, we look out for CR and LF and
> then
> > we call println() again. Maybe I'm just not getting it, but it feels
> pretty
> > messed up :-)
> >
> > Regards,
> > Benedikt
> >
> >
> >
> >>
> >>
> >> Thanks!
> >>
> >> Bruno
> >>
> >>
> >> 
> >> From: Benedikt Ritter 
> >> To: Commons Developers List 
> >> Sent: Tuesday, 21 August 2018 7:13 PM
> >> Subject: [CSV] Inconsistent record separator behavior
> >>
> >>
> >>
> >> Hi,
> >>
> >>
> >> we have this strange handling of record separator / line endings in CSV:
> >>
> >>
> >> Users can use what ever character sequence they like as a record
> separator.
> >>
> >> I could for example use the ! character to mark the end of a record.
> >>
> >> Then we have CSVPrinter.printComment(String). This inserts comments
> into a
> >>
> >> CSV output. It detects CRLF and call println() on the CSVFormat, which
> in
> >>
> >> turn uses the record separator to indicate a new record...
> >>
> >>
> >> So n

Re: [CSV] Inconsistent record separator behavior

2018-08-23 Thread sebb
On 23 August 2018 at 07:10, Benedikt Ritter  wrote:
> Hey sebb,
>
> Am Do., 23. Aug. 2018 um 01:23 Uhr schrieb sebb :
>
>> On 23 August 2018 at 00:01, Bruno P. Kinoshita
>>  wrote:
>> >
>> >>Maybe I'm just not getting it, but it feels pretty messed up :-)
>> >
>> >
>> > Mutual feeling, and +1 for consistency. From what I understood, users
>> should be able to parse these crazy CVS's, but if they tried to re-create
>> them, with comments, then they wouldn't be able to avoid the
>> println/newline (so it wouldn't be parseable later with the same reader).
>> >
>> >
>> > We probably need a ticket for it to aggregate the discussion and maybe a
>> possible solution.
>>
>> I'm wondering whether we need to be as flexible when *creating* the CSV
>> files.
>>
>> "Be liberal in what you accept, and conservative in what you send" (Jon
>> Postel)
>>
>> In this case send == create, as it might be sent to other less liberal
>> readers.
>>
>> I don't have a problem with the output being less flexible, so long as
>> it is sufficiently flexible (which I think it likely is already).
>>
>> I don't think consistency is necessary - or even desirable - here.
>>
>
> okay, but wouldn't you expect that you can use a CSVFormat instance to read
> a file that you created with it? This is currently not the case.

Sorry, I misread the problem.

Yes, it should be able to read what it writes.

So the issue remains: should the reader be able to parse the unusual
format, or should the writer not be able to create it?

I don't have a particular view on that, except that allowing LF and
CRLF only seems too restricting.
We should allow at least CR alone. I don't know whether there are any
other reasonable separators.

Perhaps we could just document the method to warn that using anything
other than CR, LF or CRLF will produce an output file that is not
parseable?

> Regards,
> Benedikt
>
>
>>
>> > Cheers
>> >
>> > 
>> > From: Benedikt Ritter 
>> > To: Commons Developers List ;
>> brunodepau...@yahoo.com.br
>> > Sent: Thursday, 23 August 2018 7:10 AM
>> > Subject: Re: [CSV] Inconsistent record separator behavior
>> >
>> >
>> >
>> > Hi Bruno,
>> >
>> > Am Mi., 22. Aug. 2018 um 15:10 Uhr schrieb Bruno P. Kinoshita
>> > :
>> >
>> >> Hi,
>> >>
>> >>
>> >> Will try to look at the code and give a better answer during the
>> weekend.
>> >> But risking a silly question, would it mean that users are not able to
>> >> parse a CSV unless each CSV row is separated by LF or CRLF?
>> >
>> >
>> > Yes.
>> >
>> >
>> >> I remember getting a CSV in a government website some time ago that was
>> >> formatted in a very strange way, and if I remember well it was a small
>> >> file, but without LF or CRLF. I think it was using | to separate the
>> rows,
>> >> and , for columns.
>> >>
>> >
>> > I didn't know that there are formats that don't use a new line as line
>> > separator.
>> >
>> >
>> >>
>> >>
>> >> Quick search returned at least another person with similar issue
>> >>
>> https://stackoverflow.com/questions/29903202/how-to-read-csv-on-python-with-newline-separator
>> >>
>> >>
>> >> Not sure if I understood the problem well, but in case it makes sense...
>> >> my suggestion would be to perhaps confirm if we could change
>> >> CSVPrinter.printComment to accept other characters for line ending?
>> >>
>> >
>> > The inconsistency I'm seeing is, that we an the one hand accept any
>> > character sequence as a record separator. Comments in a way a like
>> special
>> > records to me. But our implementation seems to put them on a new "line"
>> > using the println() method. The println() method in turn uses the record
>> > seperator to start a new record. So it's not necessarily a new line.
>> > Nevertheless while processing a comment, we look out for CR and LF and
>> then
>> > we call println() again. Maybe I'm just not getting it, but it feels
>> pretty
>> > messed up :-)
>> >
>> > Regards,
>> > Benedikt
>> >
>> >
>> >
>> >>
>> >>
>> >

Re: [CSV] Inconsistent record separator behavior

2018-08-23 Thread Benedikt Ritter
Hi,

Am Do., 23. Aug. 2018 um 12:11 Uhr schrieb sebb :

> On 23 August 2018 at 07:10, Benedikt Ritter  wrote:
> > Hey sebb,
> >
> > Am Do., 23. Aug. 2018 um 01:23 Uhr schrieb sebb :
> >
> >> On 23 August 2018 at 00:01, Bruno P. Kinoshita
> >>  wrote:
> >> >
> >> >>Maybe I'm just not getting it, but it feels pretty messed up :-)
> >> >
> >> >
> >> > Mutual feeling, and +1 for consistency. From what I understood, users
> >> should be able to parse these crazy CVS's, but if they tried to
> re-create
> >> them, with comments, then they wouldn't be able to avoid the
> >> println/newline (so it wouldn't be parseable later with the same
> reader).
> >> >
> >> >
> >> > We probably need a ticket for it to aggregate the discussion and
> maybe a
> >> possible solution.
> >>
> >> I'm wondering whether we need to be as flexible when *creating* the CSV
> >> files.
> >>
> >> "Be liberal in what you accept, and conservative in what you send" (Jon
> >> Postel)
> >>
> >> In this case send == create, as it might be sent to other less liberal
> >> readers.
> >>
> >> I don't have a problem with the output being less flexible, so long as
> >> it is sufficiently flexible (which I think it likely is already).
> >>
> >> I don't think consistency is necessary - or even desirable - here.
> >>
> >
> > okay, but wouldn't you expect that you can use a CSVFormat instance to
> read
> > a file that you created with it? This is currently not the case.
>
> Sorry, I misread the problem.
>
> Yes, it should be able to read what it writes.
>
> So the issue remains: should the reader be able to parse the unusual
> format, or should the writer not be able to create it?
>
> I don't have a particular view on that, except that allowing LF and
> CRLF only seems too restricting.
> We should allow at least CR alone. I don't know whether there are any
> other reasonable separators.
>

As Bruno pointed out, there seem to be formats that have record separator
that are not new lines. So maybe CSVPrinter.printComment(String) should not
scan for CR and LF but for the record separator.


>
> Perhaps we could just document the method to warn that using anything
> other than CR, LF or CRLF will produce an output file that is not
> parseable?
>

That sounds like a good approach. But how would you implement that? You
probably don't want to introduce a dependency on a logging framework just
for that, do you?

Regards,
Benedikt


>
> > Regards,
> > Benedikt
> >
> >
> >>
> >> > Cheers
> >> >
> >> > 
> >> > From: Benedikt Ritter 
> >> > To: Commons Developers List ;
> >> brunodepau...@yahoo.com.br
> >> > Sent: Thursday, 23 August 2018 7:10 AM
> >> > Subject: Re: [CSV] Inconsistent record separator behavior
> >> >
> >> >
> >> >
> >> > Hi Bruno,
> >> >
> >> > Am Mi., 22. Aug. 2018 um 15:10 Uhr schrieb Bruno P. Kinoshita
> >> > :
> >> >
> >> >> Hi,
> >> >>
> >> >>
> >> >> Will try to look at the code and give a better answer during the
> >> weekend.
> >> >> But risking a silly question, would it mean that users are not able
> to
> >> >> parse a CSV unless each CSV row is separated by LF or CRLF?
> >> >
> >> >
> >> > Yes.
> >> >
> >> >
> >> >> I remember getting a CSV in a government website some time ago that
> was
> >> >> formatted in a very strange way, and if I remember well it was a
> small
> >> >> file, but without LF or CRLF. I think it was using | to separate the
> >> rows,
> >> >> and , for columns.
> >> >>
> >> >
> >> > I didn't know that there are formats that don't use a new line as line
> >> > separator.
> >> >
> >> >
> >> >>
> >> >>
> >> >> Quick search returned at least another person with similar issue
> >> >>
> >>
> https://stackoverflow.com/questions/29903202/how-to-read-csv-on-python-with-newline-separator
> >> >>
> >> >>
> >> >> Not sure if I understood the problem well, but in c

Re: [CSV] Inconsistent record separator behavior

2018-08-23 Thread sebb
On 23 August 2018 at 17:31, Benedikt Ritter  wrote:
> Hi,
>
> Am Do., 23. Aug. 2018 um 12:11 Uhr schrieb sebb :
>
>> On 23 August 2018 at 07:10, Benedikt Ritter  wrote:
>> > Hey sebb,
>> >
>> > Am Do., 23. Aug. 2018 um 01:23 Uhr schrieb sebb :
>> >
>> >> On 23 August 2018 at 00:01, Bruno P. Kinoshita
>> >>  wrote:
>> >> >
>> >> >>Maybe I'm just not getting it, but it feels pretty messed up :-)
>> >> >
>> >> >
>> >> > Mutual feeling, and +1 for consistency. From what I understood, users
>> >> should be able to parse these crazy CVS's, but if they tried to
>> re-create
>> >> them, with comments, then they wouldn't be able to avoid the
>> >> println/newline (so it wouldn't be parseable later with the same
>> reader).
>> >> >
>> >> >
>> >> > We probably need a ticket for it to aggregate the discussion and
>> maybe a
>> >> possible solution.
>> >>
>> >> I'm wondering whether we need to be as flexible when *creating* the CSV
>> >> files.
>> >>
>> >> "Be liberal in what you accept, and conservative in what you send" (Jon
>> >> Postel)
>> >>
>> >> In this case send == create, as it might be sent to other less liberal
>> >> readers.
>> >>
>> >> I don't have a problem with the output being less flexible, so long as
>> >> it is sufficiently flexible (which I think it likely is already).
>> >>
>> >> I don't think consistency is necessary - or even desirable - here.
>> >>
>> >
>> > okay, but wouldn't you expect that you can use a CSVFormat instance to
>> read
>> > a file that you created with it? This is currently not the case.
>>
>> Sorry, I misread the problem.
>>
>> Yes, it should be able to read what it writes.
>>
>> So the issue remains: should the reader be able to parse the unusual
>> format, or should the writer not be able to create it?
>>
>> I don't have a particular view on that, except that allowing LF and
>> CRLF only seems too restricting.
>> We should allow at least CR alone. I don't know whether there are any
>> other reasonable separators.
>>
>
> As Bruno pointed out, there seem to be formats that have record separator
> that are not new lines. So maybe CSVPrinter.printComment(String) should not
> scan for CR and LF but for the record separator.
>

Makes sense.

>>
>> Perhaps we could just document the method to warn that using anything
>> other than CR, LF or CRLF will produce an output file that is not
>> parseable?
>>
>
> That sounds like a good approach. But how would you implement that? You
> probably don't want to introduce a dependency on a logging framework just
> for that, do you?

I meant: add a warning to the documentation.

> Regards,
> Benedikt
>
>
>>
>> > Regards,
>> > Benedikt
>> >
>> >
>> >>
>> >> > Cheers
>> >> >
>> >> > 
>> >> > From: Benedikt Ritter 
>> >> > To: Commons Developers List ;
>> >> brunodepau...@yahoo.com.br
>> >> > Sent: Thursday, 23 August 2018 7:10 AM
>> >> > Subject: Re: [CSV] Inconsistent record separator behavior
>> >> >
>> >> >
>> >> >
>> >> > Hi Bruno,
>> >> >
>> >> > Am Mi., 22. Aug. 2018 um 15:10 Uhr schrieb Bruno P. Kinoshita
>> >> > :
>> >> >
>> >> >> Hi,
>> >> >>
>> >> >>
>> >> >> Will try to look at the code and give a better answer during the
>> >> weekend.
>> >> >> But risking a silly question, would it mean that users are not able
>> to
>> >> >> parse a CSV unless each CSV row is separated by LF or CRLF?
>> >> >
>> >> >
>> >> > Yes.
>> >> >
>> >> >
>> >> >> I remember getting a CSV in a government website some time ago that
>> was
>> >> >> formatted in a very strange way, and if I remember well it was a
>> small
>> >> >> file, but without LF or CRLF. I think it was using | to separate the
>> >> rows,
>> >> >> and , for columns.
>> >> >>
>&

Re: [CSV] Inconsistent record separator behavior

2018-08-24 Thread Benedikt Ritter
Am Do., 23. Aug. 2018 um 20:17 Uhr schrieb sebb :

> On 23 August 2018 at 17:31, Benedikt Ritter  wrote:
> > Hi,
> >
> > Am Do., 23. Aug. 2018 um 12:11 Uhr schrieb sebb :
> >
> >> On 23 August 2018 at 07:10, Benedikt Ritter  wrote:
> >> > Hey sebb,
> >> >
> >> > Am Do., 23. Aug. 2018 um 01:23 Uhr schrieb sebb :
> >> >
> >> >> On 23 August 2018 at 00:01, Bruno P. Kinoshita
> >> >>  wrote:
> >> >> >
> >> >> >>Maybe I'm just not getting it, but it feels pretty messed up :-)
> >> >> >
> >> >> >
> >> >> > Mutual feeling, and +1 for consistency. From what I understood,
> users
> >> >> should be able to parse these crazy CVS's, but if they tried to
> >> re-create
> >> >> them, with comments, then they wouldn't be able to avoid the
> >> >> println/newline (so it wouldn't be parseable later with the same
> >> reader).
> >> >> >
> >> >> >
> >> >> > We probably need a ticket for it to aggregate the discussion and
> >> maybe a
> >> >> possible solution.
> >> >>
> >> >> I'm wondering whether we need to be as flexible when *creating* the
> CSV
> >> >> files.
> >> >>
> >> >> "Be liberal in what you accept, and conservative in what you send"
> (Jon
> >> >> Postel)
> >> >>
> >> >> In this case send == create, as it might be sent to other less
> liberal
> >> >> readers.
> >> >>
> >> >> I don't have a problem with the output being less flexible, so long
> as
> >> >> it is sufficiently flexible (which I think it likely is already).
> >> >>
> >> >> I don't think consistency is necessary - or even desirable - here.
> >> >>
> >> >
> >> > okay, but wouldn't you expect that you can use a CSVFormat instance to
> >> read
> >> > a file that you created with it? This is currently not the case.
> >>
> >> Sorry, I misread the problem.
> >>
> >> Yes, it should be able to read what it writes.
> >>
> >> So the issue remains: should the reader be able to parse the unusual
> >> format, or should the writer not be able to create it?
> >>
> >> I don't have a particular view on that, except that allowing LF and
> >> CRLF only seems too restricting.
> >> We should allow at least CR alone. I don't know whether there are any
> >> other reasonable separators.
> >>
> >
> > As Bruno pointed out, there seem to be formats that have record separator
> > that are not new lines. So maybe CSVPrinter.printComment(String) should
> not
> > scan for CR and LF but for the record separator.
> >
>
> Makes sense.
>
> >>
> >> Perhaps we could just document the method to warn that using anything
> >> other than CR, LF or CRLF will produce an output file that is not
> >> parseable?
> >>
> >
> > That sounds like a good approach. But how would you implement that? You
> > probably don't want to introduce a dependency on a logging framework just
> > for that, do you?
>
> I meant: add a warning to the documentation.
>

+1 for that! CSVPrinter has almost no class level documentation, so I
wanted to improve that anyway.

Benedikt


>
> > Regards,
> > Benedikt
> >
> >
> >>
> >> > Regards,
> >> > Benedikt
> >> >
> >> >
> >> >>
> >> >> > Cheers
> >> >> >
> >> >> > 
> >> >> > From: Benedikt Ritter 
> >> >> > To: Commons Developers List ;
> >> >> brunodepau...@yahoo.com.br
> >> >> > Sent: Thursday, 23 August 2018 7:10 AM
> >> >> > Subject: Re: [CSV] Inconsistent record separator behavior
> >> >> >
> >> >> >
> >> >> >
> >> >> > Hi Bruno,
> >> >> >
> >> >> > Am Mi., 22. Aug. 2018 um 15:10 Uhr schrieb Bruno P. Kinoshita
> >> >> > :
> >> >> >
> >> >> >> Hi,
> >> >> >>
> >> >> >>
> >> >> >> Will try to look at the code and give a better answer during the
> >> >>