On Mon, 21 Sep 2009 11:39:14 -0400, Per-Erik Brodin <per-erik.bro...@ericsson.com> wrote:

Michael A. Puls II wrote:
On Fri, 18 Sep 2009 11:37:24 -0400, Per-Erik Brodin wrote:

When parsing an event stream, allowing carriage return, carriage return
line feed, and line feed to denote line endings introduces unnecessary
ambiguity into the spec. For example, the sequence "\r\r\n\n" could be
interpreted as three or four line endings.
That would always be 3 lines: a mac, a windows and a nix. "\n\r\n\r" would be the reverse order, but still 3.
So what you are saying is that "\r\n" will always be a Windows line
ending and never a Mac line ending followed by a Unix line ending?

Ideally, yes, imo.

 Universal newline normalization for input with mixed newline formats:
 // normalize newlines to \n
.replace(/\r\n|\r/g, "\n");
 // normalize newlines to \r\n
.replace(/\r\n|r|\n/g, "\r\n");
 // normalize newlines to \r
.replace(/\r\n|\n/g, "\r");
While regular expressions are greedy by default, I have been told that
there is no way to express such behavior using ABNF. For what it is
worth, that means that the current ABNF definition of the event stream
format can't stand on its own.

Ideally, I think it's often best to do the first to normalize to \n for processing (like if you need to know line count) and then normalize to a different format *if needed* afterwards.
 IMO

Keep in mind that we are parsing a continuous stream where data arrives
in chunks. It is entirely possible for a "\r\n" pair to be split up
between two chunks which could be handled by either 1) dispatching an
event immediately when receiving a carriage return and then upon
reception of the next chunk "remember" that the last character in the
previous chunk was a carriage return and discard the first character if
it happens to be line feed, or 2) not dispatching an event until the
next character after carriage return has been received which could lead
to delays in event dispatch. Both these options are far from ideal.

#1 sounds like it makes great sense, imo. Ideally, even if you're handling things in chunks, the end result should be the same as if you got it all at once. In other words, if you can help it, don't let the chunkiness mess up your desired newline handling :).

Of course, it'd be nice if there's only ever \n to deal with.

--
Michael

Reply via email to