James Lyon wrote:
>
> ES> How should a *single* CVS executable "accept/deal with" all of
> ES> the following, which it *must* do if it's to defend itself
> ES> against the kinds of abuse you want to throw at it?
> ES> - Unix format: <LF>
> ES> - DOS format: <CR><LF>
> ES> - Mac format: <CR>
> ES> - Files in which some lines use one of the above conventions,
> ES> and some use another (because you edited a DOS-format file in
> ES> vi on a Unix box, and didn't religiously type the ^v^m's)
> ES> - Unix-format files that contain <CR>s as actual formatting
> ES> characters -- perhaps even at the ends of lines, for doing
> ES> overstriking, so looking specifically for <CR><LF> is unsafe
> ES> - Record-oriented formats which use length words and have no
> ES> terminator at all. This is old mainframe stuff -- dying, but
> ES> alas not dead yet. (For an example, see below.)
>
On the dinosaur I worked on for over a decade, lines were terminated
by two character's worth of binary zeros. (They're retiring it now,
but its use did overlap the CVS era by quite a few years.)
> The request was to handle only the line-terminated approach, and not
> the record-orientated approach.
>
That was a request. It isn't the only possible request. Since it
is a request to change CVS, it has to be looked at globally.
> You just treat *any* <LF> *or* <CR> as a line terminitor with the
> single *exception* that any <LF> that is found immediately following a
> <CR> is skipped without treating it as a line terminator.
>
Which violates the fifth case above. I don't know how often \r
is used for formatting, but the fact that it's got its own escape
sequence in C suggests that somebody's probably using it for
something - or at least that's a possibility you've got to
consider.
In short, this is a change that has the potential to really
screw up somebody's files somewhere. I don't think this is
something to do lightly.
> This simple algorithm is *very* effective except when a "formatting"
> <CR> is used with something following it other than an <LF>. But
> that's logically ambiguous anyway and so you have to tell "it" what to
> do with such situations.
>
Um, no, that's not logically ambiguous. In a Unix text file,
the \r is very well defined. If you don't know whether it's a
Unix text file or not, then you've got an ambiguity problem.
> Having said all that, the real answer is to use utilities like
> unix2dos and dos2unix to make sure your files are fixed before using
> them in the particular environment... so the problem is evaded in the
> first place.
>
Right. Alternatively, if you know your files do not contain
embedded \ns and \rs, you can run a filter to make sure the
files have the right line-ending conventions. This is something
that should be done on a local basis, since you can be sure of
your own file contents but not everybody else's.
(And in reply to an earlier email: a three-l lllama is in fact
a pretty bad fire in Brooklyn.)
(Explanation for those who need it: in the US, fires are often
measured in number of alarms, where a one-alarmer is a routine
fire, and a five-alarmer is catastrophic. In the traditional
Brooklyn accent, "three-alarmer" would be pronounced much like
"three-l lama". I suppose this is what I get for trying to
use a accent- and jargon-based joke in an international forum.)
--
David H. Thornley Software Engineer
at CES International, Inc.: [EMAIL PROTECTED] or (763)-694-2556
at home: (612)-623-0552 or [EMAIL PROTECTED] or
http://www.visi.com/~thornley/david/
_______________________________________________
Info-cvs mailing list
[EMAIL PROTECTED]
http://mail.gnu.org/mailman/listinfo/info-cvs