Re: [fossil-users] How to force text for all files?

Shal Farley Mon, 08 Dec 2014 12:16:27 -0800

Stephan,

> If it has ANY bytes above 127, it's not, by definition, ASCII. i.e.
> "it's binary."

I would disagree with part of this statement. I agree that ASCII definesonly the 7-bit code values, but I think this whole thread has run offthe rails in talking about the content values as determining whether thefile is "text" or "binary".

But this discussion of content heuristics misses the point of why thereis a distinction to be made in the first place. And that I think hasmore to do with whether the content is organized into "lines".

In a functional sense for Fossil, a "text" file is one for which it isuseful to display a line-oriented difference. For all other files("binary" files) the difference can only be displayed in a way that isagnostic of the internal structure (if any) of the content.

Given that there is no universal heuristic for discriminating "text"from "binary" files based on content, that determination must be treatedas a bit of metadata about the file.

Likewise, it is necessary to know for a given file what representationis used to separate lines. Knowledge of the line separator is seldomcarried as metadata, because it is usually uniform in a given system.But in these days of interoperable systems and multi-platform support,this detail also may be a necessary piece of metadata to know about afile. ASCII code calls out the CR (carriage return) and LF (line-feed)control characters. DOS-based systems (including Windows) follow thedirect ASCII tradition of using CR and LF, paired in that order (andoften represented as CRLF) as the line separator. That tradition is alsoembodied in the Internet Mail standards for message content, header andbody (absent MIME extensions). Unix-based systems use the LF characteralone as the line separator in files (aka "newline"). Other systems haveused CR alone.

And additionally, the character set used to represent text in a filemust also be carried as metadata (because of the ISO-8859 and othercode-page based character sets).

Only if all these items of metadata are known can the file content, ordifferences in the file content, be displayed in a useful form. Soreturning to this thread, it is convenient to have a heuristic thatworks most of the time to discriminate "text" from "binary" files, butit is necessary to also have a way for the user to explicitly providethat metadata (and ideally the character set metadata).


-- Shal

_______________________________________________
fossil-users mailing list
fossil-users@lists.fossil-scm.org
http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users

Re: [fossil-users] How to force text for all files?

Reply via email to