In article <[EMAIL PROTECTED]>, Stefan Monnier <[EMAIL PROTECTED]> writes:
> > I don't think it uncommon. People migrate from Windows to GNU/Linux > > (or switch between both), people exchange files with Windows users, > > ... (and on Windows, it's quite common to insert `smart quotes' and > > other non-Latin-1 characters). > True, but in my experience plain-text files using windows-1252 are still > rather uncommon under GNU/Linux. Of course, it depends on the specifics, > but adapting Emacs to the specific circumstance should be done via the > .emacs, I think. > > What is the benefit to treat it as raw-text instead of window-1252 > > assuming that the file only contains characters from window-1252? We > > are taking about a file (> 300000 chars of text) with mostly ASCII, > > some Latin-1 [ÄÖÜäöüß] (1.3%, probably typical for a German text), and > > 19 \202 characters (= 0.005%). > Obviously, in the case where the file is using window-1252 encoding, there's > no harm in Emacs using the windows-1252 encoding. But what about the other > cases, e.g. if the file is just binary, or slightly incorrect utf-8, or ...? At least windows-1252 doesn't cover all eight-bit bytes. There are a few invalid bytes: 0x81, 0x8c, 0x8e... Anyway, how about thinking the situation this way. When one visits a binary file and it's detected as windows-1252, usually he can easily notice that the auto-detection did bad thing because a binary file tend to contain many 8-bit bytes in the first page. So, he can re-read the file by C-x C-m c binary RET C-x C-v RET. But, when one visits a windows-1252 file and it's read as raw-text, it's more difficult to notice that the file is not correctly decoded because it may not contain a raw-byte in the first page. In this case, he'll notice the problem only after he did some editing, and that is too late to re-read the file. Stefan Monnier <[EMAIL PROTECTED]> writes: > So I'd rather have a tool that explains what's going on, so that the user > can decide to use window-1252 if it's a good choice for her, rather than > force windows-1252 on all users most of whom won't ever edit a file with > window-1252 encoding. How about indicating a binary buffer in more outstanding way, for instance, changing the mode line color and show "BINARY FILE" in the mode line? --- Kenichi Handa [EMAIL PROTECTED] _______________________________________________ emacs-pretest-bug mailing list emacs-pretest-bug@gnu.org http://lists.gnu.org/mailman/listinfo/emacs-pretest-bug