On Tuesday 27 September 2005 18:25, Christian Biere wrote: > It's not a bug.
I think it's a POSIX violation, IIRC (see below). > For almost all Unix-like systems the encoding of filenames is > completely irrelevant, they are handled as opaque binary byte > strings. I know, and I completely agree with this. This is why it's the application's responsibility to interpret this byte string in the way the user wants. So there must be a way to tell applications how a given byte string should be interpreted. This could theoretically be done by an application-specific preference setting. But instead, POSIX suggests to use a locale setting for this purpose. > Actually, "[EMAIL PROTECTED]" does not imply any > character set. It's just a language preference. I think that's not true. A locale also contains information about character encoding. This information is supposed to be taken from the locale specified in LC_CTYPE (whereas LANG is the "general" language preference). [EMAIL PROTECTED] implies iso-8859-15. > Further this is a very special case in which you're seemingly > interested in files with filenames that are compatible with > your locale encoding. If you're interested in files that have > Asian or Arabic filenames (non-ASCII-fied) for example, those > filenames couldn't be converted anyway. So either your other > tools still wouldn't handle those properly or you'd have to > live with bogus filenames containing mostly underscores or > some trash. I know. I know that this is bad, but it's my decision. Or, more accurately, it was the default when I installed my OS some years ago. Many linux distros still use iso-charset locales for western languages, and that's what users expect to "work" as good as possible, i.e. at least work for those few characters that are contained in that charset. In my case, I want it to work for ä, ö, ü and ß. For the moment, I can live with japanese Kanji being converted to underscores. In fact, that would even be somewhat helpful, since I can't type arbitrary foreign-language filenames into my console. > That sucks but I still think it's better to keep the UTF-8. I'd > rather recommend to switch to UTF-8 ... > Not because of Gtk-Gnutella but because UTF-8 > is the future Yes, I know, me too. Utf-8 _IS_ the future (at least for unix, on other platforms it might be utf-16 or whatever unicode encoding). But that's not the point here. The point is that it has to work like the user expects, and it must be possible to have different applications interoperate. That's why standards exist. > and those apps > should be fixed to allow UTF-8 *and* the locale encoding That is logically impossible. If a filename contains the byte 0xc3 followed by a 0xa4, there is no way for the application to know whether I mean "ä" (in utf-8) or "À" (in iso-8859-15) or "辰" (in eucjp). All three are, in theory, perfectly reasonable. I have to tell the application which encoding to assume, and this is in most cases done by a locale setting. On Tuesday 27 September 2005 18:49, Daichi Kawahata wrote: > Back to the first topic, what do you think about ideal handling? GTKG should convert filenames to the charset of the locale given in LC_CTYPE before storing them to the disk. Characters that can't be represented in that charset should be replaced by an underscore or the like. Of course, it should internally keep the original filename until the file is completely downloaded and closed. This isn't a problem for partial file sharing at all. In most cases, the file should be requested by SHA1 anyway, not by name. And even if it is requested by name, GTKG of course internally knows the original utf-8 file name(s) of the still-downloading partial file, not only the "underscored" on-disk name. And query hits are never returned for partial files anyway. Greetz, Hauke Hachmann ------------------------------------------------------- This SF.Net email is sponsored by: Power Architecture Resource Center: Free content, downloads, discussions, and more. http://solutions.newsforge.com/ibmarch.tmpl _______________________________________________ Gtk-gnutella-devel mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/gtk-gnutella-devel
