On Monday 03 October 2005 16:47, Christian Biere wrote: > The problem with those functions is that the used encoding for > wide-characters is implementation-defined and may even depend > on the current locale. So it might simply be EUC-JP codepoints > and wchar_t could as well be a plain 8-bit char on systems that > offer no real Unicode support. As long as Gtk+ 2.x is usually > on such systems, we can still fully support Unicode with the > current way.
I don't believe that there are still systems in use that use a non-unicode implementation of wchar_t. At least on GNU systems, it is always UCS-4. If anyone uses GTKG on a system with a non-unicode wchar_t, please raise you hand. > I really have no idea how to convert wchar_t * (or win_t *) strings > to UTF-8 without that knowledge. Even if we blindly assume it's > Unicode it may be UTF-8, UCS-2, UTF-16, UTF-32 maybe even a > system-specific non-standard encoding of Unicode codepoints. You don't need to know the internal encoding of wchar_t. iconv_open() undstands "WCHAR_T" as encoding name. So you can just use iconv to convert the strings from their internal wchar_t representation to utf-8, which is required by the gnutella protocol. > Almost, most people can use UTF-8 as their locale encoding, so it's > not a question of luck. I would rather blame the OS vendor for using > a bad default than the users of course. Exactly. If my OS would have installed a utf-8 environment by default when I first installed it, I'd never have encountered any problems. But it didn't, and I suspect that most users don't know how to change that. > > What would be more convenient for them? > > They should use UTF-8 (or Unicode in general) instead. So > that they don't have problems with foreign strings/languages and > foreigners don't have problems with their strings. Yes, but see above. > > Note that you already have an option "Convert 'evil' characters > > (like shell meta characters) to underscores in generated > > filenames". > > Which is something "all others" do not. Should I remove it again? No, it's OK :-). That's not that big a point. I only said that to point out that there are already options that cause GTKG to store a file on disk with a name that is not exactly the name by which is was found on the net. And it also uses underscores, with all implications (loss of information, separation of gnutella keywords). > Sure but if no apps insist or at least prefer UTF-8, less people > will switch to UTF-8. And during my experiments with a utf-8 environment, I already encountered the first problems in other applications. amaroK, for example, which is my preferred audio player, has a bug that causes problems if files with utf-8 encoded filenames are called from the command line. Shame on them. I had hoped to delay my switch until all real-world software is really mature and unicode-aware enough that this move wouldn't cause problems. C'est la vie. Hauke ------------------------------------------------------- This SF.Net email is sponsored by: Power Architecture Resource Center: Free content, downloads, discussions, and more. http://solutions.newsforge.com/ibmarch.tmpl _______________________________________________ Gtk-gnutella-devel mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/gtk-gnutella-devel
