On Monday 03 October 2005 16:47, Christian Biere wrote:
> The problem with those functions is that the used encoding for
> wide-characters is implementation-defined and may even depend
> on the current locale. So it might simply be EUC-JP codepoints
> and wchar_t could as well be a plain 8-bit char on systems that
> offer no real Unicode support. As long as Gtk+ 2.x is usually
> on such systems, we can still fully support Unicode with the
> current way.

I don't believe that there are still systems in use that use a 
non-unicode implementation of wchar_t. At least on GNU systems, it is 
always UCS-4. If anyone uses GTKG on a system with a non-unicode 
wchar_t, please raise you hand.

> I really have no idea how to convert wchar_t * (or win_t *) strings
> to UTF-8 without that knowledge. Even if we blindly assume it's
> Unicode it may be UTF-8, UCS-2, UTF-16, UTF-32 maybe even a
> system-specific non-standard encoding of Unicode codepoints.

You don't need to know the internal encoding of wchar_t. iconv_open() 
undstands "WCHAR_T" as encoding name. So you can just use iconv to 
convert the strings from their internal wchar_t representation to 
utf-8, which is required by the gnutella protocol.

> Almost, most people can use UTF-8 as their locale encoding, so it's
> not a question of luck. I would rather blame the OS vendor for using
> a bad default than the users of course.

Exactly. If my OS would have installed a utf-8 environment by default 
when I first installed it, I'd never have encountered any problems. But 
it didn't, and I suspect that most users don't know how to change that.

> > What would be more convenient for them?
>
> They should use UTF-8 (or Unicode in general) instead. So
> that they don't have problems with foreign strings/languages and
> foreigners don't have problems with their strings.

Yes, but see above.

> > Note that you already have an option "Convert 'evil' characters
> > (like shell meta characters) to underscores in generated
> > filenames".
>
> Which is something "all others" do not. Should I remove it again?

No, it's OK :-). That's not that big a point. I only said that to point 
out that there are already options that cause GTKG to store a file on 
disk with a name that is not exactly the name by which is was found on 
the net. And it also uses underscores, with all implications (loss of 
information, separation of gnutella keywords).

> Sure but if no apps insist or at least prefer UTF-8, less people
> will switch to UTF-8.

And during my experiments with a utf-8 environment, I already 
encountered the first problems in other applications. amaroK, for 
example, which is my preferred audio player, has a bug that causes 
problems if files with utf-8 encoded filenames are called from the 
command line. Shame on them. I had hoped to delay my switch until all 
real-world software is really mature and unicode-aware enough that this 
move wouldn't cause problems. C'est la vie.

Hauke


-------------------------------------------------------
This SF.Net email is sponsored by:
Power Architecture Resource Center: Free content, downloads, discussions,
and more. http://solutions.newsforge.com/ibmarch.tmpl
_______________________________________________
Gtk-gnutella-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/gtk-gnutella-devel

Reply via email to