Follow-up Comment #7, patch #4406 (project mldonkey): skip to ---HERE--- for the important things, if you know enough about charset encoding and locale stuff and how it is on win and linux.
Ok i checked it. a locale charset encoding is called codepage on windows. we detect it correctly (at least for an one common codpage [german win2003 displays: "locale: CP1252" in buildinfo]). win has all string functions as an ansi (fooA) and an wide (fooW) version and an wrapper that uses one of them depending on if UNICODE is defined (see http://msdn.microsoft.com/library/default.asp?url=/library/en-us/intl/unicode_28c3.asp and it's siblings for this and more windows unicode info). the wide versions use strings of the character type wchar_t encoded as UCS-2LE (at least on i386 i think), libiconv has an special encoding named wchar_t for this use (see http://www.gnu.org/software/libiconv/ ). the ansi versions are for use with strings that are encoded with the codepage the system is set to. on linux the filename functions just work with anything that is thrown at them. (an filename can contain any char/bytevalue except / on linux, even such things as \0 .) and how we encode that is entierly an thing of the locales character encoding. everyone should use the utf-8 encoding, but there are still people using other encodeings and thus we encode any filename on write to that locale (that is .to_locale for) or on read from that locale (i don't know if we do that currently). so we either need use an unpatched ocaml (that uses the ansi functions) and output filenames in the system codepage (but normally that is something that can only represent an limited amount of chars, much much less than unicode) or use an patched ocaml that uses the widechar functions and output filenames in an unicode encoding suitable for it. an solution to ---HERE--- the problem with the current version of the unicode patch to ocaml is that we don't know (in mldonkey code) if we have an ocaml that uses ansi or widechar functions and thus can't know if need to convert to the locale or to wchar_t. another problem with the patch is, that it does an additional convert between utf-8 and ucs-2le (where the later should be wchar_t anyway, even tho that is most likely an alias to ucs_2le on many windows systems). (not doing the additional conversion would simplyfy the patch much more. using the UNICODE define could perhaps reduce it further. see the msdn link above for more.) (another solution would be to set the codepage to utf-8 and just use the ocaml with the ansi functions, but i don't know if one can do that. windows as something like per thread locales "CP_THREAD_ACP", but i don't know if/how that can be set to utf-8) _______________________________________________________ Reply to this item at: <http://savannah.nongnu.org/patch/?func=detailitem&item_id=4406> _______________________________________________ Message sent via/by Savannah http://savannah.nongnu.org/ _______________________________________________ Mldonkey-users mailing list [email protected] http://lists.nongnu.org/mailman/listinfo/mldonkey-users
