[Mldonkey-users] [patch #4406] Improved Unicode filename support

Amorphous Sat, 17 Sep 2005 18:03:34 -0700

Follow-up Comment #7, patch #4406 (project mldonkey):

skip to ---HERE--- for the important things, if you know
enough about charset encoding and locale stuff and how it is
on win and linux.


Ok i checked it. a locale charset encoding is called
codepage on windows. we detect it correctly (at least for an
one common codpage [german win2003 displays: "locale:
CP1252" in buildinfo]).

win has all string functions as an ansi (fooA) and an wide
(fooW) version and an wrapper that uses one of them
depending on if UNICODE is defined (see
http://msdn.microsoft.com/library/default.asp?url=/library/en-us/intl/unicode_28c3.asp
and it's siblings for this and more windows unicode info).
the wide versions use strings of the character type wchar_t
encoded as UCS-2LE (at least on i386 i think), libiconv has
an special encoding named wchar_t for this use (see
http://www.gnu.org/software/libiconv/ ). the ansi versions
are for use with strings that are encoded with the codepage
the system is set to.

on linux the filename functions just work with anything that
is thrown at them. (an filename can contain any
char/bytevalue except / on linux, even such things as \0 .)
and how we encode that is entierly an thing of the locales
character encoding. everyone should use the utf-8 encoding,
but there are still people using other encodeings and thus
we encode any filename on write to that locale (that is
.to_locale for) or on read from that locale (i don't know if
we do that currently).

so we either need use an unpatched ocaml (that uses the ansi
functions) and output filenames in the system codepage (but
normally that is something that can only represent an
limited amount of chars, much much less than unicode) or use
an patched ocaml that uses the widechar functions and output
filenames in an unicode encoding suitable for it. an
solution to

---HERE---
the problem with the current version of the unicode patch to
ocaml is that we don't know (in mldonkey code) if we have an
ocaml that uses ansi or widechar functions and thus can't
know if need to convert to the locale or to wchar_t.

another problem with the patch is, that it does an
additional convert between utf-8 and ucs-2le (where the
later should be wchar_t anyway, even tho that is most likely
an alias to ucs_2le on many windows systems). (not doing the
additional conversion would simplyfy the patch much more.
using the UNICODE define could perhaps reduce it further.
see the msdn link above for more.)

(another solution would be to set the codepage to utf-8 and
just use the ocaml with the ansi functions, but i don't know
if one can do that. windows as something like per thread
locales "CP_THREAD_ACP", but i don't know if/how that can be
set to utf-8)


    _______________________________________________________

Reply to this item at:

  <http://savannah.nongnu.org/patch/?func=detailitem&item_id=4406>

_______________________________________________
  Message sent via/by Savannah
  http://savannah.nongnu.org/



_______________________________________________
Mldonkey-users mailing list
[email protected]
http://lists.nongnu.org/mailman/listinfo/mldonkey-users

[Mldonkey-users] [patch #4406] Improved Unicode filename support

Reply via email to