Atsushi wrote: > Mono does not support non-UTF8 multibyte conversion by design.
That's ok, but whatever we marshal out we should be able to marshal back, yeah? Okay, so after some more digging and realizing things are more complicated than I thought, here's what I've learned: PtrToStringAnsi does a UTF-8-to-UTF-16 conversion StringToHGlobalAnsi does exactly the reverse StringToCoTaskMemAnsi does something totally different! It does something kind of like a conversion to ANSI (or maybe it is ANSI, I'm not sure). There's no way to marshal such pointers back. While the Ptr and HGlobal methods are icalls, the CoTaskMem methods are in C#. My confusion yesterday came from my assumption that StringToCoTaskMemAnsi was simply wrapping StringToHGlobalAnsi, whose implementation I was looking at. StringToHGlobalAnsi eventually calls the glib conversion function, and so I was expecting to see UTF-8 in the resulting bytes whereas I was only seeing ANSI. Anyway, I think StringToCoTaskMemAnsi should be changed to do exactly the same thing as StringToHGlobalAnsi, right? StringToCoTaskMemUni also has a managed implementation, and while it looks OK, it strangely also doesn't reuse the implementation of StringToHGlobalUni. -- - Joshua Tauberer http://taubz.for.net "Unfortunately, we're having this discussion. It's too bad, because guess who listens to the discussion: the enemy." Atsushi Eno wrote: > Hello, > > Mono does not support non-UTF8 multibyte conversion by design. We > shouldn't change its behavior from current one. Actually it is pretty > classic matter which has been stated since 2003. > http://lists.ximian.com/archives/public/mono-list/2003-June/014500.html > > It is Microsoft who should provide additional marshaling flags so that > it will be truly functional on every platforms (especially considering > that there is also Gtk+ on Windows which is apparently designed to work > on Windows and uses UTF-8 based marshaling). AFAIK they are also aware > on this matter through ECMA meetings. > > Atsushi Eno > > >> While debugging a SqliteClient issue, I came across an interesting bug. >> The following returns null when I'm pretty sure it should not (it >> doesn't on Windows): >> >> Marshal.PtrToStringAnsi(Marshal.StringToCoTaskMemAnsi("ü")) >> >> In case the encoding of this email gets messed up, that's a u with >> umlauts, (char)0xFC. >> >> The encoding half "works" (Marshal.ReadByte reports the bytes (0xFC >> 0x00)), on the assumption that I'm supposed to get ANSI out of this >> method. Internally, g_utf16_to_utf8 is used, which means that (besides >> being surprised this call doesn't actually do ANSI encoding) I would >> actually expect a multibyte representation of that character. That's >> from a few minutes of Googling for info on UTF-8. >> >> So I'm confused. Can someone with more knowledge about encodings tell >> me whether this really doesn't make sense? >> >> I'm using the latest RPMs. Here's a test program: >> >> using System; >> using System.Runtime.InteropServices; >> >> public class Test { >> public static void Main() >> Console.WriteLine(Marshal.PtrToStringAnsi(Marshal.StringToCoTaskMemAnsi("ü"))); >> } >> } >> _______________________________________________ Mono-devel-list mailing list Mono-devel-list@lists.ximian.com http://lists.ximian.com/mailman/listinfo/mono-devel-list