On Tue, Jul 3, 2012 at 10:17 AM, Tatsuo Ishii <is...@postgresql.org> wrote:
> > OK. So, in that case, I suggest that if the leading byte is non-zero, > > we emit 0x9d followed by the three available bytes, instead of first > > testing whether the first byte is >= 0xf0. That test seems to serve > > no purpose but to confuse the issue. > > Probably the code shoud look like this(see below comment): > > else if (lb >= 0xf0 && lb <= 0xfe) > { > if (lb <= 0xf4) > *to++ = 0x9c; > else > *to++ = 0x9d; > *to++ = lb; > *to++ = (*from >> 8) & 0xff; > *to++ = *from & 0xff; > cnt += 4; It's likely we also need to assign some names to all these numbers (0xf0, 0xf4, 0xfe, 0x9c, 0x9d). But it's hard for me to invent such names. > > I further suggest that we improve the comments on the mule functions > > for both wchar->mb and mb->wchar to make all this more clear. > > I have added comments about mule internal encoding by refreshing my > memory and from old document found on > web( > http://mibai.tec.u-ryukyu.ac.jp/cgi-bin/info2www?%28mule%29Buffer%20and%20string > ). > > Please take a look at. BTW, it seems conversion between multibyte and > wchar can be roundtrip in the leading character is LCPRV2 case: > > If the second byte of wchar (out of 4 bytes of wchar. The first byte > is always 0x00) is in range of 0xf0 to 0xf4, then the first byte of > multibyte must be 0x9c. If the second byte of wchar is in range of > 0xf5 to 0xfe, then the first byte of multibyte must be 0x9d. Should I intergrate these code changes into my patch? Or we would like to commit them first? ------ With best regards, Alexander Korotkov.