Hi Alexander, It was good seeing you in Ottawa!
> Hello, Ishii-san! > > We've talked on PGCon that I've questions about mule to wchar > conversion. My questions about pg_mule2wchar_with_len function are > following. In these parts of code: > * > * > else if (IS_LCPRV1(*from) && len >= 3) > { > from++; > *to = *from++ << 16; > *to |= *from++; > len -= 3; > } > > and > > else if (IS_LCPRV2(*from) && len >= 4) > { > from++; > *to = *from++ << 16; > *to |= *from++ << 8; > *to |= *from++; > len -= 4; > } > > we skip first character of original string. Are we able to restore it back > from pg_wchar? I think it's possible. The first characters are defined like this: #define IS_LCPRV1(c) ((unsigned char)(c) == 0x9a || (unsigned char)(c) == 0x9b) #define IS_LCPRV2(c) ((unsigned char)(c) == 0x9c || (unsigned char)(c) == 0x9d) It seems IS_LCPRV1 is not used in any of PostgreSQL supported encodings at this point, that means there's 0 chance which existing databases include LCPRV1. So you could safely ignore it. For IS_LCPRV2, it is only used for Chinese encodings (EUC_TW and BIG5) in backend/utils/mb/conversion_procs/euc_tw_and_big5/euc_tw_and_big5.c and it is fixed to 0x9d. So you can always restore the value to 0x9d. > Also in this part of code we're shifting first byte by 16 bits: > > if (IS_LC1(*from) && len >= 2) > { > *to = *from++ << 16; > *to |= *from++; > len -= 2; > } > else if (IS_LCPRV1(*from) && len >= 3) > { > from++; > *to = *from++ << 16; > *to |= *from++; > len -= 3; > } > > Why don't we shift it by 8 bits? Because we want the first byte of LC1 case to be placed in the second byte of wchar. i.e. 0th byte: always 0 1th byte: leading byte (the first byte of the multibyte) 2th byte: always 0 3th byte: the second byte of the multibyte Note that we always assume that the 1th byte (called "leading byte": LB in short) represents the id of the character set (from 0x81 to 0xff) in MULE INTERNAL encoding. For the mapping between LB and charsets, see pg_wchar.h. > You can see my patch in this thread where I propose purely mechanical > changes in this function which make inverse conversion possible. > > ------ > With best regards, > Alexander Korotkov. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers