iconv looks busted to me.

Can you convert using iconv() UTF-8 to UCS-4?  If you can't, then its
entirely an iconv() problem.

If you're working with wchar_t (which is *not* guaranteed to be UCS-4, or
anything else), you *must* use the libc wide character functions.  The only
reason to use iconv to work with UCS-4 strings is because that's an
external format; do not use iconv when converting back and forth to
wchar_t.  (It *should* work if your locale is UTF-8, but its technically
incorrect; the details of how wchar_t's are encoded is an operating system
implementation detail.)

Out of curiosity, why are you working with UCS-4/UTF-32?  I rarely see this
content.


On Fri, Aug 15, 2014 at 9:18 AM, Alexander Pyhalov <[email protected]> wrote:

> On 08/15/2014 19:20, Garrett D'Amore wrote:
>
>> To get from wchar_t to multibyte string, you do wcstombs().  Note that the
>> resulting output will only be UTF-8 if the locale is *.UTF-8.  (If you're
>> in a different locale, the multi-byte-string may well be in a different
>> encoding.
>>
>> Visually, your code above looks OK, but I'm not sure what's wrong.  Is it
>> a
>> bug in libiconv?   My guess is so, since it seems that even if the
>> encoding
>> was invalid, it shouldn't just dump core.  Instead it should return an
>> error such as EILSEQ.
>>
>> Admittedly, I have less than perfect confidence in our libiconv
>> implementation.
>>
>>
> OK,  wcstombs works. As application locale can be not UTF-8, I'd like to
> use iconv later to convert result from application locale to UTF-8.
> And receive one more core dump...
>
> int main()
> {
>   char out[1024],res[1024];
>   int ret;
>   wchar_t *in;
>   size_t inlen,outlen;
>   char *locale;
>   char *second_part;
>   size_t outsz=sizeof(out);
>   char *enc;
>   iconv_t hdl;
>
>   in=L"Привет!";
>
>   locale=setlocale(LC_ALL,"");
>   second_part=strchr(locale,'.');
>   if(second_part){
>     enc=strdup(second_part+1);
>   } else {
>     enc=strdup(locale);
>   }
>
>   if(enc){
>         printf("enc is %s\n",enc);
>         hdl = iconv_open("UTF-8", enc);
>         if(hdl<0) {
>                 perror("iconv_open");
>                 return -1;
>         }
>          ret=wcsrtombs((char*)out,&in,sizeof(out),NULL);
>          printf("ret is %d\n",ret);
>          out[ret+1]='\0';
>          printf("result is %s\n",out);
>          ret=strlen(out);
>          iconv(hdl,&out,&ret,&res,&outlen);
>          printf("%s\n",res);
>         free(enc);
>   }
>   return 0;
> }
>
> $ ./test_utf8_mbchar
> enc is UTF-8
> ret is 13
> result is Привет!
> Segmentation Fault (core dumped)
> ...
>
> Core was generated by `./test_utf8_mbchar'.
> Program terminated with signal 11, Segmentation fault.
> #0  0xfedd06d5 in _icv_iconv () from /usr/lib/iconv/UTF-8%UTF-8.so
> (gdb) bt
> #0  0xfedd06d5 in _icv_iconv () from /usr/lib/iconv/UTF-8%UTF-8.so
>
> #1  0xfee7dc17 in iconv () from /lib/libc.so.1
> #2  0x080510c0 in main ()
>
>
>
> --
> Best regards,
> Alexander Pyhalov,
> system administrator of Computer Center of Southern Federal University
>



-------------------------------------------
illumos-discuss
Archives: https://www.listbox.com/member/archive/182180/=now
RSS Feed: https://www.listbox.com/member/archive/rss/182180/21175430-2e6923be
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=21175430&id_secret=21175430-6a77cda4
Powered by Listbox: http://www.listbox.com

Reply via email to