Re: [HACKERS] [GENERAL] psql weird behaviour with charset encodings

hernan gonzalez Fri, 07 May 2010 19:31:39 -0700

Sorry about a error in my previous example (mixed width and precision).
But the conclusion is the same - it works on bytes:


#include<stdio.h>
main () {
        char s[] = "ni\xc3\xb1o"; /* 5 bytes , 4 utf8 chars */
        printf("|%*s|\n",6,s); /* this should pad a black */
        printf("|%.*s|\n",4,s); /* this should eat a char */
}

[r...@myserv tmp]#  ./a.out | od -t cx1
0000000   |       n   i 303 261   o   |  \n   |   n   i 303 261   |  \n
         7c  20  6e  69  c3  b1  6f  7c  0a  7c  6e  69  c3  b1  7c  0a


Hernán



On Fri, May 7, 2010 at 10:48 PM,  <[email protected]> wrote:
>> However, it appears that glibc's printf
> code interprets the parameter as the number of *characters* to print,
> and to determine what's a character it assumes the string is in the
> environment LC_CTYPE's encoding.
>
> Well, I myself have problems to believe that :-)
> This would be nasty... Are you sure?
>
> I couldn reproduce that.
> I made a quick test, passing a utf-8 encoded string
> (5 bytes correspoding to 4 unicode chars: "niño")
> And my glib (same Fedora 12) seems to count bytes,
> as it should.
>
> #include<stdio.h>
> main () {
> char s[] = "ni\xc3\xb1o";
> printf("|%.*s|\n",5,s);
> }
>
> This, compiled with gcc 4.4.3, run with my root locale (utf8)
> did not padded a blank. i.e. it worked as expected.
>
> Hernán

-- 
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [GENERAL] psql weird behaviour with charset encodings

Reply via email to