Printing UTF-8 in C
I need to print several Hebrew characters (UTF-8) to the terminal. My locale is set to he_IL.UTF-8 so it shows Hebrew on the terminal, however printing from C gives me Chinese characters. My question is how to print one character such as 'א' to the terminal. -- Ori Idan ___ Linux-il mailing list Linux-il@cs.huji.ac.il http://mailman.cs.huji.ac.il/mailman/listinfo/linux-il
Re: Printing UTF-8 in C
From: Ori Idan o...@helicontech.co.il Date: Sun, 12 Jan 2014 20:34:07 +0200 I need to print several Hebrew characters (UTF-8) to the terminal. My locale is set to he_IL.UTF-8 so it shows Hebrew on the terminal, however printing from C gives me Chinese characters. My question is how to print one character such as 'א' to the terminal. Is the C source stored on disk in UTF-8 encoding? ___ Linux-il mailing list Linux-il@cs.huji.ac.il http://mailman.cs.huji.ac.il/mailman/listinfo/linux-il
Re: Printing UTF-8 in C
On 12.01.2014 20:34, Ori Idan wrote: I need to print several Hebrew characters (UTF-8) to the terminal. My locale is set to he_IL.UTF-8 so it shows Hebrew on the terminal, however printing from C gives me Chinese characters. My question is how to print one character such as 'א' to the terminal. Where does the character come from, is it a verbatim literal in the source? Unfortunately, this is not portable, even though gcc would support it. See the docs for GNU CPP, section Implementation details, Implementation-defined behavior. If you want portable solution, you must escape the chars, best done with something like #define ALEPH \x... to concatenate into a larger literal string. Here is a nice stackoverflow thread with sample code that reads and outputs utf-8 from C, w/o any literals in it: http://stackoverflow.com/questions/1373463/handling-special-characters-in-c-utf-8-encoding . V. ___ Linux-il mailing list Linux-il@cs.huji.ac.il http://mailman.cs.huji.ac.il/mailman/listinfo/linux-il
Re: Printing UTF-8 in C
Writing hebrew to the terminal is a bad idea because terminals do not support BiDi reordering. That said, doing cat small-hello.utf8[1] works for me in gnome-term (though it is reversed). No special environment variables were defined. Regards, Dov [1] http://paps.sourceforge.net/small-hello.utf8 On Sun, Jan 12, 2014 at 8:34 PM, Ori Idan o...@helicontech.co.il wrote: I need to print several Hebrew characters (UTF-8) to the terminal. My locale is set to he_IL.UTF-8 so it shows Hebrew on the terminal, however printing from C gives me Chinese characters. My question is how to print one character such as 'א' to the terminal. -- Ori Idan ___ Linux-il mailing list Linux-il@cs.huji.ac.il http://mailman.cs.huji.ac.il/mailman/listinfo/linux-il ___ Linux-il mailing list Linux-il@cs.huji.ac.il http://mailman.cs.huji.ac.il/mailman/listinfo/linux-il
Re: Printing UTF-8 in C
Hi Dov, On Sun, Jan 12, 2014 at 08:53:38PM +0200, Dov Grobgeld wrote: Writing hebrew to the terminal is a bad idea because terminals do not support BiDi reordering. That said, doing cat small-hello.utf8[1] works for me in gnome-term (though it is reversed). No special environment variables were defined. But Ori has specifically asked about sending just one character to terminal. cat treats everything like binary data. baruch On Sun, Jan 12, 2014 at 8:34 PM, Ori Idan o...@helicontech.co.il wrote: I need to print several Hebrew characters (UTF-8) to the terminal. My locale is set to he_IL.UTF-8 so it shows Hebrew on the terminal, however printing from C gives me Chinese characters. My question is how to print one character such as 'א' to the terminal. -- Ori Idan -- http://baruch.siach.name/blog/ ~. .~ Tk Open Systems =}ooO--U--Ooo{= - bar...@tkos.co.il - tel: +972.2.679.5364, http://www.tkos.co.il - ___ Linux-il mailing list Linux-il@cs.huji.ac.il http://mailman.cs.huji.ac.il/mailman/listinfo/linux-il
Re: Printing UTF-8 in C
On Sun, Jan 12, 2014 at 9:02 PM, Baruch Siach bar...@tkos.co.il wrote: Hi Dov, On Sun, Jan 12, 2014 at 08:53:38PM +0200, Dov Grobgeld wrote: Writing hebrew to the terminal is a bad idea because terminals do not support BiDi reordering. That said, doing cat small-hello.utf8[1] works for me in gnome-term (though it is reversed). No special environment variables were defined. But Ori has specifically asked about sending just one character to terminal. cat treats everything like binary data. baruch I don't care at this stage about bidi. I still could not find out how to print even once character, I tried printf and putwchar. -- Ori Idan On Sun, Jan 12, 2014 at 8:34 PM, Ori Idan o...@helicontech.co.il wrote: I need to print several Hebrew characters (UTF-8) to the terminal. My locale is set to he_IL.UTF-8 so it shows Hebrew on the terminal, however printing from C gives me Chinese characters. My question is how to print one character such as 'א' to the terminal. -- Ori Idan -- http://baruch.siach.name/blog/ ~. .~ Tk Open Systems =}ooO--U--Ooo{= - bar...@tkos.co.il - tel: +972.2.679.5364, http://www.tkos.co.il - ___ Linux-il mailing list Linux-il@cs.huji.ac.il http://mailman.cs.huji.ac.il/mailman/listinfo/linux-il
Re: Printing UTF-8 in C
The most unixy way is to treat everything as binary UTF-8 and then forget about encodings. The following program works just fine: #include stdio.h int main() { printf(Hello שלום!\n); } Compile with: cc -o hello hello.c ./hello Hello שלום! (Though שלום is inversed in the terminal). On Sun, Jan 12, 2014 at 9:02 PM, Baruch Siach bar...@tkos.co.il wrote: Hi Dov, On Sun, Jan 12, 2014 at 08:53:38PM +0200, Dov Grobgeld wrote: Writing hebrew to the terminal is a bad idea because terminals do not support BiDi reordering. That said, doing cat small-hello.utf8[1] works for me in gnome-term (though it is reversed). No special environment variables were defined. But Ori has specifically asked about sending just one character to terminal. cat treats everything like binary data. baruch On Sun, Jan 12, 2014 at 8:34 PM, Ori Idan o...@helicontech.co.il wrote: I need to print several Hebrew characters (UTF-8) to the terminal. My locale is set to he_IL.UTF-8 so it shows Hebrew on the terminal, however printing from C gives me Chinese characters. My question is how to print one character such as 'א' to the terminal. -- Ori Idan -- http://baruch.siach.name/blog/ ~. .~ Tk Open Systems =}ooO--U--Ooo{= - bar...@tkos.co.il - tel: +972.2.679.5364, http://www.tkos.co.il - ___ Linux-il mailing list Linux-il@cs.huji.ac.il http://mailman.cs.huji.ac.il/mailman/listinfo/linux-il
Re: Printing UTF-8 in C
On Sun, Jan 12, 2014 at 9:26 PM, Dov Grobgeld dov.grobg...@gmail.comwrote: The most unixy way is to treat everything as binary UTF-8 and then forget about encodings. The following program works just fine: #include stdio.h int main() { printf(Hello שלום!\n); } Compile with: cc -o hello hello.c ./hello Hello שלום! (Though שלום is inversed in the terminal). That works, but I need one character such as 'א' to be printed and to be able to print 'ב' as 'א' + 1 Does someone have any idea how to do it? -- Ori Idan On Sun, Jan 12, 2014 at 9:02 PM, Baruch Siach bar...@tkos.co.il wrote: Hi Dov, On Sun, Jan 12, 2014 at 08:53:38PM +0200, Dov Grobgeld wrote: Writing hebrew to the terminal is a bad idea because terminals do not support BiDi reordering. That said, doing cat small-hello.utf8[1] works for me in gnome-term (though it is reversed). No special environment variables were defined. But Ori has specifically asked about sending just one character to terminal. cat treats everything like binary data. baruch On Sun, Jan 12, 2014 at 8:34 PM, Ori Idan o...@helicontech.co.il wrote: I need to print several Hebrew characters (UTF-8) to the terminal. My locale is set to he_IL.UTF-8 so it shows Hebrew on the terminal, however printing from C gives me Chinese characters. My question is how to print one character such as 'א' to the terminal. -- Ori Idan -- http://baruch.siach.name/blog/ ~. .~ Tk Open Systems =}ooO--U--Ooo{= - bar...@tkos.co.il - tel: +972.2.679.5364, http://www.tkos.co.il - ___ Linux-il mailing list Linux-il@cs.huji.ac.il http://mailman.cs.huji.ac.il/mailman/listinfo/linux-il
Re: Printing UTF-8 in C
Create a list of all hebrew characters and dereference the list according to the index of the character. const char **alefbet = { \327\220, \327\221, : } printf(%s\n, alefbet[index]); // For index in 0..26 Am I missing something? Dov On Sun, Jan 12, 2014 at 9:29 PM, Ori Idan o...@helicontech.co.il wrote: On Sun, Jan 12, 2014 at 9:26 PM, Dov Grobgeld dov.grobg...@gmail.comwrote: The most unixy way is to treat everything as binary UTF-8 and then forget about encodings. The following program works just fine: #include stdio.h int main() { printf(Hello שלום!\n); } Compile with: cc -o hello hello.c ./hello Hello שלום! (Though שלום is inversed in the terminal). That works, but I need one character such as 'א' to be printed and to be able to print 'ב' as 'א' + 1 Does someone have any idea how to do it? -- Ori Idan On Sun, Jan 12, 2014 at 9:02 PM, Baruch Siach bar...@tkos.co.il wrote: Hi Dov, On Sun, Jan 12, 2014 at 08:53:38PM +0200, Dov Grobgeld wrote: Writing hebrew to the terminal is a bad idea because terminals do not support BiDi reordering. That said, doing cat small-hello.utf8[1] works for me in gnome-term (though it is reversed). No special environment variables were defined. But Ori has specifically asked about sending just one character to terminal. cat treats everything like binary data. baruch On Sun, Jan 12, 2014 at 8:34 PM, Ori Idan o...@helicontech.co.il wrote: I need to print several Hebrew characters (UTF-8) to the terminal. My locale is set to he_IL.UTF-8 so it shows Hebrew on the terminal, however printing from C gives me Chinese characters. My question is how to print one character such as 'א' to the terminal. -- Ori Idan -- http://baruch.siach.name/blog/ ~. .~ Tk Open Systems =}ooO--U--Ooo{= - bar...@tkos.co.il - tel: +972.2.679.5364, http://www.tkos.co.il - ___ Linux-il mailing list Linux-il@cs.huji.ac.il http://mailman.cs.huji.ac.il/mailman/listinfo/linux-il
Re: Printing UTF-8 in C
You may want to review the following StackOverflow item: http://stackoverflow.com/questions/4607413/c-library-to-convert-unicode-code-points-to-utf8 One answer describes how to do it yourself. Another answer uses the iconv library. On Sun, 2014-01-12 at 21:29 +0200, Ori Idan wrote: On Sun, Jan 12, 2014 at 9:26 PM, Dov Grobgeld dov.grobg...@gmail.com wrote: The most unixy way is to treat everything as binary UTF-8 and then forget about encodings. The following program works just fine: #include stdio.h int main() { printf(Hello שלום!\n); } Compile with: cc -o hello hello.c ./hello Hello שלום! (Though שלום is inversed in the terminal). That works, but I need one character such as 'א' to be printed and to be able to print 'ב' as 'א' + 1 Does someone have any idea how to do it? -- cal 09 1752 My own blog is at http://www.zak.co.il/tddpirate/ My opinions, as expressed in this E-mail message, are mine alone. They do not represent the official policy of any organization with which I may be affiliated in any way. WARNING TO SPAMMERS: at http://www.zak.co.il/spamwarning.html ___ Linux-il mailing list Linux-il@cs.huji.ac.il http://mailman.cs.huji.ac.il/mailman/listinfo/linux-il
Re: Printing UTF-8 in C
From: Ori Idan o...@helicontech.co.il Date: Sun, 12 Jan 2014 20:46:50 +0200 Is the C source stored on disk in UTF-8 encoding? Yes but what's the difference? latin characters in UTF-8 are the same in latin1 encoding and UTF-8 No, Latin-1 and UTF-8 encodings for Latin characters are different. You are mixing UTF-8 encoding with Unicode codepoints that UTF-8 encodes. ___ Linux-il mailing list Linux-il@cs.huji.ac.il http://mailman.cs.huji.ac.il/mailman/listinfo/linux-il