Hi Nicolas, At Sun, 22 Dec 2013 09:20:57 +0100, Nicolas Goaziou wrote: > > Yasushi SHOJI <ya...@atmark-techno.com> writes: > > > Ah, OK. Those coding keys are for the back-ends to select proper > > strings, not for the string encoding. > > This is also related to string encoding. You will get garbage if you > insert a string containing characters outside the encoding you use to > save the file, won't you?
Right. However, as you described below, the output file's encoding is not determined by the language option, but by the current buffer coding system, org-export-coding-system, or back-end specific variable, ie org-html-coding-system. That means that whenever your-choice-of-coding-system can handle the "characters" for the translation string, meaning that the coding system has code points for all of the characters of the translation string and Emacs can convert between them, it is free to use any character for the output, right? If one wants to use French, she sets the current buffer coding system to any coding system which can handle French and set the language option as "fr". In that case, her/his org buffer should already have French characters in it, there is no need for translation string to be strictly ASCII only when you export with plain / ascii, no? I just don't see any use case. I must have missed something here. Please enlighten me. BTW, Here is a part of quick test I've done. source lang exporter o-e-c-s o-h-c-s target buffer target file --------------------------------------------------------------------------------------------------------------------------- euc-jp ja plain/ascii nil - euc-jp euc-jp euc-jp ja plain/utf-8 nil - euc-jp euc-jp euc-jp ja plain/ascii utf-8 - euc-jp utf-8 euc-jp ja plain/utf-8 utf-8 - euc-jp utf-8 euc-jp ja html nil utf-8 euc-jp w/ charset=utf-8 utf-8 euc-jp ja html nil euc-jp euc-jp w/ charset=euc-jp euc-jp w/ charset=euc-jp --------------------------------------------------------------------------------------------------------------------------- euc-jp fr plain/ascii nil - euc-jp w/ fr trans euc-jp w/ fr translation euc-jp fr plain/utf-8 nil - euc-jp w/ fr trans & utf-8 decoration euc-jp w/ fr trans & utf-8 decoration All major encoding for Japanese, euc-jp, iso2022, shift-jis, and utf-8 can handle the current translation string without problem. So I'm assuming that encoding for other language must have some problem. > > Then, is there any restriction with HTML back-ends? Why does it need > > numeric character reference instead of just plain characters, if the > > coding system is not a concern? > > See above. You may want to save your html file in a different encoding > than UTF-8. IIUC, numeric character reference are more generic. I agree that numeric reference is more generic. As I've just checked, HTML even allows us to put characters outside of the current content charset with numeric reference! # italian text exported as html with "ja" language option. even if # html has iso-8859-1 as charset, web browser shows japanese chars. > > If my understanding is ok, all entries of Japanese translation should > > have :default instead of :utf-8. > > :default instead of :utf-8 means Org will use these translations also > for LaTeX, HTML and ASCII export. If you think that is correct, then we > can switch to :default, indeed. Since I don't use LaTeX, I have no idea about it. I hope some LaTeX user help me here. I'm checking exporters I use, including plain text and html, but it doesn't seems to go wrong. But I really needs some help for other back-ends. I'll post a patch for testing if anyone's interested in. Thanks, -- yashi