Follow-up Comment #7, bug #62830 (project groff): I updated my patch.
1. Font description files There is a precedent of font description file for Japanese support of groff by Japanese developers (Fumitoshi UKAI et al.) https://answers.launchpad.net/ubuntu/+source/groff/1.18.1.1-12 They defined font description named "M", "G" for Japanese. M : Japanese Mincho style G : Japanese Gothic style "M", "G" are possible candidates. But I wonder if Chinese/Korean people might feel uncomfortable. It is the reason that I proposed font description "JPM", "JPG" and CK fonts. 2. src/devices/grohtml/post-html.cpp: 2a & 2e. Encoding US-ASCII or UTF-8, -U option I tried three step of option setting: -U0 : US-ASCII : use named character references or numerical character references -U1 : UTF-8 (partial) : use named character references for known characters, UTF-8 literals for unknown characters (default) -U2 : UTF-8 (full) : use UTF-8 literals 2b. `to_utf8_string`. I have moved it to libgroff/font.cpp for trial. 2c. switching text styling properties. I have removed the function from my patch. 2d. `to_unicode`. I have renamed it to_unicode() to to_numerical_char_ref(). 3. src/devices/grops/ps.cpp 3a. I have renamed is_utf16 to is_utf16be 3b. I have replaced wchar_t by uint16_t. 3c. postscript name and encoding. For CJK fonts, encoding is always explicitly shown in PostScript font name by the structure of (Specific font name)-(style)(-(character set))-(encoding)-(direction). For example: /Ryumin-Light-Identity-H /Ryumin-Light-UniJIS-UTF16-H /Ryumin-Light-UniJIS-UTF8-H /Ryumin-Light-EUC-H /Ryumin-Light-RKSJ-H /GothicBBB-Medium-Identity-H /GothicBBB-Medium-UniJIS-UTF16-H /GothicBBB-Medium-UniJIS-UTF8-H /GothicBBB-Medium-EUC-H /GothicBBB-Medium-RKSJ-H This is a sample PostScript file: https://github.com/t-tk/PostScript-CJK-samples/blob/master/box-multi.eps Therefore, I think it is reasonable to get encoding information from PostScript font names. I guess most of PostScript interpreters do so. 4. src/include/font.h, src/libs/libgroff/font.cpp I removed "ENABLE_UCSRANGE" macro from my patch. 5. smoke tests. I replaced UTF-8 literal by octal code expression. (file #54631) _______________________________________________________ Additional Item Attachment: File name: cjk-ps-html_20230415.patch Size:86 KB <https://file.savannah.gnu.org/file/cjk-ps-html_20230415.patch?file_id=54631> _______________________________________________________ Reply to this item at: <https://savannah.gnu.org/bugs/?62830> _______________________________________________ Message sent via Savannah https://savannah.gnu.org/
