Follow-up Comment #5, bug #62830 (project groff): Thank you for the update.
I don't think I am going to have time to properly consider this patch in the depth it requires before the groff 1.23.0 release, which I am hoping to get at least a release candidate out for before the end of the calendar year. Only 5 open Savannah tickets remain for that. I would however like to take a fresh look at this issue (and Russian localization, bug #63076) early in the groff 1.24 development cycle. How soon that begins will depend on how many urgent bug reports we get against 1.23.0. On the bright side you can expect relatively few changes to occur between now and that might make your patches difficult to apply or maintain out of the tree. Here are some thoughts I have for when I can return to this work (or for another groff developer to step up and consider discussing or addressing). 1. I was uncertain about the wisdom of shipping more font description files, but it's not like there isn't a precedent; except for the FreeEuro font, we don't ship _any_ fonts proper--just descriptions of fonts that the user must obtain elsewhere. So shipping CSH, CSS, CTH, CTS, JPG, JPM, KOG, and KOM font descriptions for the "dvi", "html", "ps", and "utf8" output devices is not without precedent. 2. src/devices/grohtml/post-html.cpp: 2a. I wonder if defaulting to ASCII for the html output device is necessary. Apparently UTF-8 is overwhelmingly the encoding used by most web pages in the world. [https://w3techs.com/technologies/details/en-utf8] 2b. The new `to_utf8_string` function might be better housed in libgroff or libdriver, if in fact there is not already a suitable function present in one of those libraries. Another possibility is that there is some gnulib module we could use here, and not have to carry our own implementation at all. 2c. I am uneasy with switching text styling properties (bold, italic) off based on the groff font _name_ in use. I think it might be better to have a new font description file directive (see groff_font(5)) that tags a font as being unstyled. Any font with this property would cause the disablement of bold and italic flags. 2d. Maybe the existing `to_unicode` function should be renamed; from the name along, it's not obvious how it is distinct from `to_utf8_string`. 2e. The `-U` option seems like a good idea, and perhaps is a flag letter we can re-use elsewhere in groff as we improve its Unicode support. 3. src/devices/grops/ps.cpp 3a. `is_utf16` should be renamed to reflect whether it uses UTF-16BE or UTF-16LE. 3b. I'm uneasy with the use of wchar_t. I think maybe we want to use int32_t, or if that can't be assumed to be available in C++98 (check this), then we should have a type alias ("typedef" [sic]) and use an int, which must be at least 32 bits on any GNU system. 3c. Again we're inferring properties from font names, it looks like: + const char *psname = f->get_internal_name(); + + if (psname && strstr(psname,"-UTF16-")) { And again I think I'd prefer a font description file property to communicate this information. 4. src/include/font.h, src/libs/libgroff/font.cpp I wouldn't have a preprocessor-based feature gate like this "ENABLE_UCSRANGE" macro. I would enable the feature for all builds. This will give it exercise and help uncover bugs. 5. Thank you for the 'dvi' and 'ps' device smoke tests! It might be necessary to rewrite the UTF-8-encoded literals for CJK glyphs as octal escape sequences to the printf(1) command for portability, sadly. Surprising things go wrong on *BSD and macOS systems. I emphasize that I don't require any changes to be made at this time to address the above points; they are for consideration and discussion by developers (including the patch author!) before any revision occurs. I simply wanted to get these points down while they were fresh in my mind. _______________________________________________________ Reply to this item at: <https://savannah.gnu.org/bugs/?62830> _______________________________________________ Message sent via Savannah https://savannah.gnu.org/