Follow-up Comment #6, bug #65232 (group groff): Hi Robin,
Sorry for the very belated (by almost 2 years) reply. [comment #5 comment #5:] > [comment #4 комментарий №4:] >> >> [comment #3 comment #3:] >>> After switching from pdfroff (-Tps) to pdfmom (-Tpdf), hyphenation suddenly >>> works fine. >> >> Glad to hear it. >> > I forgot to mention, I also had to install a new version of the > LiberationSerif fonts as the previous ones I was using, apparently weren't > fully compatible with gropdf. There were for instance some space characters > that were not displayed correctly. > >>> Moreover, it will even work with UTF8 input (-Kutf-8), even though that >>> causes other glitches. >> >> What glitches are you seeing? >> > With -Kutf-8, link texts generated by .pdfhref were sometimes missing - > seemingly random - characters. > >> The input is [converted] from UTF-8 to KOI8-R. The hyphenation patters are >> defined in terms of KOI8-R code points. The formatter (GNU _troff_) decides >> where the hyphens should go and performs the breaks. The formatter converts >> the input characters into internal data structures called "nodes" that do >> not use an externally visible encoding. Then, when generating >> device-independent output, each glyph nodes is converted to a >> device-independent special character command _if_ the output device supports >> its code point. (If it doesn't, you get a warning like "special character >> 'u0413' not defined".) >> > Are you telling me that pdfmom is actually internally converting my text to > KOI8-R after noticing I did -mru? No. A preprocessor called _preconv_(1), which was introduced to _groff_'s pipeline via the `-K` option, ran before all other preprocessors, converting the encoding of its input stream to UTF-8. > This is obviously not the case as I tried to print some Cyrillic using .tm > and it comes out as Unicode escapes as would be expected after the sources > are ran through preconv. In the past two years I undertook a large amount of GNU _troff_ refactoring (with much vetting and scrutiny from _gropdf_ author and maintainer Deri James) to address the problem embedding of non-ASCII characters in arguments to device control commands, which is the mechanism GNU _troff_ uses to get such code points into PDF metadata. Quoting our "NEWS" file for the forthcoming _groff_ 1.24.0 release: * GNU troff now performs some limited processing/transformation of the argument to the `\X` escape sequence and its counterpart `device` request, to address the requirement that some documents have to pass metadata that must encode non-ASCII characters in device extension commands. (For example, a document author may desire a document's section headings containing non-ASCII code points to appear correctly in PDF bookmarks. Further, GNU troff encodes its output page description language only in ASCII.) This change is expected to be of significance mainly to developers of output drivers for groff; groff_diff(7) describes the transformations. If you have been using `\X` or `.device` to pass ASCII data to the output driver as a device extension command and require that it remain precisely as-is, use the `\!` escape sequence or `output` request, and prefix your data with "x X ", the device-independent troff means of expressing a device extension command (see groff_out(5)). [comment #3 comment #3:] > pdfroff should perhaps be marked as deprecated or pdfmom should outright > replace it. _pdfroff_(1) was supplied by the "pdfmark" project in our "contrib" directory. Quoting "NEWS" again: * Keith Marshall's pdfmark package is no longer distributed with groff, but is now separately maintained. Please visit <https://savannah.nongnu.org/projects/groff-pdfmark> for the latest version. > From my perspective, you can close this ticket. _______________________________________________________ Reply to this item at: <https://savannah.gnu.org/bugs/?65232> _______________________________________________ Message sent via Savannah https://savannah.gnu.org/
signature.asc
Description: PGP signature
