[bug #65232] Russian hyphenation is not working

G. Branden Robinson Sat, 13 Dec 2025 07:13:44 -0800

Follow-up Comment #6, bug #65232 (group groff):

Hi Robin,


Sorry for the very belated (by almost 2 years) reply.

[comment #5 comment #5:]
> [comment #4 комментарий №4:]
>> 
>> [comment #3 comment #3:]
>>> After switching from pdfroff (-Tps) to pdfmom (-Tpdf), hyphenation suddenly
>>> works fine.
>> 
>> Glad to hear it.
>> 
> I forgot to mention, I also had to install a new version of the
> LiberationSerif fonts as the previous ones I was using, apparently weren't
> fully compatible with gropdf. There were for instance some space characters
> that were not displayed correctly.
> 
>>> Moreover, it will even work with UTF8 input (-Kutf-8), even though that
>>> causes other glitches.
>> 
>> What glitches are you seeing?
>> 
> With -Kutf-8, link texts generated by .pdfhref were sometimes missing -
> seemingly random - characters.
> 
>> The input is [converted] from UTF-8 to KOI8-R.  The hyphenation patters are
>> defined in terms of KOI8-R code points.  The formatter (GNU _troff_) decides
>> where the hyphens should go and performs the breaks.  The formatter converts
>> the input characters into internal data structures called "nodes" that do
>> not use an externally visible encoding.  Then, when generating
>> device-independent output, each glyph nodes is converted to a
>> device-independent special character command _if_ the output device supports
>> its code point.  (If it doesn't, you get a warning like "special character
>> 'u0413' not defined".)
>> 
> Are you telling me that pdfmom is actually internally converting my text to
> KOI8-R after noticing I did -mru?

No.  A preprocessor called _preconv_(1), which was introduced to _groff_'s
pipeline via the `-K` option, ran before all other preprocessors, converting
the encoding of its input stream to UTF-8. 

> This is obviously not the case as I tried to print some Cyrillic using .tm
> and it comes out as Unicode escapes as would be expected after the sources
> are ran through preconv.

In the past two years I undertook a large amount of GNU _troff_ refactoring
(with much vetting and scrutiny from _gropdf_ author and maintainer Deri
James) to address the problem embedding of non-ASCII characters in arguments
to device control commands, which is the mechanism GNU _troff_ uses to get
such code points into PDF metadata.

Quoting our "NEWS" file for the forthcoming _groff_ 1.24.0 release:


*  GNU troff now performs some limited processing/transformation of the
   argument to the `\X` escape sequence and its counterpart `device`
   request, to address the requirement that some documents have to pass
   metadata that must encode non-ASCII characters in device extension
   commands.  (For example, a document author may desire a document's
   section headings containing non-ASCII code points to appear correctly
   in PDF bookmarks.  Further, GNU troff encodes its output page
   description language only in ASCII.)  This change is expected to be
   of significance mainly to developers of output drivers for groff;
   groff_diff(7) describes the transformations.  If you have been using
   `\X` or `.device` to pass ASCII data to the output driver as a device
   extension command and require that it remain precisely as-is, use the
   `\!` escape sequence or `output` request, and prefix your data with
   "x X ", the device-independent troff means of expressing a device
   extension command (see groff_out(5)).


[comment #3 comment #3:]
> pdfroff should perhaps be marked as deprecated or pdfmom should outright
> replace it.

_pdfroff_(1) was supplied by the "pdfmark" project in our "contrib" directory.
 Quoting "NEWS" again:


*  Keith Marshall's pdfmark package is no longer distributed with groff,
   but is now separately maintained.  Please visit
   <https://savannah.nongnu.org/projects/groff-pdfmark> for the latest
   version.


> From my perspective, you can close this ticket.


    _______________________________________________________

Reply to this item at:

  <https://savannah.gnu.org/bugs/?65232>

_______________________________________________
Message sent via Savannah
https://savannah.gnu.org/

signature.asc
Description: PGP signature

[bug #65232] Russian hyphenation is not working

Reply via email to