Going from the fact that unimap.c <https://github.com/n-t-roff/heirloom-doctools/blob/19c8adab5f59a2c8eba9f546fb36bdbbff86937d/troff/troff.d/unimap.c> uses an int <https://github.com/n-t-roff/heirloom-doctools/blob/f3a16e2ba0c411441fd5de5340be73674bd51307/troff/troff.d/unimap.h#L29> for storing Unicode codepoints, it might be that Heirloom uses a data type of insufficient size (an unsigned int is limited to values between 0–0xFFFF, meaning astral codepoints get truncated in memory).
In other words, it's a bug in Heirloom Doctools. On Wed, 5 Aug 2020 at 14:26, T. Kurt Bond <tkurtb...@gmail.com> wrote: > Thanks for the tip! As it turns out, I am using the OTF. > > On Wed, Aug 5, 2020 at 12:24 AM John Gardner <gardnerjo...@gmail.com> > wrote: > >> > The version I got was .ttf, not .otf >> >> I opened both the original OTF <https://dn-works.com/ufas/> and the >> FontLibrary.org >> TTF <https://fontlibrary.org/en/font/symbola> in Glyphs >> <https://glyphsapp.com/>; the OTF has 12,589 glyphs, whereas the TTF >> only has 7,956 glyphs. >> >> Try the OTF version of Symbola. In fact, *always* prefer an OTF over TTF >> when possible. >> >> On Wed, 5 Aug 2020 at 13:10, Richard Morse <pu...@mac.com> wrote: >> >>> Hm. Just for my edification, I tried a few things. >>> >>> I’m on a Mac, and I don’t know when I compiled Heirloom troff, but it >>> was a year or two ago, so something things may be different. >>> >>> I downloaded the Symbola font from fontlibrary.org. The version I got >>> was .ttf, not .otf. >>> >>> The various things that you tried did not work for me either. \[u1F0A1] >>> did work, but that’s because (according to fret, at least), that’s the >>> font’s internal name for the symbol, which is not guaranteed to be true >>> across all fonts, so you can’t really use that for a “fallback” system. >>> >>> Looking at the output of troff without going through dpost, it looks >>> like it is completely ignoring the character. I tried explicitly setting >>> LC_CTYPE to ‘en_US.UTF-8’ and ‘UTF-8’ (both in the terminal, and using the >>> .lc_ctype command), but that had no effect. >>> >>> I wonder if troff has a compiled in list of unicode characters that it >>> understands, and if you try to use one it deems invalid it just ignores it? >>> (This may be borne out by >>> https://github.com/n-t-roff/heirloom-doctools/blob/master/troff/troff.d/unimap.c >>> , but I don’t really know enough about the code to be certain.) >>> >>> Ricky >>> >>> > On Aug 4, 2020, at 10:14 PM, T. Kurt Bond <tkurtb...@gmail.com> wrote: >>> > >>> > In Emacs M-x describe-coding-system tells me the coding system for >>> saving the buffer is utf-8-unix. I don't have any LC_* environment >>> variables set, but LANG=en_US.UTF-8. >>> > >>> > I'm not very knowledgeable about the insides of Unicode fonts, >>> unfortunately. >>> > >>> > On Tue, Aug 4, 2020 at 4:27 PM Richard Morse <pu...@mac.com> wrote: >>> > Huh. I’m afraid I’m out of my depth then; you might check and see if >>> your LC_* environment variables are set to something incompatible with >>> utf-8 (or, maybe, check and make sure the file in UTF-8, not UCS-16 or >>> something if you’re on Windows), but hopefully someone with more experience >>> and knowledge will speak up… >>> > >>> > Ricky >>> > >>> > > On Aug 4, 2020, at 3:59 PM, T. Kurt Bond <tkurtb...@gmail.com> >>> wrote: >>> > > >>> > > And if I add "and explicit unicode character reference \U'1F0A1'" to >>> the >>> > > file, that character doesn't show up either. >>> > > >>> > > On Tue, Aug 4, 2020 at 2:47 PM Richard Morse <pu...@mac.com> wrote: >>> > > >>> > >> According to the Heirloom Troff manual, I think that you cannot just >>> > >> insert Unicode characters (although maybe if your LC* environment >>> variables >>> > >> are set correctly, you can?). It says: >>> > >> >>> > >>> Both nroff and troff allow references to specific Unicode >>> characters >>> > >> with the \U'X' escape sequence; >>> > >>> it causes the character at position U+X to be printed (X is a >>> > >> hexadecimal number). For troff, >>> > >>> it is required that this character is available in one of the fonts >>> > >> mounted at this point. >>> > >>> As an example, \U'20AC' prints the Euro character €. When register >>> .g is >>> > >> set to 1 Unicode >>> > >>> characters can also be accessed with \[uXXXX] where XXXX is a four >>> digit >>> > >> hexadecimal number. >>> > >> >>> > >> So I think you would need to use `\U'1F0A1'` for the character to >>> show up? >>> > >> >>> > >> Ricky >>> > >> >>> > >> >>> > >>> On Aug 4, 2020, at 12:28 PM, T. Kurt Bond <tkurtb...@gmail.com> >>> wrote: >>> > >>> >>> > >>> (The heirloom-doctools README.md >>> > >>> < >>> https://github.com/n-t-roff/heirloom-doctools/blob/master/README.md> >>> > >> says >>> > >>> to ask Heirloom doctools questions on this list.) >>> > >>> >>> > >>> I'd like to use the Symbola font in Heirloom troff. I tried the >>> > >> following: >>> > >>> >>> > >>> .do xflag 3 >>> > >>> .\" fp 5 Optima Optima-Regular ttf >>> > >>> .fp 5 Symbola Symbola otf >>> > >>> .LP >>> > >>> Here is some normal text. >>> > >>> .\" PLAYING CARD ACE OF SPACES is Unicode 0x1F0A1 >>> > >>> .ft Symbola >>> > >>> 🂡 And some normal text. ❊ >>> > >>> .ft P >>> > >>> More normal text. >>> > >>> >>> > >>> That's a literal PLAYING CARD ACE OF SPADES Unicode character at >>> the >>> > >> start >>> > >>> of the line between the two .ft requests. That character does not >>> show >>> > >> up >>> > >>> in the troff output, even through the EIGHT TEARDROP-SPOKED >>> PROPELLER >>> > >>> ASTERISK Unicode character at the end of the line *does* show up, >>> > >>> as CPSuni274A where the CPS<name> outputs the character of that >>> name. >>> > >> The >>> > >>> Symbola font is embedded in the PDF output (created from the >>> PostScript >>> > >>> output), and the text "And some normal text" and the EIGHT >>> > >> TEARDROP-SPOKED >>> > >>> PROPELLER ASTERISK Unicode character are in the Symbola font in >>> the troff >>> > >>> output. >>> > >>> >>> > >>> However, if I manually add a CPSuni1F0A1 to the troff output, >>> *that* >>> > >> character >>> > >>> *does* show up. >>> > >>> >>> > >>> Any ideas as to why the literal PLAYING CARD ACE OF SPADES Unicode >>> > >>> character in the document source is being ignored and not written >>> to the >>> > >>> troff output? >>> > >>> >>> > >>> I actually have a document that needs to use the PLAYING CARD ACE >>> OF >>> > >> SPADES >>> > >>> Unicode character. The ultimate goal is to have the Symbola font >>> used >>> > >> as a >>> > >>> fallback font, which should happen automatically in Heirloom >>> troff, since >>> > >>> it searches all the fonts when a font is missing a character, but >>> I made >>> > >>> the example use the Symbola font directly because that shows the >>> problem >>> > >>> directly. >>> > >>> >>> > >>> -- >>> > >>> T. Kurt Bond, tkurtb...@gmail.com, https://tkurtbond.github.io >>> > >> >>> > >> >>> > > >>> > > -- >>> > > T. Kurt Bond, tkurtb...@gmail.com, https://tkurtbond.github.io >>> > >>> > >>> > >>> > -- >>> > T. Kurt Bond, tkurtb...@gmail.com, https://tkurtbond.github.io >>> >>> >>> > > -- > T. Kurt Bond, tkurtb...@gmail.com, https://tkurtbond.github.io >