Hi! The issue arises before it even gets to the PostScript. If you run the following commands:
.do xflag 3 .lc_ctype UTF-8 .fp 5 Symbola Symbola ttf .ft Symbola ❊ works .sp 🂡 char .sp \U'1F0A1' uesc .sp \[u1F0A1] name .sp Through Heirloom as `troff test.roff | less`, you can see that the output is (in part, once the heading is all set up): H72000 V12000 CPSspoked8teardroppropellerstar wh11510cw h7670co h5140cr h4010ck h5560cs n12000 0 H72000 V36000 h6660cc h4490ch h5760ca h5220cr n12000 0 H72000 V60000 h6660cu h5760ce h4550cs h3920cc n12000 0 H72000 V84000 CPSu1F0A1 wh11270cn h5760ca h5220cm h8660ce n12000 0 You’ll notice that the star character, which works in the PDF, and the named character (remember that, inside the font file, u1F0A1 is the character name) both show up in ‘CPS’ statements. But the two other places you would expect to see something (from the actual character and the \U escape), it is entirely missing. You have the ‘H72000’ command, the ‘V’ command (with the vertical offset), and then it goes immediately into the latin text (seemingly without even including the space that should exist?). So for whatever reason, it isn’t seeing the character as something that should be output. Ricky > On Aug 5, 2020, at 1:30 AM, T. Kurt Bond <tkurtb...@gmail.com> wrote: > > Looking at the postscript output there is a "/uni1F0A1 9429 def" and a > "/uni1F10A" in a "/Encoding-@15@36 [...] def"; is that part of the font > machinery? (I'm sadly ignorant of PostScript, alas.) > > Looking at troff/troff.d/otf.c I see that there is a struct WGL that contains > female and male entries. At the beginning of the struct is a comment that > consists of "/* WGL4 */". Googling that led to Windows Glyph List 4. Taking > a leap, I added the unicode characters FEMALE SIGN and MALE SIGN to my test > document. Those show up fine in the final PDF output. Maybe this is > connected? At this point I suspect without much evidence that characters > that are not in the StandardStrings array, the MacintoshStrings array, or the > WGL array don't get output. Maybe. I'll have to investigate some more. > > On Tue, Aug 4, 2020 at 11:10 PM Richard Morse <pu...@mac.com> wrote: > Hm. Just for my edification, I tried a few things. > > I’m on a Mac, and I don’t know when I compiled Heirloom troff, but it was a > year or two ago, so something things may be different. > > I downloaded the Symbola font from fontlibrary.org. The version I got was > .ttf, not .otf. > > The various things that you tried did not work for me either. \[u1F0A1] did > work, but that’s because (according to fret, at least), that’s the font’s > internal name for the symbol, which is not guaranteed to be true across all > fonts, so you can’t really use that for a “fallback” system. > > Looking at the output of troff without going through dpost, it looks like it > is completely ignoring the character. I tried explicitly setting LC_CTYPE to > ‘en_US.UTF-8’ and ‘UTF-8’ (both in the terminal, and using the .lc_ctype > command), but that had no effect. > > I wonder if troff has a compiled in list of unicode characters that it > understands, and if you try to use one it deems invalid it just ignores it? > (This may be borne out by > https://github.com/n-t-roff/heirloom-doctools/blob/master/troff/troff.d/unimap.c > , but I don’t really know enough about the code to be certain.) > > Ricky > > > On Aug 4, 2020, at 10:14 PM, T. Kurt Bond <tkurtb...@gmail.com> wrote: > > > > In Emacs M-x describe-coding-system tells me the coding system for saving > > the buffer is utf-8-unix. I don't have any LC_* environment variables set, > > but LANG=en_US.UTF-8. > > > > I'm not very knowledgeable about the insides of Unicode fonts, > > unfortunately. > > > > On Tue, Aug 4, 2020 at 4:27 PM Richard Morse <pu...@mac.com> wrote: > > Huh. I’m afraid I’m out of my depth then; you might check and see if your > > LC_* environment variables are set to something incompatible with utf-8 > > (or, maybe, check and make sure the file in UTF-8, not UCS-16 or something > > if you’re on Windows), but hopefully someone with more experience and > > knowledge will speak up… > > > > Ricky > > > > > On Aug 4, 2020, at 3:59 PM, T. Kurt Bond <tkurtb...@gmail.com> wrote: > > > > > > And if I add "and explicit unicode character reference \U'1F0A1'" to the > > > file, that character doesn't show up either. > > > > > > On Tue, Aug 4, 2020 at 2:47 PM Richard Morse <pu...@mac.com> wrote: > > > > > >> According to the Heirloom Troff manual, I think that you cannot just > > >> insert Unicode characters (although maybe if your LC* environment > > >> variables > > >> are set correctly, you can?). It says: > > >> > > >>> Both nroff and troff allow references to specific Unicode characters > > >> with the \U'X' escape sequence; > > >>> it causes the character at position U+X to be printed (X is a > > >> hexadecimal number). For troff, > > >>> it is required that this character is available in one of the fonts > > >> mounted at this point. > > >>> As an example, \U'20AC' prints the Euro character €. When register .g is > > >> set to 1 Unicode > > >>> characters can also be accessed with \[uXXXX] where XXXX is a four digit > > >> hexadecimal number. > > >> > > >> So I think you would need to use `\U'1F0A1'` for the character to show > > >> up? > > >> > > >> Ricky > > >> > > >> > > >>> On Aug 4, 2020, at 12:28 PM, T. Kurt Bond <tkurtb...@gmail.com> wrote: > > >>> > > >>> (The heirloom-doctools README.md > > >>> <https://github.com/n-t-roff/heirloom-doctools/blob/master/README.md> > > >> says > > >>> to ask Heirloom doctools questions on this list.) > > >>> > > >>> I'd like to use the Symbola font in Heirloom troff. I tried the > > >> following: > > >>> > > >>> .do xflag 3 > > >>> .\" fp 5 Optima Optima-Regular ttf > > >>> .fp 5 Symbola Symbola otf > > >>> .LP > > >>> Here is some normal text. > > >>> .\" PLAYING CARD ACE OF SPACES is Unicode 0x1F0A1 > > >>> .ft Symbola > > >>> 🂡 And some normal text. ❊ > > >>> .ft P > > >>> More normal text. > > >>> > > >>> That's a literal PLAYING CARD ACE OF SPADES Unicode character at the > > >> start > > >>> of the line between the two .ft requests. That character does not show > > >> up > > >>> in the troff output, even through the EIGHT TEARDROP-SPOKED PROPELLER > > >>> ASTERISK Unicode character at the end of the line *does* show up, > > >>> as CPSuni274A where the CPS<name> outputs the character of that name. > > >> The > > >>> Symbola font is embedded in the PDF output (created from the PostScript > > >>> output), and the text "And some normal text" and the EIGHT > > >> TEARDROP-SPOKED > > >>> PROPELLER ASTERISK Unicode character are in the Symbola font in the > > >>> troff > > >>> output. > > >>> > > >>> However, if I manually add a CPSuni1F0A1 to the troff output, *that* > > >> character > > >>> *does* show up. > > >>> > > >>> Any ideas as to why the literal PLAYING CARD ACE OF SPADES Unicode > > >>> character in the document source is being ignored and not written to the > > >>> troff output? > > >>> > > >>> I actually have a document that needs to use the PLAYING CARD ACE OF > > >> SPADES > > >>> Unicode character. The ultimate goal is to have the Symbola font used > > >> as a > > >>> fallback font, which should happen automatically in Heirloom troff, > > >>> since > > >>> it searches all the fonts when a font is missing a character, but I made > > >>> the example use the Symbola font directly because that shows the problem > > >>> directly. > > >>> > > >>> -- > > >>> T. Kurt Bond, tkurtb...@gmail.com, https://tkurtbond.github.io > > >> > > >> > > > > > > -- > > > T. Kurt Bond, tkurtb...@gmail.com, https://tkurtbond.github.io > > > > > > > > -- > > T. Kurt Bond, tkurtb...@gmail.com, https://tkurtbond.github.io > > > > -- > T. Kurt Bond, tkurtb...@gmail.com, https://tkurtbond.github.io