Re: [Fontinst] A question about reglyphfont

Lars Hellström Thu, 08 Oct 2009 04:00:11 -0700

Pierre MacKay skrev:

I am following this discussion with great interest, but I wonder whetherthe problems of using a font with the Adobe Expert Character set nameshave been looked at.
Adobe seems (it is difficult to be sure of the causes) to have set upAcrobat reader 8 and 9 so that they trap names like Asmall . . .Zsmall, the old-style figures and the ff ligatures. Unless I use theon-line distiller at Acrobat.com, I get PDFs in which all charactersfrom the Expert character set are replaced by blank space.

*What do you mean by* "are replaced by blank space", exactly? Don'tthey show up when you view the document, are they missing when youprint the document, or are they missing when you copy text from thedocument?

Actually,not all, because the accented glyphs in the range E0--FF come through.
It is, of course, possible to bypass the problem by using somethingother than Reader 8 or 9.

/me typically uses Reader 5 (unless the document has compressed objectstreams), because the GUI quality seems (IMHO) to be a decreasingfunction of version number. ;-)

Reader 6 and 7 did not have the problem, soit is something introduced by Adobe in the later versions of Reader. Isubmitted a bug report over the problem when Reader 8 came out. It wasacknowledged, and I was told that it would be corrected "in the nextmajor release." It clearly has not been corrected. One of the worstaspects of this bug is that it destroys the archival value of all PDFsdistilled before the arrival of Reader 8. (I don't know exactly whenthe change was made in Acrobat Distiller, but I suspect that it wascontemporaneous with Reader 8).
A comparison of output from the online distiller at Adobe.com and outputfrom Ghostscript 8.63 shows that in the Adobe distiller, any font withthe names Asmall . . . Zsmall is treated to two consecutiveoperations, the first of which is associated with "/Tounicode." I havebeen unable to find out what /Tounicode does. Does it recode the entireAdobe Expert Character set into a page in the Private use sector?

If the difference involves /ToUnicode, then it should only be Copy textand Search operations that misbehave, right? (IMO, that wouldn'tdestroy the archival value of PDFs, but nor would bugs specific to onePDF reader.)

FYI, the /ToUnicode entry in a PDF font dictionary sets up a mappingfrom slots in the font to Unicode code points; the PDF1.5 specdescribes this in Section 5.9 "Extraction of Text Content". Providingsuch a map explicitly is really the only general way to assign aninterpretation to the text in a PDF, but originally Acrobat Reader alsohad heuristics for guessing an interpretation from the glyph names. Itis possible that the change in AR8 you observed was merely a retirementof some of these heuristics, so that "Asmall" is no longer on the listof known names, even though "a" might still be.

Fontinst has had the ability to generate /ToUnicode CMaps since v1.928(or thereabout), through the \etxtocmap command. Getting PDF generatorsto put it in at the right place is however not so straightforward;pdfTeX only gives such access to font dictionaries from the TeX side(whereas the mapfile would be more useful) and it only works for fontsthat have been \font'defed (hence not for base fonts of virtual fonts).OTOH, recent pdfTeXes seem to have some built-in heuristics of theirown for generating ToUnicode data; I haven't studied those in detail.Nor do I know what gs or dvipdfmx can currently do in this respect.

There is also the possibility of putting /ActualText data directly intothe page content stream by using pdf: \specials. I've recentlyconsidered adding support for this to fontinst (the specials would beembedded into the VF; I have figured out how to do it elegantly), butthat's probably only appropriate for faked glyphs (e.g. Euro from C andtwo rules). See also the accsupp LaTeX package.


Lars Hellström

Re: [Fontinst] A question about reglyphfont

Reply via email to