[ https://issues.apache.org/jira/browse/FOP-2701?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
J Frank updated FOP-2701: ------------------------- Attachment: (was: fop.xconf) > Some of the latin ligatures make text not searchable in PDF > ----------------------------------------------------------- > > Key: FOP-2701 > URL: https://issues.apache.org/jira/browse/FOP-2701 > Project: FOP > Issue Type: Bug > Components: font/opentype > Affects Versions: 2.1 > Environment: Windows 10, Calibri font. > Reporter: Dan Caprioara > Priority: Major > Attachments: latn-ligatures-Antenna-House.pdf, > latn-ligatures-FOP.pdf, out.pdf, test.fo > > > This problem happens using the Calibri font, that is packed in the MS Office > suite and Windows 10. > I tested with the following text: {{file settings}}. > The resulted PDF text contains ligatures: {{(fi)le se(tti)ngs}} > Searching for {{file}} in Acrobat Reader results in the first word being > selected. This is Ok. But searching for {{set}}, or {{settings}} gives no > results. > The same example, run with Antenna House works fine, you get results when > searching for {{settings}}. > Here is the complete FO file: > {code:xml} > <?xml version="1.0" encoding="UTF-8"?> > <fo:root xmlns:fo="http://www.w3.org/1999/XSL/Format"> > <fo:layout-master-set> > <fo:simple-page-master master-name="a"> > <fo:region-body/> > </fo:simple-page-master> > </fo:layout-master-set> > <fo:page-sequence master-reference="a"> > <fo:flow flow-name="xsl-region-body"> > <fo:block font-family="Calibri" font-size="40pt">file > settings</fo:block> > </fo:flow> > </fo:page-sequence> > </fo:root> > {code} > Some considerations: > # A workaround would be to reject all the substitutions that are not part of > org.apache.fop.fonts.type1.AdobeStandardEncoding. This would leave the (fi) > ligature, but reject the (tti) one. But this seems to work only for Calibri > and not for Roboto!! > # I think there might be some issues with the font embedding, and some > substitution mapping data is lost. It is just a guess, I am not sure how PDF > deals with substitutions. > I know that setting in FO xml:lang to "en" disables the ligatures, but is not > a solution for my project. I would appreciate any suggestions. -- This message was sent by Atlassian Jira (v8.20.7#820007)