[
https://issues.apache.org/jira/browse/FOP-2701?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
J Frank updated FOP-2701:
-------------------------
Attachment: image-2022-05-31-15-52-01-435.png
> Some of the latin ligatures make text not searchable in PDF
> -----------------------------------------------------------
>
> Key: FOP-2701
> URL: https://issues.apache.org/jira/browse/FOP-2701
> Project: FOP
> Issue Type: Bug
> Components: font/opentype
> Affects Versions: 2.1
> Environment: Windows 10, Calibri font.
> Reporter: Dan Caprioara
> Priority: Major
> Attachments: fop.xconf, image-2022-05-31-15-50-26-058.png,
> image-2022-05-31-15-50-39-029.png, image-2022-05-31-15-52-01-435.png,
> latn-ligatures-Antenna-House.pdf, latn-ligatures-FOP.pdf, out.pdf, test.fo
>
>
> This problem happens using the Calibri font, that is packed in the MS Office
> suite and Windows 10.
> I tested with the following text: {{file settings}}.
> The resulted PDF text contains ligatures: {{(fi)le se(tti)ngs}}
> Searching for {{file}} in Acrobat Reader results in the first word being
> selected. This is Ok. But searching for {{set}}, or {{settings}} gives no
> results.
> The same example, run with Antenna House works fine, you get results when
> searching for {{settings}}.
> Here is the complete FO file:
> {code:xml}
> <?xml version="1.0" encoding="UTF-8"?>
> <fo:root xmlns:fo="http://www.w3.org/1999/XSL/Format">
> <fo:layout-master-set>
> <fo:simple-page-master master-name="a">
> <fo:region-body/>
> </fo:simple-page-master>
> </fo:layout-master-set>
> <fo:page-sequence master-reference="a">
> <fo:flow flow-name="xsl-region-body">
> <fo:block font-family="Calibri" font-size="40pt">file
> settings</fo:block>
> </fo:flow>
> </fo:page-sequence>
> </fo:root>
> {code}
> Some considerations:
> # A workaround would be to reject all the substitutions that are not part of
> org.apache.fop.fonts.type1.AdobeStandardEncoding. This would leave the (fi)
> ligature, but reject the (tti) one. But this seems to work only for Calibri
> and not for Roboto!!
> # I think there might be some issues with the font embedding, and some
> substitution mapping data is lost. It is just a guess, I am not sure how PDF
> deals with substitutions.
> I know that setting in FO xml:lang to "en" disables the ligatures, but is not
> a solution for my project. I would appreciate any suggestions.
--
This message was sent by Atlassian Jira
(v8.20.7#820007)