Package: ghostscript Version: 9.56.1~dfsg-1 Severity: normal Tags: upstream Forwarded: https://bugs.ghostscript.com/show_bug.cgi?id=705246
When an input PDF file has a character like U+2308 LEFT CEILING and has a ToUnicode CMap, the new PDF interpreter may yield an incorrect ToUnicode CMap in the generated PDF. The issue seems to be limited to characters like math symbols (in the same font as the problematic character?), though; letters, including accented ones, do not seem to be affected. Here's a shell script used for some testing: ──────────────────────────────────────────────────────────────────────── #!/bin/sh set -e out() { echo -n "$i$j ($1):" printf " %s" $(pdftotext chartest9$i$j$2.pdf - | tr -d '\f') echo } for i in a b do for j in 0 1 do cat <<'EOF' | sed "s/:$i/\\\\lceil/" | \ sed "s/:a//" | \ sed "s/J/$j/" > chartest9.tex \documentclass{article} \usepackage[T1]{fontenc} \usepackage{lmodern} \pdfgentounicode=J \begin{document} \thispagestyle{empty} $\in:a$ \end{document} EOF pdflatex chartest9.tex > /dev/null mv chartest9.pdf chartest9$i$j.pdf out "pdfTeX" "" ps2pdf14 chartest9$i$j.pdf chartest9$i$j-new.pdf out "gs new" "-new" ps2pdf14 -dNEWPDF=false chartest9$i$j.pdf chartest9$i$j-old.pdf out "gs old" "-old" done done ──────────────────────────────────────────────────────────────────────── See the upstream bug for the obtained PDF files. 4 kinds of PDF inputs are tested (a0, a1, b0, b1), where * a: the content corresponds to "∈⌈" (ELEMENT OF + LEFT CEILING) * b: the content corresponds to "∈" (ELEMENT OF) * 0: \pdfgentounicode=0 (pdfTeX does not generate a ToUnicode CMap) * 1: \pdfgentounicode=1 (pdfTeX generates a ToUnicode CMap) I've compared (see above script for details): * pdfTeX: PDF file generated by pdfTeX from TeX Live 2022 * gs new: PDF file obtained with the new PDF interpreter (default) * gs old: PDF file obtained with the old PDF interpreter (dNEWPDF=false) I've done the tests with the ghostscript 9.56.1~dfsg-1 Debian package. If LEFT CEILING is not present, Ghostscript does not generate a ToUnicode CMap in all of these cases, which is fine. But if this character is present: 1. With the old PDF interpreter, Ghostscript generates a correct ToUnicode CMap. 2. With the new PDF interpreter and no input ToUnicode CMap, Ghostscript does not generate a ToUnicode CMap (the only practical issue is that one cannot get unual characters like LEFT CEILING, but this is not worse than what TeX Live 2022 can yield in any case). 3. With the new PDF interpreter and an input ToUnicode CMap like the one from TeX Live 2022, Ghostscript generates an incorrect ToUnicode CMap, which prevents one from getting usual math characters such as ELEMENT OF. The results, where I've added ToUnicode CMap information (which I have obtained with "qpdf --stream-data=uncompress" on these PDF files): a0 (pdfTeX): ∈d (no CMap) a0 (gs new): ∈d (no CMap) a0 (gs old): ∈⌈ (CMap old) a1 (pdfTeX): ∈d (CMap 1) a1 (gs new): (CMap 1-new) a1 (gs old): ∈⌈ (CMap old) b0 (pdfTeX): ∈ (no CMap) b0 (gs new): ∈ (no CMap) b0 (gs old): ∈ (no CMap) b1 (pdfTeX): ∈ (CMap 1) b1 (gs new): ∈ (no CMap) b1 (gs old): ∈ (no CMap) with the following ToUnicode CMaps: CMap old: ──────────────────────────────────────── begincmap /CMapType 2 def /CMapName/R11 def 1 begincodespacerange <00><ff> endcodespacerange 2 beginbfrange <32><32><2208> <64><64><2308> endbfrange endcmap ──────────────────────────────────────── CMap 1: ──────────────────────────────────────── begincmap /CIDSystemInfo << /Registry (TeX) /Ordering (lmsy10-lm-mathsy) /Supplement 0 >> def /CMapName /TeX-lmsy10-lm-mathsy-0 def /CMapType 2 def 1 begincodespacerange <00> <FF> endcodespacerange 0 beginbfrange endbfrange 0 beginbfchar endbfchar endcmap ──────────────────────────────────────── CMap 1-new: ──────────────────────────────────────── begincmap /CMapType 2 def /CMapName/R11 def 1 begincodespacerange <00><ff> endcodespacerange 2 beginbfrange <32><32><00> <64><64><00> endbfrange endcmap ──────────────────────────────────────── -- System Information: Debian Release: bookworm/sid APT prefers unstable-debug APT policy: (500, 'unstable-debug'), (500, 'stable-updates'), (500, 'stable-security'), (500, 'unstable'), (500, 'testing'), (500, 'stable'), (1, 'experimental') Architecture: amd64 (x86_64) Kernel: Linux 5.17.0-1-amd64 (SMP w/8 CPU threads; PREEMPT) Kernel taint flags: TAINT_PROPRIETARY_MODULE, TAINT_OOT_MODULE, TAINT_UNSIGNED_MODULE Locale: LANG=POSIX, LC_CTYPE=C.UTF-8 (charmap=UTF-8), LANGUAGE not set Shell: /bin/sh linked to /bin/dash Init: systemd (via /run/systemd/system) LSM: AppArmor: enabled Versions of packages ghostscript depends on: ii libc6 2.33-7 ii libgs9 9.56.1~dfsg-1 ghostscript recommends no packages. Versions of packages ghostscript suggests: ii ghostscript-x 9.56.1~dfsg-1 -- no debconf information -- Vincent Lefèvre <vinc...@vinc17.net> - Web: <https://www.vinc17.net/> 100% accessible validated (X)HTML - Blog: <https://www.vinc17.net/blog/> Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon)