David Kastrup <d...@gnu.org> writes: > Davide Liessi <davide.lie...@gmail.com> writes: > >> Il giorno dom 20 mag 2018 alle ore 18:35 Davide Liessi >> <davide.lie...@gmail.com> ha scritto: >>> The file >>> >>> \version "2.19.81" >>> \header { title = "č" } >>> { b1 } >>> >>> results in a PDF with correct printed title (lowercase c with caron) >>> but wrong title field in metadata (Ċ, i.e. uppercase c with dot >>> above). >> >> On Sun, 20 May 2018 20:52:58 +0200 David Kastrup wrote: >>> Ghostscript bug when converting PostScript output to PDF. The >>> PostScript reads (pasted from less' display) >>> >>> mark /Creator (LilyPond 2.21.0) >>> /Title (<FE><FF>^A^M) >>> /DOCINFO pdfmark >>> >>> which is the correct UTF16-LE string with BOM. GhostScript however >>> converts the ^M (0x0d) into ^J (0x0a), basically converting an ASCII CR >>> to an ASCII LF. Unfortunately, we are not in the middle of ASCII here. >> >> Actually, it turns out that the behaviour of GhostScript is not wrong >> and this is probably a bug in how LilyPond produces the PostScript >> file. >> >> PostScript strings must either properly escape non-ASCII or ASCII >> non-printable bytes, e.g., as \ddd with ddd the octal representation, >> or they must be defined as a hexadecimal string (see [1], pages >> 29–31). > > Uh WHAT? To quote: > > The \ddd form may be used to include any 8-bit character constant in > a string. One, two, or three octal digits may be specified, with > high-order overflow ignored. This notation is preferred for > specifying a character outside the recommended ASCII character set > for the PostScript language, since the notation itself stays within > the standard set and thereby avoids possible difficulties in > transmitting or storing the text of the program. It is recommended > that three octal digits always be used, with leading zeros as > needed, to prevent ambiguity. The string (\0053) , for example, > contains two characters—an ASCII 5 (Control-E) followed by the digit > 3—whereas the strings (\53) and (\053) contain one character, the > ASCII character whose code is octal 53 (plus sign). > > Recommended/preferred is not at all equivalent to "must". However, one > problem indeed is that strings as such have no notion of encoding and > CR, LF, CRLF are all equivalent. So at least those bytes, when they > occur as part of UTF-16, would warrant escaping.
Tracker issue: 5422 (https://sourceforge.net/p/testlilyissues/issues/5422/) Rietveld issue: 345090043 (https://codereview.appspot.com/345090043) Issue description: Escape nul, cr, newline in PDF metadata I wasn't really aware that the strings remain pure 8-bit strings on input and the UTF16 interpretation is private business of the pdfmark command. So thanks for that pointer, allowing to tackle this fairly long-known bug. -- David Kastrup _______________________________________________ bug-lilypond mailing list bug-lilypond@gnu.org https://lists.gnu.org/mailman/listinfo/bug-lilypond