Follow-up Comment #16, bug #64061 (project groff):

The switch from using grep to sed, which seems to have caused issues, may well
have been unnecessary, given that the example pdf provided in bug #58206 was
in fact an invalid pdf, which is why pdfinfo did not handle it correctly. The
pdf reference says:-

"For text strings encoded in Unicode, the first two bytes must be 254 followed
by
255, representing the Unicode byte order marker, U+FEFF ."

>From the byte dump in that bug you can see no such BOM is present. Please also
note that later versions of pdfinfo (checked with v. 0.26.4) now handles the
mangled title correctly (must be recognising the alternating zero bytes and
"guessing" it is UTF-16 with a missing BOM."

So, if it makes things any easier we could go back to a simple grep, since if
it fails we know it is a non-standard pdf and they are using an older version
of pdfinfo.


    _______________________________________________________

Reply to this item at:

  <https://savannah.gnu.org/bugs/?64061>

_______________________________________________
Message sent via Savannah
https://savannah.gnu.org/


Reply via email to