[ https://issues.apache.org/jira/browse/PDFBOX-2009?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Philip Helger updated PDFBOX-2009: ---------------------------------- Description: When having a text print operation like <FEFF21222193219103B103A003A6> Tj than the PDFStreamEngine.processEncodedText does not handle this correctly. Am I correct that if a BOM was determined, the codelength should be set to 2 (and not be changed)? Or should alternatively simply the BOM be skipped? It may be related to PDFBOX-920 was: When having a text print operation like <FEFF21222193219103B103A003A6> Tj than the PDFStreamEngine.processEncodedText does not handle this correctly. Am I correct that if a BOM was determined, the codelength should be set to 2 (and not be changed)? Or should alternatively simply the BOM be skipped? > PDFStreamEngine.processEncodedText incorrectly handling UTF-16 text with BOM > FEFF > --------------------------------------------------------------------------------- > > Key: PDFBOX-2009 > URL: https://issues.apache.org/jira/browse/PDFBOX-2009 > Project: PDFBox > Issue Type: Bug > Components: Text extraction > Affects Versions: 2.0.0 > Reporter: Philip Helger > Fix For: 2.0.0 > > > When having a text print operation like > <FEFF21222193219103B103A003A6> Tj > than the PDFStreamEngine.processEncodedText does not handle this correctly. > Am I correct that if a BOM was determined, the codelength should be set to 2 > (and not be changed)? Or should alternatively simply the BOM be skipped? > It may be related to PDFBOX-920 -- This message was sent by Atlassian JIRA (v6.2#6252)