[jira] [Updated] (PDFBOX-2009) PDFStreamEngine.processEncodedText incorrectly handling UTF-16 text with BOM FEFF

Philip Helger (JIRA) Wed, 02 Apr 2014 10:42:44 -0700

     [ 
https://issues.apache.org/jira/browse/PDFBOX-2009?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Philip Helger updated PDFBOX-2009:
----------------------------------

    Description: 
When having a text print operation like
<FEFF21222193219103B103A003A6> Tj
than the PDFStreamEngine.processEncodedText does not handle this correctly.
Am I correct that if a BOM was determined, the codelength should be set to 2 
(and not be changed)? Or should alternatively simply the BOM be skipped?

It may be related to PDFBOX-920

  was:
When having a text print operation like
<FEFF21222193219103B103A003A6> Tj
than the PDFStreamEngine.processEncodedText does not handle this correctly.
Am I correct that if a BOM was determined, the codelength should be set to 2 
(and not be changed)? Or should alternatively simply the BOM be skipped?


> PDFStreamEngine.processEncodedText incorrectly handling UTF-16 text with BOM 
> FEFF
> ---------------------------------------------------------------------------------
>
>                 Key: PDFBOX-2009
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-2009
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Text extraction
>    Affects Versions: 2.0.0
>            Reporter: Philip Helger
>             Fix For: 2.0.0
>
>
> When having a text print operation like
> <FEFF21222193219103B103A003A6> Tj
> than the PDFStreamEngine.processEncodedText does not handle this correctly.
> Am I correct that if a BOM was determined, the codelength should be set to 2 
> (and not be changed)? Or should alternatively simply the BOM be skipped?
> It may be related to PDFBOX-920



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (PDFBOX-2009) PDFStreamEngine.processEncodedText incorrectly handling UTF-16 text with BOM FEFF

Reply via email to