[ https://issues.apache.org/jira/browse/PDFBOX-1709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14167885#comment-14167885 ]
Robert Simms commented on PDFBOX-1709: -------------------------------------- The convenience function processEncodedText lumps text together when y-fiords areas same, even when separated horizontally by a merge distance. I'll try with the new package. We need to upgrade to keep up with the newer PDF formats anyway. =) - Robert > processEncodedText gives wrong coordinates > ------------------------------------------ > > Key: PDFBOX-1709 > URL: https://issues.apache.org/jira/browse/PDFBOX-1709 > Project: PDFBox > Issue Type: Bug > Components: PDModel > Affects Versions: 1.8.2 > Environment: Windows 7 sp1, Javac 1.6.0_30, Java 1.7.0_17 > Reporter: Robert Simms > Labels: test > Attachments: PDFBOX1709-0.pdf, PDFBOX1709-1.pdf, PDFBOX1709-2.pdf > > > PDFStreamEngine#processEncodedText gives x-coord short by width of previous > text, for next text at same y-coord. > --- > Use this PostScript to create PDFs that demonstrate x-coordinate issue with > processEncodedText(). > %! > /Helvetica findfont 20 scalefont setfont > 100 72 moveto > (Hello) show > % CASES > % Uncomment any one of the following, make a PDF (with ghostscript ps2pdf, > or acrobat distiller), > % then process the PDF with java implementation of PDFBox PDFTextStripper. > % listing text and x,y positions obtained by overriding the > processEncodedText() method. > % For example, the x-coord. of a text item may be printed in that method > with > % System.out.format("%.2f\n", this.getTextMatrix().getXPosition()); > % % 0. Works to convince processEncodedText that string 'Hello world.' was at > 100,72. This is good. > % > % ( world.) show > % % 1. Does not trick processEncodedText into thinking 'Hello' followed by ' > ' + 'world.' > % % Instead, > % % x-coord. of 'world.' reported as being actual position minus width of > 'Hello', plus width of ' ' > % % which is x=105.56 in this case. > % > %( ) stringwidth pop 0 rmoveto > %(world.) show > % % 2. Positioning 'world.' within about 500 points from 'Hello', at same > vertical position causes > % % processEncodedText to give > % % x-coord. of 'world.' as actual position minus width of 'Hello' > % % which is x=200 in this case. > % > %100 0 rmoveto > %(world.) show > showpage -- This message was sent by Atlassian JIRA (v6.3.4#6332)