[ https://issues.apache.org/jira/browse/PDFBOX-2053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13998004#comment-13998004 ]
Tilman Hausherr commented on PDFBOX-2053: ----------------------------------------- I can't tell. Currently there is my patch, but also change in the making by Andreas who is using a different strategy. The last release was just a few days ago, so the next one would take at least three months. But if you're building from source, why bother? You already fixed it yourself. > Issue with PDFBox position reading > ---------------------------------- > > Key: PDFBOX-2053 > URL: https://issues.apache.org/jira/browse/PDFBOX-2053 > Project: PDFBox > Issue Type: Bug > Affects Versions: 1.8.3 > Reporter: Orbel Mkrtchyan > Attachments: test.pdf > > > Using PDFBox 1.8.4, > bug #1: > PDDocument doc = new PDDocument(); > doc.load("test-pcc7247.pdf"); > doc.save("out.pdf"); > doc.close(); > The resulting file is corrupted, contains 0 pages and cannot be viewed by > Acrobat Reader. > bug #2: consider the following code snippet. The code runs like this: > Extractor extractor = new Extractor(); > extractor.writeText(pdDoc, output); > Using the code defined like this: > public class Extractor extends PDFTextStripper { > ... > protected void writePage() throws IOException > { > for( int i = 0; i < charactersByArticle.size(); i++) > { > List<TextPosition> textList = charactersByArticle.get( i ); > Iterator textIter = textList.iterator(); > while( textIter.hasNext() ) > { > TextPosition position = (TextPosition)textIter.next(); > In the given piece of code, position variable correctly iterates through the > letters of the first line of the provided pdf document, but its coordinates > (x, y, widths, etc) are always the same. Just to be clear, 1 position always > relates to 1 letter, and its widths array's length always equals 1. So we get > the same coordinates for every letter in a line. Expected behaviour is either > having new coordinates per letter or having widths[] contain widths for the > characters of a whole line of text -- This message was sent by Atlassian JIRA (v6.2#6252)