[ https://issues.apache.org/jira/browse/PDFBOX-167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Tilman Hausherr closed PDFBOX-167. ---------------------------------- Resolution: Cannot Reproduce On october 2013, I e-mailed both people mentioned in this issue: {quote} Is this still an issue? I looked at the code and it is different than the one mentioned. But I can't test the code mentioned because the links are broken. {quote} I never got a response. I am thus closing this issue. > wrong words highlighted > ----------------------- > > Key: PDFBOX-167 > URL: https://issues.apache.org/jira/browse/PDFBOX-167 > Project: PDFBox > Issue Type: Bug > Priority: Minor > > [imported from SourceForge] > http://sourceforge.net/tracker/index.php?group_id=78314&atid=552832&aid=1487217 > Originally submitted by nobody on 2006-05-12 01:51. > PDFBox appears to have a problem properly highlighting > words from the following PDF. I am using a very simple > servlet to do this, and it works fine for most PDFs. > With this one, however, it highlights the wrong words. > Unfortunately I am not smart enough to figure out what > is going on myself, so could anybody help me with this? > The files can be found here: > http://www.impressie.nl/matthijs/PDFHighlight.java > http://www.impressie.nl/matthijs/Rectificatie%20van%20Richtlijn%20Handhaving%20van%20Intellectuele-eigendomsrechten.pdf > Matthijs Bierman > matth...@impressie.nl > [comment on SourceForge] > Originally sent by nobody. > Logged In: NO > That document is in a password-protected area, so it can't be read by anyone > else! I have a similar problem with this doc: > http://www.usc.edu/schools/business/FBE/seminars/papers/AE_4-28-06_FISMAN-parking.pdf > ... but I think I've figured this one out. The second page of this document > is entirely blank, and checking by hand I can see that the highlights after > p1 are all in positions that would be correct if they were one page further > on; it appears that the page count isn't being incremented for the blank > page. Tracing this back in the code I see this: > PDStream contentStream = nextPage.getContents(); > if( contentStream != null ) > { > COSStream contents = contentStream.getStream(); > processPage( nextPage, contents ); > } > (PDFTextStripper.java line 255). That's skipping the blank page and giving me > the wrong page no, I think - and I guess that the problem can be resolved by > moving currentPageNo++ from inside processPage to just above that test. > -- brian.ew...@gmail.com -- This message was sent by Atlassian JIRA (v6.2#6252)