[ https://issues.apache.org/jira/browse/PDFBOX-1201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Ilija Pavlic updated PDFBOX-1201: --------------------------------- Priority: Major (was: Minor) Description: The text stripper region doesn't capture text starting and finishing outside the capture region but flowing through the capture region. (was: The text stripper region seems to be shifted up from the given coordinates, causing lines below the region to be included and ones above the defined region to be included. ... PDPage page = (PDPage) allPages.get(0); PDFTextStripperByArea stripper = new PDFTextStripperByArea(); Rectangle2D.Float region = new Rectangle2D.Float(x, y, width, height); stripper.addRegion("test region", region); // overlay the region with a cyan rectangle to check if I got the coordinates and dimensions right PDPageContentStream contentStream = new PDPageContentStream(document, page, true, true); contentStream.setNonStrokingColor( Color.CYAN ); contentStream.fillRect(x, y, width, height); contentStream.close(); stripper.extractRegions(page); String content = stripper.getTextForRegion("test region"); ... document.save(...); ... The cyan rectangle overlays the desired region exactly when viewing the saved output document. On the other hand, stripper misses a couple of lines at the bottom of the rectangle and includes couple of lines above the rectangle.) Summary: PDFTextStripperByArea doesn't capture text that flows inside the capture region (was: PDFTextStripperByArea y coordinate shifted "up") > PDFTextStripperByArea doesn't capture text that flows inside the capture > region > ------------------------------------------------------------------------------- > > Key: PDFBOX-1201 > URL: https://issues.apache.org/jira/browse/PDFBOX-1201 > Project: PDFBox > Issue Type: Bug > Components: Text extraction > Affects Versions: 1.6.0 > Reporter: Ilija Pavlic > > The text stripper region doesn't capture text starting and finishing outside > the capture region but flowing through the capture region. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira