[ https://issues.apache.org/jira/browse/PDFBOX-4738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17028065#comment-17028065 ]
ASF subversion and git services commented on PDFBOX-4738: --------------------------------------------------------- Commit 1873471 from Tilman Hausherr in branch 'pdfbox/branches/issue45' [ https://svn.apache.org/r1873471 ] PDFBOX-4738: improve javadoc > getDocument().getObjects() returns nothing for split result documents > --------------------------------------------------------------------- > > Key: PDFBOX-4738 > URL: https://issues.apache.org/jira/browse/PDFBOX-4738 > Project: PDFBox > Issue Type: Bug > Components: Documentation > Reporter: Yuguang Huang > Priority: Minor > Fix For: 2.0.19, 3.0.0 PDFBox > > > > Hi PDFBOX community, we want to get objs count on pages instead of the whole > document. > Our way to do it is splitting the whole document into multiple documents > containing only one page. But it seems then it returns documents/pages > without objects, meaning getDocument().getObjects() returns an empty list. > But if we save each page into bytes then load them into PDDocument, we are > able to get the object counts. > > Is there any way we can get the page objs count without involving so much IO? > Thanks! > > Output of the below code with a three-page PDF document: > > Page objects count from splitted pages: > page [1] num of objs [0] > page [2] num of objs [0] > page [3] num of objs [0] > Page objects count from pages generated from bytes: > page [1] num of objs [20] > page [2] num of objs [51] > page [3] num of objs [20] > > {code:java} > private static void printNumObjects(String pdfFilename) throws IOException { > byte[] fileContent = Files.readAllBytes((new File(pdfFilename)).toPath()); > PDDocument document = PDDocument.load(fileContent); > List<PDDocument> pages = new Splitter().split(document); > List<byte[]> pageBytes = pages.stream().map(page -> { > try (ByteArrayOutputStream baos = new ByteArrayOutputStream()) { > page.save(baos); > page.close(); > return baos.toByteArray(); > } catch (IOException e) { > LOG.error("Failed to get bytes from page.", e); > return new byte[0]; > } > }).collect(Collectors.toList()); > System.out.println("Page objects count from splitted pages:"); > IntStream.range(0, pages.size()).forEach(i -> > System.out.println(String.format("page [%d] num of objs [%d]", i + 1, > pages.get(i).getDocument().getObjects().size()))); > System.out.println("Page objects count from pages generated from bytes:"); > IntStream.range(0, pageBytes.size()).forEach(i -> { > try { > System.out.println(String.format("page [%d] num of objs [%d]", i + 1, > PDDocument.load(pageBytes.get(i)).getDocument().getObjects().size())); > } catch (IOException e) { > LOG.error("Failed to load page.", e); > } > }); > }{code} > > -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org For additional commands, e-mail: dev-h...@pdfbox.apache.org