[jira] [Commented] (PDFBOX-4738) getDocument().getObjects() returns nothing for split result documents

ASF subversion and git services (Jira) Sat, 01 Feb 2020 04:13:07 -0800


    [ 
https://issues.apache.org/jira/browse/PDFBOX-4738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17028065#comment-17028065
 ]


ASF subversion and git services commented on PDFBOX-4738:
---------------------------------------------------------

Commit 1873471 from Tilman Hausherr in branch 'pdfbox/branches/issue45'
[ https://svn.apache.org/r1873471 ]

PDFBOX-4738: improve javadoc

> getDocument().getObjects() returns nothing for split result documents
> ---------------------------------------------------------------------
>
>                 Key: PDFBOX-4738
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-4738
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Documentation
>            Reporter: Yuguang Huang
>            Priority: Minor
>             Fix For: 2.0.19, 3.0.0 PDFBox
>
>
>  
> Hi PDFBOX community, we want to get objs count on pages instead of the whole 
> document. 
> Our way to do it is splitting the whole document into multiple documents 
> containing only one page. But it seems then it returns documents/pages 
> without objects, meaning getDocument().getObjects() returns an empty list. 
> But if we save each page into bytes then load them into PDDocument, we are 
> able to get the object counts. 
>  
> Is there any way we can get the page objs count without involving so much IO? 
> Thanks! 
>  
> Output of the below code with a three-page PDF document:
>  
> Page objects count from splitted pages:
> page [1] num of objs [0]
> page [2] num of objs [0]
> page [3] num of objs [0]
> Page objects count from pages generated from bytes:
> page [1] num of objs [20]
> page [2] num of objs [51]
> page [3] num of objs [20]
>  
> {code:java}
> private static void printNumObjects(String pdfFilename) throws IOException {
>  byte[] fileContent = Files.readAllBytes((new File(pdfFilename)).toPath());
>  PDDocument document = PDDocument.load(fileContent);
>  List<PDDocument> pages = new Splitter().split(document);
>  List<byte[]> pageBytes = pages.stream().map(page -> {
>  try (ByteArrayOutputStream baos = new ByteArrayOutputStream()) {
>  page.save(baos);
>  page.close();
>  return baos.toByteArray();
>  } catch (IOException e) {
>  LOG.error("Failed to get bytes from page.", e);
>  return new byte[0];
>  }
>  }).collect(Collectors.toList());
>  System.out.println("Page objects count from splitted pages:");
>  IntStream.range(0, pages.size()).forEach(i -> 
> System.out.println(String.format("page [%d] num of objs [%d]", i + 1, 
> pages.get(i).getDocument().getObjects().size())));
>  System.out.println("Page objects count from pages generated from bytes:");
>  IntStream.range(0, pageBytes.size()).forEach(i -> {
>  try {
>  System.out.println(String.format("page [%d] num of objs [%d]", i + 1, 
> PDDocument.load(pageBytes.get(i)).getDocument().getObjects().size()));
>  } catch (IOException e) {
>  LOG.error("Failed to load page.", e);
>  }
>  });
> }{code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail: dev-h...@pdfbox.apache.org

[jira] [Commented] (PDFBOX-4738) getDocument().getObjects() returns nothing for split result documents

Reply via email to