On 10/08/2011 0:29, vfeki wrote:
Need to retrieve all existing paragraphs or chapters/sections from it, so
I'll be able to create bookmarks upon that list I retrieve from existing
pdf.

This is easy if the PDF contains an outline tree. But as you've tried this:

Tried to get Outlines, like ... PdfDictionary root = reader.getCatalog();
                                          PdfDictionary  outlines =
root.getAsDict(PdfName.OUTLINES); ...
but get null as a result.
it seems as if the PDF doesn't have an outline tree.
By the way, there's an easier way to extract bookmarks:

        PdfReader  
<http://api.itextpdf.com/itext/com/itextpdf/text/pdf/PdfReader.html>  reader =new  
PdfReader  <http://api.itextpdf.com/itext/com/itextpdf/text/pdf/PdfReader.html>(src);
        List<HashMap<String, Object>>  list =SimpleBookmark  
<http://api.itextpdf.com/itext/com/itextpdf/text/pdf/SimpleBookmark.html>.getBookmark(reader);
        SimpleBookmark  
<http://api.itextpdf.com/itext/com/itextpdf/text/pdf/SimpleBookmark.html>.exportToXML(list,
                new  FileOutputStream(dest),"ISO8859-1",true);

However, this won't work if root.getAsDict(PdfName.OUTLINES) returns null.
Because that means that there are no bookmarks in the PDF.

Now there are two options:
[1] The PDF is tagged: you can check this by consulting File>  Document 
Properties.
In this case, you can convert the PDF to an XML file and parse the XML file for 
its structure.
[2] The PDF is NOT tagged: in this case, your requirement is a mission 
impossible!

You must understand that (unless the PDF is "tagged") all structure gets lost when a PDF is created. What used to be a title, a paragraph, a caption, is just a string of glyphs drawn on a canvas.
What used to be a table is a bunch of paths and glyphs drawn on a page.
You can extract the lines of text using iText, but no tool can tell you if that line is part of a paragraph (and if so: which paragraph?) or if it's a title.
------------------------------------------------------------------------------
uberSVN's rich system and user administration capabilities and model 
configuration take the hassle out of deploying and managing Subversion and 
the tools developers use with it. Learn more about uberSVN and get a free 
download at:  http://p.sf.net/sfu/wandisco-dev2dev
_______________________________________________
iText-questions mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/itext-questions

iText(R) is a registered trademark of 1T3XT BVBA.
Many questions posted to this list can (and will) be answered with a reference 
to the iText book: http://www.itextpdf.com/book/
Please check the keywords list before you ask for examples: 
http://itextpdf.com/themes/keywords.php

Reply via email to