On 10/08/2011 0:29, vfeki wrote:
Need to retrieve all existing paragraphs or chapters/sections from it, so
I'll be able to create bookmarks upon that list I retrieve from existing
pdf.
This is easy if the PDF contains an outline tree. But as you've tried this:
Tried to get Outlines, like ... PdfDictionary root = reader.getCatalog();
PdfDictionary outlines =
root.getAsDict(PdfName.OUTLINES); ...
but get null as a result.
it seems as if the PDF doesn't have an outline tree.
By the way, there's an easier way to extract bookmarks:
PdfReader
<http://api.itextpdf.com/itext/com/itextpdf/text/pdf/PdfReader.html> reader =new
PdfReader <http://api.itextpdf.com/itext/com/itextpdf/text/pdf/PdfReader.html>(src);
List<HashMap<String, Object>> list =SimpleBookmark
<http://api.itextpdf.com/itext/com/itextpdf/text/pdf/SimpleBookmark.html>.getBookmark(reader);
SimpleBookmark
<http://api.itextpdf.com/itext/com/itextpdf/text/pdf/SimpleBookmark.html>.exportToXML(list,
new FileOutputStream(dest),"ISO8859-1",true);
However, this won't work if root.getAsDict(PdfName.OUTLINES) returns null.
Because that means that there are no bookmarks in the PDF.
Now there are two options:
[1] The PDF is tagged: you can check this by consulting File> Document
Properties.
In this case, you can convert the PDF to an XML file and parse the XML file for
its structure.
[2] The PDF is NOT tagged: in this case, your requirement is a mission
impossible!
You must understand that (unless the PDF is "tagged") all structure gets
lost when a PDF is created.
What used to be a title, a paragraph, a caption, is just a string of
glyphs drawn on a canvas.
What used to be a table is a bunch of paths and glyphs drawn on a page.
You can extract the lines of text using iText, but no tool can tell you
if that line is part of a paragraph (and if so: which paragraph?) or if
it's a title.
------------------------------------------------------------------------------
uberSVN's rich system and user administration capabilities and model
configuration take the hassle out of deploying and managing Subversion and
the tools developers use with it. Learn more about uberSVN and get a free
download at: http://p.sf.net/sfu/wandisco-dev2dev
_______________________________________________
iText-questions mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/itext-questions
iText(R) is a registered trademark of 1T3XT BVBA.
Many questions posted to this list can (and will) be answered with a reference
to the iText book: http://www.itextpdf.com/book/
Please check the keywords list before you ask for examples:
http://itextpdf.com/themes/keywords.php