On 24/06/2011 16:20, modie wrote: > Ever solve this? I am trying to do something similar. Yes, it is possible to extract text from a PDF. No, it's not possible to extract "h1" stuff from a PDF, because that concept doesn't exist in PDF, UNLESS the PDF is tagged. So the first counter-question is: are your PDFs tagged? If not: please consider your requirement to be impossible. If so: please read the documentation: http://itextpdf.com/book/
------------------------------------------------------------------------------ All the data continuously generated in your IT infrastructure contains a definitive record of customers, application performance, security threats, fraudulent activity and more. Splunk takes this data and makes sense of it. Business sense. IT sense. Common sense.. http://p.sf.net/sfu/splunk-d2d-c1 _______________________________________________ iText-questions mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/itext-questions iText(R) is a registered trademark of 1T3XT BVBA. Many questions posted to this list can (and will) be answered with a reference to the iText book: http://www.itextpdf.com/book/ Please check the keywords list before you ask for examples: http://itextpdf.com/themes/keywords.php
