Hi all, We are going to implement a module using iText to extract information contained in PDF. We receive different PDF (invoices most part of them) from suppliers of our clients. And we would like to extract the text information contained, like order number, description of the items contained, price, and so on.
By the moment, I have succeed to extract the entire text of the document or a single page (even if the text is not ordered in the way we read it in the original pdf. I believe we should accept some tolerance in our recognition, the number of items in invoice will vary. At least we think each supplier will mantain the same template. But the subject seems a bit complicated to make something flexible enough for making it quickly. Anyone of you has already dealed with this problematic? Is there any framework/tool or part of iText which could help us? After a research on Google and in iText documentation, I have not found nothing useful. Best regards Enrique -- View this message in context: http://itext-general.2136553.n4.nabble.com/Content-extraction-of-documents-tp4658664.html Sent from the iText - General mailing list archive at Nabble.com. ------------------------------------------------------------------------------ This SF.net email is sponsored by Windows: Build for Windows Store. http://p.sf.net/sfu/windows-dev2dev _______________________________________________ iText-questions mailing list iText-questions@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/itext-questions iText(R) is a registered trademark of 1T3XT BVBA. Many questions posted to this list can (and will) be answered with a reference to the iText book: http://www.itextpdf.com/book/ Please check the keywords list before you ask for examples: http://itextpdf.com/themes/keywords.php