Hi all,

We are going to implement a module using iText to extract information
contained in PDF. We receive different PDF (invoices most part of them) from
suppliers of our clients. And we would like to extract the text information
contained, like order number, description of the items contained, price, and
so on.

By the moment, I have succeed to extract the entire text of the document or
a single page (even if the text is not ordered in the way we read it in the
original pdf.

I believe we should accept some tolerance in our recognition, the number of
items in invoice will vary. At least we think each supplier will mantain the
same template. But the subject seems a bit complicated to make something
flexible enough for making it quickly. Anyone of you has already dealed with
this problematic? Is there any framework/tool or part of iText which could
help us? After a research on Google and in iText documentation, I have not
found nothing useful.

Best regards
Enrique



--
View this message in context: 
http://itext-general.2136553.n4.nabble.com/Content-extraction-of-documents-tp4658664.html
Sent from the iText - General mailing list archive at Nabble.com.

------------------------------------------------------------------------------
This SF.net email is sponsored by Windows:

Build for Windows Store.

http://p.sf.net/sfu/windows-dev2dev
_______________________________________________
iText-questions mailing list
iText-questions@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/itext-questions

iText(R) is a registered trademark of 1T3XT BVBA.
Many questions posted to this list can (and will) be answered with a reference 
to the iText book: http://www.itextpdf.com/book/
Please check the keywords list before you ask for examples: 
http://itextpdf.com/themes/keywords.php

Reply via email to