The broad answer to your question is iText can pull apart a PDF into its
elements, but that probably won't solve your probem. PDF files do not
necessarily contain text in a sequential order. The text can be written as
one glyph (letter) at one point and then the remaining characters as a
series of separate strings that are emitted later on. So, there is no
guarantee that the word "confidential" will ever appear as a whole word in
the extracted elements. As a result, there is no guarantee that what you're
looking to do can be done. Some other tools out there can do limited editing
if the PDF happens to have the words written out together, but this depends
on the PDF generator and some luck.

---mr. bean


guitar-4 wrote:
> 
> Please help if you can answer this question.  I need iText for a specific
> purpose.  I need to pick apart PDF documents and analyze the content of a
> PDF for possibly unintentionally included sensitive information.  I need
> to find an API that has a complete set of packages/classes which will
> parse and pull apart PDF.  Is iText mainly suited for creating PDF
> documents or can it also be used as a complete API for, examples...
> pulling/parsing PDF documents, extracting any and all objects which may be
> embedded in it, finding similar color text to its background, finding
> hidden items, etc, etc.  This is very important at the moment for me, so
> responses would be greatly appreciated.
> 
> ------------------------------------------------------------------------------
> SF.Net email is Sponsored by MIX09, March 18-20, 2009 in Las Vegas,
> Nevada.
> The future of the web can't happen without you.  Join us at MIX09 to help
> pave the way to the Next Web now. Learn more and register at
> http://ad.doubleclick.net/clk;208669438;13503038;i?http://2009.visitmix.com/
> _______________________________________________
> iText-questions mailing list
> iText-questions@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/itext-questions
> 
> Buy the iText book: http://www.1t3xt.com/docs/book.php
> 
> 

-- 
View this message in context: 
http://www.nabble.com/iText-question-tp20857833p20860994.html
Sent from the iText - General mailing list archive at Nabble.com.


------------------------------------------------------------------------------
SF.Net email is Sponsored by MIX09, March 18-20, 2009 in Las Vegas, Nevada.
The future of the web can't happen without you.  Join us at MIX09 to help
pave the way to the Next Web now. Learn more and register at
http://ad.doubleclick.net/clk;208669438;13503038;i?http://2009.visitmix.com/
_______________________________________________
iText-questions mailing list
iText-questions@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/itext-questions

Buy the iText book: http://www.1t3xt.com/docs/book.php

Reply via email to