Dear All,

I am using iTextSharp in my application and found its text extraction capabilities excellent. I am facing a problem though. I use the PdfTextExtractor.GetTextFromPage method but it returns text pieces that are far apart separated by a single space. Take the following example (as displayed in Acrobat):

User name: abcdef                               Password: Cool1234

In the PDF there are no spaces between "abcdef" and "Password". If I extract the above text using PdfTextExtractor.GetTextFromPage I'll get the following result:

User name: abcdef Password: Cool1234

So the distance between the two words were cut down to a single space. What I need to achieve is that the words that are not separated by a space but a larger distance would be separated by a TAB in the resultant text.
I am guessing that I should abandon PdfTextExtractor.GetTextFromPage and use the LocationTextExtractionStrategy class combined with TextRenderInfo, but I have no clue how.
I'd be eternally grateful if anyone could point me in the right direction. C#, VB.NET, Java samples are all appreciated.

Thank you for your kind help in advance.

Best Regards,
Daniel
------------------------------------------------------------------------------
10 Tips for Better Web Security
Learn 10 ways to better secure your business today. Topics covered include:
Web security, SSL, hacker attacks & Denial of Service (DoS), private keys,
security Microsoft Exchange, secure Instant Messaging, and much more.
http://www.accelacomm.com/jaw/sfnl/114/51426210/
_______________________________________________
iText-questions mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/itext-questions

iText(R) is a registered trademark of 1T3XT BVBA.
Many questions posted to this list can (and will) be answered with a reference 
to the iText book: http://www.itextpdf.com/book/
Please check the keywords list before you ask for examples: 
http://itextpdf.com/themes/keywords.php

Reply via email to