[iText-questions] Text Extraction with TABs instead of spaces

Kékesi Dániel Thu, 21 Jul 2011 00:00:10 -0700

Dear All,

I am using iTextSharp in my application and found its text extraction 
capabilities excellent. I am facing a problem though. I use the 
PdfTextExtractor.GetTextFromPage method but it returns text pieces that are 
far apart separated by a single space. Take the following example (as 
displayed in Acrobat):


User name: abcdef                               Password: Cool1234

In the PDF there are no spaces between "abcdef" and "Password". If I extract 
the above text using PdfTextExtractor.GetTextFromPage I'll get the following 
result:

User name: abcdef Password: Cool1234

So the distance between the two words were cut down to a single space. What I 
need to achieve is that the words that are not separated by a space but a 
larger distance would be separated by a TAB in the resultant text.
I am guessing that I should abandon PdfTextExtractor.GetTextFromPage and use 
the LocationTextExtractionStrategy class combined with TextRenderInfo, but I 
have no clue how.
I'd be eternally grateful if anyone could point me in the right direction. C#, 
VB.NET, Java samples are all appreciated.

Thank you for your kind help in advance.

Best Regards,
Daniel

------------------------------------------------------------------------------
5 Ways to Improve & Secure Unified Communications
Unified Communications promises greater efficiencies for business. UC can 
improve internal communications as well as offer faster, more efficient ways
to interact with customers and streamline customer service. Learn more!
http://www.accelacomm.com/jaw/sfnl/114/51426253/
_______________________________________________
iText-questions mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/itext-questions

iText(R) is a registered trademark of 1T3XT BVBA.
Many questions posted to this list can (and will) be answered with a reference 
to the iText book: http://www.itextpdf.com/book/
Please check the keywords list before you ask for examples: 
http://itextpdf.com/themes/keywords.php

[iText-questions] Text Extraction with TABs instead of spaces

Reply via email to