Hi Scott,
The duplication can happen if the string has shadows or faux bold. Sometimes
these effects are achieved by adding the same text twice but with a very
slight offset/scale. So the text IS REALLY THERE IN THE PDF 2 TIMES, it only
APPEARS to be displayed once. In these cases iText does the right thing.
Daniel
On 2012.02.16. 1:43, Scott Selvia wrote:
> I have been parsing a PDF and I have an issue with the text that is returned
> from the PdfTextExtractor.getTextFromPage method. The reason I'm using this
> version because of an exception that I am getting with the 5.1.3 version.
> However when I'm getting the text from the page(s) I have noticed that the
> words are running together e.g "iTextParseError" instead of "iText Parse
> Error". I made the change below in the TextRendererInfo.java and that
> resolved the text issues without a space. Finally, the PDF I'm parsing is
> duplicating lines of text, I confirmed that the text only appears once on
> the page in the PDF. E.g. "iText Parse Error\niText Parse Error".
>
> public String getText(){
> return (text == null) ? " " : (text.length() == 0) ? " " : text;
> }
>
>
> Exception in thread "main" java.lang.StringIndexOutOfBoundsException: String
> index out of range: 0
> at java.lang.String.charAt(String.java:686)
> at
> com.itextpdf.text.pdf.parser.LocationTextExtractionStrategy.getResultantText(LocationTextExtractionStrategy.java:121)
> at
> com.itextpdf.text.pdf.parser.PdfTextExtractor.getTextFromPage(PdfTextExtractor.java:73)
> at
> com.itextpdf.text.pdf.parser.PdfTextExtractor.getTextFromPage(PdfTextExtractor.java:88)
>
> ------------------------------------------------------------------------------
> Virtualization& Cloud Management Using Capacity Planning
> Cloud computing makes use of virtualization - but cloud computing
> also focuses on allowing computing to be delivered as a service.
> http://www.accelacomm.com/jaw/sfnl/114/51521223/
> _______________________________________________
> iText-questions mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/itext-questions
>
> iText(R) is a registered trademark of 1T3XT BVBA.
> Many questions posted to this list can (and will) be answered with a
> reference to the iText book: http://www.itextpdf.com/book/
> Please check the keywords list before you ask for examples:
> http://itextpdf.com/themes/keywords.php
------------------------------------------------------------------------------
Virtualization & Cloud Management Using Capacity Planning
Cloud computing makes use of virtualization - but cloud computing
also focuses on allowing computing to be delivered as a service.
http://www.accelacomm.com/jaw/sfnl/114/51521223/
_______________________________________________
iText-questions mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/itext-questions
iText(R) is a registered trademark of 1T3XT BVBA.
Many questions posted to this list can (and will) be answered with a reference
to the iText book: http://www.itextpdf.com/book/
Please check the keywords list before you ask for examples:
http://itextpdf.com/themes/keywords.php