Hi All,

I am getting a text from the pdf using itext5.1.1.jar , using following
source code :

import java.io.FileNotFoundException;
import java.io.FileOutputStream;
import java.io.IOException;
import com.itextpdf.text.Document;
import com.itextpdf.text.DocumentException;
import com.itextpdf.text.Paragraph;
import com.itextpdf.text.pdf.PRIndirectReference;
import com.itextpdf.text.pdf.PRStream;
import com.itextpdf.text.pdf.PRTokeniser;
import com.itextpdf.text.pdf.PdfDictionary;
import com.itextpdf.text.pdf.PdfName;
import com.itextpdf.text.pdf.PdfReader;
import com.itextpdf.text.pdf.PdfWriter;


public class PdfTest {

    /**
     * @param args
     * @throws IOException
     * @throws DocumentException
     */
    public static void main(String[] args) throws
IOException,FileNotFoundException, DocumentException {

        PdfReader pdfReader = new PdfReader("D:/test/pdf/test.pdf");

        PdfDictionary page = pdfReader.getPageN(1);

        PRIndirectReference objectReference = (PRIndirectReference)
page.get(PdfName.CONTENTS);

        PRStream stream = (PRStream)
PdfReader.getPdfObject(objectReference);

        byte[] streamBytes = PdfReader.getStreamBytes(stream);

        PRTokeniser tokenizer = new PRTokeniser(streamBytes);

        StringBuffer contentStringBuffer = new StringBuffer();

        while (tokenizer.nextToken()) {
                if (tokenizer.getTokenType() ==
PRTokeniser.TokenType.STRING) {

contentStringBuffer.append(tokenizer.getStringValue());
                }
        }

        System.out.println("contentStringBuffer :"+contentStringBuffer);
         // printing only "Line one Line two" (not two lines only one line.)

        try {

            String paraText = String.valueOf(contentStringBuffer) ;

            Document document = new Document();
            PdfWriter.getInstance(document, new
FileOutputStream("D:/test/pdf/result.pdf"));

            document.open();

            Paragraph paragraph = new Paragraph(paraText);
            document.add(paragraph);

            pdfReader.close();
            document.close();

        } catch (Exception e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        }

    }
}



the file *test.pdf *has two lines as

*Line one
Line two*

but the *result.pdf* is getting the content as :

 *Line one Line two* (a space instead of new line , everything in the same
line. no new line is there.)

If it fetches the text properly, I have lot more to do with this. this is
just a Sample data for me, actual content file has lot of data with place
holders, which will be replaced with actual user data into new files.

Could any one help me getting the text as it is ?

Please let me know how to achieve it, if you know. Any help is greatly
appreciated.


Thanks in Advance,

spandana V
------------------------------------------------------------------------------
All of the data generated in your IT infrastructure is seriously valuable.
Why? It contains a definitive record of application performance, security 
threats, fraudulent activity, and more. Splunk takes this data and makes 
sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-d2d-c2
_______________________________________________
iText-questions mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/itext-questions

iText(R) is a registered trademark of 1T3XT BVBA.
Many questions posted to this list can (and will) be answered with a reference 
to the iText book: http://www.itextpdf.com/book/
Please check the keywords list before you ask for examples: 
http://itextpdf.com/themes/keywords.php

Reply via email to