Re: FW: Word Merging Problem

Tilman Hausherr Wed, 24 Jan 2018 23:07:40 -0800

Hi,

Please upload your file to a sharehoster. PDF files don't go through.And please tell what PDF version you're using (hopefully 2.0.8). Andplease post to the user, not to the dev mailing list.

I was able to access your file because your post was stuck inmoderation. I don't have the time to try your code now (will dotonight). I tried with the ExtractText command line utility and that onedid have blanks.


Tilman



Am 25.01.2018 um 04:22 schrieb Laxmi Narayan:

Hi Team,
I have a problem while text extracting from pdf. When we extractingthe text words merge together. Can you suggest me , what we have todo for the same.
I have attached the PDF file from which I am extracting the text. AndI am using the below code to extract the text.
Please help me as soon as possible.
privatestatic string GetTextByArea_Orgnal(PDDocument doc, int x, inty, int w, int h)
        {

PDFTextStripperByArea stripper = new PDFTextStripperByArea("UTF-8");

stripper.setLineSeparator(" ");

stripper.setDropThreshold(3);

stripper.setWordSeparator(" ");

stripper.setParagraphStart("<p>");

stripper.setParagraphEnd("</p>");

stripper.setIndentThreshold(1);

stripper.setSortByPosition(true);

//==================

//==================

Dimension d = new Dimension(w, h);

Rectangle rect = new Rectangle(new Point(x, y), d);

stripper.addRegion("class1", rect);

java.util.List allPages = doc.getDocumentCatalog().getAllPages();

PDPage firstPage = (PDPage)allPages.get(0);
//// overlay the region with a cyan rectangle to check if I got thecoordinates and dimensions right
PDPageContentStream contentStream = new PDPageContentStream(doc,firstPage, true, true);
contentStream.setNonStrokingColor(Color.CYAN);

contentStream.fillRect(x, y, w, h);

contentStream.close();

////=============

stripper.extractRegions(firstPage);

return stripper.getTextForRegion("class1");

        }

Thanks,

Laxmi Narayan



---------------------------------------------------------------------
To unsubscribe, e-mail:[email protected]
For additional commands, e-mail:[email protected]

Re: FW: Word Merging Problem

Reply via email to