Hi,
Please upload your file to a sharehoster. PDF files don't go through.
And please tell what PDF version you're using (hopefully 2.0.8). And
please post to the user, not to the dev mailing list.
I was able to access your file because your post was stuck in
moderation. I don't have the time to try your code now (will do
tonight). I tried with the ExtractText command line utility and that one
did have blanks.
Tilman
Am 25.01.2018 um 04:22 schrieb Laxmi Narayan:
Hi Team,
I have a problem while text extracting from pdf. When we extracting
the text words merge together. Can you suggest me , what we have to
do for the same.
I have attached the PDF file from which I am extracting the text. And
I am using the below code to extract the text.
Please help me as soon as possible.
privatestatic string GetTextByArea_Orgnal(PDDocument doc, int x, int
y, int w, int h)
{
PDFTextStripperByArea stripper = new PDFTextStripperByArea("UTF-8");
stripper.setLineSeparator(" ");
stripper.setDropThreshold(3);
stripper.setWordSeparator(" ");
stripper.setParagraphStart("<p>");
stripper.setParagraphEnd("</p>");
stripper.setIndentThreshold(1);
stripper.setSortByPosition(true);
//==================
//==================
Dimension d = new Dimension(w, h);
Rectangle rect = new Rectangle(new Point(x, y), d);
stripper.addRegion("class1", rect);
java.util.List allPages = doc.getDocumentCatalog().getAllPages();
PDPage firstPage = (PDPage)allPages.get(0);
//// overlay the region with a cyan rectangle to check if I got the
coordinates and dimensions right
PDPageContentStream contentStream = new PDPageContentStream(doc,
firstPage, true, true);
contentStream.setNonStrokingColor(Color.CYAN);
contentStream.fillRect(x, y, w, h);
contentStream.close();
////=============
stripper.extractRegions(firstPage);
return stripper.getTextForRegion("class1");
}
Thanks,
Laxmi Narayan
---------------------------------------------------------------------
To unsubscribe, e-mail:dev-unsubscr...@pdfbox.apache.org
For additional commands, e-mail:dev-h...@pdfbox.apache.org