Hi all,
I was tying to manually feed text position objects to
processTextPosition method in PDFTextStripper class. I created a sub
class of PDFTextStripper and override processStream method. In
processStream method I manually created two text position objects for
words "W" and "H". At the end I passed them to processTextPosition
processTextPosition(textPosition1);
processTextPosition(textPosition2);
Then I tested it using
PDFTextStripper ocrStripper = new PDFOCRTextStripper();
PDDocument document = PDDocument.load("some pdf file");
String data = ocrStripper.getText(document);
System.out.println(data);
Output was : H W
Then I changed the sequence of passing TextPosition objects in [1]
processTextPosition(textPosition2);
processTextPosition(textPosition1);
Output was : WH
------------------------------
As far as I understood processTextPosition works with the text
position metadata like x and y co-ordinates of the input text. It
should not depend on the order of the input sequence. But in case It
seems like processTextPosition method works according to order of
input.
Ex. If I input W first, it prints W first without considering it's
actual position.
Is this the normal behaviour? Or am I missing something here?
[1] https://gist.github.com/DImuthuUpe/5dcfa9758f017794c649
--
Regards
W.Dimuthu Upeksha
Undergraduate
Department of Computer Science And Engineering
University of Moratuwa, Sri Lanka