I think I did something similar in 2018 that you might use, see the
FilteredTextStripper class in ExtractText.java . That one only extracts
text with angle 0.
/**
* TextStripper that only processes glyphs that have angle 0.
*/
class FilteredTextStripper extends PDFTextStripper
{
Sometimes the watermark will overlap with normal text which we want to
extract, so it would be great if it is possible to insert a filter and skip
some useless TextPositons (e.g. the text of the watermark may have a
rotation). I think the 'writePage' method in 'PDFTextStripper' is an
appropriate
2 matches
Mail list logo