Re: Feature request for filtering TextPosition in PDFTextStripperByArea and PDFTextStripper

2024-03-05 Thread Tilman Hausherr
I think I did something similar in 2018 that you might use, see the FilteredTextStripper class in ExtractText.java . That one only extracts text with angle 0. /**  * TextStripper that only processes glyphs that have angle 0.  */ class FilteredTextStripper extends PDFTextStripper {    

Feature request for filtering TextPosition in PDFTextStripperByArea and PDFTextStripper

2024-03-05 Thread Hengyu Weng
Sometimes the watermark will overlap with normal text which we want to extract, so it would be great if it is possible to insert a filter and skip some useless TextPositons (e.g. the text of the watermark may have a rotation). I think the 'writePage' method in 'PDFTextStripper' is an appropriate