Hi I want to extract only characters which are visible, i.e. not covered by an image. Here is a link to one page PDF sample: https://drive.google.com/file/d/14qy_GPS3dzXI-meJiCKkvqwUb59Q1yWk/view?usp=sharing It has some text which is covered by the image at the right top corner: ANNUAL REPORT 2018 All other characters are printed on top of the image. I tried running the code in here: https://stackoverflow.com/questions/66607663/how-to-use-pdfbox-to-extract-all-text-on-a-page-that-is-not-behind-an-image# And the code in here: https://stackoverflow.com/questions/69703154/differ-between-text-above-image-and-text-covered-by-image At both options, I cannot get the string "ANNUAL REPORT 2018" to be detected as hidden (= covered), and the string "Destination2050" to be detected as visible = on top of image. Any help would be much appriciated !!! Thanks Orit
--------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
