[ https://issues.apache.org/jira/browse/PDFBOX-1545?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
MartinV updated PDFBOX-1545: ---------------------------- Description: org.apache.pdfbox.examples.pdmodel.ReplaceString do not replaces any strings in this pdf : https://docs.google.com/file/d/0B4SxNalgkoJ3VjRDTEN0VER6WGc/edit?usp=sharing (anyone with link can view and download it...) As i found during iteration in "Tj" and "tj" operations : COSString previous = (COSString)tokens.get( j-1 ); String string = previous.getString(); Those strings are just empty or with length of 2 (some whitespaces only) ... i would expect to get some separated group of words from my PDF. I tried this on version 1.7.1 and then i download latest code from SVN (today) and both version had the same behaviour. I my PDF special in any way or which objects should be explored next ? I tried another two PDF downloaded from google drive and both had the same issue (maybe google formats PDF in special way ?). I am suprised that RemoveText works fine in this PDF and also test extraction give me good result - so there must be a way... Thank you PS: I don`t mind to fix bug on my own it but i do not have any significant knowledge of internal PDF structure. Hints welcomed. was: org.apache.pdfbox.examples.pdmodel.ReplaceString do not replaces any strings in this pdf : https://docs.google.com/file/d/0B4SxNalgkoJ3VjRDTEN0VER6WGc/edit?usp=sharing (anyone with link can view and download it...) As i found during iteration in "Tj" and "tj" operations : COSString previous = (COSString)tokens.get( j-1 ); String string = previous.getString(); Those strings are just empty or with length of 2 (some whitespaces only) so cannot be replaced. I tried this on version 1.7.1 and then i download latest code from SVN (today) and both version had the same behaviour. I my PDF special in any way or which objects should be explored next ? I tried another two PDF downloaded from google drive and both had the same issue (maybe google formats PDF in special way ?). I am suprised that RemoveText works fine in this PDF and also test extraction give me good result - so there must be a way... Thank you PS: I don`t mind to fix bug on my own it but i do not have any significant knowledge of internal PDF structure. Hints welcomed. > ReplaceString fails to replace text, however RemoveText or TextExtraction > works fine > ------------------------------------------------------------------------------------ > > Key: PDFBOX-1545 > URL: https://issues.apache.org/jira/browse/PDFBOX-1545 > Project: PDFBox > Issue Type: Bug > Components: PDModel > Affects Versions: 1.7.1 > Environment: ubuntu 32bit, Java 6 > Reporter: MartinV > Labels: patch > Original Estimate: 24h > Remaining Estimate: 24h > > org.apache.pdfbox.examples.pdmodel.ReplaceString do not replaces any strings > in this pdf : > https://docs.google.com/file/d/0B4SxNalgkoJ3VjRDTEN0VER6WGc/edit?usp=sharing > (anyone with link can view and download it...) > As i found during iteration in "Tj" and "tj" operations : > COSString previous = (COSString)tokens.get( j-1 ); > String string = previous.getString(); > Those strings are just empty or with length of 2 (some whitespaces only) ... > i would expect to get some separated group of words from my PDF. > I tried this on version 1.7.1 and then i download latest code from SVN > (today) and both version had the same behaviour. I my PDF special in any way > or which objects should be explored next ? I tried another two PDF downloaded > from google drive and both had the same issue (maybe google formats PDF in > special way ?). > I am suprised that RemoveText works fine in this PDF and also test extraction > give me good result - so there must be a way... Thank you > PS: I don`t mind to fix bug on my own it but i do not have any significant > knowledge of internal PDF structure. Hints welcomed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira