[jira] [Commented] (TIKA-1095) Only gibberish extracted from this PDF
[ https://issues.apache.org/jira/browse/TIKA-1095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14361451#comment-14361451 ] Tyler Palsulich commented on TIKA-1095: --- Just commented on PDFBOX-2451. Still have this issue with PDFBox 1.8.9-SNAPSHOT, so we still have it in Tika. Only gibberish extracted from this PDF -- Key: TIKA-1095 URL: https://issues.apache.org/jira/browse/TIKA-1095 Project: Tika Issue Type: Bug Components: parser Affects Versions: 1.3 Environment: Probably any Reporter: Bas van Meurs Labels: pdfbox Attachments: ALG 2010-05-19 03 bijlage 1 - besluitenlijst dagelijks bestuur d d 10 februari 2010.pdf, test.txt java -jar /usr/share/tika/tika-app-1.3.jar -t /home/adrupal/www/sites/stadsregio.nl/files/files/Agendastukken/ALG 2010-05-19 03 bijlage 1 - besluitenlijst dagelijks bestuur d d 10 februari 2010.pdf /tmp/test.txt This produces all gibberish. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TIKA-1095) Only gibberish extracted from this PDF
[ https://issues.apache.org/jira/browse/TIKA-1095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14061818#comment-14061818 ] Stefan Postema commented on TIKA-1095: -- I'm having the same problem. The file is also in Dutch.. Only gibberish extracted from this PDF -- Key: TIKA-1095 URL: https://issues.apache.org/jira/browse/TIKA-1095 Project: Tika Issue Type: Bug Components: general Affects Versions: 1.3 Environment: Probably any Reporter: Bas van Meurs Labels: patch Attachments: ALG 2010-05-19 03 bijlage 1 - besluitenlijst dagelijks bestuur d d 10 februari 2010.pdf, test.txt java -jar /usr/share/tika/tika-app-1.3.jar -t /home/adrupal/www/sites/stadsregio.nl/files/files/Agendastukken/ALG 2010-05-19 03 bijlage 1 - besluitenlijst dagelijks bestuur d d 10 februari 2010.pdf /tmp/test.txt This produces all gibberish. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (TIKA-1095) Only gibberish extracted from this PDF
[ https://issues.apache.org/jira/browse/TIKA-1095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14061867#comment-14061867 ] Hong-Thai Nguyen commented on TIKA-1095: Event with latest Tika can't convert this file. It seems that a font problem on this PDF file. Can you report this to PDFBox tracker: https://issues.apache.org/jira/browse/PDFBOX/ ? Only gibberish extracted from this PDF -- Key: TIKA-1095 URL: https://issues.apache.org/jira/browse/TIKA-1095 Project: Tika Issue Type: Bug Components: general Affects Versions: 1.3 Environment: Probably any Reporter: Bas van Meurs Labels: patch Attachments: ALG 2010-05-19 03 bijlage 1 - besluitenlijst dagelijks bestuur d d 10 februari 2010.pdf, test.txt java -jar /usr/share/tika/tika-app-1.3.jar -t /home/adrupal/www/sites/stadsregio.nl/files/files/Agendastukken/ALG 2010-05-19 03 bijlage 1 - besluitenlijst dagelijks bestuur d d 10 februari 2010.pdf /tmp/test.txt This produces all gibberish. -- This message was sent by Atlassian JIRA (v6.2#6252)