[jira] [Commented] (TIKA-1095) Only gibberish extracted from this PDF

2015-03-13 Thread Tyler Palsulich (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14361451#comment-14361451
 ] 

Tyler Palsulich commented on TIKA-1095:
---

Just commented on PDFBOX-2451. Still have this issue with PDFBox 
1.8.9-SNAPSHOT, so we still have it in Tika.

 Only gibberish extracted from this PDF
 --

 Key: TIKA-1095
 URL: https://issues.apache.org/jira/browse/TIKA-1095
 Project: Tika
  Issue Type: Bug
  Components: parser
Affects Versions: 1.3
 Environment: Probably any
Reporter: Bas van Meurs
  Labels: pdfbox
 Attachments: ALG 2010-05-19 03 bijlage 1 -  besluitenlijst dagelijks 
 bestuur d d  10 februari 2010.pdf, test.txt


 java -jar /usr/share/tika/tika-app-1.3.jar -t 
 /home/adrupal/www/sites/stadsregio.nl/files/files/Agendastukken/ALG 
 2010-05-19 03 bijlage 1 -  besluitenlijst dagelijks bestuur d d  10 februari 
 2010.pdf  /tmp/test.txt
 This produces all gibberish.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TIKA-1095) Only gibberish extracted from this PDF

2014-07-15 Thread Stefan Postema (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14061818#comment-14061818
 ] 

Stefan Postema commented on TIKA-1095:
--

I'm having the same problem. The file is also in Dutch..

 Only gibberish extracted from this PDF
 --

 Key: TIKA-1095
 URL: https://issues.apache.org/jira/browse/TIKA-1095
 Project: Tika
  Issue Type: Bug
  Components: general
Affects Versions: 1.3
 Environment: Probably any
Reporter: Bas van Meurs
  Labels: patch
 Attachments: ALG 2010-05-19 03 bijlage 1 -  besluitenlijst dagelijks 
 bestuur d d  10 februari 2010.pdf, test.txt


 java -jar /usr/share/tika/tika-app-1.3.jar -t 
 /home/adrupal/www/sites/stadsregio.nl/files/files/Agendastukken/ALG 
 2010-05-19 03 bijlage 1 -  besluitenlijst dagelijks bestuur d d  10 februari 
 2010.pdf  /tmp/test.txt
 This produces all gibberish.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (TIKA-1095) Only gibberish extracted from this PDF

2014-07-15 Thread Hong-Thai Nguyen (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14061867#comment-14061867
 ] 

Hong-Thai Nguyen commented on TIKA-1095:


Event with latest Tika can't convert this file. It seems that a font problem on 
this PDF file. Can you report this to PDFBox tracker: 
https://issues.apache.org/jira/browse/PDFBOX/ ?

 Only gibberish extracted from this PDF
 --

 Key: TIKA-1095
 URL: https://issues.apache.org/jira/browse/TIKA-1095
 Project: Tika
  Issue Type: Bug
  Components: general
Affects Versions: 1.3
 Environment: Probably any
Reporter: Bas van Meurs
  Labels: patch
 Attachments: ALG 2010-05-19 03 bijlage 1 -  besluitenlijst dagelijks 
 bestuur d d  10 februari 2010.pdf, test.txt


 java -jar /usr/share/tika/tika-app-1.3.jar -t 
 /home/adrupal/www/sites/stadsregio.nl/files/files/Agendastukken/ALG 
 2010-05-19 03 bijlage 1 -  besluitenlijst dagelijks bestuur d d  10 februari 
 2010.pdf  /tmp/test.txt
 This produces all gibberish.



--
This message was sent by Atlassian JIRA
(v6.2#6252)