[ 
https://issues.apache.org/jira/browse/TIKA-1138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13691929#comment-13691929
 ] 

Nick Burch commented on TIKA-1138:
----------------------------------

That's often a sign that the parser can't handle them. There's some discussion 
on the dev list at the moment about how best to report that, but it hasn't 
concluded

As an example, solupro.xls is an Excel-95 file, which Apache POI (the library 
Tika uses for .xls) doesn't handle, hence why you're able to get metadata but 
not text
                
> I got empty body and empty title with some documents
> ----------------------------------------------------
>
>                 Key: TIKA-1138
>                 URL: https://issues.apache.org/jira/browse/TIKA-1138
>             Project: Tika
>          Issue Type: Bug
>          Components: general
>    Affects Versions: 1.3
>         Environment: Windows 7 (my desktop)
>            Reporter: Koutsoulis Philippe
>              Labels: test
>
> *+Tested version:+* Apache Tika 1.3 (with the Apache Tika GUI)
> Hi all,
> I have empty body and empty title with some documents.
> Do you have an idea?
> *+Extract from my "Structured Text"+*
> {noformat}
> <?xml version="1.0" encoding="UTF-8"?><html 
> xmlns="http://www.w3.org/1999/xhtml";>
> <head>
> ...
> <title/>
> </head>
> <body/></html>
> {noformat}
> *+Files to reproduce+*
> [http://www.justice.gouv.fr/art_pix/declaration_sexe_20091016.xls]
> [http://ge.ch/ssco_gestats/excel/deinfo_par_ht2004.xls]
> [http://homepage.swissonline.ch/ccvaf1/stock_divers/palmares_ccvaf.xls]
> [http://top1000.anthologeek.net/participants.current.txt]
> [http://ge.ch/ssco_gestats/excel/refona_par_ht2006.xls]
> [http://www.rad.fr/solupro.xls]
> [http://www.pfynschiessen.ch/TClassementgroupeinvite.xls]
> [http://www.gregdonner.org/workbench/wb_31rev.txt]
> (i) No error in logs :(

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to