[ 
https://issues.apache.org/jira/browse/NUTCH-1994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14512910#comment-14512910
 ] 

Chris A. Mattmann commented on NUTCH-1994:
------------------------------------------

OK, so here's some more info. I printed out the set of parsers returned from 
the creation of the TikaConfig using the class's system class loader, along 
with the default one in Tika. Both return {} as the list of parsers indicating 
there is something screwy in SPI loading:

{noformat}
CREATE OUR OWN TIKA CONFIG default parser is 
org.apache.tika.parser.DefaultParser
supported parsers {}
PARSER RETRIEVED! NULL!
2015-04-25 23:25:34,046 ERROR tika.TikaParser (TikaParser.java:getParse(87)) - 
Can't retrieve Tika parser for mime-type text/plain
RESULT TEXT! textfile.txt  
HERE IS THE PARSE TEXT textfile.txt  
{noformat}

Furthermore, the upgrade needed more updates to plugin.xml, see the attached 
patch. Didn't fix the issue, but is needed, regardless. I will keep digging.


> Upgrade to Apache Tika 1.8
> --------------------------
>
>                 Key: NUTCH-1994
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1994
>             Project: Nutch
>          Issue Type: Improvement
>          Components: build, parser
>    Affects Versions: 1.10, 2.3.1
>            Reporter: Lewis John McGibbney
>            Assignee: Lewis John McGibbney
>             Fix For: 1.10, 2.3.1
>
>         Attachments: NUTCH-1994-2.x.patch, NUTCH-1994-trunk.patch
>
>
> Tika 1.8 was released this morning.
> Lets upgrade then release Nutch trunk.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to