[
https://issues.apache.org/jira/browse/NUTCH-2439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16200495#comment-16200495
]
Markus Jelsma commented on NUTCH-2439:
--
Ah, i removed slf4j-api from plugin.xml and it works. But errors are
logged:fetching: https://www.sitesearch.io/
robots.txt whitelist not configured.
{code}
fetching: https://www.sitesearch.io/
robots.txt whitelist not configured.
Oct 11, 2017 5:50:50 PM org.apache.tika.config.InitializableProblemHandler$3
handleInitializableProblem
WARNING: JBIG2ImageReader not loaded. jbig2 files will be ignored
See https://pdfbox.apache.org/2.0/dependencies.html#jai-image-io
for optional dependencies.
TIFFImageWriter not loaded. tiff files will not be processed
See https://pdfbox.apache.org/2.0/dependencies.html#jai-image-io
for optional dependencies.
J2KImageReader not loaded. JPEG2000 files will not be processed.
See https://pdfbox.apache.org/2.0/dependencies.html#jai-image-io
for optional dependencies.
Oct 11, 2017 5:50:50 PM org.apache.tika.config.InitializableProblemHandler$3
handleInitializableProblem
WARNING: org.xerial's sqlite-jdbc is not loaded.
Please provide the jar on your classpath to parse sqlite files.
See tika-parsers/pom.xml for the correct version.
parsing: https://www.sitesearch.io/
{code}
> Upgrade to Apache Tika 1.16
> ---
>
> Key: NUTCH-2439
> URL: https://issues.apache.org/jira/browse/NUTCH-2439
> Project: Nutch
> Issue Type: Improvement
>Affects Versions: 1.13
>Reporter: Markus Jelsma
>Assignee: Markus Jelsma
> Fix For: 1.14
>
> Attachments: NUTCH-2439.patch
>
>
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)