[ https://issues.apache.org/jira/browse/NUTCH-2439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16200495#comment-16200495 ]
Markus Jelsma commented on NUTCH-2439: -------------------------------------- Ah, i removed slf4j-api from plugin.xml and it works. But errors are logged:fetching: https://www.sitesearch.io/ robots.txt whitelist not configured. {code} fetching: https://www.sitesearch.io/ robots.txt whitelist not configured. Oct 11, 2017 5:50:50 PM org.apache.tika.config.InitializableProblemHandler$3 handleInitializableProblem WARNING: JBIG2ImageReader not loaded. jbig2 files will be ignored See https://pdfbox.apache.org/2.0/dependencies.html#jai-image-io for optional dependencies. TIFFImageWriter not loaded. tiff files will not be processed See https://pdfbox.apache.org/2.0/dependencies.html#jai-image-io for optional dependencies. J2KImageReader not loaded. JPEG2000 files will not be processed. See https://pdfbox.apache.org/2.0/dependencies.html#jai-image-io for optional dependencies. Oct 11, 2017 5:50:50 PM org.apache.tika.config.InitializableProblemHandler$3 handleInitializableProblem WARNING: org.xerial's sqlite-jdbc is not loaded. Please provide the jar on your classpath to parse sqlite files. See tika-parsers/pom.xml for the correct version. parsing: https://www.sitesearch.io/ {code} > Upgrade to Apache Tika 1.16 > --------------------------- > > Key: NUTCH-2439 > URL: https://issues.apache.org/jira/browse/NUTCH-2439 > Project: Nutch > Issue Type: Improvement > Affects Versions: 1.13 > Reporter: Markus Jelsma > Assignee: Markus Jelsma > Fix For: 1.14 > > Attachments: NUTCH-2439.patch > > -- This message was sent by Atlassian JIRA (v6.4.14#64029)