[ https://issues.apache.org/jira/browse/NUTCH-2439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16292450#comment-16292450 ]
Sebastian Nagel commented on NUTCH-2439: ---------------------------------------- Really? I've almost done with a PR for the upgrade (had to resolve a dependency conflict which breaks multiple parse-tika tests), but the amount of errors written to stderr is still hardly acceptable: {noformat} $ bin/nutch parsechecker -Dplugin.includes="protocol-http|parse-tika" http://localhost/nutch/test.pdf >/dev/null Dec 15, 2017 1:37:59 PM org.apache.tika.config.InitializableProblemHandler$3 handleInitializableProblem WARNING: JBIG2ImageReader not loaded. jbig2 files will be ignored See https://pdfbox.apache.org/2.0/dependencies.html#jai-image-io for optional dependencies. TIFFImageWriter not loaded. tiff files will not be processed See https://pdfbox.apache.org/2.0/dependencies.html#jai-image-io for optional dependencies. J2KImageReader not loaded. JPEG2000 files will not be processed. See https://pdfbox.apache.org/2.0/dependencies.html#jai-image-io for optional dependencies. Dec 15, 2017 1:37:59 PM org.apache.tika.config.InitializableProblemHandler$3 handleInitializableProblem WARNING: Tesseract OCR is installed and will be automatically applied to image files unless you've excluded the TesseractOCRParser from the default parser. Tesseract may dramatically slow down content extraction (TIKA-2359). As of Tika 1.15 (and prior versions), Tesseract is automatically called. In future versions of Tika, users may need to turn the TesseractOCRParser on via TikaConfig. Dec 15, 2017 1:37:59 PM org.apache.tika.config.InitializableProblemHandler$3 handleInitializableProblem WARNING: org.xerial's sqlite-jdbc is not loaded. Please provide the jar on your classpath to parse sqlite files. See tika-parsers/pom.xml for the correct version. {noformat} > Upgrade to Apache Tika 1.17 > --------------------------- > > Key: NUTCH-2439 > URL: https://issues.apache.org/jira/browse/NUTCH-2439 > Project: Nutch > Issue Type: Improvement > Affects Versions: 1.13 > Reporter: Markus Jelsma > Assignee: Markus Jelsma > Fix For: 1.14 > > Attachments: NUTCH-2439.patch, NUTCH-2439.patch > > -- This message was sent by Atlassian JIRA (v6.4.14#64029)