[ https://issues.apache.org/jira/browse/NUTCH-3039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17846401#comment-17846401 ]
Hudson commented on NUTCH-3039: ------------------------------- SUCCESS: Integrated in Jenkins build Nutch ยป Nutch-trunk #161 (See [https://ci-builds.apache.org/job/Nutch/job/Nutch-trunk/161/]) NUTCH-3039 Failure to handle ftp:// URLs (snagel: [https://github.com/apache/nutch/commit/ea9c7ee5d6635405b31b4a1d462cca746478b040]) * (edit) src/java/org/apache/nutch/plugin/URLStreamHandlerFactory.java > Failure to handle ftp:// URLs > ----------------------------- > > Key: NUTCH-3039 > URL: https://issues.apache.org/jira/browse/NUTCH-3039 > Project: Nutch > Issue Type: Bug > Components: plugin, protocol > Affects Versions: 1.19 > Reporter: Sebastian Nagel > Assignee: Sebastian Nagel > Priority: Major > Fix For: 1.21 > > > Nutch fails to handle ftp:// URLs: > - URLNormalizerBasic returns the empty string because creating the URL > instance fails with a MalformedURLException: > {noformat} > echo "ftp://ftp.example.com/path/file.txt" \ > | nutch normalizerchecker -stdin -normalizer urlnormalizer-basic{noformat} > - fetching a ftp:// URL with the protocol-ftp plugin enabled also fails due > to a MalformedURLException: > {noformat} > bin/nutch parsechecker -Dplugin.includes='protocol-ftp|parse-tika' \ > "ftp://ftp.example.com/path/file.txt" > ... > Exception in thread "main" org.apache.nutch.protocol.ProtocolNotFound: > java.net.MalformedURLException > at > org.apache.nutch.protocol.ProtocolFactory.getProtocol(ProtocolFactory.java:113) > ...{noformat} > The issue is caused by NUTCH-2429: > - we do not provide a dedicated URL stream handler for ftp URLs > - but also do not pass ftp:// URLs to the standard JVM handler -- This message was sent by Atlassian Jira (v8.20.10#820010)