[
https://issues.apache.org/jira/browse/NUTCH-3127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18057940#comment-18057940
]
Sebastian Nagel commented on NUTCH-3127:
----------------------------------------
Curlie now provides data downloads again: https://curlie.org/docs/en/rdf.html
> Deprecate or remove DmozParser
> ------------------------------
>
> Key: NUTCH-3127
> URL: https://issues.apache.org/jira/browse/NUTCH-3127
> Project: Nutch
> Issue Type: Improvement
> Components: tool
> Reporter: Sebastian Nagel
> Priority: Minor
> Fix For: 1.22
>
>
> The tool
> [DmozParser|https://github.com/apache/nutch/blob/master/src/java/org/apache/nutch/tools/DmozParser.java]
> to import links from [DMOZ|https://en.wikipedia.org/wiki/DMOZ] RDF dumps.
> - DMOZ was closed in 2017
> - The "successor" Curlie does not provide RDF dumps, although they "are busy
> on preparing a clean download" ([Curlie Data -
> RDF|https://curlie.org/docs/en/rdf.html])
> We should deprecate the tool adding a notice about the state of DMOZ, or
> simply remove it.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)