[ https://issues.apache.org/jira/browse/NUTCH-2147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14974565#comment-14974565 ]
Lewis John McGibbney commented on NUTCH-2147: --------------------------------------------- Hi [~markus17], I never took a look at that patch and was not aware that it was in the codebase :) I have no problems changing the issue name and scope. It makes much more sense Markus thanks. > LanguagePreferenceScoringFilter for Nutch > ----------------------------------------- > > Key: NUTCH-2147 > URL: https://issues.apache.org/jira/browse/NUTCH-2147 > Project: Nutch > Issue Type: New Feature > Components: plugin, scoring > Affects Versions: 1.10 > Reporter: Lewis John McGibbney > Assignee: Lewis John McGibbney > Fix For: 1.12 > > > Based on the implementation of a LanguagePreferenceScoringFilter Nutch could > easily be made into a directed crawler based on crawl administrator ranking > preferences of languages we wish to crawl. > Right now this is not possible. > We already detect and index language within the language-identifier plugin as > well as within parse-tika irrc, however currently the presence of a language > does not effect scoring of pages. -- This message was sent by Atlassian JIRA (v6.3.4#6332)