[ https://issues.apache.org/jira/browse/NUTCH-2414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16144264#comment-16144264 ]
Jorge Luis Betancourt Gonzalez commented on NUTCH-2414: ------------------------------------------------------- +1 This would allow also help to deprecate the {{mimetype-filter}} plugin and avoid having the responsibility of indexing/allowing/blocking documents (from being indexed) scattered across several plugins > Allow LanguageIndexingFilter to actually filter documents by language. > ---------------------------------------------------------------------- > > Key: NUTCH-2414 > URL: https://issues.apache.org/jira/browse/NUTCH-2414 > Project: Nutch > Issue Type: Improvement > Components: plugin > Affects Versions: 1.13 > Reporter: Yossi Tamari > Priority: Minor > > It is often useful to only index pages in select languages (e.g. only those > languages that we intend to search in). At first glance it seems that this is > done by LanguageIndexingFilter, but currently all the filter does is add the > language as a field to the index. > We can add a configuration property to LanguageIndexingFilter that will allow > it to only index languages specified in this property. -- This message was sent by Atlassian JIRA (v6.4.14#64029)