[ https://issues.apache.org/jira/browse/NUTCH-1547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
lufeng updated NUTCH-1547: -------------------------- Attachment: NUTCH-1547-2x.patch add patch to Nutch 2.x > BasicIndexingFilter - Problem to index full title > ------------------------------------------------- > > Key: NUTCH-1547 > URL: https://issues.apache.org/jira/browse/NUTCH-1547 > Project: Nutch > Issue Type: Bug > Components: indexer > Affects Versions: 1.6, 2.1 > Reporter: Gustavo Rauber > Assignee: lufeng > Priority: Minor > Fix For: 1.7, 2.2 > > Attachments: NUTCH-1547-2x.patch, NUTCH-1547.patch > > Original Estimate: 1h > Remaining Estimate: 1h > > I have faced this issue when trying to index the entire title, just like the > content, configuring its value on nutch-default.xml to -1 > (indexer.max.title.length). I think the behavior should be the same as the > content. > If you would like to fix it, just replace the line number 90: > if (title.length() > MAX_TITLE_LENGTH) { // truncate title if needed > by this one: > if (MAX_TITLE_LENGTH > -1 && title.length() > MAX_TITLE_LENGTH) { // > truncate title if needed > Stack Trace: > java.lang.StringIndexOutOfBoundsException: String index out of range: -1 > at java.lang.String.substring(String.java:1937) > at > org.apache.nutch.indexer.basic.BasicIndexingFilter.filter(BasicIndexingFilter.java:91) > at > org.apache.nutch.indexer.IndexingFilters.filter(IndexingFilters.java:109) > at > org.apache.nutch.indexer.IndexerMapReduce.reduce(IndexerMapReduce.java:272) > at > org.apache.nutch.indexer.IndexerMapReduce.reduce(IndexerMapReduce.java:53) > at > org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:519) > at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:420) > at > org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:260) > Cheers. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira