[jira] [Commented] (NUTCH-1547) BasicIndexingFilter - Problem to index full title
[ https://issues.apache.org/jira/browse/NUTCH-1547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13616227#comment-13616227 ] lufeng commented on NUTCH-1547: --- Feng Committed revision 1462078 to trunk and 2.x revision 1462079. BasicIndexingFilter - Problem to index full title - Key: NUTCH-1547 URL: https://issues.apache.org/jira/browse/NUTCH-1547 Project: Nutch Issue Type: Bug Components: indexer Affects Versions: 1.6, 2.1 Reporter: Gustavo Rauber Assignee: lufeng Priority: Minor Fix For: 1.7, 2.2 Attachments: NUTCH-1547-2x.patch, NUTCH-1547.patch Original Estimate: 1h Remaining Estimate: 1h I have faced this issue when trying to index the entire title, just like the content, configuring its value on nutch-default.xml to -1 (indexer.max.title.length). I think the behavior should be the same as the content. If you would like to fix it, just replace the line number 90: if (title.length() MAX_TITLE_LENGTH) { // truncate title if needed by this one: if (MAX_TITLE_LENGTH -1 title.length() MAX_TITLE_LENGTH) { // truncate title if needed Stack Trace: java.lang.StringIndexOutOfBoundsException: String index out of range: -1 at java.lang.String.substring(String.java:1937) at org.apache.nutch.indexer.basic.BasicIndexingFilter.filter(BasicIndexingFilter.java:91) at org.apache.nutch.indexer.IndexingFilters.filter(IndexingFilters.java:109) at org.apache.nutch.indexer.IndexerMapReduce.reduce(IndexerMapReduce.java:272) at org.apache.nutch.indexer.IndexerMapReduce.reduce(IndexerMapReduce.java:53) at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:519) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:420) at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:260) Cheers. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (NUTCH-1547) BasicIndexingFilter - Problem to index full title
[ https://issues.apache.org/jira/browse/NUTCH-1547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13617043#comment-13617043 ] Hudson commented on NUTCH-1547: --- Integrated in Nutch-trunk #2148 (See [https://builds.apache.org/job/Nutch-trunk/2148/]) NUTCH-1547 BasicIndexingFilter - Problem to index full title (Revision 1462078) Result = SUCCESS fenglu : http://svn.apache.org/viewvc/nutch/trunk/?view=revrev=1462078 Files : * /nutch/trunk/CHANGES.txt * /nutch/trunk/conf/nutch-default.xml * /nutch/trunk/src/plugin/index-basic/src/java/org/apache/nutch/indexer/basic/BasicIndexingFilter.java BasicIndexingFilter - Problem to index full title - Key: NUTCH-1547 URL: https://issues.apache.org/jira/browse/NUTCH-1547 Project: Nutch Issue Type: Bug Components: indexer Affects Versions: 1.6, 2.1 Reporter: Gustavo Rauber Assignee: lufeng Priority: Minor Fix For: 1.7, 2.2 Attachments: NUTCH-1547-2x.patch, NUTCH-1547.patch Original Estimate: 1h Remaining Estimate: 1h I have faced this issue when trying to index the entire title, just like the content, configuring its value on nutch-default.xml to -1 (indexer.max.title.length). I think the behavior should be the same as the content. If you would like to fix it, just replace the line number 90: if (title.length() MAX_TITLE_LENGTH) { // truncate title if needed by this one: if (MAX_TITLE_LENGTH -1 title.length() MAX_TITLE_LENGTH) { // truncate title if needed Stack Trace: java.lang.StringIndexOutOfBoundsException: String index out of range: -1 at java.lang.String.substring(String.java:1937) at org.apache.nutch.indexer.basic.BasicIndexingFilter.filter(BasicIndexingFilter.java:91) at org.apache.nutch.indexer.IndexingFilters.filter(IndexingFilters.java:109) at org.apache.nutch.indexer.IndexerMapReduce.reduce(IndexerMapReduce.java:272) at org.apache.nutch.indexer.IndexerMapReduce.reduce(IndexerMapReduce.java:53) at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:519) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:420) at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:260) Cheers. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (NUTCH-1547) BasicIndexingFilter - Problem to index full title
[ https://issues.apache.org/jira/browse/NUTCH-1547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13617048#comment-13617048 ] Hudson commented on NUTCH-1547: --- Integrated in Nutch-nutchgora #548 (See [https://builds.apache.org/job/Nutch-nutchgora/548/]) NUTCH-1547 BasicIndexingFilter - Problem to index full title (Revision 1462079) Result = SUCCESS fenglu : http://svn.apache.org/viewvc/nutch/branches/2.x/?view=revrev=1462079 Files : * /nutch/branches/2.x/CHANGES.txt * /nutch/branches/2.x/conf/nutch-default.xml * /nutch/branches/2.x/src/plugin/index-basic/src/java/org/apache/nutch/indexer/basic/BasicIndexingFilter.java BasicIndexingFilter - Problem to index full title - Key: NUTCH-1547 URL: https://issues.apache.org/jira/browse/NUTCH-1547 Project: Nutch Issue Type: Bug Components: indexer Affects Versions: 1.6, 2.1 Reporter: Gustavo Rauber Assignee: lufeng Priority: Minor Fix For: 1.7, 2.2 Attachments: NUTCH-1547-2x.patch, NUTCH-1547.patch Original Estimate: 1h Remaining Estimate: 1h I have faced this issue when trying to index the entire title, just like the content, configuring its value on nutch-default.xml to -1 (indexer.max.title.length). I think the behavior should be the same as the content. If you would like to fix it, just replace the line number 90: if (title.length() MAX_TITLE_LENGTH) { // truncate title if needed by this one: if (MAX_TITLE_LENGTH -1 title.length() MAX_TITLE_LENGTH) { // truncate title if needed Stack Trace: java.lang.StringIndexOutOfBoundsException: String index out of range: -1 at java.lang.String.substring(String.java:1937) at org.apache.nutch.indexer.basic.BasicIndexingFilter.filter(BasicIndexingFilter.java:91) at org.apache.nutch.indexer.IndexingFilters.filter(IndexingFilters.java:109) at org.apache.nutch.indexer.IndexerMapReduce.reduce(IndexerMapReduce.java:272) at org.apache.nutch.indexer.IndexerMapReduce.reduce(IndexerMapReduce.java:53) at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:519) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:420) at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:260) Cheers. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (NUTCH-1547) BasicIndexingFilter - Problem to index full title
[ https://issues.apache.org/jira/browse/NUTCH-1547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13615434#comment-13615434 ] Lewis John McGibbney commented on NUTCH-1547: - +1 BasicIndexingFilter - Problem to index full title - Key: NUTCH-1547 URL: https://issues.apache.org/jira/browse/NUTCH-1547 Project: Nutch Issue Type: Bug Components: indexer Affects Versions: 1.6, 2.1 Reporter: Gustavo Rauber Assignee: lufeng Priority: Minor Fix For: 1.7, 2.2 Attachments: NUTCH-1547-2x.patch, NUTCH-1547.patch Original Estimate: 1h Remaining Estimate: 1h I have faced this issue when trying to index the entire title, just like the content, configuring its value on nutch-default.xml to -1 (indexer.max.title.length). I think the behavior should be the same as the content. If you would like to fix it, just replace the line number 90: if (title.length() MAX_TITLE_LENGTH) { // truncate title if needed by this one: if (MAX_TITLE_LENGTH -1 title.length() MAX_TITLE_LENGTH) { // truncate title if needed Stack Trace: java.lang.StringIndexOutOfBoundsException: String index out of range: -1 at java.lang.String.substring(String.java:1937) at org.apache.nutch.indexer.basic.BasicIndexingFilter.filter(BasicIndexingFilter.java:91) at org.apache.nutch.indexer.IndexingFilters.filter(IndexingFilters.java:109) at org.apache.nutch.indexer.IndexerMapReduce.reduce(IndexerMapReduce.java:272) at org.apache.nutch.indexer.IndexerMapReduce.reduce(IndexerMapReduce.java:53) at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:519) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:420) at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:260) Cheers. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (NUTCH-1547) BasicIndexingFilter - Problem to index full title
[ https://issues.apache.org/jira/browse/NUTCH-1547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13613770#comment-13613770 ] Gustavo Rauber commented on NUTCH-1547: --- Of course, one can also document the setting on nutch-default.xml as well, if it changes. BasicIndexingFilter - Problem to index full title - Key: NUTCH-1547 URL: https://issues.apache.org/jira/browse/NUTCH-1547 Project: Nutch Issue Type: Bug Components: indexer Affects Versions: 1.6 Reporter: Gustavo Rauber Priority: Minor Original Estimate: 1h Remaining Estimate: 1h I have faced this issue when trying to index the entire title, just like the content, configuring its value on nutch-default.xml to -1 (indexer.max.title.length). I think the behavior should be the same as the content. If you would like to fix it, just replace the line number 90: if (title.length() MAX_TITLE_LENGTH) { // truncate title if needed by this one: if (MAX_TITLE_LENGTH -1 title.length() MAX_TITLE_LENGTH) { // truncate title if needed Stack Trace: java.lang.StringIndexOutOfBoundsException: String index out of range: -1 at java.lang.String.substring(String.java:1937) at org.apache.nutch.indexer.basic.BasicIndexingFilter.filter(BasicIndexingFilter.java:91) at org.apache.nutch.indexer.IndexingFilters.filter(IndexingFilters.java:109) at org.apache.nutch.indexer.IndexerMapReduce.reduce(IndexerMapReduce.java:272) at org.apache.nutch.indexer.IndexerMapReduce.reduce(IndexerMapReduce.java:53) at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:519) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:420) at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:260) Cheers. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (NUTCH-1547) BasicIndexingFilter - Problem to index full title
[ https://issues.apache.org/jira/browse/NUTCH-1547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13614098#comment-13614098 ] lufeng commented on NUTCH-1547: --- Hi Gustavo I will add this patch tomorrow. Thanks Gustavo. BasicIndexingFilter - Problem to index full title - Key: NUTCH-1547 URL: https://issues.apache.org/jira/browse/NUTCH-1547 Project: Nutch Issue Type: Bug Components: indexer Affects Versions: 1.6 Reporter: Gustavo Rauber Priority: Minor Original Estimate: 1h Remaining Estimate: 1h I have faced this issue when trying to index the entire title, just like the content, configuring its value on nutch-default.xml to -1 (indexer.max.title.length). I think the behavior should be the same as the content. If you would like to fix it, just replace the line number 90: if (title.length() MAX_TITLE_LENGTH) { // truncate title if needed by this one: if (MAX_TITLE_LENGTH -1 title.length() MAX_TITLE_LENGTH) { // truncate title if needed Stack Trace: java.lang.StringIndexOutOfBoundsException: String index out of range: -1 at java.lang.String.substring(String.java:1937) at org.apache.nutch.indexer.basic.BasicIndexingFilter.filter(BasicIndexingFilter.java:91) at org.apache.nutch.indexer.IndexingFilters.filter(IndexingFilters.java:109) at org.apache.nutch.indexer.IndexerMapReduce.reduce(IndexerMapReduce.java:272) at org.apache.nutch.indexer.IndexerMapReduce.reduce(IndexerMapReduce.java:53) at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:519) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:420) at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:260) Cheers. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (NUTCH-1547) BasicIndexingFilter - Problem to index full title
[ https://issues.apache.org/jira/browse/NUTCH-1547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13614418#comment-13614418 ] Lewis John McGibbney commented on NUTCH-1547: - Is this for trunk or 2.x? BasicIndexingFilter - Problem to index full title - Key: NUTCH-1547 URL: https://issues.apache.org/jira/browse/NUTCH-1547 Project: Nutch Issue Type: Bug Components: indexer Affects Versions: 1.6 Reporter: Gustavo Rauber Assignee: lufeng Priority: Minor Attachments: NUTCH-1547.patch Original Estimate: 1h Remaining Estimate: 1h I have faced this issue when trying to index the entire title, just like the content, configuring its value on nutch-default.xml to -1 (indexer.max.title.length). I think the behavior should be the same as the content. If you would like to fix it, just replace the line number 90: if (title.length() MAX_TITLE_LENGTH) { // truncate title if needed by this one: if (MAX_TITLE_LENGTH -1 title.length() MAX_TITLE_LENGTH) { // truncate title if needed Stack Trace: java.lang.StringIndexOutOfBoundsException: String index out of range: -1 at java.lang.String.substring(String.java:1937) at org.apache.nutch.indexer.basic.BasicIndexingFilter.filter(BasicIndexingFilter.java:91) at org.apache.nutch.indexer.IndexingFilters.filter(IndexingFilters.java:109) at org.apache.nutch.indexer.IndexerMapReduce.reduce(IndexerMapReduce.java:272) at org.apache.nutch.indexer.IndexerMapReduce.reduce(IndexerMapReduce.java:53) at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:519) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:420) at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:260) Cheers. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (NUTCH-1547) BasicIndexingFilter - Problem to index full title
[ https://issues.apache.org/jira/browse/NUTCH-1547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13614639#comment-13614639 ] Sebastian Nagel commented on NUTCH-1547: +1 (should be fixed for trunk and 2.x) BasicIndexingFilter - Problem to index full title - Key: NUTCH-1547 URL: https://issues.apache.org/jira/browse/NUTCH-1547 Project: Nutch Issue Type: Bug Components: indexer Affects Versions: 1.6, 2.1 Reporter: Gustavo Rauber Assignee: lufeng Priority: Minor Fix For: 1.7, 2.2 Attachments: NUTCH-1547.patch Original Estimate: 1h Remaining Estimate: 1h I have faced this issue when trying to index the entire title, just like the content, configuring its value on nutch-default.xml to -1 (indexer.max.title.length). I think the behavior should be the same as the content. If you would like to fix it, just replace the line number 90: if (title.length() MAX_TITLE_LENGTH) { // truncate title if needed by this one: if (MAX_TITLE_LENGTH -1 title.length() MAX_TITLE_LENGTH) { // truncate title if needed Stack Trace: java.lang.StringIndexOutOfBoundsException: String index out of range: -1 at java.lang.String.substring(String.java:1937) at org.apache.nutch.indexer.basic.BasicIndexingFilter.filter(BasicIndexingFilter.java:91) at org.apache.nutch.indexer.IndexingFilters.filter(IndexingFilters.java:109) at org.apache.nutch.indexer.IndexerMapReduce.reduce(IndexerMapReduce.java:272) at org.apache.nutch.indexer.IndexerMapReduce.reduce(IndexerMapReduce.java:53) at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:519) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:420) at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:260) Cheers. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira