[jira] [Commented] (NUTCH-1547) BasicIndexingFilter - Problem to index full title

2013-03-28 Thread lufeng (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-1547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13616227#comment-13616227
 ] 

lufeng commented on NUTCH-1547:
---

Feng Committed revision 1462078 to trunk and 2.x revision 1462079.


 BasicIndexingFilter - Problem to index full title
 -

 Key: NUTCH-1547
 URL: https://issues.apache.org/jira/browse/NUTCH-1547
 Project: Nutch
  Issue Type: Bug
  Components: indexer
Affects Versions: 1.6, 2.1
Reporter: Gustavo Rauber
Assignee: lufeng
Priority: Minor
 Fix For: 1.7, 2.2

 Attachments: NUTCH-1547-2x.patch, NUTCH-1547.patch

   Original Estimate: 1h
  Remaining Estimate: 1h

 I have faced this issue when trying to index the entire title, just like the 
 content, configuring its value on nutch-default.xml to -1 
 (indexer.max.title.length). I think the behavior should be the same as the 
 content.
 If you would like to fix it, just replace the line number 90:
 if (title.length()  MAX_TITLE_LENGTH) {  // truncate title if needed
 by this one:
 if (MAX_TITLE_LENGTH  -1  title.length()  MAX_TITLE_LENGTH) {  // 
 truncate title if needed
 Stack Trace:
 java.lang.StringIndexOutOfBoundsException: String index out of range: -1
   at java.lang.String.substring(String.java:1937)
   at 
 org.apache.nutch.indexer.basic.BasicIndexingFilter.filter(BasicIndexingFilter.java:91)
   at 
 org.apache.nutch.indexer.IndexingFilters.filter(IndexingFilters.java:109)
   at 
 org.apache.nutch.indexer.IndexerMapReduce.reduce(IndexerMapReduce.java:272)
   at 
 org.apache.nutch.indexer.IndexerMapReduce.reduce(IndexerMapReduce.java:53)
   at 
 org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:519)
   at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:420)
   at 
 org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:260)
 Cheers.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (NUTCH-1547) BasicIndexingFilter - Problem to index full title

2013-03-28 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-1547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13617043#comment-13617043
 ] 

Hudson commented on NUTCH-1547:
---

Integrated in Nutch-trunk #2148 (See 
[https://builds.apache.org/job/Nutch-trunk/2148/])
NUTCH-1547 BasicIndexingFilter - Problem to index full title (Revision 
1462078)

 Result = SUCCESS
fenglu : http://svn.apache.org/viewvc/nutch/trunk/?view=revrev=1462078
Files : 
* /nutch/trunk/CHANGES.txt
* /nutch/trunk/conf/nutch-default.xml
* 
/nutch/trunk/src/plugin/index-basic/src/java/org/apache/nutch/indexer/basic/BasicIndexingFilter.java


 BasicIndexingFilter - Problem to index full title
 -

 Key: NUTCH-1547
 URL: https://issues.apache.org/jira/browse/NUTCH-1547
 Project: Nutch
  Issue Type: Bug
  Components: indexer
Affects Versions: 1.6, 2.1
Reporter: Gustavo Rauber
Assignee: lufeng
Priority: Minor
 Fix For: 1.7, 2.2

 Attachments: NUTCH-1547-2x.patch, NUTCH-1547.patch

   Original Estimate: 1h
  Remaining Estimate: 1h

 I have faced this issue when trying to index the entire title, just like the 
 content, configuring its value on nutch-default.xml to -1 
 (indexer.max.title.length). I think the behavior should be the same as the 
 content.
 If you would like to fix it, just replace the line number 90:
 if (title.length()  MAX_TITLE_LENGTH) {  // truncate title if needed
 by this one:
 if (MAX_TITLE_LENGTH  -1  title.length()  MAX_TITLE_LENGTH) {  // 
 truncate title if needed
 Stack Trace:
 java.lang.StringIndexOutOfBoundsException: String index out of range: -1
   at java.lang.String.substring(String.java:1937)
   at 
 org.apache.nutch.indexer.basic.BasicIndexingFilter.filter(BasicIndexingFilter.java:91)
   at 
 org.apache.nutch.indexer.IndexingFilters.filter(IndexingFilters.java:109)
   at 
 org.apache.nutch.indexer.IndexerMapReduce.reduce(IndexerMapReduce.java:272)
   at 
 org.apache.nutch.indexer.IndexerMapReduce.reduce(IndexerMapReduce.java:53)
   at 
 org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:519)
   at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:420)
   at 
 org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:260)
 Cheers.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (NUTCH-1547) BasicIndexingFilter - Problem to index full title

2013-03-28 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-1547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13617048#comment-13617048
 ] 

Hudson commented on NUTCH-1547:
---

Integrated in Nutch-nutchgora #548 (See 
[https://builds.apache.org/job/Nutch-nutchgora/548/])
NUTCH-1547 BasicIndexingFilter - Problem to index full title (Revision 
1462079)

 Result = SUCCESS
fenglu : http://svn.apache.org/viewvc/nutch/branches/2.x/?view=revrev=1462079
Files : 
* /nutch/branches/2.x/CHANGES.txt
* /nutch/branches/2.x/conf/nutch-default.xml
* 
/nutch/branches/2.x/src/plugin/index-basic/src/java/org/apache/nutch/indexer/basic/BasicIndexingFilter.java


 BasicIndexingFilter - Problem to index full title
 -

 Key: NUTCH-1547
 URL: https://issues.apache.org/jira/browse/NUTCH-1547
 Project: Nutch
  Issue Type: Bug
  Components: indexer
Affects Versions: 1.6, 2.1
Reporter: Gustavo Rauber
Assignee: lufeng
Priority: Minor
 Fix For: 1.7, 2.2

 Attachments: NUTCH-1547-2x.patch, NUTCH-1547.patch

   Original Estimate: 1h
  Remaining Estimate: 1h

 I have faced this issue when trying to index the entire title, just like the 
 content, configuring its value on nutch-default.xml to -1 
 (indexer.max.title.length). I think the behavior should be the same as the 
 content.
 If you would like to fix it, just replace the line number 90:
 if (title.length()  MAX_TITLE_LENGTH) {  // truncate title if needed
 by this one:
 if (MAX_TITLE_LENGTH  -1  title.length()  MAX_TITLE_LENGTH) {  // 
 truncate title if needed
 Stack Trace:
 java.lang.StringIndexOutOfBoundsException: String index out of range: -1
   at java.lang.String.substring(String.java:1937)
   at 
 org.apache.nutch.indexer.basic.BasicIndexingFilter.filter(BasicIndexingFilter.java:91)
   at 
 org.apache.nutch.indexer.IndexingFilters.filter(IndexingFilters.java:109)
   at 
 org.apache.nutch.indexer.IndexerMapReduce.reduce(IndexerMapReduce.java:272)
   at 
 org.apache.nutch.indexer.IndexerMapReduce.reduce(IndexerMapReduce.java:53)
   at 
 org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:519)
   at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:420)
   at 
 org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:260)
 Cheers.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (NUTCH-1547) BasicIndexingFilter - Problem to index full title

2013-03-27 Thread Lewis John McGibbney (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-1547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13615434#comment-13615434
 ] 

Lewis John McGibbney commented on NUTCH-1547:
-

+1

 BasicIndexingFilter - Problem to index full title
 -

 Key: NUTCH-1547
 URL: https://issues.apache.org/jira/browse/NUTCH-1547
 Project: Nutch
  Issue Type: Bug
  Components: indexer
Affects Versions: 1.6, 2.1
Reporter: Gustavo Rauber
Assignee: lufeng
Priority: Minor
 Fix For: 1.7, 2.2

 Attachments: NUTCH-1547-2x.patch, NUTCH-1547.patch

   Original Estimate: 1h
  Remaining Estimate: 1h

 I have faced this issue when trying to index the entire title, just like the 
 content, configuring its value on nutch-default.xml to -1 
 (indexer.max.title.length). I think the behavior should be the same as the 
 content.
 If you would like to fix it, just replace the line number 90:
 if (title.length()  MAX_TITLE_LENGTH) {  // truncate title if needed
 by this one:
 if (MAX_TITLE_LENGTH  -1  title.length()  MAX_TITLE_LENGTH) {  // 
 truncate title if needed
 Stack Trace:
 java.lang.StringIndexOutOfBoundsException: String index out of range: -1
   at java.lang.String.substring(String.java:1937)
   at 
 org.apache.nutch.indexer.basic.BasicIndexingFilter.filter(BasicIndexingFilter.java:91)
   at 
 org.apache.nutch.indexer.IndexingFilters.filter(IndexingFilters.java:109)
   at 
 org.apache.nutch.indexer.IndexerMapReduce.reduce(IndexerMapReduce.java:272)
   at 
 org.apache.nutch.indexer.IndexerMapReduce.reduce(IndexerMapReduce.java:53)
   at 
 org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:519)
   at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:420)
   at 
 org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:260)
 Cheers.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (NUTCH-1547) BasicIndexingFilter - Problem to index full title

2013-03-26 Thread Gustavo Rauber (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-1547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13613770#comment-13613770
 ] 

Gustavo Rauber commented on NUTCH-1547:
---

Of course, one can also document the setting on nutch-default.xml as well, if 
it changes.

 BasicIndexingFilter - Problem to index full title
 -

 Key: NUTCH-1547
 URL: https://issues.apache.org/jira/browse/NUTCH-1547
 Project: Nutch
  Issue Type: Bug
  Components: indexer
Affects Versions: 1.6
Reporter: Gustavo Rauber
Priority: Minor
   Original Estimate: 1h
  Remaining Estimate: 1h

 I have faced this issue when trying to index the entire title, just like the 
 content, configuring its value on nutch-default.xml to -1 
 (indexer.max.title.length). I think the behavior should be the same as the 
 content.
 If you would like to fix it, just replace the line number 90:
 if (title.length()  MAX_TITLE_LENGTH) {  // truncate title if needed
 by this one:
 if (MAX_TITLE_LENGTH  -1  title.length()  MAX_TITLE_LENGTH) {  // 
 truncate title if needed
 Stack Trace:
 java.lang.StringIndexOutOfBoundsException: String index out of range: -1
   at java.lang.String.substring(String.java:1937)
   at 
 org.apache.nutch.indexer.basic.BasicIndexingFilter.filter(BasicIndexingFilter.java:91)
   at 
 org.apache.nutch.indexer.IndexingFilters.filter(IndexingFilters.java:109)
   at 
 org.apache.nutch.indexer.IndexerMapReduce.reduce(IndexerMapReduce.java:272)
   at 
 org.apache.nutch.indexer.IndexerMapReduce.reduce(IndexerMapReduce.java:53)
   at 
 org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:519)
   at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:420)
   at 
 org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:260)
 Cheers.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (NUTCH-1547) BasicIndexingFilter - Problem to index full title

2013-03-26 Thread lufeng (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-1547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13614098#comment-13614098
 ] 

lufeng commented on NUTCH-1547:
---

Hi Gustavo 

I will add this patch tomorrow. Thanks Gustavo.

 BasicIndexingFilter - Problem to index full title
 -

 Key: NUTCH-1547
 URL: https://issues.apache.org/jira/browse/NUTCH-1547
 Project: Nutch
  Issue Type: Bug
  Components: indexer
Affects Versions: 1.6
Reporter: Gustavo Rauber
Priority: Minor
   Original Estimate: 1h
  Remaining Estimate: 1h

 I have faced this issue when trying to index the entire title, just like the 
 content, configuring its value on nutch-default.xml to -1 
 (indexer.max.title.length). I think the behavior should be the same as the 
 content.
 If you would like to fix it, just replace the line number 90:
 if (title.length()  MAX_TITLE_LENGTH) {  // truncate title if needed
 by this one:
 if (MAX_TITLE_LENGTH  -1  title.length()  MAX_TITLE_LENGTH) {  // 
 truncate title if needed
 Stack Trace:
 java.lang.StringIndexOutOfBoundsException: String index out of range: -1
   at java.lang.String.substring(String.java:1937)
   at 
 org.apache.nutch.indexer.basic.BasicIndexingFilter.filter(BasicIndexingFilter.java:91)
   at 
 org.apache.nutch.indexer.IndexingFilters.filter(IndexingFilters.java:109)
   at 
 org.apache.nutch.indexer.IndexerMapReduce.reduce(IndexerMapReduce.java:272)
   at 
 org.apache.nutch.indexer.IndexerMapReduce.reduce(IndexerMapReduce.java:53)
   at 
 org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:519)
   at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:420)
   at 
 org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:260)
 Cheers.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (NUTCH-1547) BasicIndexingFilter - Problem to index full title

2013-03-26 Thread Lewis John McGibbney (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-1547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13614418#comment-13614418
 ] 

Lewis John McGibbney commented on NUTCH-1547:
-

Is this for trunk or 2.x?

 BasicIndexingFilter - Problem to index full title
 -

 Key: NUTCH-1547
 URL: https://issues.apache.org/jira/browse/NUTCH-1547
 Project: Nutch
  Issue Type: Bug
  Components: indexer
Affects Versions: 1.6
Reporter: Gustavo Rauber
Assignee: lufeng
Priority: Minor
 Attachments: NUTCH-1547.patch

   Original Estimate: 1h
  Remaining Estimate: 1h

 I have faced this issue when trying to index the entire title, just like the 
 content, configuring its value on nutch-default.xml to -1 
 (indexer.max.title.length). I think the behavior should be the same as the 
 content.
 If you would like to fix it, just replace the line number 90:
 if (title.length()  MAX_TITLE_LENGTH) {  // truncate title if needed
 by this one:
 if (MAX_TITLE_LENGTH  -1  title.length()  MAX_TITLE_LENGTH) {  // 
 truncate title if needed
 Stack Trace:
 java.lang.StringIndexOutOfBoundsException: String index out of range: -1
   at java.lang.String.substring(String.java:1937)
   at 
 org.apache.nutch.indexer.basic.BasicIndexingFilter.filter(BasicIndexingFilter.java:91)
   at 
 org.apache.nutch.indexer.IndexingFilters.filter(IndexingFilters.java:109)
   at 
 org.apache.nutch.indexer.IndexerMapReduce.reduce(IndexerMapReduce.java:272)
   at 
 org.apache.nutch.indexer.IndexerMapReduce.reduce(IndexerMapReduce.java:53)
   at 
 org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:519)
   at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:420)
   at 
 org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:260)
 Cheers.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (NUTCH-1547) BasicIndexingFilter - Problem to index full title

2013-03-26 Thread Sebastian Nagel (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-1547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13614639#comment-13614639
 ] 

Sebastian Nagel commented on NUTCH-1547:


+1
(should be fixed for trunk and 2.x)

 BasicIndexingFilter - Problem to index full title
 -

 Key: NUTCH-1547
 URL: https://issues.apache.org/jira/browse/NUTCH-1547
 Project: Nutch
  Issue Type: Bug
  Components: indexer
Affects Versions: 1.6, 2.1
Reporter: Gustavo Rauber
Assignee: lufeng
Priority: Minor
 Fix For: 1.7, 2.2

 Attachments: NUTCH-1547.patch

   Original Estimate: 1h
  Remaining Estimate: 1h

 I have faced this issue when trying to index the entire title, just like the 
 content, configuring its value on nutch-default.xml to -1 
 (indexer.max.title.length). I think the behavior should be the same as the 
 content.
 If you would like to fix it, just replace the line number 90:
 if (title.length()  MAX_TITLE_LENGTH) {  // truncate title if needed
 by this one:
 if (MAX_TITLE_LENGTH  -1  title.length()  MAX_TITLE_LENGTH) {  // 
 truncate title if needed
 Stack Trace:
 java.lang.StringIndexOutOfBoundsException: String index out of range: -1
   at java.lang.String.substring(String.java:1937)
   at 
 org.apache.nutch.indexer.basic.BasicIndexingFilter.filter(BasicIndexingFilter.java:91)
   at 
 org.apache.nutch.indexer.IndexingFilters.filter(IndexingFilters.java:109)
   at 
 org.apache.nutch.indexer.IndexerMapReduce.reduce(IndexerMapReduce.java:272)
   at 
 org.apache.nutch.indexer.IndexerMapReduce.reduce(IndexerMapReduce.java:53)
   at 
 org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:519)
   at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:420)
   at 
 org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:260)
 Cheers.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira