one "bad" link on a page kills parsing
--------------------------------------

         Key: NUTCH-120
         URL: http://issues.apache.org/jira/browse/NUTCH-120
     Project: Nutch
        Type: Bug
  Components: fetcher  
    Versions: 0.7    
 Environment: ubuntu 5.10
    Reporter: Earl Cahill


Since the try in src/java/org/apache/nutch/parse/OutlinkExtractor.java, 
getOutlinks method loops around the whole

while (matcher.contains(input, pattern)) {
...
}

loop, if one url causes an exception, no more links will be extracted.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira

Reply via email to