[ http://issues.apache.org/jira/browse/NUTCH-89?page=all ] Piotr Kosiorowski closed NUTCH-89: ----------------------------------
Fix Version: 0.8-dev 0.7 Resolution: Fixed Applied in trunk and 0.7 branch. Thanks. > parse-rss null pointer exception > -------------------------------- > > Key: NUTCH-89 > URL: http://issues.apache.org/jira/browse/NUTCH-89 > Project: Nutch > Type: Bug > Components: fetcher > Versions: 0.7, 0.8-dev > Reporter: Michael Nebel > Fix For: 0.7, 0.8-dev > Attachments: parse-rss.20050910.patch > > The rss-parser causes an exception. The reason is a syntax error in the page. > Hitting this pages, the parser trys to add an outlink with "null" as anchor. > The anchor of a outlink must no be null. > java.lang.NullPointerException > at org.apache.nutch.io.UTF8.writeString(UTF8.java:236) > at org.apache.nutch.parse.Outlink.write(Outlink.java:51) > at org.apache.nutch.parse.ParseData.write(ParseData.java:111) > at > org.apache.nutch.io.SequenceFile$Writer.append(SequenceFile.java:137) > at org.apache.nutch.io.MapFile$Writer.append(MapFile.java:127) > at org.apache.nutch.io.ArrayFile$Writer.append(ArrayFile.java:39) > at > org.apache.nutch.fetcher.Fetcher$FetcherThread.outputPage(Fetcher.java:281) > at > org.apache.nutch.fetcher.Fetcher$FetcherThread.handleFetch(Fetcher.java:261) > at > org.apache.nutch.fetcher.Fetcher$FetcherThread.run(Fetcher.java:148) > Exception in thread "main" java.lang.RuntimeException: SEVERE error logged. > Exiting fetcher. > at org.apache.nutch.fetcher.Fetcher.run(Fetcher.java:354) > at org.apache.nutch.fetcher.Fetcher.main(Fetcher.java:488) > at org.apache.nutch.tools.CrawlTool.main(CrawlTool.java:140) > I suggest the following patch: > Index: src/plugin/parse-rss/src/java/org/apache/nutch/parse/rss/RSSParser.java > =================================================================== > --- src/plugin/parse-rss/src/java/org/apache/nutch/parse/rss/RSSParser.java > (revision 279397) > +++ src/plugin/parse-rss/src/java/org/apache/nutch/parse/rss/RSSParser.java > (working copy) > @@ -157,11 +157,13 @@ > if (r.getLink() != null) { > try { > // get the outlink > - theOutlinks.add(new Outlink(r.getLink(), r > - .getDescription())); > + if (r.getDescription()!= null ) { > + theOutlinks.add(new Outlink(r.getLink(), > r.getDescription())); > + } else { > + theOutlinks.add(new Outlink(r.getLink(), "")); > + } > } catch (MalformedURLException e) { > - LOG > - .info("nutch:parse-rss:RSSParser Exception: > MalformedURL: " > + LOG.info("nutch:parse-rss:RSSParser Exception: > MalformedURL: " > + r.getLink() > + ": Attempting to continue > processing outlinks"); > e.printStackTrace(); > @@ -185,12 +187,13 @@ > > if (whichLink != null) { > try { > - theOutlinks.add(new Outlink(whichLink, theRSSItem > - .getDescription())); > - > + if (theRSSItem.getDescription()!=null) { > + theOutlinks.add(new Outlink(whichLink, > theRSSItem.getDescription())); > + } else { > + theOutlinks.add(new Outlink(whichLink, "")); > + } > } catch (MalformedURLException e) { > - LOG > - .info("nutch:parse-rss:RSSParser > Exception: MalformedURL: " > + LOG.info("nutch:parse-rss:RSSParser Exception: > MalformedURL: " > + whichLink > + ": Attempting to continue > processing outlinks"); > e.printStackTrace(); -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira