Doğacan Güney wrote:
On Mon, Jan 5, 2009 at 7:00 AM, Vlad Cananau <vlad...@gmail.com> wrote:
Hello
I'm trying to make RSSParser do something simmilar to FeedParser (which
doesn't work quite right) - that is, instead of indexing the whole contents

Why doesn't FeedParser work? Let's fix whatever is broken in it :D

of the feed, I want it to show individual items, with their respective title
and and proper link to the article I realize that I could index 1 depth
more, but I'd like to index just the feed, not the articles that go with it
(keep the index small and the crawl fast).

For each item in each RSS channel (the code does not differ much for
getParse() of RSSParser.java) I do something like

 Outlink[] outlinks = new Outlink[1];
 try{
  outlinks[0] = new Outlink(whichLink, theRSSItem.getTitle());
 } catch (Exception e) {
  continue;
 }

 parseResult.put(
  whichLink,
  new ParseText(theRSSItem.getTitle() + theRSSItem.getDescription()),
  new ParseData(
    ParseStatus.STATUS_SUCCESS,
    theRSSItem.getTitle(),
    outlinks,
    new Metadata() //was content.getMetadata()
  )
 );

The problem is, however, that only one item from the whole RSS gets into the
index, although in the log I can see them all ( I've tried it with feeds
from cnn and reuters). What happens? Why do they get overwritten in a
seemingly random order? The item that makes it into the index is neither the
first nor the last, but appears to be the same until new items appear in the
feed.

Thank you,
Vlad





In order to show you what I mean by "only one item gets into the index", check out these results <http://tinyurl.com/7hkkoo>*http://tinyurl.com/7hkkoo [link http://vladk2k.homeip.net:8080 - my own server]*

Reply via email to