Hi Jeremy,

On 8/28/06 10:18 AM, "HUYLEBROECK Jeremy RD-ILAB-SSF"
<[EMAIL PROTECTED]> wrote:

> 
> The Nutch Feed/RSS plugin (parse-rss) only allows you to search the
> entire channel/feed text, not items individually.

Actually, this isn't entirely the case. parse-rss actually indexes the item
text (see line 148 in RSSParser.java) as well. Additionally, parse-rss adds
the individual item links to the Outlinks (see lines 161 and 163 in
RSSParser.java) , and they get crawled as well, in addition to the channel
text (see line 123 in RSSParser.java) and channel outlink (see lines 130 and
132 in RSSParser.java).

> You'll have to develop your own if it's what you are trying to do.
> I also found that the feedparse library used by parse-rss doesn't read
> properly all formats and I myself moved to the ROME library for now.

I haven't really noticed any formats not really handled by
commons-feedparser. What formats have you noticed that it doesn't handle?



Cheers,
  Chris


> 
> 
> -----Original Message-----
> From: Dima Gritsenko [mailto:[EMAIL PROTECTED]
> Sent: Monday, August 28, 2006 10:44 AM
> To: [email protected]
> Subject: RSS search by nutch
> 
> Hi, 
> 
> Does nutch have a class for searching incoming RSS feeds in real time?
> Thank you. 
> Dima. 



-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to