The Nutch Feed/RSS plugin (parse-rss) only allows you to search the
entire channel/feed text, not items individually.
You'll have to develop your own if it's what you are trying to do.
I also found that the feedparse library used by parse-rss doesn't read
properly all formats and I myself moved to the ROME library for now.

By real time I guess you meant aggregate, index and add it to the main
index, everything as fast as possible.
Nutch is batch oriented, so it doesn't allow this without heavy
modification IMHO.

Maybe the way to go is to use some buffer/intermediate indices before
merging with the main index once in a while.
Once an index is created, it is added to a list of dynamic searchers.
But once the index is merged with the main one, this searcher is turned
off. I am not sure it scales well because you always get new content and
merging is long. Is there any architecture ideas that someone can share?

For example, I wonder how Yahoo indexes the emails in realtime. 
As soon as I sent an email, it can be searched by keywords.


Cheers,
Jeremy.


-----Original Message-----
From: Dima Gritsenko [mailto:[EMAIL PROTECTED] 
Sent: Monday, August 28, 2006 10:44 AM
To: [email protected]
Subject: RSS search by nutch

Hi, 

Does nutch have a class for searching incoming RSS feeds in real time?
Thank you. 
Dima. 

-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to