The Nutch Feed/RSS plugin (parse-rss) only allows you to search the entire channel/feed text, not items individually. You'll have to develop your own if it's what you are trying to do. I also found that the feedparse library used by parse-rss doesn't read properly all formats and I myself moved to the ROME library for now.
By real time I guess you meant aggregate, index and add it to the main index, everything as fast as possible. Nutch is batch oriented, so it doesn't allow this without heavy modification IMHO. Maybe the way to go is to use some buffer/intermediate indices before merging with the main index once in a while. Once an index is created, it is added to a list of dynamic searchers. But once the index is merged with the main one, this searcher is turned off. I am not sure it scales well because you always get new content and merging is long. Is there any architecture ideas that someone can share? For example, I wonder how Yahoo indexes the emails in realtime. As soon as I sent an email, it can be searched by keywords. Cheers, Jeremy. -----Original Message----- From: Dima Gritsenko [mailto:[EMAIL PROTECTED] Sent: Monday, August 28, 2006 10:44 AM To: [email protected] Subject: RSS search by nutch Hi, Does nutch have a class for searching incoming RSS feeds in real time? Thank you. Dima. ------------------------------------------------------------------------- Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 _______________________________________________ Nutch-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-general
