Thanks for this clarification. The opportunity I like best about the Nutch technology is the ability to create and run topic-specific search engines (as for medical/social science/technology) searches which have comprehensive specially-constructed indexes to support such searches. These indexes need not be that large - certainly a few million pages would do the trick in most cases.
I was under the false impression that Nutch wanted to be a single catch-all interface designed to match or surpass something like Google. Naturally, I felt that goal was not realistic. Also, I feel it's worth mentioning that Matt Wells is doing something similar at his site, which is http://www.gigablast.com/. As far as I know, however, his is not an open-source project. He's in business and who can blame him. He has done some good work. best, Dave -----Original Message----- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Behalf Of Doug Cutting Sent: Thursday, February 05, 2004 10:57 AM To: [EMAIL PROTECTED] Subject: Re: [Nutch-general] Re: Assumptions About Internet Search Dave Cohen wrote: > Maybe it could, that's a design issue. I'm not sure Nutch will ever > achieve "lift off" if it's just like Google and requires the same > resources and money to exist. Nutch's core differentiator from Google, Inktomi, etc. is that it's open source: anyone can run the software. In the short term, we don't aim to be better, but rather just to bring more-or-less equivalent techology to the open source world. This alone creates lots of opportunities. For example: it enables folks to run their own search engines with unlimited API-based queries per day; folks can analyze query data; folks can experiment with ranking algorithms; etc. So equivalent technology doesn't mean that nothing new can happen. In many ways Linux is equivalent to XP, yet the fact that it is open source enables many things that are impossible with XP. As for the costs and resources: on a per query basis Google's operation is actually quite inexpensive. Things only get really expensive when you have to quantity of traffic that Google has. A Nutch implementation with a few hundred million URLs that can handle a few queries per second can be maintained on a handful of inexpensive machines. If even that is beyond your means, we're deploying a ~200M page installation (hosted by the Internet Archive) that we can give experimental access to. So if you develop Nutch-based code that you'd like to run on a larger collection than you can afford to maintain, please contact us, and we can set you up with an account. Cheers, Doug ------------------------------------------------------- The SF.Net email is sponsored by EclipseCon 2004 Premiere Conference on Open Tools Development and Integration See the breadth of Eclipse activity. February 3-5 in Anaheim, CA. http://www.eclipsecon.org/osdn _______________________________________________ Nutch-general mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/nutch-general ------------------------------------------------------- The SF.Net email is sponsored by EclipseCon 2004 Premiere Conference on Open Tools Development and Integration See the breadth of Eclipse activity. February 3-5 in Anaheim, CA. http://www.eclipsecon.org/osdn _______________________________________________ Nutch-general mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/nutch-general
