The idea of NDFS is a means of cheap/decrantalized storage - has there been any thoughts into cheap/"brokered" processing of the WebDB?
My concern is at 100 million pages it takes days to process on a dual/quad xeon machine - to compete with the likes of google in keeping up with generate segments, update db requests and alike 24/7 i'm not sure a single "processing" node could work. Is it feasable to think of a "broker" server that takes a webdb request and then sends it to the correct "bucket/client" server for processing and then it takes the results from the bucket servers and makes a decision based upon that? (much like the distributed index processes) The idea is that update db would be streamed to a broker server that could compute link statistics almost in real time by sending a simple query to the db servers asking for the mechanics of the document that is incoming and then distributing the decision to the appropriate bucket. The premise being that the bucket servers are maintaining themselves on a smaller scale than the "whole" and would communicate diffs/changes/updates and insterts/deletes to each other? One way to manage the process and scale it according to cost effective use of cpu & funds would be to sort the "buckets" based upon a rank method that as the db is grown and analyzed it would naturally distribute itself according to the ranks of the documents. Thus you could possibly build segments to be fetched based upon each "bucket" and cut down on analyze time as well and send the fetched segments webdb updates to the broker servers that would repeat the computaitonal/ranking process in top down fashion through a somewhat "natural selection" process :) I may be off the wall, but i'm just tyring to think of ideas here. Sort of like partitioning an oracle database based upon specific ranges and only querying the necessary ranges for your "heavy" tasks and then having a process by which you manage/massage & update the partitioned data accordingly. ------------------------------------------------------- This SF.Net email sponsored by Black Hat Briefings & Training. Attend Black Hat Briefings & Training, Las Vegas July 24-29 - digital self defense, top technical experts, no vendor pitches, unmatched networking opportunities. Visit www.blackhat.com _______________________________________________ Nutch-developers mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/nutch-developers
