Re: Index update and Google Dance

Doug Cutting Wed, 09 Nov 2005 10:49:51 -0800

Jack Tang wrote:

Below is google architecture in my brain:


                DataNode A
Master      DataNode B               GoogleCrawler
                DataNode C
                ......
GoogleCrawler is kept running all the time. One day, it gets fethlist
from DataNode A, crawls all pages and index them, then it tells Master
"I wanna to update DataNode A's index", finally it acquires "read
lock" and "write lock", and the index is updated. And some operation
is applied to DataNode B and C.

Do you have evidence that this is how Google updates their index? I'venever seen much published about that.

In the future I would like to implement a more automated distributedsearch system than Nutch currently has. One way to do this might be touse MapReduce. Each map task's input could be an index and some segmentdata. The map method would serve queries, i.e., run a NutchDistributedSearch.Server. It would first copy the index out of NDFS tothe local disk, for better performance. It would never exit normally,but rather "map" forever. When a new version of the index (new set ofsegments, new boosts, and/or new deletions, etc.) is ready to deploy,then a new job could be submitted. If the number of map tasks (i.e.,indexes) is kept equal or less than the number of nodes, and each nodeis permitted to run two or more tasks, then two versions of the indexcan be served at once. Once the new version has been deployed(listening for searches on different ports), and search front-ends areusing it, then the old version can be stopped by killing its MapReducejob. If a node dies, the MapReduce job tracker would automaticallyre-start its task on another node.

If there were an affinity method between tasks and task trackers, thenattempts could be made to re-deploy new versions of indexes whose, e.g.,only boosts or deletions have changed, to the same nodes as before.Then the copy of the index to the local disk could be incremental, onlycopying the parts of the index/segment that have changed.


Doug

Re: Index update and Google Dance

Reply via email to