What is the best way to create a master index on a nutch 8 / hadoop system?

Is it to merge all of the segments together, and then create an index?

Or like Roberto Navoni in his Tutorial
First index all the segments separately and then merge the indexes into 
one master index?

-.-.-.-.-.-.-
# Create a new indexe0
bin/nutch
index /user/root/crawld/indexe0 /user/root/crawld/ /user/root/crawld/linkdb
/user/root/crawld/segments/20060722153133
# Create a new index1
bin/nutch
index /user/root/crawld/indexe1 /user/root/crawld/ /user/root/crawld/linkdb
/user/root/crawld/segments/20060722182213
#Dedup the new indexe0
bin/nutch dedup /user/root/crawld/indexe0
#Dedup the new index1
bin/nutch dedup /user/root/crawld/indexe1
#Delete the old index
#Merge the new index merge directory
bin/nutch
merge /user/root/crawld/index /user/root/crawld/indexe0 
/user/root/crawld/indexe1 ...
#(and the other index create for the fetch segments)
-.-.-.-.-.-.-

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys -- and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to