What is the best way to create a master index on a nutch 8 / hadoop system?
Is it to merge all of the segments together, and then create an index? Or like Roberto Navoni in his Tutorial First index all the segments separately and then merge the indexes into one master index? -.-.-.-.-.-.- # Create a new indexe0 bin/nutch index /user/root/crawld/indexe0 /user/root/crawld/ /user/root/crawld/linkdb /user/root/crawld/segments/20060722153133 # Create a new index1 bin/nutch index /user/root/crawld/indexe1 /user/root/crawld/ /user/root/crawld/linkdb /user/root/crawld/segments/20060722182213 #Dedup the new indexe0 bin/nutch dedup /user/root/crawld/indexe0 #Dedup the new index1 bin/nutch dedup /user/root/crawld/indexe1 #Delete the old index #Merge the new index merge directory bin/nutch merge /user/root/crawld/index /user/root/crawld/indexe0 /user/root/crawld/indexe1 ... #(and the other index create for the fetch segments) -.-.-.-.-.-.- ------------------------------------------------------------------------- Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys -- and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV _______________________________________________ Nutch-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-general
