Anyone have any good configuration ideas for indexing/merging with 0.9 using hadoop on a local fs? Our segment merging is taking an extremely long time compared with nutch 0.7. Currently, I am trying to merge 300 segments, which amounts to about 1gig of data. It has taken hours to merge, and it's still not done. This box has dual zeon 2.8ghz processors with 4 gigs of ram.
So, I figure there must be a better setup in the mapred-default.xml for a single machine. Do I increase the file size for I/O buffers, sort buffers, etc.? Do I reduce the number of tasks or increase them? I'm at a loss. Any advice would be greatly appreciated. -- "Conscious decisions by conscious minds are what make reality real" ------------------------------------------------------------------------- This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ _______________________________________________ Nutch-general mailing list Nutch-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nutch-general