> Are you running this in a distributed setup, or in "local" mode? Local > mode is not designed to cope with such large datasets, so it's likely > that you will be getting OOM errors during sorting ... I can only > recommend that you use a distributed setup with several machines, and > adjust RAM consumption with the number of reduce tasks.
Currently we are running in local mode. We do not have the setup for distributing. That is why I want to merge these segments. Would that not help? Insteand of having potentially tens of thousands of segments, I want to create several large segments and index those. Sorry for my ignorance, but not really sure how to scale nutch correctly. Do you know of a document, or some pointers as to how segment/index data should be stored? <briggs /> "Concious decisions by concious minds are what make reality real" ------------------------------------------------------------------------- Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys - and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV _______________________________________________ Nutch-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-general
