I'm using nutch 0.8.1 (so with the included hadoop-0.4.0-patched.jar) With 2 servers, and with the namenode and the jobtracker on the first one. My hadoop-site.xml is like: mapred.map.tasks=2 mapred.reduce.tasks=2
so i should have running 1 node of each type on each server? server1 - data-node - task-tracker server2 - data-node - task-tracker But some times a task is not allocated on the two servers but on only one like: server1 - data-node server2 - data-node - task-tracker - task-tracker and when it appends for the "fetch" task, it's a real lack of performance, to have just one server working! . Can't we specify manually the allocation of the tasktracker-nodes for each server? . Why hadoop is doing this strange behavior? . I assume that he decides to allocate the task-node dynamically regarding the load of the servers, so should I put in my params: mapred.map.tasks=2*numbers-of-servers ? -- View this message in context: http://www.nabble.com/hadoop-and-nutch-%3A-task-load-allocation-problem-tf3751447.html#a10601144 Sent from the Nutch - User mailing list archive at Nabble.com. ------------------------------------------------------------------------- This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ _______________________________________________ Nutch-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-general
