Re: Poor scalability with map reduce application
Hi guys, I suspected that the problem was due to overhead introduced by the filesystem, so I tried to set the dfs.replication.max property to different values. First, I tried with 2, and I got a message saying that I was requesting a value of 3, which was bigger than the limit. So I couldn't do the run(it seems this 3 is hardcoded somewhere, I read that in Jira). Then I tried with 3, I could generate the input files for the map reduce app, but when trying to run I got this one, Exception in thread main java.io.IOException: file /tmp/hadoop-aandre/mapred/staging/aandre/.staging/job_201106230004_0003/job.jar. Requested replication 10 exceeds maximum 3 at org.apache.hadoop.hdfs.server.namenode.BlockManager.verifyReplication(BlockManager.java:468) which seems like the framework were trying to replicate the output in as many nodes as possible. Could this be the degradation source?. Also I attached the log for the run with 7 nodes,. Alberto. On 21 June 2011 14:40, Harsh J ha...@cloudera.com wrote: Matt, You're right that it (slowstart) does not / would not affect much. I was merely explaining the reason behind his observance of reducers getting scheduled early, not really recommending a tweak for performance changes there. On Tue, Jun 21, 2011 at 10:46 PM, GOEKE, MATTHEW (AG/1000) matthew.go...@monsanto.com wrote: Harsh, Is it possible for mapred.reduce.slowstart.completed.maps to even play a significant role in this? The only benefit he would find in tweaking that for his problem would be to spread network traffic from the shuffle over a longer period of time at a cost of having the reducer using resources earlier. Either way he would see this effect across both sets of runs if he is using the default parameters. I guess it would all depend on what kind of network layout the cluster is on. Matt -Original Message- From: Harsh J [mailto:ha...@cloudera.com] Sent: Tuesday, June 21, 2011 12:09 PM To: common-user@hadoop.apache.org Subject: Re: Poor scalability with map reduce application Alberto, On Tue, Jun 21, 2011 at 10:27 PM, Alberto Andreotti albertoandreo...@gmail.com wrote: I don't know if speculatives maps are on, I'll check it. One thing I observed is that reduces begin before all maps have finished. Let me check also if the difference is on the map side or in the reduce. I believe it's balanced, both are slower when adding more nodes, but i'll confirm that. Maps and reduces are speculative by default, so must've been ON. Could you also post a general input vs. output record counts and statistics like that between your job runs, to correlate? The reducers get scheduled early but do not exactly reduce() until all maps are done. They just keep fetching outputs. Their scheduling can be controlled with some configurations (say, to start only after X% of maps are done -- by default it starts up when 5% of maps are done). -- Harsh J This e-mail message may contain privileged and/or confidential information, and is intended to be received only by persons entitled to receive such information. If you have received this e-mail in error, please notify the sender immediately. Please delete it and all attachments from any servers, hard drives or any other media. Other use of this e-mail by you is strictly prohibited. All e-mails and attachments sent and received are subject to monitoring, reading and archival by Monsanto, including its subsidiaries. The recipient of this e-mail is solely responsible for checking for the presence of Viruses or other Malware. Monsanto, along with its subsidiaries, accepts no liability for any damage caused by any such code transmitted by or accompanying this e-mail or any attachment. The information contained in this email may be subject to the export control laws and regulations of the United States, potentially including but not limited to the Export Administration Regulations (EAR) and sanctions regulations issued by the U.S. Department of Treasury, Office of Foreign Asset Controls (OFAC). As a recipient of this information you are obligated to comply with all applicable U.S. export laws and regulations. -- Harsh J -- José Pablo Alberto Andreotti. Tel: 54 351 4730292 Móvil: 54351156526363. MSN: albertoandreo...@gmail.com Skype: andreottialberto 11/06/23 01:09:38 INFO security.Groups: Group mapping impl=org.apache.hadoop.security.ShellBasedUnixGroupsMapping; cacheTimeout=30 11/06/23 01:09:38 WARN conf.Configuration: mapred.task.id is deprecated. Instead, use mapreduce.task.attempt.id 11/06/23 01:09:38 WARN mapreduce.JobSubmitter: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same. 11/06/23 01:09:38 INFO input.FileInputFormat: Total input paths to process : 1 11/06/23 01:09:40 WARN conf.Configuration: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps 11/06/23 01
Re: Poor scalability with map reduce application
Alberto, I can assure you that fiddling with default replication factors can't be the solution here. Most of us running a 3+ cluster still use the 3-replica-factor and it hardly introduces a performance lag. As long as your Hadoop cluster network is not shared with other network applications, you shouldn't be seeing any network slowdowns. Anyhow, the dfs.replication.max is not what you were looking to change. It was dfs.replication instead (to affect all new file replication values). AFAIK, there is no replication factor hardcoded anywhere in code, its all configurable, so its just a matter of setting the right configuration :) Regarding the 10 thing: The MR components try to load their jars and other submitted code/files with a 10 replication factor by default, so that it propagates to all racks/etc and leads to a fast startup of tasks. I do not think that's a problem either in your case (if it gets 4, it will use 4, if it gets 7, it will use 7 -- but won't take too long). On Thu, Jun 23, 2011 at 6:14 AM, Alberto Andreotti albertoandreo...@gmail.com wrote: Hi guys, I suspected that the problem was due to overhead introduced by the filesystem, so I tried to set the dfs.replication.max property to different values. First, I tried with 2, and I got a message saying that I was requesting a value of 3, which was bigger than the limit. So I couldn't do the run(it seems this 3 is hardcoded somewhere, I read that in Jira). Then I tried with 3, I could generate the input files for the map reduce app, but when trying to run I got this one, Exception in thread main java.io.IOException: file /tmp/hadoop-aandre/mapred/staging/aandre/.staging/job_201106230004_0003/job.jar. Requested replication 10 exceeds maximum 3 at org.apache.hadoop.hdfs.server.namenode.BlockManager.verifyReplication(BlockManager.java:468) which seems like the framework were trying to replicate the output in as many nodes as possible. Could this be the degradation source?. Also I attached the log for the run with 7 nodes,. Alberto. On 21 June 2011 14:40, Harsh J ha...@cloudera.com wrote: Matt, You're right that it (slowstart) does not / would not affect much. I was merely explaining the reason behind his observance of reducers getting scheduled early, not really recommending a tweak for performance changes there. On Tue, Jun 21, 2011 at 10:46 PM, GOEKE, MATTHEW (AG/1000) matthew.go...@monsanto.com wrote: Harsh, Is it possible for mapred.reduce.slowstart.completed.maps to even play a significant role in this? The only benefit he would find in tweaking that for his problem would be to spread network traffic from the shuffle over a longer period of time at a cost of having the reducer using resources earlier. Either way he would see this effect across both sets of runs if he is using the default parameters. I guess it would all depend on what kind of network layout the cluster is on. Matt -Original Message- From: Harsh J [mailto:ha...@cloudera.com] Sent: Tuesday, June 21, 2011 12:09 PM To: common-user@hadoop.apache.org Subject: Re: Poor scalability with map reduce application Alberto, On Tue, Jun 21, 2011 at 10:27 PM, Alberto Andreotti albertoandreo...@gmail.com wrote: I don't know if speculatives maps are on, I'll check it. One thing I observed is that reduces begin before all maps have finished. Let me check also if the difference is on the map side or in the reduce. I believe it's balanced, both are slower when adding more nodes, but i'll confirm that. Maps and reduces are speculative by default, so must've been ON. Could you also post a general input vs. output record counts and statistics like that between your job runs, to correlate? The reducers get scheduled early but do not exactly reduce() until all maps are done. They just keep fetching outputs. Their scheduling can be controlled with some configurations (say, to start only after X% of maps are done -- by default it starts up when 5% of maps are done). -- Harsh J This e-mail message may contain privileged and/or confidential information, and is intended to be received only by persons entitled to receive such information. If you have received this e-mail in error, please notify the sender immediately. Please delete it and all attachments from any servers, hard drives or any other media. Other use of this e-mail by you is strictly prohibited. All e-mails and attachments sent and received are subject to monitoring, reading and archival by Monsanto, including its subsidiaries. The recipient of this e-mail is solely responsible for checking for the presence of Viruses or other Malware. Monsanto, along with its subsidiaries, accepts no liability for any damage caused by any such code transmitted by or accompanying this e-mail or any attachment. The information contained in this email may be subject to the export control
Poor scalability with map reduce application
Hello, I'm working with an application to calculate the temperatures of a squared board. I divide the board in a mesh, and represent the board as a list of (key, value) pairs with a key being the linear position of a cell within the mesh, and the value its temperature. I distribute the data during the map and calculate the temperature for next step in the reduce. You can see a more detailed explanation here, http://code.google.com/p/heat-transfer/source/browse/trunk/informe/Informe.pdf but the basic idea is the one I have just mentioned. The funny thing is that the more nodes I add the slower it runs!. With 7 nodes it takes 16 minutes, but with 4 nodes it takes only 8 minutes. You can see the code in file HeatTransfer.java which is found here, http://code.google.com/p/heat-transfer/source/browse/#svn%2Ftrunk%2Ffine%253Fstate%253Dclosed thanks in advance! Alberto. -- José Pablo Alberto Andreotti. Tel: 54 351 4730292 Móvil: 54351156526363. MSN: albertoandreo...@gmail.com Skype: andreottialberto
Re: Poor scalability with map reduce application
Hi Harsh, thanks for your answer!. The cluster is homogeneus, every node has the same amount of cores and memory and is equally reachable in the network. The data is generated specifically for each run. I mean, I write the input data in 4 nodes for one run and in 7 nodes for another. So the input file will be replicated in 4 nodes when running the map reduce with 4 nodes, and in 7 nodes when running it with 7. I don't know if speculatives maps are on, I'll check it. One thing I observed is that reduces begin before all maps have finished. Let me check also if the difference is on the map side or in the reduce. I believe it's balanced, both are slower when adding more nodes, but i'll confirm that. I would appreciate any other comment, thanks again On 21 June 2011 13:33, Harsh J ha...@cloudera.com wrote: Alberto, Please add more practical-related info like if your cluster is homogenous, if the number of maps and reduces in both runs are consistent (i.e., same data and same amount of reducers on 4 vs. 7?), and if map speculatives are on. Also, do you notice difference of time for a single map task across the two runs? Or is the difference on the reduce task side? On Tue, Jun 21, 2011 at 8:33 PM, Alberto Andreotti albertoandreo...@gmail.com wrote: Hello, I'm working with an application to calculate the temperatures of a squared board. I divide the board in a mesh, and represent the board as a list of (key, value) pairs with a key being the linear position of a cell within the mesh, and the value its temperature. I distribute the data during the map and calculate the temperature for next step in the reduce. You can see a more detailed explanation here, http://code.google.com/p/heat-transfer/source/browse/trunk/informe/Informe.pdf but the basic idea is the one I have just mentioned. The funny thing is that the more nodes I add the slower it runs!. With 7 nodes it takes 16 minutes, but with 4 nodes it takes only 8 minutes. You can see the code in file HeatTransfer.java which is found here, http://code.google.com/p/heat-transfer/source/browse/#svn%2Ftrunk%2Ffine%253Fstate%253Dclosed thanks in advance! Alberto. -- José Pablo Alberto Andreotti. Tel: 54 351 4730292 Móvil: 54351156526363. MSN: albertoandreo...@gmail.com Skype: andreottialberto -- Harsh J -- José Pablo Alberto Andreotti. Tel: 54 351 4730292 Móvil: 54351156526363. MSN: albertoandreo...@gmail.com Skype: andreottialberto
RE: Poor scalability with map reduce application
Harsh, Is it possible for mapred.reduce.slowstart.completed.maps to even play a significant role in this? The only benefit he would find in tweaking that for his problem would be to spread network traffic from the shuffle over a longer period of time at a cost of having the reducer using resources earlier. Either way he would see this effect across both sets of runs if he is using the default parameters. I guess it would all depend on what kind of network layout the cluster is on. Matt -Original Message- From: Harsh J [mailto:ha...@cloudera.com] Sent: Tuesday, June 21, 2011 12:09 PM To: common-user@hadoop.apache.org Subject: Re: Poor scalability with map reduce application Alberto, On Tue, Jun 21, 2011 at 10:27 PM, Alberto Andreotti albertoandreo...@gmail.com wrote: I don't know if speculatives maps are on, I'll check it. One thing I observed is that reduces begin before all maps have finished. Let me check also if the difference is on the map side or in the reduce. I believe it's balanced, both are slower when adding more nodes, but i'll confirm that. Maps and reduces are speculative by default, so must've been ON. Could you also post a general input vs. output record counts and statistics like that between your job runs, to correlate? The reducers get scheduled early but do not exactly reduce() until all maps are done. They just keep fetching outputs. Their scheduling can be controlled with some configurations (say, to start only after X% of maps are done -- by default it starts up when 5% of maps are done). -- Harsh J This e-mail message may contain privileged and/or confidential information, and is intended to be received only by persons entitled to receive such information. If you have received this e-mail in error, please notify the sender immediately. Please delete it and all attachments from any servers, hard drives or any other media. Other use of this e-mail by you is strictly prohibited. All e-mails and attachments sent and received are subject to monitoring, reading and archival by Monsanto, including its subsidiaries. The recipient of this e-mail is solely responsible for checking for the presence of Viruses or other Malware. Monsanto, along with its subsidiaries, accepts no liability for any damage caused by any such code transmitted by or accompanying this e-mail or any attachment. The information contained in this email may be subject to the export control laws and regulations of the United States, potentially including but not limited to the Export Administration Regulations (EAR) and sanctions regulations issued by the U.S. Department of Treasury, Office of Foreign Asset Controls (OFAC). As a recipient of this information you are obligated to comply with all applicable U.S. export laws and regulations.
Re: Poor scalability with map reduce application
Thank you guys, I really appreciate your answers. I don't have access to the cluster right now, I'll check the info you are asking and come back in a couple of hours. BTW, I tried the app on two clusters with similar results. I'm using 0.21.0. thanks again, Alberto. On 21 June 2011 14:16, GOEKE, MATTHEW (AG/1000) matthew.go...@monsanto.comwrote: Harsh, Is it possible for mapred.reduce.slowstart.completed.maps to even play a significant role in this? The only benefit he would find in tweaking that for his problem would be to spread network traffic from the shuffle over a longer period of time at a cost of having the reducer using resources earlier. Either way he would see this effect across both sets of runs if he is using the default parameters. I guess it would all depend on what kind of network layout the cluster is on. Matt -Original Message- From: Harsh J [mailto:ha...@cloudera.com] Sent: Tuesday, June 21, 2011 12:09 PM To: common-user@hadoop.apache.org Subject: Re: Poor scalability with map reduce application Alberto, On Tue, Jun 21, 2011 at 10:27 PM, Alberto Andreotti albertoandreo...@gmail.com wrote: I don't know if speculatives maps are on, I'll check it. One thing I observed is that reduces begin before all maps have finished. Let me check also if the difference is on the map side or in the reduce. I believe it's balanced, both are slower when adding more nodes, but i'll confirm that. Maps and reduces are speculative by default, so must've been ON. Could you also post a general input vs. output record counts and statistics like that between your job runs, to correlate? The reducers get scheduled early but do not exactly reduce() until all maps are done. They just keep fetching outputs. Their scheduling can be controlled with some configurations (say, to start only after X% of maps are done -- by default it starts up when 5% of maps are done). -- Harsh J This e-mail message may contain privileged and/or confidential information, and is intended to be received only by persons entitled to receive such information. If you have received this e-mail in error, please notify the sender immediately. Please delete it and all attachments from any servers, hard drives or any other media. Other use of this e-mail by you is strictly prohibited. All e-mails and attachments sent and received are subject to monitoring, reading and archival by Monsanto, including its subsidiaries. The recipient of this e-mail is solely responsible for checking for the presence of Viruses or other Malware. Monsanto, along with its subsidiaries, accepts no liability for any damage caused by any such code transmitted by or accompanying this e-mail or any attachment. The information contained in this email may be subject to the export control laws and regulations of the United States, potentially including but not limited to the Export Administration Regulations (EAR) and sanctions regulations issued by the U.S. Department of Treasury, Office of Foreign Asset Controls (OFAC). As a recipient of this information you are obligated to comply with all applicable U.S. export laws and regulations. -- José Pablo Alberto Andreotti. Tel: 54 351 4730292 Móvil: 54351156526363. MSN: albertoandreo...@gmail.com Skype: andreottialberto
Re: Poor scalability with map reduce application
I saw that the link I sent you may not be working, please take a look here to see what it is all about, https://docs.google.com/viewer?a=vpid=explorerchrome=truesrcid=0B5AOpwg8IzVANjJlODZhZDctNWUzMS00MmNhLWI3OWMtMWNhMTdjODQwNjVlhl=en_US thanks again! On 21 June 2011 14:22, Alberto Andreotti albertoandreo...@gmail.com wrote: Thank you guys, I really appreciate your answers. I don't have access to the cluster right now, I'll check the info you are asking and come back in a couple of hours. BTW, I tried the app on two clusters with similar results. I'm using 0.21.0. thanks again, Alberto. On 21 June 2011 14:16, GOEKE, MATTHEW (AG/1000) matthew.go...@monsanto.com wrote: Harsh, Is it possible for mapred.reduce.slowstart.completed.maps to even play a significant role in this? The only benefit he would find in tweaking that for his problem would be to spread network traffic from the shuffle over a longer period of time at a cost of having the reducer using resources earlier. Either way he would see this effect across both sets of runs if he is using the default parameters. I guess it would all depend on what kind of network layout the cluster is on. Matt -Original Message- From: Harsh J [mailto:ha...@cloudera.com] Sent: Tuesday, June 21, 2011 12:09 PM To: common-user@hadoop.apache.org Subject: Re: Poor scalability with map reduce application Alberto, On Tue, Jun 21, 2011 at 10:27 PM, Alberto Andreotti albertoandreo...@gmail.com wrote: I don't know if speculatives maps are on, I'll check it. One thing I observed is that reduces begin before all maps have finished. Let me check also if the difference is on the map side or in the reduce. I believe it's balanced, both are slower when adding more nodes, but i'll confirm that. Maps and reduces are speculative by default, so must've been ON. Could you also post a general input vs. output record counts and statistics like that between your job runs, to correlate? The reducers get scheduled early but do not exactly reduce() until all maps are done. They just keep fetching outputs. Their scheduling can be controlled with some configurations (say, to start only after X% of maps are done -- by default it starts up when 5% of maps are done). -- Harsh J This e-mail message may contain privileged and/or confidential information, and is intended to be received only by persons entitled to receive such information. If you have received this e-mail in error, please notify the sender immediately. Please delete it and all attachments from any servers, hard drives or any other media. Other use of this e-mail by you is strictly prohibited. All e-mails and attachments sent and received are subject to monitoring, reading and archival by Monsanto, including its subsidiaries. The recipient of this e-mail is solely responsible for checking for the presence of Viruses or other Malware. Monsanto, along with its subsidiaries, accepts no liability for any damage caused by any such code transmitted by or accompanying this e-mail or any attachment. The information contained in this email may be subject to the export control laws and regulations of the United States, potentially including but not limited to the Export Administration Regulations (EAR) and sanctions regulations issued by the U.S. Department of Treasury, Office of Foreign Asset Controls (OFAC). As a recipient of this information you are obligated to comply with all applicable U.S. export laws and regulations. -- José Pablo Alberto Andreotti. Tel: 54 351 4730292 Móvil: 54351156526363. MSN: albertoandreo...@gmail.com Skype: andreottialberto -- José Pablo Alberto Andreotti. Tel: 54 351 4730292 Móvil: 54351156526363. MSN: albertoandreo...@gmail.com Skype: andreottialberto
Re: Poor scalability with map reduce application
Matt, You're right that it (slowstart) does not / would not affect much. I was merely explaining the reason behind his observance of reducers getting scheduled early, not really recommending a tweak for performance changes there. On Tue, Jun 21, 2011 at 10:46 PM, GOEKE, MATTHEW (AG/1000) matthew.go...@monsanto.com wrote: Harsh, Is it possible for mapred.reduce.slowstart.completed.maps to even play a significant role in this? The only benefit he would find in tweaking that for his problem would be to spread network traffic from the shuffle over a longer period of time at a cost of having the reducer using resources earlier. Either way he would see this effect across both sets of runs if he is using the default parameters. I guess it would all depend on what kind of network layout the cluster is on. Matt -Original Message- From: Harsh J [mailto:ha...@cloudera.com] Sent: Tuesday, June 21, 2011 12:09 PM To: common-user@hadoop.apache.org Subject: Re: Poor scalability with map reduce application Alberto, On Tue, Jun 21, 2011 at 10:27 PM, Alberto Andreotti albertoandreo...@gmail.com wrote: I don't know if speculatives maps are on, I'll check it. One thing I observed is that reduces begin before all maps have finished. Let me check also if the difference is on the map side or in the reduce. I believe it's balanced, both are slower when adding more nodes, but i'll confirm that. Maps and reduces are speculative by default, so must've been ON. Could you also post a general input vs. output record counts and statistics like that between your job runs, to correlate? The reducers get scheduled early but do not exactly reduce() until all maps are done. They just keep fetching outputs. Their scheduling can be controlled with some configurations (say, to start only after X% of maps are done -- by default it starts up when 5% of maps are done). -- Harsh J This e-mail message may contain privileged and/or confidential information, and is intended to be received only by persons entitled to receive such information. If you have received this e-mail in error, please notify the sender immediately. Please delete it and all attachments from any servers, hard drives or any other media. Other use of this e-mail by you is strictly prohibited. All e-mails and attachments sent and received are subject to monitoring, reading and archival by Monsanto, including its subsidiaries. The recipient of this e-mail is solely responsible for checking for the presence of Viruses or other Malware. Monsanto, along with its subsidiaries, accepts no liability for any damage caused by any such code transmitted by or accompanying this e-mail or any attachment. The information contained in this email may be subject to the export control laws and regulations of the United States, potentially including but not limited to the Export Administration Regulations (EAR) and sanctions regulations issued by the U.S. Department of Treasury, Office of Foreign Asset Controls (OFAC). As a recipient of this information you are obligated to comply with all applicable U.S. export laws and regulations. -- Harsh J