Replication Factor Modification
Hi, We have a requirement where we have change our Hadoop Cluster's Replication Factor without restarting the Cluster. We are running our Cluster on Amazon EMR. Can you please suggest the way to achieve this? Any pointer to this will be very helpful. Thanks And Regards Uddipan Mukherjee CAUTION - Disclaimer * This e-mail contains PRIVILEGED AND CONFIDENTIAL INFORMATION intended solely for the use of the addressee(s). If you are not the intended recipient, please notify the sender by e-mail and delete the original message. Further, you are not to copy, disclose, or distribute this e-mail or its contents to any other person and any such actions are unlawful. This e-mail may contain viruses. Infosys has taken every reasonable precaution to minimize this risk, but is not liable for any damage you may sustain as a result of any virus in this e-mail. You should carry out your own virus checks before opening the e-mail or attachment. Infosys reserves the right to monitor and review the content of all messages sent to or from this e-mail address. Messages sent to or from this e-mail address may be stored on the Infosys e-mail system. ***INFOSYS End of Disclaimer INFOSYS***
Re: Replication Factor Modification
Hi You can change the replication factor of an existing directory using '-setrep' http://hadoop.apache.org/common/docs/r0.20.0/hdfs_shell.html#setrep The below command will recursively set the replication factor to 1 for all files within the given directory '/user' hadoop fs -setrep -w 1 -R /user On Wed, Sep 5, 2012 at 11:39 PM, Uddipan Mukherjee uddipan_mukher...@infosys.com wrote: Hi, We have a requirement where we have change our Hadoop Cluster's Replication Factor without restarting the Cluster. We are running our Cluster on Amazon EMR. Can you please suggest the way to achieve this? Any pointer to this will be very helpful. Thanks And Regards Uddipan Mukherjee CAUTION - Disclaimer * This e-mail contains PRIVILEGED AND CONFIDENTIAL INFORMATION intended solely for the use of the addressee(s). If you are not the intended recipient, please notify the sender by e-mail and delete the original message. Further, you are not to copy, disclose, or distribute this e-mail or its contents to any other person and any such actions are unlawful. This e-mail may contain viruses. Infosys has taken every reasonable precaution to minimize this risk, but is not liable for any damage you may sustain as a result of any virus in this e-mail. You should carry out your own virus checks before opening the e-mail or attachment. Infosys reserves the right to monitor and review the content of all messages sent to or from this e-mail address. Messages sent to or from this e-mail address may be stored on the Infosys e-mail system. ***INFOSYS End of Disclaimer INFOSYS***
Re: Replication Factor Modification
Replication factor is per file option, So, you may have to write a small program which will iterate over all files and set the replication factor to desired one. API: FileSystem#setReplication Regards, Uma On Wed, Sep 5, 2012 at 11:39 PM, Uddipan Mukherjee uddipan_mukher...@infosys.com wrote: Hi, We have a requirement where we have change our Hadoop Cluster's Replication Factor without restarting the Cluster. We are running our Cluster on Amazon EMR. Can you please suggest the way to achieve this? Any pointer to this will be very helpful. Thanks And Regards Uddipan Mukherjee CAUTION - Disclaimer * This e-mail contains PRIVILEGED AND CONFIDENTIAL INFORMATION intended solely for the use of the addressee(s). If you are not the intended recipient, please notify the sender by e-mail and delete the original message. Further, you are not to copy, disclose, or distribute this e-mail or its contents to any other person and any such actions are unlawful. This e-mail may contain viruses. Infosys has taken every reasonable precaution to minimize this risk, but is not liable for any damage you may sustain as a result of any virus in this e-mail. You should carry out your own virus checks before opening the e-mail or attachment. Infosys reserves the right to monitor and review the content of all messages sent to or from this e-mail address. Messages sent to or from this e-mail address may be stored on the Infosys e-mail system. ***INFOSYS End of Disclaimer INFOSYS***
Re: Replication Factor Modification
Replication factor is per-file, and is a client-side property. So, this is doable. 1. Change the replication factor of all existing files (or needed ones): $ hadoop fs -setrep -R value / 2. Change the dfs.replication parameter in all client configs to the desired value On Wed, Sep 5, 2012 at 11:39 PM, Uddipan Mukherjee uddipan_mukher...@infosys.com wrote: Hi, We have a requirement where we have change our Hadoop Cluster's Replication Factor without restarting the Cluster. We are running our Cluster on Amazon EMR. Can you please suggest the way to achieve this? Any pointer to this will be very helpful. Thanks And Regards Uddipan Mukherjee CAUTION - Disclaimer * This e-mail contains PRIVILEGED AND CONFIDENTIAL INFORMATION intended solely for the use of the addressee(s). If you are not the intended recipient, please notify the sender by e-mail and delete the original message. Further, you are not to copy, disclose, or distribute this e-mail or its contents to any other person and any such actions are unlawful. This e-mail may contain viruses. Infosys has taken every reasonable precaution to minimize this risk, but is not liable for any damage you may sustain as a result of any virus in this e-mail. You should carry out your own virus checks before opening the e-mail or attachment. Infosys reserves the right to monitor and review the content of all messages sent to or from this e-mail address. Messages sent to or from this e-mail address may be stored on the Infosys e-mail system. ***INFOSYS End of Disclaimer INFOSYS*** -- Harsh J
RE: Replication Factor Modification
Hi, Thanks for the help. But How I will set the replication factor as desired so that when new files comes in it will automatically take the new value of dfs.replication without a cluster restart. Please note we have a 200 nodes cluster. Thanks and Regards, Uddipan Mukherjee From: Harsh J [mailto:ha...@cloudera.com] Sent: Wednesday, September 05, 2012 7:17 PM To: user@hadoop.apache.org Subject: Re: Replication Factor Modification Replication factor is per-file, and is a client-side property. So, this is doable. 1. Change the replication factor of all existing files (or needed ones): $ hadoop fs -setrep -R value / 2. Change the dfs.replication parameter in all client configs to the desired value On Wed, Sep 5, 2012 at 11:39 PM, Uddipan Mukherjee uddipan_mukher...@infosys.commailto:uddipan_mukher...@infosys.com wrote: Hi, We have a requirement where we have change our Hadoop Cluster's Replication Factor without restarting the Cluster. We are running our Cluster on Amazon EMR. Can you please suggest the way to achieve this? Any pointer to this will be very helpful. Thanks And Regards Uddipan Mukherjee CAUTION - Disclaimer * This e-mail contains PRIVILEGED AND CONFIDENTIAL INFORMATION intended solely for the use of the addressee(s). If you are not the intended recipient, please notify the sender by e-mail and delete the original message. Further, you are not to copy, disclose, or distribute this e-mail or its contents to any other person and any such actions are unlawful. This e-mail may contain viruses. Infosys has taken every reasonable precaution to minimize this risk, but is not liable for any damage you may sustain as a result of any virus in this e-mail. You should carry out your own virus checks before opening the e-mail or attachment. Infosys reserves the right to monitor and review the content of all messages sent to or from this e-mail address. Messages sent to or from this e-mail address may be stored on the Infosys e-mail system. ***INFOSYS End of Disclaimer INFOSYS*** -- Harsh J
Re: Replication Factor Modification
Hi Uddipan As Harsh mentioned, replication factor is a client side property . So you need to update the value for 'dfs.replication' in hdfs-site.xml as per your requirement in your edge nodes or from the machines your are copying files to hdfs. If you are using some of the existing DN's for this purpose (as client) you need to update the value in there. No need of restarting the services. On Wed, Sep 5, 2012 at 11:54 PM, Uddipan Mukherjee uddipan_mukher...@infosys.com wrote: Hi, ** ** Thanks for the help. But How I will set the replication factor as desired so that when new files comes in it will automatically take the new value of dfs.replication without a cluster restart. Please note we have a 200 nodes cluster. ** ** Thanks and Regards, Uddipan Mukherjee ** ** *From:* Harsh J [mailto:ha...@cloudera.com] *Sent:* Wednesday, September 05, 2012 7:17 PM *To:* user@hadoop.apache.org *Subject:* Re: Replication Factor Modification ** ** Replication factor is per-file, and is a client-side property. So, this is doable. ** ** 1. Change the replication factor of all existing files (or needed ones):** ** ** ** $ hadoop fs -setrep -R value / ** ** 2. Change the dfs.replication parameter in all client configs to the desired value On Wed, Sep 5, 2012 at 11:39 PM, Uddipan Mukherjee uddipan_mukher...@infosys.com wrote: Hi, We have a requirement where we have change our Hadoop Cluster's Replication Factor without restarting the Cluster. We are running our Cluster on Amazon EMR. Can you please suggest the way to achieve this? Any pointer to this will be very helpful. Thanks And Regards Uddipan Mukherjee CAUTION - Disclaimer * This e-mail contains PRIVILEGED AND CONFIDENTIAL INFORMATION intended solely for the use of the addressee(s). If you are not the intended recipient, please notify the sender by e-mail and delete the original message. Further, you are not to copy, disclose, or distribute this e-mail or its contents to any other person and any such actions are unlawful. This e-mail may contain viruses. Infosys has taken every reasonable precaution to minimize this risk, but is not liable for any damage you may sustain as a result of any virus in this e-mail. You should carry out your own virus checks before opening the e-mail or attachment. Infosys reserves the right to monitor and review the content of all messages sent to or from this e-mail address. Messages sent to or from this e-mail address may be stored on the Infosys e-mail system. ***INFOSYS End of Disclaimer INFOSYS*** ** ** -- Harsh J
Re: Reg: Replication Factor Modification
Hi Uddippan, Check out the following link for setrep command in Hadoop: http://hadoop.apache.org/common/docs/r0.20.0/hdfs_shell.html#setrep You don't need to restart the cluster after running the command. HTH, Anil On Wed, Sep 5, 2012 at 11:02 AM, Uddipan Mukherjee uddipan_mukher...@infosys.com wrote: Hi, We have a requirement where we have change our Hadoop Cluster's Replication Factor without restarting the Cluster. We are running our Cluster on Amazon EMR. Can you please suggest the way to achieve this? Any pointer to this will be very helpful. Thanks And Regards Uddipan Mukherjee CAUTION - Disclaimer * This e-mail contains PRIVILEGED AND CONFIDENTIAL INFORMATION intended solely for the use of the addressee(s). If you are not the intended recipient, please notify the sender by e-mail and delete the original message. Further, you are not to copy, disclose, or distribute this e-mail or its contents to any other person and any such actions are unlawful. This e-mail may contain viruses. Infosys has taken every reasonable precaution to minimize this risk, but is not liable for any damage you may sustain as a result of any virus in this e-mail. You should carry out your own virus checks before opening the e-mail or attachment. Infosys reserves the right to monitor and review the content of all messages sent to or from this e-mail address. Messages sent to or from this e-mail address may be stored on the Infosys e-mail system. ***INFOSYS End of Disclaimer INFOSYS*** -- Thanks Regards, Anil Gupta