Replication Factor Modification

2012-09-05 Thread Uddipan Mukherjee
Hi,



   We have a requirement where we have change our Hadoop Cluster's Replication 
Factor without restarting the Cluster. We are running our Cluster on Amazon EMR.



Can you please suggest the way to achieve this? Any pointer to this will be 
very helpful.


Thanks And Regards
Uddipan Mukherjee

 CAUTION - Disclaimer *
This e-mail contains PRIVILEGED AND CONFIDENTIAL INFORMATION intended solely
for the use of the addressee(s). If you are not the intended recipient, please
notify the sender by e-mail and delete the original message. Further, you are 
not
to copy, disclose, or distribute this e-mail or its contents to any other 
person and
any such actions are unlawful. This e-mail may contain viruses. Infosys has 
taken
every reasonable precaution to minimize this risk, but is not liable for any 
damage
you may sustain as a result of any virus in this e-mail. You should carry out 
your
own virus checks before opening the e-mail or attachment. Infosys reserves the
right to monitor and review the content of all messages sent to or from this 
e-mail
address. Messages sent to or from this e-mail address may be stored on the
Infosys e-mail system.
***INFOSYS End of Disclaimer INFOSYS***


Re: Replication Factor Modification

2012-09-05 Thread Bejoy Ks
Hi

You can change the replication factor of an existing directory using
'-setrep'

http://hadoop.apache.org/common/docs/r0.20.0/hdfs_shell.html#setrep

The below command will recursively set the replication factor to 1 for all
files within the given directory '/user'
hadoop fs -setrep -w 1 -R /user




On Wed, Sep 5, 2012 at 11:39 PM, Uddipan Mukherjee 
uddipan_mukher...@infosys.com wrote:

  Hi, 

  

We have a requirement where we have change our Hadoop Cluster's
 Replication Factor without restarting the Cluster. We are running our
 Cluster on Amazon EMR.

  

 Can you please suggest the way to achieve this? Any pointer to this will
 be very helpful.

  

 Thanks And Regards

 Uddipan Mukherjee

  CAUTION - Disclaimer *
 This e-mail contains PRIVILEGED AND CONFIDENTIAL INFORMATION intended solely
 for the use of the addressee(s). If you are not the intended recipient, please
 notify the sender by e-mail and delete the original message. Further, you are 
 not
 to copy, disclose, or distribute this e-mail or its contents to any other 
 person and
 any such actions are unlawful. This e-mail may contain viruses. Infosys has 
 taken
 every reasonable precaution to minimize this risk, but is not liable for any 
 damage
 you may sustain as a result of any virus in this e-mail. You should carry out 
 your
 own virus checks before opening the e-mail or attachment. Infosys reserves the
 right to monitor and review the content of all messages sent to or from this 
 e-mail
 address. Messages sent to or from this e-mail address may be stored on the
 Infosys e-mail system.
 ***INFOSYS End of Disclaimer INFOSYS***




Re: Replication Factor Modification

2012-09-05 Thread Uma Maheswara Rao G
Replication factor is per file option, So, you may have to write a small
program which will iterate over all files and set the replication factor to
desired one.
API: FileSystem#setReplication

Regards,
Uma

On Wed, Sep 5, 2012 at 11:39 PM, Uddipan Mukherjee 
uddipan_mukher...@infosys.com wrote:

  Hi, 

  

We have a requirement where we have change our Hadoop Cluster's
 Replication Factor without restarting the Cluster. We are running our
 Cluster on Amazon EMR.

  

 Can you please suggest the way to achieve this? Any pointer to this will
 be very helpful.

  

 Thanks And Regards

 Uddipan Mukherjee

  CAUTION - Disclaimer *
 This e-mail contains PRIVILEGED AND CONFIDENTIAL INFORMATION intended solely
 for the use of the addressee(s). If you are not the intended recipient, please
 notify the sender by e-mail and delete the original message. Further, you are 
 not
 to copy, disclose, or distribute this e-mail or its contents to any other 
 person and
 any such actions are unlawful. This e-mail may contain viruses. Infosys has 
 taken
 every reasonable precaution to minimize this risk, but is not liable for any 
 damage
 you may sustain as a result of any virus in this e-mail. You should carry out 
 your
 own virus checks before opening the e-mail or attachment. Infosys reserves the
 right to monitor and review the content of all messages sent to or from this 
 e-mail
 address. Messages sent to or from this e-mail address may be stored on the
 Infosys e-mail system.
 ***INFOSYS End of Disclaimer INFOSYS***




Re: Replication Factor Modification

2012-09-05 Thread Harsh J
Replication factor is per-file, and is a client-side property. So, this is
doable.

1. Change the replication factor of all existing files (or needed ones):

$ hadoop fs -setrep -R value /

2. Change the dfs.replication parameter in all client configs to the
desired value

On Wed, Sep 5, 2012 at 11:39 PM, Uddipan Mukherjee 
uddipan_mukher...@infosys.com wrote:

  Hi, 

  

We have a requirement where we have change our Hadoop Cluster's
 Replication Factor without restarting the Cluster. We are running our
 Cluster on Amazon EMR.

  

 Can you please suggest the way to achieve this? Any pointer to this will
 be very helpful.

  

 Thanks And Regards

 Uddipan Mukherjee

  CAUTION - Disclaimer *
 This e-mail contains PRIVILEGED AND CONFIDENTIAL INFORMATION intended solely
 for the use of the addressee(s). If you are not the intended recipient, please
 notify the sender by e-mail and delete the original message. Further, you are 
 not
 to copy, disclose, or distribute this e-mail or its contents to any other 
 person and
 any such actions are unlawful. This e-mail may contain viruses. Infosys has 
 taken
 every reasonable precaution to minimize this risk, but is not liable for any 
 damage
 you may sustain as a result of any virus in this e-mail. You should carry out 
 your
 own virus checks before opening the e-mail or attachment. Infosys reserves the
 right to monitor and review the content of all messages sent to or from this 
 e-mail
 address. Messages sent to or from this e-mail address may be stored on the
 Infosys e-mail system.
 ***INFOSYS End of Disclaimer INFOSYS***




-- 
Harsh J


RE: Replication Factor Modification

2012-09-05 Thread Uddipan Mukherjee
Hi,

   Thanks for the help. But How I will set the replication factor as desired so 
that when new files comes in it will automatically take the new value of 
dfs.replication without a cluster restart. Please note we have a 200 nodes 
cluster.

Thanks and Regards,
Uddipan Mukherjee

From: Harsh J [mailto:ha...@cloudera.com]
Sent: Wednesday, September 05, 2012 7:17 PM
To: user@hadoop.apache.org
Subject: Re: Replication Factor Modification

Replication factor is per-file, and is a client-side property. So, this is 
doable.

1. Change the replication factor of all existing files (or needed ones):

$ hadoop fs -setrep -R value /

2. Change the dfs.replication parameter in all client configs to the desired 
value
On Wed, Sep 5, 2012 at 11:39 PM, Uddipan Mukherjee 
uddipan_mukher...@infosys.commailto:uddipan_mukher...@infosys.com wrote:

Hi,



   We have a requirement where we have change our Hadoop Cluster's Replication 
Factor without restarting the Cluster. We are running our Cluster on Amazon EMR.



Can you please suggest the way to achieve this? Any pointer to this will be 
very helpful.


Thanks And Regards
Uddipan Mukherjee

 CAUTION - Disclaimer *

This e-mail contains PRIVILEGED AND CONFIDENTIAL INFORMATION intended solely

for the use of the addressee(s). If you are not the intended recipient, please

notify the sender by e-mail and delete the original message. Further, you are 
not

to copy, disclose, or distribute this e-mail or its contents to any other 
person and

any such actions are unlawful. This e-mail may contain viruses. Infosys has 
taken

every reasonable precaution to minimize this risk, but is not liable for any 
damage

you may sustain as a result of any virus in this e-mail. You should carry out 
your

own virus checks before opening the e-mail or attachment. Infosys reserves the

right to monitor and review the content of all messages sent to or from this 
e-mail

address. Messages sent to or from this e-mail address may be stored on the

Infosys e-mail system.

***INFOSYS End of Disclaimer INFOSYS***




--
Harsh J


Re: Replication Factor Modification

2012-09-05 Thread Bejoy Ks
Hi  Uddipan

As Harsh mentioned, replication factor is a client side property . So you
need to update the value for 'dfs.replication' in hdfs-site.xml as per your
requirement in your edge nodes or from the machines your are copying files
to hdfs. If you are using some of the existing DN's for this purpose (as
client) you need to update the value in there. No need of restarting the
services.

On Wed, Sep 5, 2012 at 11:54 PM, Uddipan Mukherjee 
uddipan_mukher...@infosys.com wrote:

  Hi,

 ** **

Thanks for the help. But How I will set the replication factor as
 desired so that when new files comes in it will automatically take the new
 value of dfs.replication without a cluster restart. Please note we have a
 200 nodes cluster.

 ** **

 Thanks and Regards,

 Uddipan Mukherjee

 ** **

 *From:* Harsh J [mailto:ha...@cloudera.com]
 *Sent:* Wednesday, September 05, 2012 7:17 PM
 *To:* user@hadoop.apache.org
 *Subject:* Re: Replication Factor Modification

 ** **

 Replication factor is per-file, and is a client-side property. So, this is
 doable.

 ** **

 1. Change the replication factor of all existing files (or needed ones):**
 **

 ** **

 $ hadoop fs -setrep -R value /

 ** **

 2. Change the dfs.replication parameter in all client configs to the
 desired value

 On Wed, Sep 5, 2012 at 11:39 PM, Uddipan Mukherjee 
 uddipan_mukher...@infosys.com wrote:

 Hi, 

  

We have a requirement where we have change our Hadoop Cluster's
 Replication Factor without restarting the Cluster. We are running our
 Cluster on Amazon EMR.

  

 Can you please suggest the way to achieve this? Any pointer to this will
 be very helpful.

  

 Thanks And Regards

 Uddipan Mukherjee

  CAUTION - Disclaimer *

 This e-mail contains PRIVILEGED AND CONFIDENTIAL INFORMATION intended solely 
 

 for the use of the addressee(s). If you are not the intended recipient, 
 please 

 notify the sender by e-mail and delete the original message. Further, you are 
 not 

 to copy, disclose, or distribute this e-mail or its contents to any other 
 person and 

 any such actions are unlawful. This e-mail may contain viruses. Infosys has 
 taken 

 every reasonable precaution to minimize this risk, but is not liable for any 
 damage 

 you may sustain as a result of any virus in this e-mail. You should carry out 
 your 

 own virus checks before opening the e-mail or attachment. Infosys reserves 
 the 

 right to monitor and review the content of all messages sent to or from this 
 e-mail 

 address. Messages sent to or from this e-mail address may be stored on the 
 

 Infosys e-mail system.

 ***INFOSYS End of Disclaimer INFOSYS***



 

 ** **

 --
 Harsh J



Re: Reg: Replication Factor Modification

2012-09-05 Thread anil gupta
Hi Uddippan,

Check out the following link for setrep command in Hadoop:
http://hadoop.apache.org/common/docs/r0.20.0/hdfs_shell.html#setrep

You don't need to restart the cluster after running the command.

HTH,
Anil

On Wed, Sep 5, 2012 at 11:02 AM, Uddipan Mukherjee 
uddipan_mukher...@infosys.com wrote:

 Hi,



We have a requirement where we have change our Hadoop Cluster's
 Replication Factor without restarting the Cluster. We are running our
 Cluster on Amazon EMR.



 Can you please suggest the way to achieve this? Any pointer to this will
 be very helpful.


 Thanks And Regards
 Uddipan Mukherjee

  CAUTION - Disclaimer *
 This e-mail contains PRIVILEGED AND CONFIDENTIAL INFORMATION intended
 solely
 for the use of the addressee(s). If you are not the intended recipient,
 please
 notify the sender by e-mail and delete the original message. Further, you
 are not
 to copy, disclose, or distribute this e-mail or its contents to any other
 person and
 any such actions are unlawful. This e-mail may contain viruses. Infosys
 has taken
 every reasonable precaution to minimize this risk, but is not liable for
 any damage
 you may sustain as a result of any virus in this e-mail. You should carry
 out your
 own virus checks before opening the e-mail or attachment. Infosys reserves
 the
 right to monitor and review the content of all messages sent to or from
 this e-mail
 address. Messages sent to or from this e-mail address may be stored on the
 Infosys e-mail system.
 ***INFOSYS End of Disclaimer INFOSYS***




-- 
Thanks  Regards,
Anil Gupta