Re: hadoop datanode capacity issue

2013-07-06 Thread Ian Wrigley
You're correct: each directory is assumed to be a different storage device. 
There's really no reason to specify two directories on the same physical disk 
in dfs.data.dir -- just use one directory.

Ian.


On Jul 6, 2013, at 9:18 AM, Amit Anand aan...@aquratesolutions.com wrote:

 Hi All,
  
 I have configured a three node cluster:
  
 For each data node the configured capacity is showing double the size of 
 actual storage. Below is the screen shot of configuration  files, “dfsadmin 
 –report” and “df –h” from each node. Any idea why would it show configured 
 capacity as double the size of actual storage?
  
 After looking at configuration files, I am assuming each directory mentioned 
 under “dfs.data.dir” is being treated as a separate storage device and hence 
 doubling the configured capacity size. Am I correct? Is this a bug or 
 something wrong with my configuration?
  
 CORE-SITE.XML, HDFS-SITE.XML, MAPRED-SITE.XML (From all nodes)
  
 image003.png
  
 R1NN1 (NAMENODE, DATANODE, JOBTRACKER)
  
 image004.png
  
 R1SN1 (SECONDARY NAMENODE, DATANODE, TASKTRACKER)
  
 image005.png
  
 R1DN1(DATANODE, TASKTRACKER)
  
 image006.png
  
 DFSADMIN -REPORT
  
 image002.png
  
 Thank You,
 Amit Anand
 (Mob) 484.682.3065 , 215-995-1058
 (Fax) 215.359.9674
 (Desk). 215-774-9959
 aan...@aquratesolutions.com
  
 image001.gif
  
 Disclaimer: This email message is for the sole use of the intended recipient 
 (s) and may contain confidential and privileged information. Any unauthorized 
 review, use, disclosure or distribution is prohibited. If you are not the 
 intended recipient, please contact the sender by reply email and destroy all 
 copies of the original message.
 NOTE: Under Bill s.1618 Title III passed by the 105th U.S. Congress this mail 
 cannot be considered Spam as long as we include the contact information for 
 removal from our mailing list. To be removed from our mailing list please 
 reply to this email with 'remove' in the subject heading and your email 
 address in the body. Include complete address and/or domain/aliases to be 
 removed.
  


---
Ian Wrigley
Sr. Curriculum Manager
Cloudera, Inc
Cell: (323) 819 4075



Re: Shuffle phase replication factor

2013-05-21 Thread Ian Wrigley
Intermediate data is written to local disk, not to HDFS.

Ian.

On May 21, 2013, at 1:57 PM, John Lilley john.lil...@redpoint.net wrote:

 When MapReduce enters “shuffle” to partition the tuples, I am assuming that 
 it writes intermediate data to HDFS.  What replication factor is used for 
 those temporary files?
 john
  


---
Ian Wrigley
Sr. Curriculum Manager
Cloudera, Inc
Cell: (323) 819 4075