Re: Not a host:port pair when running balancer

2009-03-11 Thread Hairong Kuang
Please try using the port number 8020.

Hairong

On 3/11/09 9:42 AM, Stuart White stuart.whi...@gmail.com wrote:

 I've been running hadoop-0.19.0 for several weeks successfully.
 
 Today, for the first time, I tried to run the balancer, and I'm receiving:
 
 java.lang.RuntimeException: Not a host:port pair: hvcwydev0601
 
 In my hadoop-site.xml, I have this:
 
 property
   namefs.default.name/name
   valuehdfs://hvcwydev0601//value
 /property
 
 What do I need to change to get the balancer to work?  It seems I need
 to add a port to fs.default.name.  If so, what port?  Can I just pick
 any port?  If I specify a port, do I need to specify any other parms
 accordingly?
 
 I searched the forum, and found a few posts on this topic, but it
 seems that the configuration parms have changed over time, so I'm not
 sure what the current correct configuration is.
 
 Also, if fs.default.name is supposed to have a port, I'll point out
 that the docs don't say so:
 http://hadoop.apache.org/core/docs/r0.19.1/cluster_setup.html
 
 The example given for fs.default.name is hdfs://hostname/.
 
 Thanks!



Re: Question about HDFS capacity and remaining

2009-01-29 Thread Hairong Kuang
It's taken by non-dfs files.

Hairong


On 1/29/09 3:23 PM, Bryan Duxbury br...@rapleaf.com wrote:

 Hey all,
 
 I'm currently installing a new cluster, and noticed something a
 little confusing. My DFS is *completely* empty - 0 files in DFS.
 However, in the namenode web interface, the reported capacity is
 3.49 TB, but the remaining is 3.25TB. Where'd that .24TB go? There
 are literally zero other files on the partitions hosting the DFS data
 directories. Where am I losing 240GB?
 
 -Bryan



Re: hadoop balanceing data

2009-01-23 Thread Hairong Kuang
%Remaining is much more fluctuate than %dfs used. This is because dfs shares
the disks with mapred and mapred tasks may use a lot of disks temporally. So
trying to keep the same %free is impossible most of the time.

Hairong


On 1/19/09 10:28 PM, Billy Pearson sa...@pearsonwholesale.com wrote:

 Why do we not use the Remaining % in place of use Used % when we are
 selecting datanode for new data and when running the balancer.
 form what I can tell we are using the use % used and we do not factor in non
 DFS Used at all.
 I see a datanode with only a 60GB hard drive fill up completely 100% before
 the other servers that have 130+GB hard drives get half full.
 Seams like Trying to keep the same % free on the drives in the cluster would
 be more optimal in production.
 I know this still may not be perfect but would be nice if we tried.
 
 Billy
 
 



Re: getting HDFS to rack-aware mode

2008-10-14 Thread Hairong Kuang
Using -w option for the set replication command will wait until replication
is done. Then run fsck to check if the all blocks are on at least two racks.

Hairong


On 10/14/08 12:06 PM, Sriram Rao [EMAIL PROTECTED] wrote:

 Hi,
 
 We have a cluster where we running HDFS in non-rack-aware mode.  Now,
 we want to switch HDFS to run in rack-aware mode.  Apart from the
 config changes (and restarting HDFS), to rackify the existing data, we
 were thinking of increasing/decreasing replication level a few times
 to get the data spread.  Are there any tools that will enable us to
 know when we are done?
 
 Sriram



RE: Could not get block locations. Aborting... exception

2008-09-26 Thread Hairong Kuang
Does your failed map task open a lot of files to write? Could you please check 
the log of the datanode running at the machine where the map tasks failed? Do 
you see any error message containing exceeds the limit of concurrent xcievers?
 
Hairong



From: Bryan Duxbury [mailto:[EMAIL PROTECTED]
Sent: Fri 9/26/2008 4:36 PM
To: core-user@hadoop.apache.org
Subject: Could not get block locations. Aborting... exception



Hey all.

We've been running into a very annoying problem pretty frequently 
lately. We'll be running some job, for instance a distcp, and it'll 
be moving along quite nicely, until all of the sudden, it sort of 
freezes up. It takes a while, and then we'll get an error like this one:

attempt_200809261607_0003_m_02_0: Exception closing file /tmp/
dustin/input/input_dataunits/_distcp_tmp_1dk90o/part-01897.bucketfile
attempt_200809261607_0003_m_02_0: java.io.IOException: Could not 
get block locations. Aborting...
attempt_200809261607_0003_m_02_0:   at 
org.apache.hadoop.dfs.DFSClient$DFSOutputStream.processDatanodeError
(DFSClient.java:2143)
attempt_200809261607_0003_m_02_0:   at 
org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access$1400
(DFSClient.java:1735)
attempt_200809261607_0003_m_02_0:   at 
org.apache.hadoop.dfs.DFSClient$DFSOutputStream$DataStreamer.run
(DFSClient.java:1889)

At approximately the same time, we start seeing lots of these errors 
in the namenode log:

2008-09-26 16:19:26,502 WARN org.apache.hadoop.dfs.StateChange: DIR* 
NameSystem.startFile: failed to create file /tmp/dustin/input/
input_dataunits/_distcp_tmp_1dk90o/part-01897.bucketfile for 
DFSClient_attempt_200809261607_0003_m_02_1 on client 10.100.11.83 
because current leaseholder is trying to recreate file.
2008-09-26 16:19:26,502 INFO org.apache.hadoop.ipc.Server: IPC Server 
handler 8 on 7276, call create(/tmp/dustin/input/input_dataunits/
_distcp_tmp_1dk90o/part-01897.bucketfile, rwxr-xr-x, 
DFSClient_attempt_200809261607_0003_m_02_1, true, 3, 67108864) 
from 10.100.11.83:60056: error: 
org.apache.hadoop.dfs.AlreadyBeingCreatedException: failed to create 
file /tmp/dustin/input/input_dataunits/_distcp_tmp_1dk90o/
part-01897.bucketfile for 
DFSClient_attempt_200809261607_0003_m_02_1 on client 10.100.11.83 
because current leaseholder is trying to recreate file.
org.apache.hadoop.dfs.AlreadyBeingCreatedException: failed to create 
file /tmp/dustin/input/input_dataunits/_distcp_tmp_1dk90o/
part-01897.bucketfile for 
DFSClient_attempt_200809261607_0003_m_02_1 on client 10.100.11.83 
because current leaseholder is trying to recreate file.
 at org.apache.hadoop.dfs.FSNamesystem.startFileInternal
(FSNamesystem.java:952)
 at org.apache.hadoop.dfs.FSNamesystem.startFile
(FSNamesystem.java:903)
 at org.apache.hadoop.dfs.NameNode.create(NameNode.java:284)
 at sun.reflect.GeneratedMethodAccessor8.invoke(Unknown Source)
 at sun.reflect.DelegatingMethodAccessorImpl.invoke
(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597)
 at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:452)
 at org.apache.hadoop.ipc.Server$Handler.run(Server.java:888)



Eventually, the job fails because of these errors. Subsequent job 
runs also experience this problem and fail. The only way we've been 
able to recover is to restart the DFS. It doesn't happen every time, 
but it does happen often enough that I'm worried.

Does anyone have any ideas as to why this might be happening? I 
thought that https://issues.apache.org/jira/browse/HADOOP-2669 might 
be the culprit, but today we upgraded to hadoop 0.18.1 and the 
problem still happens.

Thanks,

Bryan




Re: Unknown protocol to name node: JobSubmissionProtocol

2008-07-30 Thread Hairong Kuang
JobClient is supposed to talk a JobTracker. But the stack trace shows that
it talked to a namenode. Could you check your configuration to see if the
jobtracker port # was set to be the same as the namenode port #.

Hairong 


On 7/30/08 6:56 AM, Arv Mistry [EMAIL PROTECTED] wrote:

  
 Can anyone provide any hints as to why this might be happening;
 
 I have hadoop running all process' on one machine (for trouble-shooting)
 and when I go to submit a job from another machine I get the following
 exception;
 
 INFO   | jvm 2| 2008/07/30 06:05:05 | 2008-07-30 06:05:05,117 ERROR
 [HadoopJobTool] java.io.IOException: Unknown protocol to name node:
 org.apache.hadoop.mapred.JobSubmissionProtocol
 INFO   | jvm 2| 2008/07/30 06:05:05 |   at
 org.apache.hadoop.dfs.NameNode.getProtocolVersion(NameNode.java:84)
 INFO   | jvm 2| 2008/07/30 06:05:05 |   at
 sun.reflect.GeneratedMethodAccessor5.invoke(Unknown Source)
 INFO   | jvm 2| 2008/07/30 06:05:05 |   at
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessor
 Impl.java:25)
 INFO   | jvm 2| 2008/07/30 06:05:05 |   at
 java.lang.reflect.Method.invoke(Method.java:597)
 INFO   | jvm 2| 2008/07/30 06:05:05 |   at
 org.apache.hadoop.ipc.RPC$Server.call(RPC.java:446)
 INFO   | jvm 2| 2008/07/30 06:05:05 |   at
 org.apache.hadoop.ipc.Server$Handler.run(Server.java:896)
 INFO   | jvm 2| 2008/07/30 06:05:05 |
 INFO   | jvm 2| 2008/07/30 06:05:05 |
 org.apache.hadoop.ipc.RemoteException: java.io.IOException: Unknown
 protocol to name node: org.apache.hadoop.mapred.JobSubmissionProtocol
 INFO   | jvm 2| 2008/07/30 06:05:05 |   at
 org.apache.hadoop.dfs.NameNode.getProtocolVersion(NameNode.java:84)
 INFO   | jvm 2| 2008/07/30 06:05:05 |   at
 sun.reflect.GeneratedMethodAccessor5.invoke(Unknown Source)
 INFO   | jvm 2| 2008/07/30 06:05:05 |   at
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessor
 Impl.java:25)
 INFO   | jvm 2| 2008/07/30 06:05:05 |   at
 java.lang.reflect.Method.invoke(Method.java:597)
 INFO   | jvm 2| 2008/07/30 06:05:05 |   at
 org.apache.hadoop.ipc.RPC$Server.call(RPC.java:446)
 INFO   | jvm 2| 2008/07/30 06:05:05 |   at
 org.apache.hadoop.ipc.Server$Handler.run(Server.java:896)
 INFO   | jvm 2| 2008/07/30 06:05:05 |
 INFO   | jvm 2| 2008/07/30 06:05:05 |   at
 org.apache.hadoop.ipc.Client.call(Client.java:557)
 INFO   | jvm 2| 2008/07/30 06:05:05 |   at
 org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:212)
 INFO   | jvm 2| 2008/07/30 06:05:05 |   at
 $Proxy4.getProtocolVersion(Unknown Source)
 INFO   | jvm 2| 2008/07/30 06:05:05 |   at
 org.apache.hadoop.ipc.RPC.getProxy(RPC.java:313)
 INFO   | jvm 2| 2008/07/30 06:05:05 |   at
 org.apache.hadoop.ipc.RPC.getProxy(RPC.java:300)
 INFO   | jvm 2| 2008/07/30 06:05:05 |   at
 org.apache.hadoop.mapred.JobClient.createRPCProxy(JobClient.java:383)
 INFO   | jvm 2| 2008/07/30 06:05:05 |   at
 org.apache.hadoop.mapred.JobClient.init(JobClient.java:376)
 INFO   | jvm 2| 2008/07/30 06:05:05 |   at
 org.apache.hadoop.mapred.JobClient.init(JobClient.java:346)
 INFO   | jvm 2| 2008/07/30 06:05:05 |   at
 org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:958)
 INFO   | jvm 2| 2008/07/30 06:05:05 |   at
 com.rialto.profiler.profiler.clickstream.hadoop.HadoopJobTool.run(Hadoop
 JobTool.java:129)
 INFO   | jvm 2| 2008/07/30 06:05:05 |   at
 org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
 INFO   | jvm 2| 2008/07/30 06:05:05 |   at
 com.rialto.profiler.profiler.clickstream.hadoop.HadoopJobTool.launchJob(
 HadoopJobTool.java:142)
 INFO   | jvm 2| 2008/07/30 06:05:05 |   at
 com.rialto.profiler.profiler.clickstream.RawStreamGenerator.run(RawStrea
 mGenerator.java:138)



Re: utility to get block locations for a HDFS file

2008-07-30 Thread Hairong Kuang
Try bin/hadoop fsck.


On 7/30/08 8:23 AM, Jun Rao [EMAIL PROTECTED] wrote:

 
 Hi,
 
 Is there a Hadoop utility that takes a directory and dumps the block
 locations for each file in that directory to a text output? Thanks,
 
 Jun
 IBM Almaden Research Center
 K55/B1, 650 Harry Road, San Jose, CA  95120-6099
 
 [EMAIL PROTECTED]
 (408)927-1886 (phone)
 (408)927-3215 (fax)
 



Re: how does one rebalance data nodes

2008-05-29 Thread Hairong Kuang
If you set dfs.datanode.du.reserved to be 10G, this guarantees that dfs
won't use more than (the total partition space - 10G).

In my opinion, dfs.datanode.du.pct is not of much use. So you can ignore it
for now.

Hairong 

On 5/29/08 8:32 AM, prasana.iyengar [EMAIL PROTECTED] wrote:

 
 1. After adding new data nodes is there a way to force a rebalance the data
 blocks across the new nodes.
 We recently added 6 nodes to the cluster - the original 4 nodes seem to have
 80+% hdfs usage.
 2. In 0.16.0 i also have the following settings in hadoop-site.xml - .
  dfs.datanode.du.reserved - 10G [default = 0]
  dfs.datanode.du.pct - 0.9f [default = 0.98f]
 
 Q:will this stop the fillup of the data node @ 90% and/or 10G remaining on
 the partition [whichever is earlier] ?
 
 thanks,
 -prasana



Re: reading a directory children in DFS?

2008-05-20 Thread Hairong Kuang
Your code is trying to list a directory in the local file system. You should
use the dfs handler instead.
 Path[] children =
  FileUtil.status2paths(dfs.listStatus(parentDirectoryPath));

Hairong



On 5/20/08 8:17 AM, Deyaa Adranale [EMAIL PROTECTED]
wrote:

 hello,
 i have a problem in reading the children of a directory in the
 distributed file system of hadoop:
 when I read the results of the reduce, I know the output folder (which i
 have specified using JobConf), but I don't know the file names inside
 it, and i still does not know how to access them using Java code
 I have tried this:
 
 Configuration conf = new Configuration();
 FileSystem fs = FileSystem.get(conf);
 
 String outputDir = 
 Path inDir = new Path();

 if (!fs.exists(inDir))
   throw new Exception(Directory does not exist);
   
 File jInDir = new RawLocalFileSystem().pathToFile(inDir);
 String[] files = jInDir.list();
 for (int i=0; ifiles.length; i++) {
 Path inFile = new Path(inDir, files[i]);
 
 
 
 but the files array is null, and i get a null pointer exception when
 files.length.
 
 any suggestions? I have searched on the internet, wiki and the archive,
 but could not find something useful.
 
 
 thanks for the help
 
 Deyaa



Re: About HDFS`s Certification and Authorization?

2008-05-16 Thread Hairong Kuang
Release 0.15 does not have any permission/security control. Release 0.16
supports permission control. An initial design of user authentication is
coming soon. A jira issue regarding this will open in the next couple of
weeks. Please contribute if you have any ideas.

Hairong


On 5/16/08 1:32 AM, wangxiaowei [EMAIL PROTECTED] wrote:

 hi,all
   I now use hadoop-0.15.3.Does it`s HDFS have the functionality of
 certification and authorization? So that one user can just access one part of
 HDFS,and cann`t access other parts without permitting?If it does ,how can I
 implement it?
   Thanks a lot.



Re: Balancer not balancing 100%?

2008-05-12 Thread Hairong Kuang
Please check the balancer user guide at
http://issues.apache.org/jira/secure/attachment/12370966/BalancerUserGuide2.
pdf. As stated in the document, a cluster is balanced iff
 |utilization(DNi)-average utilization|threshold
for each datanode DNi,

When you run a balancer, the default threshold is 10%. If you want a cluster
ends up to be more balanced, you may use a smaller threshold.

Good luck,
Hairong

On 5/12/08 10:30 AM, Ted Dunning [EMAIL PROTECTED] wrote:

 
 I think the balancer has a pretty lenient feeling about what balanced
 means.
 
 If you want to shave off the last slivers, try the trick of increasing
 replication on each file, one at a time and then decreasing it after 30-60
 seconds.  You can do this at whatever rate your disk space limits you to
 (i.e. If your disk is 80% full, you can double the replication on 1/4 of
 your files without running out of disk).
 
 
 On 5/11/08 11:48 AM, Otis Gospodnetic [EMAIL PROTECTED] wrote:
 
 Oh, and on top of the above, I just observed that even though bin/hadoop
 balancer exits immediately and reports the cluster is fully balanced, I do
 see
 *very* few blocks (1-2 blocks per node) getting moved every time I run
 balancer.  It feels as if the balancer does actually find some blocks that it
 could move around, moves them, but then quickly gets lazy and just exits
 claiming the cluster is/was already balanced.  I just ran balancer about 10
 times and each time it moved a couple of blocks and then exited.
 
 Makes me want to do ugly stuff like:
 for ((i=1; i = ; i++)); do echo $i; bin/hadoop balancer; done
 
 
 ...just to get to the point where all 4 nodes have the same number of blocks
 and thus the same percentage of disk used...
 
 Otis
 --
 Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
 
 
 - Original Message 
 From: Otis Gospodnetic [EMAIL PROTECTED]
 To: core-user@hadoop.apache.org
 Sent: Sunday, May 11, 2008 2:36:24 PM
 Subject: Balancer not balancing 100%?
 
 Hi,
 
 I have 4 identical nodes in a Hadoop cluster (all functioning as DNs).  One
 of 
 the 4 nodes is a new node that I recently added.  I ran the balancer a few
 times 
 and it did move some of the blocks from the other 3 nodes to the new node.
 However, the 4 nodes are still not 100% balanced (according to the GUI),
 even
 though running bin/hadoop balancer says the cluster is balanced:
 
 Time Stamp   Iteration#  Bytes Already Moved  Bytes Left To Move
 Bytes Being Moved
 The cluster is balanced. Exiting...
 Balancing took 666.0 milliseconds
 
 
 The 3 old DNs are about 60% full (around 24K blocks), which the 1 new DN is
 only 
 about 50% full (around 21K blocks).  I restarted the NN and re-ran the
 balancer, 
 bug got the same output: The cluster is balanced. Exiting...
 
 Is this a bug or is it somehow possible for a cluster to be balanced, yet
 have 
 nodes with different number of blocks?
 
 Thanks,
 Otis
 
 



Re: How to re-balance, NN safe mode

2008-05-09 Thread Hairong Kuang
Otis,

I would recommend you to follow the following steps:
1. Bring up all 4 DNs (both old and new).
2. Decommission the DN that you want to remove. See
http://wiki.apache.org/hadoop/FAQ#17
3. Run balancer

Hairong


On 5/8/08 9:11 PM, Otis Gospodnetic [EMAIL PROTECTED] wrote:

 Hi,
 
 (I should prefix this by saying that bin/hadoop fsck reported corrupt HDFS
 after I replaced one of the DNs with a new/empty DN)
 
 I've removed 1 old DN and added 1 new DN .  The cluster has 4 nodes total (all
 4 act as DNs) and replication factor of 3.  I'm trying to re-balance the data
 by following http://wiki.apache.org/hadoop/FAQ#6:
 - I stopped all daemons
 - I removed the old DN and added the new DN to conf/slaves
 - I started all daemons
 
 The new DN shows in the JT and NN GUIs and bin/hadoop dfsadmin -report shows
 it.  At this point I expected NN to figure out that it needs to re-balance
 under-replicated blocks and start pushing data to the new DN.  However, no
 data got copied to the new DN.  I pumped the replication factor to 6 and
 restarted all daemons, but still nothing.  I noticed the NN GUI says the NN is
 in safe mode, but it has been stuck there for 10+ minutes now - too long, it
 seems.
 
 I then tried running bin/hadoop balancer, but got this:
 
  
 $ bin/hadoop balancer
 Received an IO exception: org.apache.hadoop.dfs.SafeModeException: Cannot
 create file/system/balancer.id. Name node is in safe mode.
 Safe mode will be turned off automatically.
 at 
 org.apache.hadoop.dfs.FSNamesystem.startFileInternal(FSNamesystem.java:947)
 at org.apache.hadoop.dfs.FSNamesystem.startFile(FSNamesystem.java:931)
 ...
 ...
 
 So now I'm wondering what steps one need to follow when replacing a DN?  Just
 pulling it out and listing a new one in conf/slaves leads to NN getting into
 the permanent(?) safe mode, it seems.
 
 I know I can run bin/hadoop dfsadmin -safemode leave  but is that safe? ;)
 If I do that, will I then be able to run bin/hadoop balancer and get some
 replicas of the old HDFS data on the newly added DN?
 
 Thanks,
 Otis
 --
 Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
 



Re: Corrupt HDFS and salvaging data

2008-05-09 Thread Hairong Kuang
A default replication factor of 3 does not mean that every block's
replication factor in the file system is 3.
In case (1), some blocks have a replication factor which is less than 3. So
the average replication factor is less than 3. But no missing replicas. In
case 2, some blocks have zero replicas, so only 92.72564% are minimally
replicated. Those missing blocks must have a replication factor of 1 and
were placed on the removed DN.

Hairong

On 5/9/08 7:16 AM, Otis Gospodnetic [EMAIL PROTECTED] wrote:

 Hi,
 
 Here are 2 bin/hadoop fsck / -files -blocks locations reports:
 
 1) For the old HDFS cluster, reportedly HEALTHY, but with this inconsistency:
 
 http://www.krumpir.com/fsck-old.txt.zip   (  1MB)
 
 Total blocks:  32264 (avg. block size 11591245 B)
 Minimally replicated blocks:   32264 (100.0 %) == looks GOOD, matches
 Total blocks
 Over-replicated blocks:0 (0.0 %)
 Under-replicated blocks:   0 (0.0 %)
 Mis-replicated blocks: 0 (0.0 %)
 Default replication factor:3   == should
 have 3 copies of each block
 Average block replication: 2.418051== ???  shouldn't
 this be 3??
 Missing replicas:  0 (0.0 %)==
 if the above is 2.41... how can I have 0 missing replicas?
 
 2) For the cluster with 1 old DN replaced with 1 new DN:
 
 http://www.krumpir.com/fsck-1newDN.txt.zip (  800KB)
 
  Minimally replicated blocks:   29917 (92.72564 %)
  Over-replicated blocks:0 (0.0 %)
  Under-replicated blocks:   17124 (53.074635 %)
  Mis-replicated blocks: 0 (0.0 %)
  Default replication factor:3
  Average block replication: 1.8145611
  Missing replicas:  17124 (29.249296 %)
 
 
 
 Any help would be appreciated.
 
 Thanks,
 Otis
 --
 Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
 
 
 - Original Message 
 From: lohit [EMAIL PROTECTED]
 To: core-user@hadoop.apache.org
 Sent: Friday, May 9, 2008 2:47:39 AM
 Subject: Re: Corrupt HDFS and salvaging data
 
 When you say all daemons, do you mean the entire cluster, including the
 namenode?
 According to your explanation, this means that after I removed 1 DN I
 started 
 missing about 30% of the blocks, right?
 No, You would only miss the replica. If all of your blocks have replication
 factor of 3, then you would miss only one replica which was on this DN.
 
 It would be good to see full report
 could you run hadoop fsck / -files -blocks -location?
 
 That would give you much more detailed information.
 
 
 - Original Message 
 From: Otis Gospodnetic
 To: core-user@hadoop.apache.org
 Sent: Thursday, May 8, 2008 10:54:53 PM
 Subject: Re: Corrupt HDFS and salvaging data
 
 Lohit,
 
 
 I run fsck after I replaced 1 DN (with data on it) with 1 blank DN and
 started 
 all daemons.
 I see the fsck report does include this:
 Missing replicas:  17025 (29.727087 %)
 
 According to your explanation, this means that after I removed 1 DN I started
 missing about 30% of the blocks, right?
 Wouldn't that mean that 30% of all blocks were *only* on the 1 DN that I
 removed?  But how could that be when I have replication factor of 3?
 
 If I run bin/hadoop balancer with my old DN back in the cluster (and new DN
 removed), I do get the happy The cluster is balanced response.  So wouldn't
 that mean that everything is peachy and that if my replication factor is 3
 then 
 when I remove 1 DN, I should have only some portion of blocks
 under-replicated,
 but not *completely* missing from HDFS?
 
 Thanks,
 Otis
 --
 Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
 
 
 - Original Message 
 From: lohit 
 To: core-user@hadoop.apache.org
 Sent: Friday, May 9, 2008 1:33:56 AM
 Subject: Re: Corrupt HDFS and salvaging data
 
 Hi Otis,
 
 Namenode has location information about all replicas of a block. When you
 run 
 fsck, namenode checks for those replicas. If all replicas are missing, then
 fsck 
 reports the block as missing. Otherwise they are added to under replicated
 blocks. If you specify -move or -delete option along with fsck, files with
 such 
 missing blocks are moved to /lost+found or deleted depending on the option.
 At what point did you run the fsck command, was it after the datanodes were
 stopped? When you run namenode -format it would delete directories specified
 in 
 dfs.name.dir. If directory exists it would ask for confirmation.
 
 Thanks,
 Lohit
 
 - Original Message 
 From: Otis Gospodnetic
 To: core-user@hadoop.apache.org
 Sent: Thursday, May 8, 2008 9:00:34 PM
 Subject: Re: Corrupt HDFS and salvaging data
 
 Hi,
 
 Update:
 It seems fsck reports HDFS is corrupt when a significant-enough number of
 block 
 replicas is missing (or something like that).
 fsck reported corrupt HDFS after I replaced 1 old DN with 1 new DN.  After I
 restarted Hadoop with the old set of DNs, fsck stopped reporting corrupt
 HDFS 
 and started reporting *healthy* HDFS.
 
 

Re: Corrupt HDFS and salvaging data

2008-05-09 Thread Hairong Kuang
Default replication factor takes effect only at the file creation time.

If you want to increase the replication factor of existing blocks, you need
to run command hadoop fs - setrep.

It's better to finish decommission first, remove the old DN, and then
rebalance. Rebalancing moves blocks around but does not replicate blocks.

Hope it helps,
Hairong

On 5/9/08 9:38 AM, Otis Gospodnetic [EMAIL PROTECTED] wrote:

 Hi,
 
 A default replication factor of 3 does not mean that every block's
 
 replication factor in the file system is 3.
 
 Hm, and I thought that is exactly what it meant.  What does it mean then?  Or
 are you saying:
 The number of block replicas matches the r.f. that was in place when the block
 was *created* ?
 
 In case (1), some blocks have a replication factor which is less than 3. So
 the average replication factor is less than 3. But no missing replicas.
 
 Makes sense.  Most likely due to the repl. fact. being = 1 at some point.  But
 then why does bin/hadoop balancer tell me that the cluster is balanced?  Does
 it not take into consideration the *current* replication factor?
 
 In case 2, some blocks have zero replicas, so only 92.72564% are minimally
 replicated. Those missing blocks must have a replication factor of 1 and
 were placed on the removed DN.
 
 Makes sense.  So there are two things that need to be done:
 - get the blocks on the about to be removed DN off of that DN, so copies exist
 elsewhere (decommissioning)
 - get the cluster to re-balance, factoring in the *current* replication
 factor. (re-balancing)
 
 Is this correct?
 
 I think that's what your other email said (FAQ #17).  I'm doing that now and
 it seems to be progressing, although I started the balancer immediately after
 running dfsadmin -refreshNodes (it didn't block, so I thought it didn't
 work...).  I hope the fact that decomission and balancer are running
 simultaneously doesn't cause problems...
 
 Thanks!
 Otis
 
 
 On 5/9/08 7:16 AM, Otis Gospodnetic wrote:
 
 Hi,
 
 Here are 2 bin/hadoop fsck / -files -blocks locations reports:
 
 1) For the old HDFS cluster, reportedly HEALTHY, but with this
 inconsistency:
 
 http://www.krumpir.com/fsck-old.txt.zip   (  1MB)
 
 Total blocks:  32264 (avg. block size 11591245 B)
 Minimally replicated blocks:   32264 (100.0 %) == looks GOOD,
 matches
 Total blocks
 Over-replicated blocks:0 (0.0 %)
 Under-replicated blocks:   0 (0.0 %)
 Mis-replicated blocks: 0 (0.0 %)
 Default replication factor:3   ==
 should
 have 3 copies of each block
 Average block replication: 2.418051== ???
 shouldn't
 this be 3??
 Missing replicas:  0 (0.0 %)==
 if the above is 2.41... how can I have 0 missing replicas?
 
 2) For the cluster with 1 old DN replaced with 1 new DN:
 
 http://www.krumpir.com/fsck-1newDN.txt.zip (  800KB)
 
  Minimally replicated blocks:   29917 (92.72564 %)
  Over-replicated blocks:0 (0.0 %)
  Under-replicated blocks:   17124 (53.074635 %)
  Mis-replicated blocks: 0 (0.0 %)
  Default replication factor:3
  Average block replication: 1.8145611
  Missing replicas:  17124 (29.249296 %)
 
 
 
 Any help would be appreciated.
 
 Thanks,
 Otis
 --
 Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
 
 
 - Original Message 
 From: lohit 
 To: core-user@hadoop.apache.org
 Sent: Friday, May 9, 2008 2:47:39 AM
 Subject: Re: Corrupt HDFS and salvaging data
 
 When you say all daemons, do you mean the entire cluster, including the
 namenode?
 According to your explanation, this means that after I removed 1 DN I
 started 
 missing about 30% of the blocks, right?
 No, You would only miss the replica. If all of your blocks have replication
 factor of 3, then you would miss only one replica which was on this DN.
 
 It would be good to see full report
 could you run hadoop fsck / -files -blocks -location?
 
 That would give you much more detailed information.
 
 
 - Original Message 
 From: Otis Gospodnetic
 To: core-user@hadoop.apache.org
 Sent: Thursday, May 8, 2008 10:54:53 PM
 Subject: Re: Corrupt HDFS and salvaging data
 
 Lohit,
 
 
 I run fsck after I replaced 1 DN (with data on it) with 1 blank DN and
 started 
 all daemons.
 I see the fsck report does include this:
 Missing replicas:  17025 (29.727087 %)
 
 According to your explanation, this means that after I removed 1 DN I
 started
 missing about 30% of the blocks, right?
 Wouldn't that mean that 30% of all blocks were *only* on the 1 DN that I
 removed?  But how could that be when I have replication factor of 3?
 
 If I run bin/hadoop balancer with my old DN back in the cluster (and new DN
 removed), I do get the happy The cluster is balanced response.  So
 wouldn't
 that mean that everything is peachy and that if my replication factor is 3
 then 
 when I remove 1 DN, I should have only some portion of blocks
 

Re: could only be replicated to 0 nodes, instead of 1

2008-05-08 Thread Hairong Kuang
Could you please go to the dfs webUI and check how many datanodes are up and
how much available space each has?

Hairong


On 5/8/08 3:30 AM, jasongs [EMAIL PROTECTED] wrote:

 
 I get the same error when doing a put and my cluster is running ok
 
 i.e. has capacity and all nodes are live.
 Error message is
 org.apache.hadoop.ipc.RemoteException: java.io.IOException: File
 /test/test.txt could only be replicated to 0 nodes, instead of 1
 at
 org.apache.hadoop.dfs.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1127)
 at org.apache.hadoop.dfs.NameNode.addBlock(NameNode.java:312)
 at sun.reflect.GeneratedMethodAccessor11.invoke(Unknown Source)
 at
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.j
 ava:25)
 at java.lang.reflect.Method.invoke(Method.java:585)
 at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:409)
 at org.apache.hadoop.ipc.Server$Handler.run(Server.java:901)
 
 at org.apache.hadoop.ipc.Client.call(Client.java:512)
 at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:198)
 at org.apache.hadoop.dfs.$Proxy0.addBlock(Unknown Source)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
 at
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.j
 ava:25)
 at java.lang.reflect.Method.invoke(Method.java:585)
 at
 org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocation
 Handler.java:82)
 at
 org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandle
 r.java:59)
 at org.apache.hadoop.dfs.$Proxy0.addBlock(Unknown Source)
 at
 org.apache.hadoop.dfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient
 .java:2074)
 at
 org.apache.hadoop.dfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClien
 t.java:1967)
 at
 org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access$1500(DFSClient.java:148
 7)
 at
 org.apache.hadoop.dfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.jav
 a:1601)
 I would appreciate any help/suggestions
 
 Thanks
 
 
 jerrro wrote:
 
 I am trying to install/configure hadoop on a cluster with several
 computers. I followed exactly the instructions in the hadoop website for
 configuring multiple slaves, and when I run start-all.sh I get no errors -
 both datanode and tasktracker are reported to be running (doing ps awux |
 grep hadoop on the slave nodes returns two java processes). Also, the log
 files are empty - nothing is printed there. Still, when I try to use
 bin/hadoop dfs -put,
 I get the following error:
 
 # bin/hadoop dfs -put w.txt w.txt
 put: java.io.IOException: File /user/scohen/w4.txt could only be
 replicated to 0 nodes, instead of 1
 
 and a file of size 0 is created on the DFS (bin/hadoop dfs -ls shows it).
 
 I couldn't find much information about this error, but I did manage to see
 somewhere it might mean that there are no datanodes running. But as I
 said, start-all does not give any errors. Any ideas what could be problem?
 
 Thanks.
 
 Jerr.
 



Re: Where is the files?

2008-05-07 Thread Hairong Kuang
DFS files are mapped into blocks. Blocks are stored under
dfs.data.dir/current.

Hairong


On 5/7/08 7:36 AM, hong [EMAIL PROTECTED] wrote:

 Hi All,
 
 I started Hadoop in standalone mode, and put some file on to HDSF. I
 strictly followed the instructions in Hadoop Quick Start.
 
 HDSF is mapped to a local directory in my local file system, right?
 and where is it?
 
 Thank you in advance!
 
 



Re: Read timed out, Abandoning block blk_-5476242061384228962

2008-05-07 Thread Hairong Kuang
Taking the timeout out is very dangerous. It may cause your application to
hang. You could change the timeout parameter to a larger number. HADOOP-2188
fixed the problem. Check https://issues.apache.org/jira/browse/HADOOP-2188.

Hairong

On 5/7/08 2:36 PM, James Moore [EMAIL PROTECTED] wrote:

 I noticed that there was a hard-coded timeout value of 6000 (ms) in
 src/java/org/apache/hadoop/dfs/DFSClient.java - as an experiment, I
 took that way down and now I'm not noticing the problem.  (Doesn't
 mean it's not there, I just don't feel the pain...)
 
 This feels like a terrible solution^H^H^H^H^H^hack though,
 particularly since I haven't yet taken the time to actually understand
 the code.



Re: Where are passed the JobConf?

2008-04-14 Thread Hairong Kuang
JobConf gets passed to a mapper in Mapper.configure(JobConf job). Check
http://hadoop.apache.org/core/docs/r0.16.1/api/org/apache/hadoop/mapred/MapR
educeBase.html#configure(org.apache.hadoop.mapred.JobConf)

Hairong


On 4/13/08 11:44 PM, Steve Han [EMAIL PROTECTED] wrote:

 I am reading Map/Reduce tutorial in official site of hadoop core.It said
 that Overall, Mapper implementations are passed the JobConf for the job via
 the 
 JobConfigurable.configure(JobConf)http://hadoop.apache.org/core/docs/r0.16.1/
 api/org/apache/hadoop/mapred/JobConfigurable.html#configure%28org.apache.hadoo
 p.mapred.JobConf%29method
 and override it to initialize themselves.Where  is  the  place  in
 the  code  JobConf  is  passed to  Mapper implementation(in WordCount. v1.0
 or v2.0)?Any idea?Thanks  a  lot.



Re: HDFS interface

2008-03-12 Thread Hairong Kuang
http://wiki.apache.org/hadoop/HadoopDfsReadWriteExample

Hairong


On 3/12/08 1:21 PM, Arun C Murthy [EMAIL PROTECTED] wrote:

 
 http://hadoop.apache.org/core/docs/r0.16.0/hdfs_user_guide.html
 
 Arun
 
 On Mar 12, 2008, at 1:16 PM, Cagdas Gerede wrote:
 
 I would like to use HDFS component of Hadoop but not interested in
 MapReduce.
 All the Hadoop examples I have seen so far uses MapReduce classes
 and from
 these examples there is no reference to HDFS classes including File
 System
 API of Hadoop (http://hadoop.apache.org/core/docs/current/api/org/
 apache/hadoop/fs/FileSystem.html
 )http://hadoop.apache.org/core/docs/current/api/org/apache/hadoop/
 fs/FileSystem.html
 Everything seems to happen under the hood.
 
 I was wondering if there is any example source code that is using HDFS
 directly.
 
 
 Thanks,
 
 - CEG
 



Re: HDFS interface

2008-03-12 Thread Hairong Kuang
If you add the configuration directory to the class path, the configuration
files will be automatically loaded.

Hairong


On 3/12/08 5:32 PM, Cagdas Gerede [EMAIL PROTECTED] wrote:

 I found the solution.  Please let me know if you have a better idea.
 
 I added the following addResource lines.
 
 Configuration conf = new Configuration();
 
 conf.addResource(new Path(location_of_hadoop-default.xml));
 conf.addResource(new Path(location_of_hadoop-site.xml));
 
 FileSystem fs = FileSystem.get(conf);
 
 (Would be good to update the wiki page).
 
 - CEG
 
 
 On Wed, Mar 12, 2008 at 5:04 PM, Cagdas Gerede [EMAIL PROTECTED]
 wrote:
 
 I see the following paragraphs in the wiki (
 http://wiki.apache.org/hadoop/HadoopDfsReadWriteExample)http://wiki.apache.o
 rg/hadoop/HadoopDfsReadWriteExample
 
 Create a [image: [WWW]]
 FileSystemhttp://hadoop.apache.org/core/api/org/apache/hadoop/fs/FileSystem
 .htmlinstance by passing a new Configuration object. Please note that the
 following example code assumes that the Configuration object will
 automatically load the *hadoop-default.xml* and
 *hadoop-site.xml*configuration files. You may need to explicitly add these
 resource paths if
 you are not running inside of the Hadoop runtime environment.
 
 and
 
 Configuration conf = new Configuration();
FileSystem fs = FileSystem.get(conf);
 
 When I do
 
 Path[] apples = fs.globPaths(new Path(*));
 for(Path apple : apples) {
 System.out.println(apple);
 }
 
 
 It prints out all the local file names.
 
 How do I point my application to running HDFS instance?
 What does explicitly add these resource paths if you are not running
 inside of the Hadoop runtime environment. mean?
 
 Thanks,
 
 - CEG
 
 
 
 
 



Re: Does Hadoop Honor Reserved Space?

2008-03-10 Thread Hairong Kuang
I think you have a misunderstanding of the reserved parameter. As I
commented on hadoop-1463, remember that dfs.du.reserve is the space for
non-dfs usage, including the space for map/reduce, other application, fs
meta-data etc. In your case since /usr already takes 45GB, it far exceeds
the reserved limit 1G. You should set the reserved space to be 50G.

Hairong


On 3/10/08 4:54 PM, Joydeep Sen Sarma [EMAIL PROTECTED] wrote:

 Filed https://issues.apache.org/jira/browse/HADOOP-2991
 
 -Original Message-
 From: Joydeep Sen Sarma [mailto:[EMAIL PROTECTED]
 Sent: Monday, March 10, 2008 12:56 PM
 To: core-user@hadoop.apache.org; core-user@hadoop.apache.org
 Cc: Pete Wyckoff
 Subject: RE: Does Hadoop Honor Reserved Space?
 
 folks - Jimmy is right - as we have unfortunately hit it as well:
 
 https://issues.apache.org/jira/browse/HADOOP-1463 caused a regression.
 we have left some comments on the bug - but can't reopen it.
 
 this is going to be affecting all 0.15 and 0.16 deployments!
 
 
 -Original Message-
 From: Hairong Kuang [mailto:[EMAIL PROTECTED]
 Sent: Thu 3/6/2008 2:01 PM
 To: core-user@hadoop.apache.org
 Subject: Re: Does Hadoop Honor Reserved Space?
  
 In addition to the version, could you please send us a copy of the
 datanode
 report by running the command bin/hadoop dfsadmin -report?
 
 Thanks,
 Hairong
 
 
 On 3/6/08 11:56 AM, Joydeep Sen Sarma [EMAIL PROTECTED] wrote:
 
 but intermediate data is stored in a different directory from dfs/data
 (something like mapred/local by default i think).
 
 what version are u running?
 
 
 -Original Message-
 From: Ashwinder Ahluwalia on behalf of [EMAIL PROTECTED]
 Sent: Thu 3/6/2008 10:14 AM
 To: core-user@hadoop.apache.org
 Subject: RE: Does Hadoop Honor Reserved Space?
  
 I've run into a similar issue in the past. From what I understand,
 this
 parameter only controls the HDFS space usage. However, the
 intermediate data
 in
 the map reduce job is stored on the local file system (not HDFS) and
 is not
 subject to this configuration.
 
 In the past I have used mapred.local.dir.minspacekill and
 mapred.local.dir.minspacestart to control the amount of space that is
 allowable
 for use by this temporary data.
 
 Not sure if that is the best approach though, so I'd love to hear what
 other
 people have done. In your case, you have a map-red job that will
 consume too
 much space (without setting a limit, you didn't have enough disk
 capacity for
 the job), so looking at mapred.output.compress and
 mapred.compress.map.output
 might be useful to decrease the job's disk requirements.
 
 --Ash
 
 -Original Message-
 From: Jimmy Wan [mailto:[EMAIL PROTECTED]
 Sent: Thursday, March 06, 2008 9:56 AM
 To: core-user@hadoop.apache.org
 Subject: Does Hadoop Honor Reserved Space?
 
 I've got 2 datanodes setup with the following configuration parameter:
 property
  namedfs.datanode.du.reserved/name
  value429496729600/value
  descriptionReserved space in bytes per volume. Always leave this
 much  
 space free for non dfs use.
  /description
 /property
 
 Both are housed on 800GB volumes, so I thought this would keep about
 half
 the volume free for non-HDFS usage.
 
 After some long running jobs last night, both disk volumes were
 completely
 filled. The bulk of the data was in:
 ${my.hadoop.tmp.dir}/hadoop-hadoop/dfs/data
 
 This is running as the user hadoop.
 
 Am I interpretting these parameters incorrectly?
 
 I noticed this issue, but it is marked as closed:
 http://issues.apache.org/jira/browse/HADOOP-2549
 
 
 



Re: Does Hadoop Honor Reserved Space?

2008-03-06 Thread Hairong Kuang
In addition to the version, could you please send us a copy of the datanode
report by running the command bin/hadoop dfsadmin -report?

Thanks,
Hairong


On 3/6/08 11:56 AM, Joydeep Sen Sarma [EMAIL PROTECTED] wrote:

 but intermediate data is stored in a different directory from dfs/data
 (something like mapred/local by default i think).
 
 what version are u running?
 
 
 -Original Message-
 From: Ashwinder Ahluwalia on behalf of [EMAIL PROTECTED]
 Sent: Thu 3/6/2008 10:14 AM
 To: core-user@hadoop.apache.org
 Subject: RE: Does Hadoop Honor Reserved Space?
  
 I've run into a similar issue in the past. From what I understand, this
 parameter only controls the HDFS space usage. However, the intermediate data
 in
 the map reduce job is stored on the local file system (not HDFS) and is not
 subject to this configuration.
 
 In the past I have used mapred.local.dir.minspacekill and
 mapred.local.dir.minspacestart to control the amount of space that is
 allowable
 for use by this temporary data.
 
 Not sure if that is the best approach though, so I'd love to hear what other
 people have done. In your case, you have a map-red job that will consume too
 much space (without setting a limit, you didn't have enough disk capacity for
 the job), so looking at mapred.output.compress and mapred.compress.map.output
 might be useful to decrease the job's disk requirements.
 
 --Ash
 
 -Original Message-
 From: Jimmy Wan [mailto:[EMAIL PROTECTED]
 Sent: Thursday, March 06, 2008 9:56 AM
 To: core-user@hadoop.apache.org
 Subject: Does Hadoop Honor Reserved Space?
 
 I've got 2 datanodes setup with the following configuration parameter:
 property
  namedfs.datanode.du.reserved/name
  value429496729600/value
  descriptionReserved space in bytes per volume. Always leave this
 much  
 space free for non dfs use.
  /description
 /property
 
 Both are housed on 800GB volumes, so I thought this would keep about half
 the volume free for non-HDFS usage.
 
 After some long running jobs last night, both disk volumes were completely
 filled. The bulk of the data was in:
 ${my.hadoop.tmp.dir}/hadoop-hadoop/dfs/data
 
 This is running as the user hadoop.
 
 Am I interpretting these parameters incorrectly?
 
 I noticed this issue, but it is marked as closed:
 http://issues.apache.org/jira/browse/HADOOP-2549