RE: Jobtracker could only be replicated to 0 nodes instead of 1
I had the same issue. Can you try disabling firewall from both datanode and resourcemanager using sudo /etc/init.d/iptables stop? Regards, Smita From: Sindhu Hosamane [mailto:sindh...@gmail.com] Sent: Saturday, August 16, 2014 2:09 AM To: user@hadoop.apache.org Subject: Re: Jobtracker could only be replicated to 0 nodes instead of 1 When i checked hadoop dfsadmin -report then i could see 1 datanode is up .so i assume datanode is working . Also i see all the 5 daemons running when i run jps command. The only error i saw is in nam node logs is that job tracker.info could only be replicated to 0 nodes instead of 1 . Rest in other logs i found no error just some warnings EOF exception. Just attached logs for reference. Please point me what should be corrected.
Jobtracker could only be replicated to 0 nodes instead of 1
Hello friends, I got the above error jobtarcker.info could only be replicated to 0 nodes instead of 1 Tried different Solutions found on web : * Formatted namenode * removed tmp Folder * cleaned uncessary logs just to have more space But still no success . What other Solutions could it be ? your advices would be appreciated.Thank you. Regards, shosaman
Re: Jobtracker could only be replicated to 0 nodes instead of 1
you have set replication factor to 1, I am assuming its running a single node cluster. i would recommend you to check the datanode logs to see if it was able to connect with namenode successfully. On Fri, Aug 15, 2014 at 1:58 PM, sindhu hosamane sindh...@gmail.com wrote: Hello friends, I got the above error jobtarcker.info could only be replicated to 0 nodes instead of 1 Tried different Solutions found on web : * Formatted namenode * removed tmp Folder * cleaned uncessary logs just to have more space But still no success . What other Solutions could it be ? your advices would be appreciated.Thank you. Regards, shosaman -- Nitin Pawar
could only be replicated to 0 nodes, instead of 1
I've been running up against the good old fashioned replicated to 0 nodes gremlin quite a bit recently. My system (a set of processes interacting with hadoop, and of course hadoop itself) runs for a while (a day or so) and then I get plagued with these errors. This is a very simple system, a single node running pseudo-distributed. Obviously, the replication factor is implicitly 1 and the datanode is the same machine as the namenode. None of the typical culprits seem to explain the situation and I'm not sure what to do. I'm also not sure how I'm getting around it so far. I fiddle desperately for a few hours and things start running again, but that's not really a solution...I've tried stopping and restarting hdfs, but that doesn't seem to improve things. So, to go through the common suspects one by one, as quoted on http://wiki.apache.org/hadoop/CouldOnlyBeReplicatedTo: • No DataNode instances being up and running. Action: look at the servers, see if the processes are running. I can interact with hdfs through the command line (doing directory listings for example). Furthermore, I can see that the relevant java processes are all running (NameNode, SecondaryNameNode, DataNode, JobTracker, TaskTracker). • The DataNode instances cannot talk to the server, through networking or Hadoop configuration problems. Action: look at the logs of one of the DataNodes. Obviously irrelevant in a single-node scenario. Anyway, like I said, I can perform basic hdfs listings, I just can't upload new data. • Your DataNode instances have no hard disk space in their configured data directories. Action: look at the dfs.data.dir list in the node configurations, verify that at least one of the directories exists, and is writeable by the user running the Hadoop processes. Then look at the logs. There's plenty of space, at least 50GB. • Your DataNode instances have run out of space. Look at the disk capacity via the Namenode web pages. Delete old files. Compress under-used files. Buy more disks for existing servers (if there is room), upgrade the existing servers to bigger drives, or add some more servers. Nope, 50GBs free, I'm only uploading a few KB at a time, maybe a few MB. • The reserved space for a DN (as set in dfs.datanode.du.reserved is greater than the remaining free space, so the DN thinks it has no free space I grepped all the files in the conf directory and couldn't find this parameter so I don't really know anything about it. At any rate, it seems rather esoteric, I doubt it is related to my problem. Any thoughts on this? • You may also get this message due to permissions, eg if JT can not create jobtracker.info on startup. Meh, like I said, the system basicaslly works...and then stops working. The only explanation that would really make sense in that context is running out of space...which isn't happening. If this were a permission error, or a configuration error, or anything weird like that, then the whole system would never get up and running in the first place. Why would a properly running hadoop system start exhibiting this error without running out of disk space? THAT's the real question on the table here. Any ideas? Keith Wiley kwi...@keithwiley.com keithwiley.commusic.keithwiley.com Yet mark his perfect self-contentment, and hence learn his lesson, that to be self-contented is to be vile and ignorant, and that to aspire is better than to be blindly and impotently happy. -- Edwin A. Abbott, Flatland
Re: could only be replicated to 0 nodes, instead of 1
- A datanode is typically kept free with up to 5 free blocks (HDFS block size) of space. - Disk space is used by mapreduce jobs to store temporary shuffle spills also. This is what dfs.datanode.du.reserved is used to configure. The configuration is available in hdfs-site.xml. If you have not configured it then reserved space is 0. Not only mapreduce, other files also might take up the disk space. When these errors are thrown, please send the namenode web UI information. It has storage related information in the cluster summary. That will help debug. On Tue, Sep 4, 2012 at 9:41 AM, Keith Wiley kwi...@keithwiley.com wrote: I've been running up against the good old fashioned replicated to 0 nodes gremlin quite a bit recently. My system (a set of processes interacting with hadoop, and of course hadoop itself) runs for a while (a day or so) and then I get plagued with these errors. This is a very simple system, a single node running pseudo-distributed. Obviously, the replication factor is implicitly 1 and the datanode is the same machine as the namenode. None of the typical culprits seem to explain the situation and I'm not sure what to do. I'm also not sure how I'm getting around it so far. I fiddle desperately for a few hours and things start running again, but that's not really a solution...I've tried stopping and restarting hdfs, but that doesn't seem to improve things. So, to go through the common suspects one by one, as quoted on http://wiki.apache.org/hadoop/CouldOnlyBeReplicatedTo: • No DataNode instances being up and running. Action: look at the servers, see if the processes are running. I can interact with hdfs through the command line (doing directory listings for example). Furthermore, I can see that the relevant java processes are all running (NameNode, SecondaryNameNode, DataNode, JobTracker, TaskTracker). • The DataNode instances cannot talk to the server, through networking or Hadoop configuration problems. Action: look at the logs of one of the DataNodes. Obviously irrelevant in a single-node scenario. Anyway, like I said, I can perform basic hdfs listings, I just can't upload new data. • Your DataNode instances have no hard disk space in their configured data directories. Action: look at the dfs.data.dir list in the node configurations, verify that at least one of the directories exists, and is writeable by the user running the Hadoop processes. Then look at the logs. There's plenty of space, at least 50GB. • Your DataNode instances have run out of space. Look at the disk capacity via the Namenode web pages. Delete old files. Compress under-used files. Buy more disks for existing servers (if there is room), upgrade the existing servers to bigger drives, or add some more servers. Nope, 50GBs free, I'm only uploading a few KB at a time, maybe a few MB. • The reserved space for a DN (as set in dfs.datanode.du.reserved is greater than the remaining free space, so the DN thinks it has no free space I grepped all the files in the conf directory and couldn't find this parameter so I don't really know anything about it. At any rate, it seems rather esoteric, I doubt it is related to my problem. Any thoughts on this? • You may also get this message due to permissions, eg if JT can not create jobtracker.info on startup. Meh, like I said, the system basicaslly works...and then stops working. The only explanation that would really make sense in that context is running out of space...which isn't happening. If this were a permission error, or a configuration error, or anything weird like that, then the whole system would never get up and running in the first place. Why would a properly running hadoop system start exhibiting this error without running out of disk space? THAT's the real question on the table here. Any ideas? Keith Wiley kwi...@keithwiley.com keithwiley.com music.keithwiley.com Yet mark his perfect self-contentment, and hence learn his lesson, that to be self-contented is to be vile and ignorant, and that to aspire is better than to be blindly and impotently happy. -- Edwin A. Abbott, Flatland -- http://hortonworks.com/download/
Re: could only be replicated to 0 nodes, instead of 1
On Sep 4, 2012, at 10:05 , Suresh Srinivas wrote: When these errors are thrown, please send the namenode web UI information. It has storage related information in the cluster summary. That will help debug. Sure thing. Thanks. Here's what I currently see. It looks like the problem isn't the datanode, but rather the namenode. Would you agree with that assessment? NameNode 'localhost:9000' Started: Tue Sep 04 10:06:52 PDT 2012 Version: 0.20.2-cdh3u3, 03b655719d13929bd68bb2c2f9cee615b389cea9 Compiled:Thu Jan 26 11:55:16 PST 2012 by root from Unknown Upgrades:There are no upgrades in progress. Browse the filesystem Namenode Logs Cluster Summary Safe mode is ON. Resources are low on NN. Safe mode must be turned off manually. 1639 files and directories, 585 blocks = 2224 total. Heap Size is 39.55 MB / 888.94 MB (4%) Configured Capacity : 49.21 GB DFS Used : 9.9 MB Non DFS Used : 2.68 GB DFS Remaining: 46.53 GB DFS Used%: 0.02 % DFS Remaining% : 94.54 % Live Nodes : 1 Dead Nodes : 0 Decommissioning Nodes: 0 Number of Under-Replicated Blocks: 5 NameNode Storage: Storage Directory TypeState /var/lib/hadoop-0.20/cache/hadoop/dfs/name IMAGE_AND_EDITS Active Cloudera's Distribution including Apache Hadoop, 2012. Keith Wiley kwi...@keithwiley.com keithwiley.commusic.keithwiley.com And what if we picked the wrong religion? Every week, we're just making God madder and madder! -- Homer Simpson
Re: could only be replicated to 0 nodes, instead of 1
Keith, Assuming that you were seeing the problem when you captured the namenode webUI info, it is not related to what I suspect. This might be a good question for CDH forums given this is not an Apache release. Regards, Suresh On Tue, Sep 4, 2012 at 10:20 AM, Keith Wiley kwi...@keithwiley.com wrote: On Sep 4, 2012, at 10:05 , Suresh Srinivas wrote: When these errors are thrown, please send the namenode web UI information. It has storage related information in the cluster summary. That will help debug. Sure thing. Thanks. Here's what I currently see. It looks like the problem isn't the datanode, but rather the namenode. Would you agree with that assessment? NameNode 'localhost:9000' Started: Tue Sep 04 10:06:52 PDT 2012 Version: 0.20.2-cdh3u3, 03b655719d13929bd68bb2c2f9cee615b389cea9 Compiled:Thu Jan 26 11:55:16 PST 2012 by root from Unknown Upgrades:There are no upgrades in progress. Browse the filesystem Namenode Logs Cluster Summary Safe mode is ON. Resources are low on NN. Safe mode must be turned off manually. 1639 files and directories, 585 blocks = 2224 total. Heap Size is 39.55 MB / 888.94 MB (4%) Configured Capacity : 49.21 GB DFS Used : 9.9 MB Non DFS Used : 2.68 GB DFS Remaining: 46.53 GB DFS Used%: 0.02 % DFS Remaining% : 94.54 % Live Nodes : 1 Dead Nodes : 0 Decommissioning Nodes: 0 Number of Under-Replicated Blocks: 5 NameNode Storage: Storage Directory TypeState /var/lib/hadoop-0.20/cache/hadoop/dfs/name IMAGE_AND_EDITS Active Cloudera's Distribution including Apache Hadoop, 2012. Keith Wiley kwi...@keithwiley.com keithwiley.com music.keithwiley.com And what if we picked the wrong religion? Every week, we're just making God madder and madder! -- Homer Simpson -- http://hortonworks.com/download/
Re: could only be replicated to 0 nodes, instead of 1
Keith, The NameNode has a resource-checker thread in it by design to help prevent cases of on-disk metadata corruption in event of filled up dfs.namenode.name.dir disks, etc.. By default, an NN will lock itself up if the free disk space (among its configured metadata mounts) reaches a value 100 MB, controlled by dfs.namenode.resource.du.reserved. You can probably set that to 0 if you do not want such an automatic preventive measure. Its not exactly a need, just a check to help avoid accidental data loss due to non-monitoring of disk space. On Tue, Sep 4, 2012 at 11:33 PM, Keith Wiley kwi...@keithwiley.com wrote: I had moved the data directory to the larger disk but left the namenode directory on the smaller disk figuring it didn't need much room. Moving that to the larger disk seems to have improved the situation...although I'm still surprised the NN needed so much room. Problem is solved for now. Thanks. Keith Wiley kwi...@keithwiley.com keithwiley.com music.keithwiley.com I used to be with it, but then they changed what it was. Now, what I'm with isn't it, and what's it seems weird and scary to me. -- Abe (Grandpa) Simpson -- Harsh J
Re: could only be replicated to 0 nodes, instead of 1
Harsh, Thanks, you are right. The problem stems from the tmp directory space is not large enough. After changing tmp dir to other place, the problem goes away. But I remember one block size (default) in hdfs is 64m, so shouldn't it at least allow one file, whose actual size in local disk is smaller than 1k, to be uploaded? Thanks again for the advice. On Fri, Jul 15, 2011 at 7:49 PM, Harsh J ha...@cloudera.com wrote: Thomas, Your problem might lie simply with the virtual node DNs using /tmp and tmpfs being used for that -- which somehow is causing reported free space to go as 0 in reports to the NN (master). tmpfs 101M 44K 101M 1% /tmp This causes your trouble that the NN can't choose a suitable DN to write to, cause it determines that none has at least a block size worth of space (64MB default) available for writes. You can resolve as: 1. Stop DFS completely. 2. Create a directory under root somewhere (I use Cloudera's distro, and its default configured location for data files comes along as /var/lib/hadoop-0.20/cache/, if you need an idea for a location) and set it as your hadoop.tmp.dir in core-site.xml on all the nodes. 3. Reformat your NameNode (hadoop namenode -format, say Y) and restart DFS. Things _should_ be OK now. Config example (core-site.xml): property namehadoop.tmp.dir/name value/var/lib/hadoop-0.20/cache/value /property Let us know if this still doesn't get your dev cluster up and running for action :) On Fri, Jul 15, 2011 at 4:40 PM, Thomas Anderson t.dt.aander...@gmail.com wrote: When doing partition, I remember only / and swap was specified for all nodes during creation. So I think /tmp is also mounted under /, which should have size around 9G. The total size of hardisk specified is 10G. The df -kh shows server01: /dev/sda1 9.4G 2.3G 6.7G 25% / tmpfs 5.0M 4.0K 5.0M 1% /lib/init/rw tmpfs 5.0M 0 5.0M 0% /var/run/lock tmpfs 101M 132K 101M 1% /tmp udev 247M 0 247M 0% /dev tmpfs 101M 0 101M 0% /var/run/shm tmpfs 51M 176K 51M 1% /var/run server02: /dev/sda1 9.4G 2.2G 6.8G 25% / tmpfs 5.0M 4.0K 5.0M 1% /lib/init/rw tmpfs 5.0M 0 5.0M 0% /var/run/lock tmpfs 101M 44K 101M 1% /tmp udev 247M 0 247M 0% /dev tmpfs 101M 0 101M 0% /var/run/shm tmpfs 51M 176K 51M 1% /var/run server03: /dev/sda1 9.4G 2.2G 6.8G 25% / tmpfs 5.0M 4.0K 5.0M 1% /lib/init/rw tmpfs 5.0M 0 5.0M 0% /var/run/lock tmpfs 101M 44K 101M 1% /tmp udev 247M 0 247M 0% /dev tmpfs 101M 0 101M 0% /var/run/shm tmpfs 51M 176K 51M 1% /var/run server04: /dev/sda1 9.4G 2.2G 6.8G 25% / tmpfs 5.0M 4.0K 5.0M 1% /lib/init/rw tmpfs 5.0M 0 5.0M 0% /var/run/lock tmpfs 101M 44K 101M 1% /tmp udev 247M 0 247M 0% /dev tmpfs 101M 0 101M 0% /var/run/shm tmpfs 51M 176K 51M 1% /var/run server05: /dev/sda1 9.4G 2.2G 6.8G 25% / tmpfs 5.0M 4.0K 5.0M 1% /lib/init/rw tmpfs 5.0M 0 5.0M 0% /var/run/lock tmpfs 101M 44K 101M 1% /tmp udev 247M 0 247M 0% /dev tmpfs 101M 0 101M 0% /var/run/shm tmpfs 51M 176K 51M 1% /var/run In addition, the output of dfs (du -sk /tmp/hadoop-user/dfs) is server02: 8 /tmp/hadoop-user/dfs/ server03: 8 /tmp/hadoop-user/dfs/ server04: 8 /tmp/hadoop-user/dfs/ server05: 8 /tmp/hadoop-user/dfs/ On Fri, Jul 15, 2011 at 7:01 PM, Harsh J ha...@cloudera.com wrote: (P.s. I asked that cause if you look at your NN's live nodes tables, the reported space is all 0) What's the output of: du -sk /tmp/hadoop-user/dfs on all your DNs? On Fri, Jul 15, 2011 at 4:01 PM, Harsh J ha...@cloudera.com wrote: Thomas, Is your /tmp/ mount point also under the / or is it separate? Your dfs.data.dir are /tmp/hadoop-user/dfs/data in all DNs, and if they are separately mounted then what's the available space on that? (bad idea in production to keep things default on /tmp though, like dfs.name.dir, dfs.data.dir -- reconfigure+restart as necessary) On Fri, Jul 15, 2011 at 3:47 PM, Thomas Anderson t.dt.aander...@gmail.com wrote: 1.) The disk usage (with df -kh) on namenode (server01) Filesystem Size Used Avail Use% Mounted on /dev/sda1 9.4G 2.3G 6.7G 25% / and datanodes (server02 ~ server05) /dev/sda1 9.4G 2.2G 6.8G 25% /
Re: could only be replicated to 0 nodes, instead of 1
The actual check is done to see if 5 blocks worth of space is available remaining. On Sat, Jul 16, 2011 at 1:52 PM, Thomas Anderson t.dt.aander...@gmail.com wrote: Harsh, Thanks, you are right. The problem stems from the tmp directory space is not large enough. After changing tmp dir to other place, the problem goes away. But I remember one block size (default) in hdfs is 64m, so shouldn't it at least allow one file, whose actual size in local disk is smaller than 1k, to be uploaded? Thanks again for the advice. On Fri, Jul 15, 2011 at 7:49 PM, Harsh J ha...@cloudera.com wrote: Thomas, Your problem might lie simply with the virtual node DNs using /tmp and tmpfs being used for that -- which somehow is causing reported free space to go as 0 in reports to the NN (master). tmpfs 101M 44K 101M 1% /tmp This causes your trouble that the NN can't choose a suitable DN to write to, cause it determines that none has at least a block size worth of space (64MB default) available for writes. You can resolve as: 1. Stop DFS completely. 2. Create a directory under root somewhere (I use Cloudera's distro, and its default configured location for data files comes along as /var/lib/hadoop-0.20/cache/, if you need an idea for a location) and set it as your hadoop.tmp.dir in core-site.xml on all the nodes. 3. Reformat your NameNode (hadoop namenode -format, say Y) and restart DFS. Things _should_ be OK now. Config example (core-site.xml): property namehadoop.tmp.dir/name value/var/lib/hadoop-0.20/cache/value /property Let us know if this still doesn't get your dev cluster up and running for action :) On Fri, Jul 15, 2011 at 4:40 PM, Thomas Anderson t.dt.aander...@gmail.com wrote: When doing partition, I remember only / and swap was specified for all nodes during creation. So I think /tmp is also mounted under /, which should have size around 9G. The total size of hardisk specified is 10G. The df -kh shows server01: /dev/sda1 9.4G 2.3G 6.7G 25% / tmpfs 5.0M 4.0K 5.0M 1% /lib/init/rw tmpfs 5.0M 0 5.0M 0% /var/run/lock tmpfs 101M 132K 101M 1% /tmp udev 247M 0 247M 0% /dev tmpfs 101M 0 101M 0% /var/run/shm tmpfs 51M 176K 51M 1% /var/run server02: /dev/sda1 9.4G 2.2G 6.8G 25% / tmpfs 5.0M 4.0K 5.0M 1% /lib/init/rw tmpfs 5.0M 0 5.0M 0% /var/run/lock tmpfs 101M 44K 101M 1% /tmp udev 247M 0 247M 0% /dev tmpfs 101M 0 101M 0% /var/run/shm tmpfs 51M 176K 51M 1% /var/run server03: /dev/sda1 9.4G 2.2G 6.8G 25% / tmpfs 5.0M 4.0K 5.0M 1% /lib/init/rw tmpfs 5.0M 0 5.0M 0% /var/run/lock tmpfs 101M 44K 101M 1% /tmp udev 247M 0 247M 0% /dev tmpfs 101M 0 101M 0% /var/run/shm tmpfs 51M 176K 51M 1% /var/run server04: /dev/sda1 9.4G 2.2G 6.8G 25% / tmpfs 5.0M 4.0K 5.0M 1% /lib/init/rw tmpfs 5.0M 0 5.0M 0% /var/run/lock tmpfs 101M 44K 101M 1% /tmp udev 247M 0 247M 0% /dev tmpfs 101M 0 101M 0% /var/run/shm tmpfs 51M 176K 51M 1% /var/run server05: /dev/sda1 9.4G 2.2G 6.8G 25% / tmpfs 5.0M 4.0K 5.0M 1% /lib/init/rw tmpfs 5.0M 0 5.0M 0% /var/run/lock tmpfs 101M 44K 101M 1% /tmp udev 247M 0 247M 0% /dev tmpfs 101M 0 101M 0% /var/run/shm tmpfs 51M 176K 51M 1% /var/run In addition, the output of dfs (du -sk /tmp/hadoop-user/dfs) is server02: 8 /tmp/hadoop-user/dfs/ server03: 8 /tmp/hadoop-user/dfs/ server04: 8 /tmp/hadoop-user/dfs/ server05: 8 /tmp/hadoop-user/dfs/ On Fri, Jul 15, 2011 at 7:01 PM, Harsh J ha...@cloudera.com wrote: (P.s. I asked that cause if you look at your NN's live nodes tables, the reported space is all 0) What's the output of: du -sk /tmp/hadoop-user/dfs on all your DNs? On Fri, Jul 15, 2011 at 4:01 PM, Harsh J ha...@cloudera.com wrote: Thomas, Is your /tmp/ mount point also under the / or is it separate? Your dfs.data.dir are /tmp/hadoop-user/dfs/data in all DNs, and if they are separately mounted then what's the available space on that? (bad idea in production to keep things default on /tmp though, like dfs.name.dir, dfs.data.dir -- reconfigure+restart as necessary) On Fri, Jul 15, 2011 at 3:47 PM, Thomas Anderson t.dt.aander...@gmail.com wrote: 1.) The disk usage (with df -kh) on namenode (server01) Filesystem
Re: could only be replicated to 0 nodes, instead of 1
Before posting question indeed I did follow the wiki page http://wiki.apache.org/hadoop/FAQ#What_does_.22file_could_only_be_replicated_to_0_nodes.2C_instead_of_1.22_mean.3F http://wiki.apache.org/hadoop/HowToSetupYourDevelopmentEnvironment checking local disk space (*), etc. but none of those suggestions work. So I appreciate any advice. *: df -kh shows /dev/sda1 9.4G 2.3G 6.7G 25% / only 25% disk space is used. On Fri, Jul 15, 2011 at 11:39 AM, Thomas Anderson t.dt.aander...@gmail.com wrote: I have fresh hadoop 0.20.2 installed on virtualbox 4.0.8 with jdk 1.6.0_26. The problem is when trying to put a file to hdfs, it throws error `org.apache.hadoop.ipc.RemoteException: java.io.IOException: File /path/to/file could only be replicated to 0 nodes, instead of 1'; however, there is no problem to create a folder, as the command ls print the result Found 1 items drwxr-xr-x - user supergroup 0 2011-07-15 11:09 /user/user/test I also try with flushing firewall (remove all iptables restriction), but the error message is still thrown out when uploading (hadoop fs -put /tmp/x test) a file from local fs. The name node log shows 2011-07-15 10:42:43,491 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* NameSystem.registerDatanode: node registration from aaa.bbb.ccc.ddd.22:50010 storage DS-929017105-aaa.bbb.ccc.22-50010-13 10697763488 2011-07-15 10:42:43,495 INFO org.apache.hadoop.net.NetworkTopology: Adding a new node: /default-rack/aaa.bbb.ccc.22:50010 2011-07-15 10:42:44,169 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* NameSystem.registerDatanode: node registration from aaa.bbb.ccc.35:50010 storage DS-884574392-aaa.bbb.ccc.35-50010-13 10697764164 2011-07-15 10:42:44,170 INFO org.apache.hadoop.net.NetworkTopology: Adding a new node: /default-rack/aaa.bbb.ccc.35:50010 2011-07-15 10:42:44,507 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* NameSystem.registerDatanode: node registration from aaa.bbb.ccc.ddd.11:50010 storage DS-1537583073-aaa.bbb.ccc.11-50010-1 310697764488 2011-07-15 10:42:44,507 INFO org.apache.hadoop.net.NetworkTopology: Adding a new node: /default-rack/aaa.bbb.ccc.11:50010 2011-07-15 10:42:45,796 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* NameSystem.registerDatanode: node registration from 140.127.220.25:50010 storage DS-1500589162-aaa.bbb.ccc.25-50010-1 310697765386 2011-07-15 10:42:45,797 INFO org.apache.hadoop.net.NetworkTopology: Adding a new node: /default-rack/aaa.bbb.ccc.25:50010 And all datanodes have similar message as below: 2011-07-15 10:42:46,562 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: using BLOCKREPORT_INTERVAL of 360msec Initial delay: 0msec 2011-07-15 10:42:47,163 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: BlockReport of 0 blocks got processed in 3 msecs 2011-07-15 10:42:47,187 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Starting Periodic block scanner. 2011-07-15 11:19:42,931 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: BlockReport of 0 blocks got processed in 1 msecs Command `hadoop fsck /` displays Status: HEALTHY Total size: 0 B Total dirs: 3 Total files: 0 (Files currently being written: 1) Total blocks (validated): 0 Minimally replicated blocks: 0 Over-replicated blocks: 0 Under-replicated blocks: 0 Mis-replicated blocks: 0 Default replication factor: 3 Average block replication: 0.0 Corrupt blocks: 0 Missing replicas: 0 Number of data-nodes: 4 The setting in conf include: - Master node: core-site.xml property namefs.default.name/name valuehdfs://lab01:9000//value /property hdfs-site.xml property namedfs.replication/name value3/value /property -Slave nodes: core-site.xml property namefs.default.name/name valuehdfs://lab01:9000//value /property hdfs-site.xml property namedfs.replication/name value3/value /property Do I missing any configuration? Or any place that I can check? Thanks.
Re: could only be replicated to 0 nodes, instead of 1
1.) The disk usage (with df -kh) on namenode (server01) FilesystemSize Used Avail Use% Mounted on /dev/sda1 9.4G 2.3G 6.7G 25% / and datanodes (server02 ~ server05) /dev/sda1 9.4G 2.2G 6.8G 25% / /dev/sda1 9.4G 2.2G 6.8G 25% / /dev/sda1 9.4G 2.2G 6.8G 25% / /dev/sda1 9.4G 2.2G 6.8G 25% / 2.) How can I make sure that datanode is busy? The environment is only for testing so there is no other user processes are running at that moment. Also it is a fresh installation, so only hadoop required packages are installed such as hadoop and jdk. 3.) fs.block.size is not set in hdfs-site.xml, including datanodes and namenode, because its purpose is for testing. I thought it would use the default value, which should be 512? 4.) What might be a good way for fast check if network is not stable? I check the healthy page e.g. server01:50070/dfshealth.jsp where livenodes are up and last contact varies when checking the page. Node Last ContactAdmin State Configured Capacity (GB) Used (GB) Non DFS Used (GB) Remaining (GB) Used (%) Used (%) Remaining (%) Blocks server02 2 In Service 0.1 0 0 0.1 0.01 99.96 0 server03 0 In Service 0.1 0 0 0.1 0.01 99.96 0 server04 1 In Service 0.1 0 0 0.1 0.01 99.96 0 server05 2 In Service 0.1 0 0 0.1 0.01 99.96 0 5.) Only command `hadoop fs -put /tmp/testfile test` is issued as it is just to test if the installation is working. So the file e.g. testfile will be removed first (hadoop fs -rm test/testfile), then upload again with hadoop put command. The logs are listed as below: namenode: server01: http://pastebin.com/TLpDmmPx datanodes: server02: http://pastebin.com/pdE5XKfi server03: http://pastebin.com/4aV7ECCV server04: http://pastebin.com/tF7HiRZj server05: http://pastebin.com/5qwSPrvU Please let me know if more information needs to be provided. I really appreciate your suggestion. Thank you. On Fri, Jul 15, 2011 at 4:54 PM, Brahma Reddy brahmared...@huawei.com wrote: Hi, By seeing this exception(could only be replicated to 0 nodes, instead of 1) ,datanode is not available to Name Node.. This are the following cases Data Node may not available to Name Node 1)Data Node disk is Full 2)Data Node is Busy with block report and block scanning 3)If Block Size is Negative value(dfs.block.size in hdfs-site.xml) 4)while write in progress primary datanode goes down(Any n/w fluctations b/w Name Node and Data Node Machines) 5)when Ever we append any partial chunk and call sync for subsequent partial chunk appends client should store the previous data in buffer. For example after appending a I have called sync and when I am trying the to append the buffer should have ab And Server side when the chunk is not multiple of 512 then it will try to do Crc comparison for the data present in block file as well as crc present in metafile. But while constructing crc for the data present in block it is always comparing till the initial Offeset Or For more analysis Please the data node logs Warm Regards Brahma Reddy *** This e-mail and attachments contain confidential information from HUAWEI, which is intended only for the person or entity whose address is listed above. Any use of the information contained herein in any way (including, but not limited to, total or partial disclosure, reproduction, or dissemination) by persons other than the intended recipient's) is prohibited. If you receive this e-mail in error, please notify the sender by phone or email immediately and delete it! -Original Message- From: Thomas Anderson [mailto:t.dt.aander...@gmail.com] Sent: Friday, July 15, 2011 9:09 AM To: hdfs-user@hadoop.apache.org Subject: could only be replicated to 0 nodes, instead of 1 I have fresh hadoop 0.20.2 installed on virtualbox 4.0.8 with jdk 1.6.0_26. The problem is when trying to put a file to hdfs, it throws error `org.apache.hadoop.ipc.RemoteException: java.io.IOException: File /path/to/file could only be replicated to 0 nodes, instead of 1'; however, there is no problem to create a folder, as the command ls print the result Found 1 items drwxr-xr-x - user supergroup 0 2011-07-15 11:09 /user/user/test I also try with flushing firewall (remove all iptables restriction), but the error message is still thrown out when uploading (hadoop fs -put /tmp/x test) a file from local fs. The name node log shows 2011-07-15 10:42:43,491 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* NameSystem.registerDatanode: node registration from aaa.bbb.ccc.ddd.22:50010 storage DS-929017105-aaa.bbb.ccc.22-50010-13 10697763488 2011-07-15
Re: could only be replicated to 0 nodes, instead of 1
Thomas, Is your /tmp/ mount point also under the / or is it separate? Your dfs.data.dir are /tmp/hadoop-user/dfs/data in all DNs, and if they are separately mounted then what's the available space on that? (bad idea in production to keep things default on /tmp though, like dfs.name.dir, dfs.data.dir -- reconfigure+restart as necessary) On Fri, Jul 15, 2011 at 3:47 PM, Thomas Anderson t.dt.aander...@gmail.com wrote: 1.) The disk usage (with df -kh) on namenode (server01) Filesystem Size Used Avail Use% Mounted on /dev/sda1 9.4G 2.3G 6.7G 25% / and datanodes (server02 ~ server05) /dev/sda1 9.4G 2.2G 6.8G 25% / /dev/sda1 9.4G 2.2G 6.8G 25% / /dev/sda1 9.4G 2.2G 6.8G 25% / /dev/sda1 9.4G 2.2G 6.8G 25% / 2.) How can I make sure that datanode is busy? The environment is only for testing so there is no other user processes are running at that moment. Also it is a fresh installation, so only hadoop required packages are installed such as hadoop and jdk. 3.) fs.block.size is not set in hdfs-site.xml, including datanodes and namenode, because its purpose is for testing. I thought it would use the default value, which should be 512? 4.) What might be a good way for fast check if network is not stable? I check the healthy page e.g. server01:50070/dfshealth.jsp where livenodes are up and last contact varies when checking the page. Node Last Contact Admin State Configured Capacity (GB) Used (GB) Non DFS Used (GB) Remaining (GB) Used (%) Used (%) Remaining (%) Blocks server02 2 In Service 0.1 0 0 0.1 0.01 99.96 0 server03 0 In Service 0.1 0 0 0.1 0.01 99.96 0 server04 1 In Service 0.1 0 0 0.1 0.01 99.96 0 server05 2 In Service 0.1 0 0 0.1 0.01 99.96 0 5.) Only command `hadoop fs -put /tmp/testfile test` is issued as it is just to test if the installation is working. So the file e.g. testfile will be removed first (hadoop fs -rm test/testfile), then upload again with hadoop put command. The logs are listed as below: namenode: server01: http://pastebin.com/TLpDmmPx datanodes: server02: http://pastebin.com/pdE5XKfi server03: http://pastebin.com/4aV7ECCV server04: http://pastebin.com/tF7HiRZj server05: http://pastebin.com/5qwSPrvU Please let me know if more information needs to be provided. I really appreciate your suggestion. Thank you. On Fri, Jul 15, 2011 at 4:54 PM, Brahma Reddy brahmared...@huawei.com wrote: Hi, By seeing this exception(could only be replicated to 0 nodes, instead of 1) ,datanode is not available to Name Node.. This are the following cases Data Node may not available to Name Node 1)Data Node disk is Full 2)Data Node is Busy with block report and block scanning 3)If Block Size is Negative value(dfs.block.size in hdfs-site.xml) 4)while write in progress primary datanode goes down(Any n/w fluctations b/w Name Node and Data Node Machines) 5)when Ever we append any partial chunk and call sync for subsequent partial chunk appends client should store the previous data in buffer. For example after appending a I have called sync and when I am trying the to append the buffer should have ab And Server side when the chunk is not multiple of 512 then it will try to do Crc comparison for the data present in block file as well as crc present in metafile. But while constructing crc for the data present in block it is always comparing till the initial Offeset Or For more analysis Please the data node logs Warm Regards Brahma Reddy *** This e-mail and attachments contain confidential information from HUAWEI, which is intended only for the person or entity whose address is listed above. Any use of the information contained herein in any way (including, but not limited to, total or partial disclosure, reproduction, or dissemination) by persons other than the intended recipient's) is prohibited. If you receive this e-mail in error, please notify the sender by phone or email immediately and delete it! -Original Message- From: Thomas Anderson [mailto:t.dt.aander...@gmail.com] Sent: Friday, July 15, 2011 9:09 AM To: hdfs-user@hadoop.apache.org Subject: could only be replicated to 0 nodes, instead of 1 I have fresh hadoop 0.20.2 installed on virtualbox 4.0.8 with jdk 1.6.0_26. The problem is when trying to put a file to hdfs, it throws error `org.apache.hadoop.ipc.RemoteException: java.io.IOException: File /path/to/file could only be replicated to 0 nodes, instead of 1'; however, there is no problem to create a folder, as the command ls print the result Found 1 items drwxr-xr-x - user supergroup 0
Re: could only be replicated to 0 nodes, instead of 1
(P.s. I asked that cause if you look at your NN's live nodes tables, the reported space is all 0) What's the output of: du -sk /tmp/hadoop-user/dfs on all your DNs? On Fri, Jul 15, 2011 at 4:01 PM, Harsh J ha...@cloudera.com wrote: Thomas, Is your /tmp/ mount point also under the / or is it separate? Your dfs.data.dir are /tmp/hadoop-user/dfs/data in all DNs, and if they are separately mounted then what's the available space on that? (bad idea in production to keep things default on /tmp though, like dfs.name.dir, dfs.data.dir -- reconfigure+restart as necessary) On Fri, Jul 15, 2011 at 3:47 PM, Thomas Anderson t.dt.aander...@gmail.com wrote: 1.) The disk usage (with df -kh) on namenode (server01) Filesystem Size Used Avail Use% Mounted on /dev/sda1 9.4G 2.3G 6.7G 25% / and datanodes (server02 ~ server05) /dev/sda1 9.4G 2.2G 6.8G 25% / /dev/sda1 9.4G 2.2G 6.8G 25% / /dev/sda1 9.4G 2.2G 6.8G 25% / /dev/sda1 9.4G 2.2G 6.8G 25% / 2.) How can I make sure that datanode is busy? The environment is only for testing so there is no other user processes are running at that moment. Also it is a fresh installation, so only hadoop required packages are installed such as hadoop and jdk. 3.) fs.block.size is not set in hdfs-site.xml, including datanodes and namenode, because its purpose is for testing. I thought it would use the default value, which should be 512? 4.) What might be a good way for fast check if network is not stable? I check the healthy page e.g. server01:50070/dfshealth.jsp where livenodes are up and last contact varies when checking the page. Node Last Contact Admin State Configured Capacity (GB) Used (GB) Non DFS Used (GB) Remaining (GB) Used (%) Used (%) Remaining (%) Blocks server02 2 In Service 0.1 0 0 0.1 0.01 99.96 0 server03 0 In Service 0.1 0 0 0.1 0.01 99.96 0 server04 1 In Service 0.1 0 0 0.1 0.01 99.96 0 server05 2 In Service 0.1 0 0 0.1 0.01 99.96 0 5.) Only command `hadoop fs -put /tmp/testfile test` is issued as it is just to test if the installation is working. So the file e.g. testfile will be removed first (hadoop fs -rm test/testfile), then upload again with hadoop put command. The logs are listed as below: namenode: server01: http://pastebin.com/TLpDmmPx datanodes: server02: http://pastebin.com/pdE5XKfi server03: http://pastebin.com/4aV7ECCV server04: http://pastebin.com/tF7HiRZj server05: http://pastebin.com/5qwSPrvU Please let me know if more information needs to be provided. I really appreciate your suggestion. Thank you. On Fri, Jul 15, 2011 at 4:54 PM, Brahma Reddy brahmared...@huawei.com wrote: Hi, By seeing this exception(could only be replicated to 0 nodes, instead of 1) ,datanode is not available to Name Node.. This are the following cases Data Node may not available to Name Node 1)Data Node disk is Full 2)Data Node is Busy with block report and block scanning 3)If Block Size is Negative value(dfs.block.size in hdfs-site.xml) 4)while write in progress primary datanode goes down(Any n/w fluctations b/w Name Node and Data Node Machines) 5)when Ever we append any partial chunk and call sync for subsequent partial chunk appends client should store the previous data in buffer. For example after appending a I have called sync and when I am trying the to append the buffer should have ab And Server side when the chunk is not multiple of 512 then it will try to do Crc comparison for the data present in block file as well as crc present in metafile. But while constructing crc for the data present in block it is always comparing till the initial Offeset Or For more analysis Please the data node logs Warm Regards Brahma Reddy *** This e-mail and attachments contain confidential information from HUAWEI, which is intended only for the person or entity whose address is listed above. Any use of the information contained herein in any way (including, but not limited to, total or partial disclosure, reproduction, or dissemination) by persons other than the intended recipient's) is prohibited. If you receive this e-mail in error, please notify the sender by phone or email immediately and delete it! -Original Message- From: Thomas Anderson [mailto:t.dt.aander...@gmail.com] Sent: Friday, July 15, 2011 9:09 AM To: hdfs-user@hadoop.apache.org Subject: could only be replicated to 0 nodes, instead of 1 I have fresh hadoop 0.20.2 installed on virtualbox 4.0.8 with jdk 1.6.0_26. The problem is when trying to put a file to hdfs, it throws error
Re: could only be replicated to 0 nodes, instead of 1
Thomas, Your problem might lie simply with the virtual node DNs using /tmp and tmpfs being used for that -- which somehow is causing reported free space to go as 0 in reports to the NN (master). tmpfs 101M 44K 101M 1% /tmp This causes your trouble that the NN can't choose a suitable DN to write to, cause it determines that none has at least a block size worth of space (64MB default) available for writes. You can resolve as: 1. Stop DFS completely. 2. Create a directory under root somewhere (I use Cloudera's distro, and its default configured location for data files comes along as /var/lib/hadoop-0.20/cache/, if you need an idea for a location) and set it as your hadoop.tmp.dir in core-site.xml on all the nodes. 3. Reformat your NameNode (hadoop namenode -format, say Y) and restart DFS. Things _should_ be OK now. Config example (core-site.xml): property namehadoop.tmp.dir/name value/var/lib/hadoop-0.20/cache/value /property Let us know if this still doesn't get your dev cluster up and running for action :) On Fri, Jul 15, 2011 at 4:40 PM, Thomas Anderson t.dt.aander...@gmail.com wrote: When doing partition, I remember only / and swap was specified for all nodes during creation. So I think /tmp is also mounted under /, which should have size around 9G. The total size of hardisk specified is 10G. The df -kh shows server01: /dev/sda1 9.4G 2.3G 6.7G 25% / tmpfs 5.0M 4.0K 5.0M 1% /lib/init/rw tmpfs 5.0M 0 5.0M 0% /var/run/lock tmpfs 101M 132K 101M 1% /tmp udev 247M 0 247M 0% /dev tmpfs 101M 0 101M 0% /var/run/shm tmpfs 51M 176K 51M 1% /var/run server02: /dev/sda1 9.4G 2.2G 6.8G 25% / tmpfs 5.0M 4.0K 5.0M 1% /lib/init/rw tmpfs 5.0M 0 5.0M 0% /var/run/lock tmpfs 101M 44K 101M 1% /tmp udev 247M 0 247M 0% /dev tmpfs 101M 0 101M 0% /var/run/shm tmpfs 51M 176K 51M 1% /var/run server03: /dev/sda1 9.4G 2.2G 6.8G 25% / tmpfs 5.0M 4.0K 5.0M 1% /lib/init/rw tmpfs 5.0M 0 5.0M 0% /var/run/lock tmpfs 101M 44K 101M 1% /tmp udev 247M 0 247M 0% /dev tmpfs 101M 0 101M 0% /var/run/shm tmpfs 51M 176K 51M 1% /var/run server04: /dev/sda1 9.4G 2.2G 6.8G 25% / tmpfs 5.0M 4.0K 5.0M 1% /lib/init/rw tmpfs 5.0M 0 5.0M 0% /var/run/lock tmpfs 101M 44K 101M 1% /tmp udev 247M 0 247M 0% /dev tmpfs 101M 0 101M 0% /var/run/shm tmpfs 51M 176K 51M 1% /var/run server05: /dev/sda1 9.4G 2.2G 6.8G 25% / tmpfs 5.0M 4.0K 5.0M 1% /lib/init/rw tmpfs 5.0M 0 5.0M 0% /var/run/lock tmpfs 101M 44K 101M 1% /tmp udev 247M 0 247M 0% /dev tmpfs 101M 0 101M 0% /var/run/shm tmpfs 51M 176K 51M 1% /var/run In addition, the output of dfs (du -sk /tmp/hadoop-user/dfs) is server02: 8 /tmp/hadoop-user/dfs/ server03: 8 /tmp/hadoop-user/dfs/ server04: 8 /tmp/hadoop-user/dfs/ server05: 8 /tmp/hadoop-user/dfs/ On Fri, Jul 15, 2011 at 7:01 PM, Harsh J ha...@cloudera.com wrote: (P.s. I asked that cause if you look at your NN's live nodes tables, the reported space is all 0) What's the output of: du -sk /tmp/hadoop-user/dfs on all your DNs? On Fri, Jul 15, 2011 at 4:01 PM, Harsh J ha...@cloudera.com wrote: Thomas, Is your /tmp/ mount point also under the / or is it separate? Your dfs.data.dir are /tmp/hadoop-user/dfs/data in all DNs, and if they are separately mounted then what's the available space on that? (bad idea in production to keep things default on /tmp though, like dfs.name.dir, dfs.data.dir -- reconfigure+restart as necessary) On Fri, Jul 15, 2011 at 3:47 PM, Thomas Anderson t.dt.aander...@gmail.com wrote: 1.) The disk usage (with df -kh) on namenode (server01) Filesystem Size Used Avail Use% Mounted on /dev/sda1 9.4G 2.3G 6.7G 25% / and datanodes (server02 ~ server05) /dev/sda1 9.4G 2.2G 6.8G 25% / /dev/sda1 9.4G 2.2G 6.8G 25% / /dev/sda1 9.4G 2.2G 6.8G 25% / /dev/sda1 9.4G 2.2G 6.8G 25% / 2.) How can I make sure that datanode is busy? The environment is only for testing so there is no other user processes are running at that moment. Also it is a fresh installation, so only hadoop required packages are installed such as hadoop and jdk. 3.) fs.block.size is not set in hdfs-site.xml, including
Re: could only be replicated to 0 nodes, instead of 1
Thomas, Your problem might lie simply with the virtual node DNs using /tmp and tmpfs being used for that -- which somehow is causing reported free space to go as 0 in reports to the NN (master). tmpfs 101M 44K 101M 1% /tmp This causes your trouble that the NN can't choose a suitable DN to write to, cause it determines that none has at least a block size worth of space (64MB default) available for writes. You can resolve as: 1. Stop DFS completely. 2. Create a directory under root somewhere (I use Cloudera's distro, and its default configured location for data files comes along as /var/lib/hadoop-0.20/cache/, if you need an idea for a location) and set it as your hadoop.tmp.dir in core-site.xml on all the nodes. 3. Reformat your NameNode (hadoop namenode -format, say Y) and restart DFS. Things _should_ be OK now. Config example (core-site.xml): property namehadoop.tmp.dir/name value/var/lib/hadoop-0.20/cache/value /property Let us know if this still doesn't get your dev cluster up and running for action :) On 15-Jul-2011, at 4:40 PM, Thomas Anderson wrote: When doing partition, I remember only / and swap was specified for all nodes during creation. So I think /tmp is also mounted under /, which should have size around 9G. The total size of hardisk specified is 10G. The df -kh shows server01: /dev/sda1 9.4G 2.3G 6.7G 25% / tmpfs 5.0M 4.0K 5.0M 1% /lib/init/rw tmpfs 5.0M 0 5.0M 0% /var/run/lock tmpfs 101M 132K 101M 1% /tmp udev 247M 0 247M 0% /dev tmpfs 101M 0 101M 0% /var/run/shm tmpfs 51M 176K 51M 1% /var/run server02: /dev/sda1 9.4G 2.2G 6.8G 25% / tmpfs 5.0M 4.0K 5.0M 1% /lib/init/rw tmpfs 5.0M 0 5.0M 0% /var/run/lock tmpfs 101M 44K 101M 1% /tmp udev 247M 0 247M 0% /dev tmpfs 101M 0 101M 0% /var/run/shm tmpfs 51M 176K 51M 1% /var/run server03: /dev/sda1 9.4G 2.2G 6.8G 25% / tmpfs 5.0M 4.0K 5.0M 1% /lib/init/rw tmpfs 5.0M 0 5.0M 0% /var/run/lock tmpfs 101M 44K 101M 1% /tmp udev 247M 0 247M 0% /dev tmpfs 101M 0 101M 0% /var/run/shm tmpfs 51M 176K 51M 1% /var/run server04: /dev/sda1 9.4G 2.2G 6.8G 25% / tmpfs 5.0M 4.0K 5.0M 1% /lib/init/rw tmpfs 5.0M 0 5.0M 0% /var/run/lock tmpfs 101M 44K 101M 1% /tmp udev 247M 0 247M 0% /dev tmpfs 101M 0 101M 0% /var/run/shm tmpfs 51M 176K 51M 1% /var/run server05: /dev/sda1 9.4G 2.2G 6.8G 25% / tmpfs 5.0M 4.0K 5.0M 1% /lib/init/rw tmpfs 5.0M 0 5.0M 0% /var/run/lock tmpfs 101M 44K 101M 1% /tmp udev 247M 0 247M 0% /dev tmpfs 101M 0 101M 0% /var/run/shm tmpfs 51M 176K 51M 1% /var/run In addition, the output of dfs (du -sk /tmp/hadoop-user/dfs) is server02: 8 /tmp/hadoop-user/dfs/ server03: 8 /tmp/hadoop-user/dfs/ server04: 8 /tmp/hadoop-user/dfs/ server05: 8 /tmp/hadoop-user/dfs/ On Fri, Jul 15, 2011 at 7:01 PM, Harsh J ha...@cloudera.com wrote: (P.s. I asked that cause if you look at your NN's live nodes tables, the reported space is all 0) What's the output of: du -sk /tmp/hadoop-user/dfs on all your DNs? On Fri, Jul 15, 2011 at 4:01 PM, Harsh J ha...@cloudera.com wrote: Thomas, Is your /tmp/ mount point also under the / or is it separate? Your dfs.data.dir are /tmp/hadoop-user/dfs/data in all DNs, and if they are separately mounted then what's the available space on that? (bad idea in production to keep things default on /tmp though, like dfs.name.dir, dfs.data.dir -- reconfigure+restart as necessary) On Fri, Jul 15, 2011 at 3:47 PM, Thomas Anderson t.dt.aander...@gmail.com wrote: 1.) The disk usage (with df -kh) on namenode (server01) FilesystemSize Used Avail Use% Mounted on /dev/sda1 9.4G 2.3G 6.7G 25% / and datanodes (server02 ~ server05) /dev/sda1 9.4G 2.2G 6.8G 25% / /dev/sda1 9.4G 2.2G 6.8G 25% / /dev/sda1 9.4G 2.2G 6.8G 25% / /dev/sda1 9.4G 2.2G 6.8G 25% / 2.) How can I make sure that datanode is busy? The environment is only for testing so there is no other user processes are running at that moment. Also it is a fresh installation, so only hadoop required packages are installed such as hadoop and jdk. 3.) fs.block.size is not set in hdfs-site.xml, including
could only be replicated to 0 nodes, instead of 1
I have fresh hadoop 0.20.2 installed on virtualbox 4.0.8 with jdk 1.6.0_26. The problem is when trying to put a file to hdfs, it throws error `org.apache.hadoop.ipc.RemoteException: java.io.IOException: File /path/to/file could only be replicated to 0 nodes, instead of 1'; however, there is no problem to create a folder, as the command ls print the result Found 1 items drwxr-xr-x - user supergroup 0 2011-07-15 11:09 /user/user/test I also try with flushing firewall (remove all iptables restriction), but the error message is still thrown out when uploading (hadoop fs -put /tmp/x test) a file from local fs. The name node log shows 2011-07-15 10:42:43,491 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* NameSystem.registerDatanode: node registration from aaa.bbb.ccc.ddd.22:50010 storage DS-929017105-aaa.bbb.ccc.22-50010-13 10697763488 2011-07-15 10:42:43,495 INFO org.apache.hadoop.net.NetworkTopology: Adding a new node: /default-rack/aaa.bbb.ccc.22:50010 2011-07-15 10:42:44,169 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* NameSystem.registerDatanode: node registration from aaa.bbb.ccc.35:50010 storage DS-884574392-aaa.bbb.ccc.35-50010-13 10697764164 2011-07-15 10:42:44,170 INFO org.apache.hadoop.net.NetworkTopology: Adding a new node: /default-rack/aaa.bbb.ccc.35:50010 2011-07-15 10:42:44,507 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* NameSystem.registerDatanode: node registration from aaa.bbb.ccc.ddd.11:50010 storage DS-1537583073-aaa.bbb.ccc.11-50010-1 310697764488 2011-07-15 10:42:44,507 INFO org.apache.hadoop.net.NetworkTopology: Adding a new node: /default-rack/aaa.bbb.ccc.11:50010 2011-07-15 10:42:45,796 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* NameSystem.registerDatanode: node registration from 140.127.220.25:50010 storage DS-1500589162-aaa.bbb.ccc.25-50010-1 310697765386 2011-07-15 10:42:45,797 INFO org.apache.hadoop.net.NetworkTopology: Adding a new node: /default-rack/aaa.bbb.ccc.25:50010 And all datanodes have similar message as below: 2011-07-15 10:42:46,562 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: using BLOCKREPORT_INTERVAL of 360msec Initial delay: 0msec 2011-07-15 10:42:47,163 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: BlockReport of 0 blocks got processed in 3 msecs 2011-07-15 10:42:47,187 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Starting Periodic block scanner. 2011-07-15 11:19:42,931 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: BlockReport of 0 blocks got processed in 1 msecs Command `hadoop fsck /` displays Status: HEALTHY Total size:0 B Total dirs:3 Total files: 0 (Files currently being written: 1) Total blocks (validated): 0 Minimally replicated blocks: 0 Over-replicated blocks:0 Under-replicated blocks: 0 Mis-replicated blocks: 0 Default replication factor:3 Average block replication: 0.0 Corrupt blocks:0 Missing replicas: 0 Number of data-nodes: 4 The setting in conf include: - Master node: core-site.xml property namefs.default.name/name valuehdfs://lab01:9000//value /property hdfs-site.xml property namedfs.replication/name value3/value /property -Slave nodes: core-site.xml property namefs.default.name/name valuehdfs://lab01:9000//value /property hdfs-site.xml property namedfs.replication/name value3/value /property Do I missing any configuration? Or any place that I can check? Thanks.
jobtracker.info could only be replicated to 0 nodes, instead of 1
Dear Hadoop Users, I am very new on hadoop, I am just trying to run the tutorials. Currently I am trying to run the Pseudo-Distributed Operation (http://hadoop.apache.org/common/docs/stable/single_node_setup.html). I have found that there are another users that have had this same problem. But any of the workarounds worked for me. I will copy a fragment of the namenode LOG: 2011-07-08 09:34:38,422 INFO org.apache.hadoop.hdfs.StateChange: *BLOCK* NameSystem.processReport: from 127.0.0.1:50010, blocks: 0, processing time: 1 msecs 2011-07-08 09:38:41,887 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Number of transactions: 0 Total time for transactions(ms): 0Number of transactions batched in Syncs: 0 Number of syncs: 0 SyncTimes(ms): 0 2011-07-08 09:38:42,100 WARN org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Not able to place enough replicas, still in need of 1 2011-07-08 09:38:42,101 INFO org.apache.hadoop.ipc.Server: IPC Server handler 9 on 9000, call addBlock(/tmp/hadoop-gpabon/mapred/system/jobtracker.info, DFSClient_383801114) from 127.0.0.1:60003: error: java.io.IOException: File /tmp/hadoop-gpabon/mapred/system/jobtracker.info could only be replicated to 0 nodes, instead of 1 Note that the timestamp of the logs starts at 09:34, and the IOExceotion was raised at 09:38. This is because one workaround suggests start dfs first, wait some minutes and then start mapred service. Hadoop Version: 0.20.203.0 Java: jdk1.6.0_26 SO: Linux Debian on VirtualBox kernel: 2.6.39-2-686-pae Thank you very much in advanced for your valuable help. Best regards, Gustavo P.
Question on Error : could only be replicated to 0 nodes, instead of 1
Greetings all, I get the following error at seemingly irregular intervals when I'm trying to do the following... hadoop fs -put /scratch1/tdp/data/* input (The data is a few hundred files of wikistats data, about 75GB in total). 11/03/04 15:55:05 WARN hdfs.DFSClient: DataStreamer Exception: org.apache.hadoop .ipc.RemoteException: java.io.IOException: File /user/pedersen/input/pagecounts- 20110129-020001 could only be replicated to 0 nodes, instead of 1 I've searched around on the error message, and have actually found a lot of postings, but they seem to be as irregular as the error itself (both in terms of explanations and fixes). http://www.mail-archive.com/common-user@hadoop.apache.org/msg00407.html http://wiki.apache.org/hadoop/HowToSetupYourDevelopmentEnvironment http://www.phacai.com/hadoop-error-could-only-be-replicated-to-0-nodes-instead-of-1 http://permalink.gmane.org/gmane.comp.jakarta.lucene.hadoop.user/20198 Is there a best currently understood explanation for this error, and the preferred way to resolve it? We are running in fully distributed mode here... Thanks! Ted -- Ted Pedersen http://www.d.umn.edu/~tpederse
could only be replicated to 0 nodes, instead of 1
Hello everyone :) Any idea how to troubleshoot this please? I don't get where this comes from. 2010-07-01 08:29:02,721 WARN org.apache.hadoop.hdfs.DFSClient: DataStreamer Exception: org.apache.hadoop.ipc.RemoteException: java.io.IOException: File /acd/Dispatch/data/user100/historyreport/020920360E55A807CECDFC34621E90C5/last could only be replicated to 0 nodes, instead of 1 at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1271) at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:422) at sun.reflect.GeneratedMethodAccessor68.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:508) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:959) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:955) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:953) at org.apache.hadoop.ipc.Client.call(Client.java:740) at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220) at $Proxy1.addBlock(Unknown Source) at sun.reflect.GeneratedMethodAccessor5.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59) at $Proxy1.addBlock(Unknown Source) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:2937) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2819) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:2102) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2288) Thank you. -- View this message in context: http://old.nabble.com/could-only-be-replicated-to-0-nodes%2C-instead-of-1-tp29043049p29043049.html Sent from the Hadoop core-user mailing list archive at Nabble.com.
could only be replicated to 0 nodes, instead of 1
Hello everyone :) Any idea how to troubleshoot this please? I don't get where this comes from. 2010-07-01 08:29:02,721 WARN org.apache.hadoop.hdfs.DFSClient: DataStreamer Exception: org.apache.hadoop.ipc.RemoteException: java.io.IOException: File /acd/Dispatch/data/user100/historyreport/020920360E55A807CECDFC34621E90C5/last could only be replicated to 0 nodes, instead of 1 at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1271) at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:422) at sun.reflect.GeneratedMethodAccessor68.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:508) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:959) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:955) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:953) at org.apache.hadoop.ipc.Client.call(Client.java:740) at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220) at $Proxy1.addBlock(Unknown Source) at sun.reflect.GeneratedMethodAccessor5.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59) at $Proxy1.addBlock(Unknown Source) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:2937) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2819) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:2102) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2288) Thank you. -- http://www.neko-consulting.com Ego sum quis ego servo Je suis ce que je protège I am what I protect
Re: could only be replicated to 0 nodes, instead of 1
Did you check if your Datanode(s) is/are up? The Namenode's web interface reports statistics about live/dead datanodes. NN is trying to replicate your data to N (in your case, 1) datanodes but it can find none. On Thu, Jul 1, 2010 at 1:48 PM, Pierre ANCELOT pierre...@gmail.com wrote: Hello everyone :) Any idea how to troubleshoot this please? I don't get where this comes from. 2010-07-01 08:29:02,721 WARN org.apache.hadoop.hdfs.DFSClient: DataStreamer Exception: org.apache.hadoop.ipc.RemoteException: java.io.IOException: File /acd/Dispatch/data/user100/historyreport/020920360E55A807CECDFC34621E90C5/last could only be replicated to 0 nodes, instead of 1 at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1271) at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:422) at sun.reflect.GeneratedMethodAccessor68.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:508) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:959) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:955) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:953) at org.apache.hadoop.ipc.Client.call(Client.java:740) at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220) at $Proxy1.addBlock(Unknown Source) at sun.reflect.GeneratedMethodAccessor5.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59) at $Proxy1.addBlock(Unknown Source) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:2937) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2819) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:2102) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2288) Thank you. -- http://www.neko-consulting.com Ego sum quis ego servo Je suis ce que je protège I am what I protect -- Harsh J www.harshj.com
Re: could only be replicated to 0 nodes, instead of 1
Hi, Well, actually, I'm configured to have 3 replicas, the default. So this, is a first issue. And yes, the datanode is up and reponding, this problem only happens from time to time. Thanks :) On Thu, Jul 1, 2010 at 10:49 AM, Harsh J qwertyman...@gmail.com wrote: Did you check if your Datanode(s) is/are up? The Namenode's web interface reports statistics about live/dead datanodes. NN is trying to replicate your data to N (in your case, 1) datanodes but it can find none. On Thu, Jul 1, 2010 at 1:48 PM, Pierre ANCELOT pierre...@gmail.com wrote: Hello everyone :) Any idea how to troubleshoot this please? I don't get where this comes from. 2010-07-01 08:29:02,721 WARN org.apache.hadoop.hdfs.DFSClient: DataStreamer Exception: org.apache.hadoop.ipc.RemoteException: java.io.IOException: File /acd/Dispatch/data/user100/historyreport/020920360E55A807CECDFC34621E90C5/last could only be replicated to 0 nodes, instead of 1 at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1271) at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:422) at sun.reflect.GeneratedMethodAccessor68.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:508) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:959) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:955) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:953) at org.apache.hadoop.ipc.Client.call(Client.java:740) at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220) at $Proxy1.addBlock(Unknown Source) at sun.reflect.GeneratedMethodAccessor5.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59) at $Proxy1.addBlock(Unknown Source) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:2937) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2819) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:2102) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2288) Thank you. -- http://www.neko-consulting.com Ego sum quis ego servo Je suis ce que je protège I am what I protect -- Harsh J www.harshj.com -- http://www.neko-consulting.com Ego sum quis ego servo Je suis ce que je protège I am what I protect
Re: Does error could only be replicated to 0 nodes, instead of 1 mean no datanodes available?
Hello here is the output of hadoop fsck /: Status: HEALTHY Total size:0 B Total dirs:2 Total files: 0 (Files currently being written: 1) Total blocks (validated): 0 Minimally replicated blocks: 0 Over-replicated blocks:0 Under-replicated blocks: 0 Mis-replicated blocks: 0 Default replication factor:3 Average block replication: 0.0 Corrupt blocks:0 Missing replicas: 0 Number of data-nodes: 4 Number of racks: 1 -- Currently,I have set the dfs.http.address configuration property in hdfs-site.xml,all othe error gone,except error in primary namenode: 10/05/27 14:21:06 WARN hdfs.DFSClient: DataStreamer Exception: org.apache.hadoop.ipc.RemoteException: java.io.IOException: File /user/alex/check_ssh.sh could only be replicated to 0 nodes, instead of 1 at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1271) at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:422) ... 10/05/27 14:21:06 WARN hdfs.DFSClient: Error Recovery for block null bad datanode[0] nodes == null 10/05/27 14:21:06 WARN hdfs.DFSClient: Could not get block locations. Source file /user/alex/check_ssh.sh - Aborting... put: java.io.IOException: File /user/alex/check_ssh.sh could only be replicated to 0 nodes, instead of 1 10/05/27 14:21:06 ERROR hdfs.DFSClient: Exception closing file /user/alex/check_ssh.sh : org.apache.hadoop.ipc.RemoteException: java.io.IOException: File /user/alex/check_ssh.sh could only be replicated to 0 nodes, instead of 1 at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1271) at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:422) ... org.apache.hadoop.ipc.RemoteException: java.io.IOException: File /user/alex/check_ssh.sh could only be replicated to 0 nodes, instead of 1 at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1271) at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:422) ... --- On 05/27/2010 12:23 AM, Eric Sammer wrote: Alex: From the data node / secondary NN exceptions, it appears that nothing can talk to your name node. Take a look in the name node logs and look for where data node registration happens. Is it possible the NN disk is full? My guess is that there's something odd happening with the state on the name node. What does hadoop fsck / look like? On Wed, May 26, 2010 at 6:53 AM, Alex Luyaalexander.l...@gmail.com wrote: Hello: I got this error when putting files into hdfs,it seems a old issue,and I followed the solution of this link: http://adityadesai.wordpress.com/2009/02/26/another-problem-with-hadoop- jobjar-could-only-be-replicated-to-0-nodes-instead-of-1io-exception/ - but problem still exists.so I tried to figure it out through source code: --- org.apache.hadoop.hdfs.server.namenode.FSNameSystem.getAdditionalBlock() --- // choose targets for the new block tobe allocated. DatanodeDescriptor targets[] = replicator.chooseTarget(replication, clientNode, null, blockSize); if (targets.length this.minReplication) { throw new IOException(File + src + could only be replicated to + targets.length + nodes, instead of + minReplication); -- I think DatanodeDescriptor represents datanode,so here targets.length means the number of datanode,clearly,it is 0,in other words,no datanode is available.But in the web interface:localhost:50070,I can see 4 live nodes(I have 4 nodes only),and hadoop dfsadmin -report shows 4 nodes also.that is strange. And I got this error message in secondary namenode:
Does error could only be replicated to 0 nodes, instead of 1 mean no datanodes available?
Hello: I got this error when putting files into hdfs,it seems a old issue,and I followed the solution of this link: http://adityadesai.wordpress.com/2009/02/26/another-problem-with-hadoop- jobjar-could-only-be-replicated-to-0-nodes-instead-of-1io-exception/ - but problem still exists.so I tried to figure it out through source code: --- org.apache.hadoop.hdfs.server.namenode.FSNameSystem.getAdditionalBlock() --- // choose targets for the new block tobe allocated. DatanodeDescriptor targets[] = replicator.chooseTarget(replication, clientNode, null, blockSize); if (targets.length this.minReplication) { throw new IOException(File + src + could only be replicated to + targets.length + nodes, instead of + minReplication); -- I think DatanodeDescriptor represents datanode,so here targets.length means the number of datanode,clearly,it is 0,in other words,no datanode is available.But in the web interface:localhost:50070,I can see 4 live nodes(I have 4 nodes only),and hadoop dfsadmin -report shows 4 nodes also.that is strange. And I got this error message in secondary namenode: - 2010-05-26 16:26:39,588 INFO org.apache.hadoop.hdfs.server.common.Storage: Recovering storage directory /home/alex/tmp/dfs/namesecondary from failed checkpoint. 2010-05-26 16:26:39,593 ERROR org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: Exception in doCheckpoint: 2010-05-26 16:26:39,594 ERROR org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: java.net.ConnectException: Connection refused at java.net.PlainSocketImpl.socketConnect(Native Method) at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:333) at java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:193) at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:182) at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:366) .. - and error message in datanode: - 2010-05-26 16:07:49,039 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(192.168.1.3:50010, storageID=DS-1180479012-192.168.1.3-50010-1274799233678, infoPort=50075, ipcPort=50020):DataXceiver java.io.IOException: Connection reset by peer at sun.nio.ch.FileDispatcher.read0(Native Method) at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:21) at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:233) at sun.nio.ch.IOUtil.read(IOUtil.java:206) . - Seems like that network ports don't open,but after scaning by nmap,I can confirm that all network ports in relevant nodes are being opened.After two days effort,result is zero. Can anybody help me troubleshooting?Thank you. (following is relevant info:my cluster configuration,content conf files and oupt or hadoop dfsadmin -report and java error message stack ) --- my configuration is: - ubuntu 10.04 64 bit+jdk1.6.0_20+hadoop 0.20.2, - core-site.xml - configuration property namefs.default.name/name valuehdfs://AlexLuya/value /property property namehadoop.tmp.dir/name value/home/alex/tmp/value
hadoop start error - could only be replicated to 0 nodes, instead of 1
Hi,I am trying to set up a Hadoop claster. I have a server machine, and I installed XenServer on it. I installed 3 Debian Lenny VMs on the XenServer. And on every VM, I installed sun-java6-jre, openssh-server, rsync and hadoop. VM jack is the namenode, VM kim and VM lynne are datanodes. Everything was cool until I run bin/start-all.sh. I attached three configuration files and the hadoop-cs-namenode-jack.log file who printed a lot errors. Two important errors are:java.lang.IllegalArgumentException: Duplicate metricsName:getProtocolVersionjava.io.IOException: File /home/cs/HadoopInstall/tmp/mapred/system/jobtracker.info could only be replicated to 0 nodes, instead of 1 Thanks.Dennis ?xml version=1.0? ?xml-stylesheet type=text/xsl href=configuration.xsl? !-- Put site-specific property overrides in this file. -- configuration property namehadoop.tmp.dir/name value/home/cs/HadoopInstall/tmp/value descriptionA base for other temporary directories./description /property property namefs.default.name/name valuehdfs://jack:9000/value descriptionThe name of the default file system. Either the literal string local or a host:port for DFS./description /property property namedfs.name.dir/name value/home/cr/HadoopInstall/filesystem/name/value descriptionDetermines where on the local filesystem the DFS name node should store the name table. If this is a comma-delimited list of directories then the name table is replicated in all of the directories, for redundancy. /description /property property namedfs.data.dir/name value/home/cr/HadoopInstall/filesystem/data/value descriptionDetermines where on the local filesystem an DFS data node should store its blocks. If this is a comma-delimited list of directories, then data will be stored in all named directories, typically on different devices. Directories that do not exist are ignored./description /property /configuration ?xml version=1.0? ?xml-stylesheet type=text/xsl href=configuration.xsl? !-- Put site-specific property overrides in this file. -- configuration property namedfs.replication/name value1/value descriptionDefault block replication. The actual number of replications can be specified when the file is created. The default is used if replication is not specified in create time./description /property /configuration ?xml version=1.0? ?xml-stylesheet type=text/xsl href=configuration.xsl? !-- Put site-specific property overrides in this file. -- configuration property namemapred.job.tracker/name valuehdfs://jack:9001/value descriptionThe host and port that the MapReduce job tracker runs at. If local, then jobs are run in-process as a single map and reduce task./description /property /configuration
hadoop start error - could only be replicated to 0 nodes, instead of 1
Sorry, forgot to attach the log file Hi,I am trying to set up a Hadoop claster. I have a server machine, and I installed XenServer on it. I installed 3 Debian Lenny VMs on the XenServer. And on every VM, I installed sun-java6-jre, openssh-server, rsync and hadoop. VM jack is the namenode, VM kim and VM lynne are datanodes. Everything was cool until I run bin/start-all.sh. I attached three configuration files and the hadoop-cs-namenode-jack.log file who printed a lot errors. Two important errors are:java.lang.IllegalArgumentException: Duplicate metricsName:getProtocolVersionjava.io.IOException: File /home/cs/HadoopInstall/tmp/mapred/system/jobtracker.info could only be replicated to 0 nodes, instead of 1 Thanks.Dennis ?xml version=1.0? ?xml-stylesheet type=text/xsl href=configuration.xsl? !-- Put site-specific property overrides in this file. -- configuration property namehadoop.tmp.dir/name value/home/cs/HadoopInstall/tmp/value descriptionA base for other temporary directories./description /property property namefs.default.name/name valuehdfs://jack:9000/value descriptionThe name of the default file system. Either the literal string local or a host:port for DFS./description /property property namedfs.name.dir/name value/home/cr/HadoopInstall/filesystem/name/value descriptionDetermines where on the local filesystem the DFS name node should store the name table. If this is a comma-delimited list of directories then the name table is replicated in all of the directories, for redundancy. /description /property property namedfs.data.dir/name value/home/cr/HadoopInstall/filesystem/data/value descriptionDetermines where on the local filesystem an DFS data node should store its blocks. If this is a comma-delimited list of directories, then data will be stored in all named directories, typically on different devices. Directories that do not exist are ignored./description /property /configuration ?xml version=1.0? ?xml-stylesheet type=text/xsl href=configuration.xsl? !-- Put site-specific property overrides in this file. -- configuration property namedfs.replication/name value1/value descriptionDefault block replication. The actual number of replications can be specified when the file is created. The default is used if replication is not specified in create time./description /property /configuration ?xml version=1.0? ?xml-stylesheet type=text/xsl href=configuration.xsl? !-- Put site-specific property overrides in this file. -- configuration property namemapred.job.tracker/name valuehdfs://jack:9001/value descriptionThe host and port that the MapReduce job tracker runs at. If local, then jobs are run in-process as a single map and reduce task./description /property /configuration
could only be replicated to 0 nodes, instead of 1 (java.io.EOFException)
Hi all, I have just done a fresh install of hadoop-0.20.1 on a small cluster and can't get it to start up. Could someone please help me diagnose where I might be going wrong? Below are the snippets of logs from the namenode, a datanode and a tasktrasker. I have successfully formated the namenode: 09/10/13 15:18:51 INFO common.Storage: Storage directory /hadoop/name has been successfully formatted. Any advice is greatly appreciated and please let me know if there is more info I can to provide. Thanks Tim The namenode is reporting: --- 2009-10-13 15:00:24,758 INFO org.apache.hadoop.ipc.Server: IPC Server handler 3 on 8020, call addBlock(/hadoop/mapred/system/jobtracker.info, DFSClient_-262825200) from 192.38.28.30:49642: error: java.io.IOException: File /hadoop/mapred/system/jobtracker.info could only be replicated to 0 nodes, instead of 1 java.io.IOException: File /hadoop/mapred/system/jobtracker.info could only be replicated to 0 nodes, instead of 1 at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1267) at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:422) --- And the datanodes are reporting repeatedly: --- 2009-10-13 15:20:40,773 INFO org.apache.hadoop.ipc.RPC: Server at hdfs-master.local/169.254.97.194:8020 not available yet, Z... 2009-10-13 15:20:42,774 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: hdfs-master.local/169.254.97.194:8020. Already tried 0 time(s). --- The task trackers are reporting: --- 2009-10-13 15:06:27,034 ERROR org.apache.hadoop.mapred.TaskTracker: Can not start task tracker because java.io.IOException: Call to hdfs-master.local/169.254.97.194:50070 failed on local exception: java.io.EOFException at org.apache.hadoop.ipc.Client.wrapException(Client.java:774) at org.apache.hadoop.ipc.Client.call(Client.java:742) at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220) at org.apache.hadoop.mapred.$Proxy4.getProtocolVersion(Unknown Source) at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:359) at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:346) at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:383) at org.apache.hadoop.ipc.RPC.waitForProxy(RPC.java:314) at org.apache.hadoop.ipc.RPC.waitForProxy(RPC.java:291) at org.apache.hadoop.mapred.TaskTracker.initialize(TaskTracker.java:514) at org.apache.hadoop.mapred.TaskTracker.init(TaskTracker.java:934) at org.apache.hadoop.mapred.TaskTracker.main(TaskTracker.java:2833) Caused by: java.io.EOFException at java.io.DataInputStream.readInt(DataInputStream.java:375) at org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:508) at org.apache.hadoop.ipc.Client$Connection.run(Client.java:446) ---
could only be replicated to 0 nodes, instead of 1
Hi, All I just start to use Hadoop few days ago. I met the error message WARN hdfs.DFSClient: DataStreamer Exception: org.apache.hadoop.ipc.RemoteException: java.io.IOException: File /user/hadoop/count/count/temp1 could only be replicated to 0 nodes, instead of 1 while trying to copy data files to DFS after Hadoop is started. I did all the settings according to the Running_Hadoop_On_Ubuntu_Linux_(Single-Node_Cluster)'s instruction, and I don't know what's wrong. Besides, during the process, no error message is written to log files. Also, according to http://localhost.localdomain:50070/dfshealth.jsp;, I have one live namenode. By the broswer, I even can see the first data file is created in DFS, but the size of it is 0. Things I've tried: 1. Stop hadoop, re-format DFS and start hadoop again. 2. Change localhost to 127.0.0.1 But neigher of them works. Could anyone help me or give me a hint? Thanks. Anthony -- View this message in context: http://www.nabble.com/could-only-be-replicated-to-0-nodes%2C-instead-of-1-tp24459104p24459104.html Sent from the Hadoop lucene-users mailing list archive at Nabble.com.
Re: could only be replicated to 0 nodes, instead of 1
The full error message is 09/07/02 16:28:09 WARN hdfs.DFSClient: NotReplicatedYetException sleeping /user/hadoop/count/count/temp1 retries left 1 09/07/02 16:28:12 WARN hdfs.DFSClient: DataStreamer Exception: org.apache.hadoop.ipc.RemoteException: java.io.IOException: File /user/hadoop/count/count/temp1 could only be replicated to 0 nodes, instead of 1 at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1280) at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:351) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:481) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:894) at org.apache.hadoop.ipc.Client.call(Client.java:697) at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:216) at $Proxy0.addBlock(Unknown Source) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59) at $Proxy0.addBlock(Unknown Source) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:2814) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2696) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:1996) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2183) -- View this message in context: http://www.nabble.com/could-only-be-replicated-to-0-nodes%2C-instead-of-1-tp24459104p24459151.html Sent from the Hadoop lucene-users mailing list archive at Nabble.com.
Hadoop error help- file system closed, could only be replicated to 0 nodes, instead of 1
Hi, I am extremely new to Hadoop and have come across a few errors that I'm not sure how to fix. I am running Hadoop version 0.19.0 from an image through Elasticfox and S3. I am on windows and use puTTY as my ssh. I am trying to run a wordcount with 5 slaves. This is what I do so far: 1. boot up the instance through ElasticFox 2. cd /usr/local/hadoop-0.19.0 3. bin/hadoop namenode -format 4. bin/start-all.sh 5. jps --( shows jps, jobtracker, secondarynamenode) 6.bin/stop-all.sh 7. ant examples 8. bin/start-all.sh 9. bin/hadoop jar build/hadoop-0.19.0-examples.jar pi 0 100 Then I get this error trace: Number of Maps = 0 Samples per Map = 100 Starting Job 09/06/18 17:31:25 INFO hdfs.DFSClient: org.apache.hadoop.ipc.RemoteException: java.io.IOException: File /mnt/hadoop/mapred/system/job_200906181730_0001/job.jar could only be replicated to 0 nodes, instead of 1 at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1270) at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:351) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:452) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:892) at org.apache.hadoop.ipc.Client.call(Client.java:696) at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:216) at $Proxy0.addBlock(Unknown Source) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59) at $Proxy0.addBlock(Unknown Source) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:2815) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2697) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:1997) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2183) 09/06/18 17:31:25 WARN hdfs.DFSClient: NotReplicatedYetException sleeping /mnt/hadoop/mapred/system/job_200906181730_0001/job.jar retries left 4 09/06/18 17:31:25 INFO hdfs.DFSClient: org.apache.hadoop.ipc.RemoteException: java.io.IOException: File /mnt/hadoop/mapred/system/job_200906181730_0001/job.jar could only be replicated to 0 nodes, instead of 1 at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1270) at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:351) at sun.reflec,t.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:452) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:892) at org.apache.hadoop.ipc.Client.call(Client.java:696) at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:216) at $Proxy0.addBlock(Unknown Source) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59) at $Proxy0.addBlock(Unknown Source) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:2815) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2697) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:1997) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2183) 09/06/18 17:31:25 WARN hdfs.DFSClient: NotReplicatedYetException sleeping /mnt/hadoop/mapred/system/job_200906181730_0001/job.jar retries left 3 09/06/18
Re: Hadoop error help- file system closed, could only be replicated to 0 nodes, instead of 1
HI , What seems from your details is that datanode is not running.can you run *bin/hadoop dfsadmin -report* and find out whether your datanodes are up ? then post your observation and it would be better if you post even your hadoop-site.xml file deatils also. Regards, Ashish. On Fri, Jun 19, 2009 at 3:16 AM, terrianne.erick...@accenture.com wrote: Hi, I am extremely new to Hadoop and have come across a few errors that I'm not sure how to fix. I am running Hadoop version 0.19.0 from an image through Elasticfox and S3. I am on windows and use puTTY as my ssh. I am trying to run a wordcount with 5 slaves. This is what I do so far: 1. boot up the instance through ElasticFox 2. cd /usr/local/hadoop-0.19.0 3. bin/hadoop namenode -format 4. bin/start-all.sh 5. jps --( shows jps, jobtracker, secondarynamenode) 6.bin/stop-all.sh 7. ant examples 8. bin/start-all.sh 9. bin/hadoop jar build/hadoop-0.19.0-examples.jar pi 0 100 Then I get this error trace: Number of Maps = 0 Samples per Map = 100 Starting Job 09/06/18 17:31:25 INFO hdfs.DFSClient: org.apache.hadoop.ipc.RemoteException: java.io.IOException: File /mnt/hadoop/mapred/system/job_200906181730_0001/job.jar could only be replicated to 0 nodes, instead of 1 at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1270) at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:351) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:452) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:892) at org.apache.hadoop.ipc.Client.call(Client.java:696) at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:216) at $Proxy0.addBlock(Unknown Source) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59) at $Proxy0.addBlock(Unknown Source) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:2815) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2697) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:1997) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2183) 09/06/18 17:31:25 WARN hdfs.DFSClient: NotReplicatedYetException sleeping /mnt/hadoop/mapred/system/job_200906181730_0001/job.jar retries left 4 09/06/18 17:31:25 INFO hdfs.DFSClient: org.apache.hadoop.ipc.RemoteException: java.io.IOException: File /mnt/hadoop/mapred/system/job_200906181730_0001/job.jar could only be replicated to 0 nodes, instead of 1 at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1270) at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:351) at sun.reflec,t.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:452) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:892) at org.apache.hadoop.ipc.Client.call(Client.java:696) at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:216) at $Proxy0.addBlock(Unknown Source) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59) at $Proxy0.addBlock(Unknown Source) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:2815) at
Could only be replicated to 0 nodes, instead of 1
Hi. I'm testing Hadoop in our lab, and started getting the following message when trying to copy a file: Could only be replicated to 0 nodes, instead of 1 I have the following setup: * 3 machines, 2 of them with only 80GB of space, and 1 with 1.5GB * Two clients are copying files all the time (one of them is the 1.5GB machine) * The replication is set on 2 * I let the space on 2 smaller machines to end, to test the behavior Now, one of the clients (the one located on 1.5GB) works fine, and the other one - the external, unable to copy and displays the error + the exception below Any idea if this expected on my scenario? Or how it can be solved? Thanks in advance. 09/05/21 10:51:03 WARN dfs.DFSClient: NotReplicatedYetException sleeping /test/test.bin retries left 1 09/05/21 10:51:06 WARN dfs.DFSClient: DataStreamer Exception: org.apache.hadoop.ipc.RemoteException: java.io.IOException: File /test/test.bin could only be replicated to 0 nodes, instead of 1 at org.apache.hadoop.dfs.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1123 ) at org.apache.hadoop.dfs.NameNode.addBlock(NameNode.java:330) at sun.reflect.GeneratedMethodAccessor8.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25 ) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:481) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:890) at org.apache.hadoop.ipc.Client.call(Client.java:716) at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:216) at org.apache.hadoop.dfs.$Proxy0.addBlock(Unknown Source) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39 ) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25 ) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82 ) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59 ) at org.apache.hadoop.dfs.$Proxy0.addBlock(Unknown Source) at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:2450 ) at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2333 ) at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access$1800(DFSClient.java:1745 ) at org.apache.hadoop.dfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:1922 ) 09/05/21 10:51:06 WARN dfs.DFSClient: Error Recovery for block null bad datanode[0] java.io.IOException: Could not get block locations. Aborting... at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.processDatanodeError(DFSClient.java:2153 ) at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access$1400(DFSClient.java:1745 ) at org.apache.hadoop.dfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:1899 )
Re: Could only be replicated to 0 nodes, instead of 1
Hi , I have two suggestion i)Choose a right version ( Hadoop- 0.18 is good) ii)replication should be 3 as ur having 3 modes.( Indirectly see to it that ur configuration is correct !!) Hey even i am just suggesting this as i am also a new to hadoop Ashish Pareek On Thu, May 21, 2009 at 2:41 PM, Stas Oskin stas.os...@gmail.com wrote: Hi. I'm testing Hadoop in our lab, and started getting the following message when trying to copy a file: Could only be replicated to 0 nodes, instead of 1 I have the following setup: * 3 machines, 2 of them with only 80GB of space, and 1 with 1.5GB * Two clients are copying files all the time (one of them is the 1.5GB machine) * The replication is set on 2 * I let the space on 2 smaller machines to end, to test the behavior Now, one of the clients (the one located on 1.5GB) works fine, and the other one - the external, unable to copy and displays the error + the exception below Any idea if this expected on my scenario? Or how it can be solved? Thanks in advance. 09/05/21 10:51:03 WARN dfs.DFSClient: NotReplicatedYetException sleeping /test/test.bin retries left 1 09/05/21 10:51:06 WARN dfs.DFSClient: DataStreamer Exception: org.apache.hadoop.ipc.RemoteException: java.io.IOException: File /test/test.bin could only be replicated to 0 nodes, instead of 1 at org.apache.hadoop.dfs.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1123 ) at org.apache.hadoop.dfs.NameNode.addBlock(NameNode.java:330) at sun.reflect.GeneratedMethodAccessor8.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25 ) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:481) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:890) at org.apache.hadoop.ipc.Client.call(Client.java:716) at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:216) at org.apache.hadoop.dfs.$Proxy0.addBlock(Unknown Source) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39 ) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25 ) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82 ) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59 ) at org.apache.hadoop.dfs.$Proxy0.addBlock(Unknown Source) at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:2450 ) at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2333 ) at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access$1800(DFSClient.java:1745 ) at org.apache.hadoop.dfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:1922 ) 09/05/21 10:51:06 WARN dfs.DFSClient: Error Recovery for block null bad datanode[0] java.io.IOException: Could not get block locations. Aborting... at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.processDatanodeError(DFSClient.java:2153 ) at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access$1400(DFSClient.java:1745 ) at org.apache.hadoop.dfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:1899 )
Re: Could only be replicated to 0 nodes, instead of 1
It does not appear that any datanodes have connected to your namenode. on the datanode machines look in the hadoop logs directory at the datanode log files. There should be some information there that helps you diagnose the problem. chapter 4 of my book provides some detail on work with this problem On Thu, May 21, 2009 at 4:29 AM, ashish pareek pareek...@gmail.com wrote: Hi , I have two suggestion i)Choose a right version ( Hadoop- 0.18 is good) ii)replication should be 3 as ur having 3 modes.( Indirectly see to it that ur configuration is correct !!) Hey even i am just suggesting this as i am also a new to hadoop Ashish Pareek On Thu, May 21, 2009 at 2:41 PM, Stas Oskin stas.os...@gmail.com wrote: Hi. I'm testing Hadoop in our lab, and started getting the following message when trying to copy a file: Could only be replicated to 0 nodes, instead of 1 I have the following setup: * 3 machines, 2 of them with only 80GB of space, and 1 with 1.5GB * Two clients are copying files all the time (one of them is the 1.5GB machine) * The replication is set on 2 * I let the space on 2 smaller machines to end, to test the behavior Now, one of the clients (the one located on 1.5GB) works fine, and the other one - the external, unable to copy and displays the error + the exception below Any idea if this expected on my scenario? Or how it can be solved? Thanks in advance. 09/05/21 10:51:03 WARN dfs.DFSClient: NotReplicatedYetException sleeping /test/test.bin retries left 1 09/05/21 10:51:06 WARN dfs.DFSClient: DataStreamer Exception: org.apache.hadoop.ipc.RemoteException: java.io.IOException: File /test/test.bin could only be replicated to 0 nodes, instead of 1 at org.apache.hadoop.dfs.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1123 ) at org.apache.hadoop.dfs.NameNode.addBlock(NameNode.java:330) at sun.reflect.GeneratedMethodAccessor8.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25 ) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:481) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:890) at org.apache.hadoop.ipc.Client.call(Client.java:716) at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:216) at org.apache.hadoop.dfs.$Proxy0.addBlock(Unknown Source) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39 ) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25 ) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82 ) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59 ) at org.apache.hadoop.dfs.$Proxy0.addBlock(Unknown Source) at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:2450 ) at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2333 ) at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access$1800(DFSClient.java:1745 ) at org.apache.hadoop.dfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:1922 ) 09/05/21 10:51:06 WARN dfs.DFSClient: Error Recovery for block null bad datanode[0] java.io.IOException: Could not get block locations. Aborting... at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.processDatanodeError(DFSClient.java:2153 ) at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access$1400(DFSClient.java:1745 ) at org.apache.hadoop.dfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:1899 ) -- Alpha Chapters of my book on Hadoop are available http://www.apress.com/book/view/9781430219422 www.prohadoopbook.com a community for Hadoop Professionals
Re: Could only be replicated to 0 nodes, instead of 1
Hi. i)Choose a right version ( Hadoop- 0.18 is good) I'm using 0.18.3. ii)replication should be 3 as ur having 3 modes.( Indirectly see to it that ur configuration is correct !!) Actually I'm testing 2x replication on any number of DN's, to see how reliable is it. Hey even i am just suggesting this as i am also a new to hadoop Ashish Pareek On Thu, May 21, 2009 at 2:41 PM, Stas Oskin stas.os...@gmail.com wrote: Hi. I'm testing Hadoop in our lab, and started getting the following message when trying to copy a file: Could only be replicated to 0 nodes, instead of 1 I have the following setup: * 3 machines, 2 of them with only 80GB of space, and 1 with 1.5GB * Two clients are copying files all the time (one of them is the 1.5GB machine) * The replication is set on 2 * I let the space on 2 smaller machines to end, to test the behavior Now, one of the clients (the one located on 1.5GB) works fine, and the other one - the external, unable to copy and displays the error + the exception below Any idea if this expected on my scenario? Or how it can be solved? Thanks in advance. 09/05/21 10:51:03 WARN dfs.DFSClient: NotReplicatedYetException sleeping /test/test.bin retries left 1 09/05/21 10:51:06 WARN dfs.DFSClient: DataStreamer Exception: org.apache.hadoop.ipc.RemoteException: java.io.IOException: File /test/test.bin could only be replicated to 0 nodes, instead of 1 at org.apache.hadoop.dfs.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1123 ) at org.apache.hadoop.dfs.NameNode.addBlock(NameNode.java:330) at sun.reflect.GeneratedMethodAccessor8.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25 ) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:481) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:890) at org.apache.hadoop.ipc.Client.call(Client.java:716) at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:216) at org.apache.hadoop.dfs.$Proxy0.addBlock(Unknown Source) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39 ) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25 ) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82 ) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59 ) at org.apache.hadoop.dfs.$Proxy0.addBlock(Unknown Source) at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:2450 ) at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2333 ) at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access$1800(DFSClient.java:1745 ) at org.apache.hadoop.dfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:1922 ) 09/05/21 10:51:06 WARN dfs.DFSClient: Error Recovery for block null bad datanode[0] java.io.IOException: Could not get block locations. Aborting... at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.processDatanodeError(DFSClient.java:2153 ) at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access$1400(DFSClient.java:1745 ) at org.apache.hadoop.dfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:1899 )
Re: Could only be replicated to 0 nodes, instead of 1
I think you should file a jira on this. Most likely this is what is happening : * two out of 3 dns can not take anymore blocks. * While picking nodes for a new block, NN mostly skips the third dn as well since '# active writes' on it is larger than '2 * avg'. * Even if there is one other block is being written on the 3rd, it is still greater than (2 * 1/3). To test this, if you write just one block to an idle cluster it should succeed. Writing from the client on the 3rd dn succeeds since local node is always favored. This particular problem is not that severe on a large cluster but HDFS should do the sensible thing. Raghu. Stas Oskin wrote: Hi. I'm testing Hadoop in our lab, and started getting the following message when trying to copy a file: Could only be replicated to 0 nodes, instead of 1 I have the following setup: * 3 machines, 2 of them with only 80GB of space, and 1 with 1.5GB * Two clients are copying files all the time (one of them is the 1.5GB machine) * The replication is set on 2 * I let the space on 2 smaller machines to end, to test the behavior Now, one of the clients (the one located on 1.5GB) works fine, and the other one - the external, unable to copy and displays the error + the exception below Any idea if this expected on my scenario? Or how it can be solved? Thanks in advance. 09/05/21 10:51:03 WARN dfs.DFSClient: NotReplicatedYetException sleeping /test/test.bin retries left 1 09/05/21 10:51:06 WARN dfs.DFSClient: DataStreamer Exception: org.apache.hadoop.ipc.RemoteException: java.io.IOException: File /test/test.bin could only be replicated to 0 nodes, instead of 1 at org.apache.hadoop.dfs.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1123 ) at org.apache.hadoop.dfs.NameNode.addBlock(NameNode.java:330) at sun.reflect.GeneratedMethodAccessor8.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25 ) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:481) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:890) at org.apache.hadoop.ipc.Client.call(Client.java:716) at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:216) at org.apache.hadoop.dfs.$Proxy0.addBlock(Unknown Source) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39 ) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25 ) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82 ) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59 ) at org.apache.hadoop.dfs.$Proxy0.addBlock(Unknown Source) at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:2450 ) at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2333 ) at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access$1800(DFSClient.java:1745 ) at org.apache.hadoop.dfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:1922 ) 09/05/21 10:51:06 WARN dfs.DFSClient: Error Recovery for block null bad datanode[0] java.io.IOException: Could not get block locations. Aborting... at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.processDatanodeError(DFSClient.java:2153 ) at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access$1400(DFSClient.java:1745 ) at org.apache.hadoop.dfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:1899 )
Re: Could only be replicated to 0 nodes, instead of 1
On May 21, 2009, at 2:01 PM, Raghu Angadi wrote: I think you should file a jira on this. Most likely this is what is happening : * two out of 3 dns can not take anymore blocks. * While picking nodes for a new block, NN mostly skips the third dn as well since '# active writes' on it is larger than '2 * avg'. * Even if there is one other block is being written on the 3rd, it is still greater than (2 * 1/3). To test this, if you write just one block to an idle cluster it should succeed. Writing from the client on the 3rd dn succeeds since local node is always favored. This particular problem is not that severe on a large cluster but HDFS should do the sensible thing. Hey Raghu, If this analysis is right, I would add it can happen even on large clusters! I've seen this error at our cluster when we're very full (97%) and very few nodes have any empty space. This usually happens because we have two very large nodes (10x bigger than the rest of the cluster), and HDFS tends to distribute writes randomly -- meaning the smaller nodes fill up quickly, until the balancer can catch up. Brian Raghu. Stas Oskin wrote: Hi. I'm testing Hadoop in our lab, and started getting the following message when trying to copy a file: Could only be replicated to 0 nodes, instead of 1 I have the following setup: * 3 machines, 2 of them with only 80GB of space, and 1 with 1.5GB * Two clients are copying files all the time (one of them is the 1.5GB machine) * The replication is set on 2 * I let the space on 2 smaller machines to end, to test the behavior Now, one of the clients (the one located on 1.5GB) works fine, and the other one - the external, unable to copy and displays the error + the exception below Any idea if this expected on my scenario? Or how it can be solved? Thanks in advance. 09/05/21 10:51:03 WARN dfs.DFSClient: NotReplicatedYetException sleeping /test/test.bin retries left 1 09/05/21 10:51:06 WARN dfs.DFSClient: DataStreamer Exception: org.apache.hadoop.ipc.RemoteException: java.io.IOException: File /test/test.bin could only be replicated to 0 nodes, instead of 1 at org .apache .hadoop.dfs.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1123 ) at org.apache.hadoop.dfs.NameNode.addBlock(NameNode.java: 330) at sun.reflect.GeneratedMethodAccessor8.invoke(Unknown Source) at sun .reflect .DelegatingMethodAccessorImpl .invoke(DelegatingMethodAccessorImpl.java:25 ) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:481) at org.apache.hadoop.ipc.Server$Handler.run(Server.java: 890) at org.apache.hadoop.ipc.Client.call(Client.java:716) at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:216) at org.apache.hadoop.dfs.$Proxy0.addBlock(Unknown Source) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun .reflect .NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39 ) at sun .reflect .DelegatingMethodAccessorImpl .invoke(DelegatingMethodAccessorImpl.java:25 ) at java.lang.reflect.Method.invoke(Method.java:597) at org .apache .hadoop .io .retry .RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82 ) at org .apache .hadoop .io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java: 59 ) at org.apache.hadoop.dfs.$Proxy0.addBlock(Unknown Source) at org.apache.hadoop.dfs.DFSClient $DFSOutputStream.locateFollowingBlock(DFSClient.java:2450 ) at org.apache.hadoop.dfs.DFSClient $DFSOutputStream.nextBlockOutputStream(DFSClient.java:2333 ) at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access $1800(DFSClient.java:1745 ) at org.apache.hadoop.dfs.DFSClient$DFSOutputStream $DataStreamer.run(DFSClient.java:1922 ) 09/05/21 10:51:06 WARN dfs.DFSClient: Error Recovery for block null bad datanode[0] java.io.IOException: Could not get block locations. Aborting... at org.apache.hadoop.dfs.DFSClient $DFSOutputStream.processDatanodeError(DFSClient.java:2153 ) at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access $1400(DFSClient.java:1745 ) at org.apache.hadoop.dfs.DFSClient$DFSOutputStream $DataStreamer.run(DFSClient.java:1899 )
Re: Could only be replicated to 0 nodes, instead of 1
Brian Bockelman wrote: On May 21, 2009, at 2:01 PM, Raghu Angadi wrote: I think you should file a jira on this. Most likely this is what is happening : * two out of 3 dns can not take anymore blocks. * While picking nodes for a new block, NN mostly skips the third dn as well since '# active writes' on it is larger than '2 * avg'. * Even if there is one other block is being written on the 3rd, it is still greater than (2 * 1/3). To test this, if you write just one block to an idle cluster it should succeed. Writing from the client on the 3rd dn succeeds since local node is always favored. This particular problem is not that severe on a large cluster but HDFS should do the sensible thing. Hey Raghu, If this analysis is right, I would add it can happen even on large clusters! I've seen this error at our cluster when we're very full (97%) and very few nodes have any empty space. This usually happens because we have two very large nodes (10x bigger than the rest of the cluster), and HDFS tends to distribute writes randomly -- meaning the smaller nodes fill up quickly, until the balancer can catch up. Yes. This would bite when ever a large portion of nodes can not accept blocks. In general can happen whenever less than half the nodes have any space left. Raghu.
Re: Could only be replicated to 0 nodes, instead of 1
Hi. I think you should file a jira on this. Most likely this is what is happening : Will do - this goes to DFS section, correct? * two out of 3 dns can not take anymore blocks. * While picking nodes for a new block, NN mostly skips the third dn as well since '# active writes' on it is larger than '2 * avg'. * Even if there is one other block is being written on the 3rd, it is still greater than (2 * 1/3). Frankly I'm not so familiar with Hadoop inner workings to understand this completely, but from what I digest, NN doesn't like the 3rd DN because there is too many blocks on it, compared to other servers? To test this, if you write just one block to an idle cluster it should succeed. What exactly is idle cluster? Something that nothing is being written to (including the 3rd DN)? Writing from the client on the 3rd dn succeeds since local node is always favored. Makes sense. This particular problem is not that severe on a large cluster but HDFS should do the sensible thing. Yes, I agree that this is a non-standard situation, but IMHO the best way of action would be write anyway, but throw a warning. There is one already appearing when there is not enough space for replication, and it explains quite well the matter. So similar one would be great.
Re: Could only be replicated to 0 nodes, instead of 1
Hi. If this analysis is right, I would add it can happen even on large clusters! I've seen this error at our cluster when we're very full (97%) and very few nodes have any empty space. This usually happens because we have two very large nodes (10x bigger than the rest of the cluster), and HDFS tends to distribute writes randomly -- meaning the smaller nodes fill up quickly, until the balancer can catch up. A bit of topic, do you ran the balancer manually? Or you have some scheduler that does it?
Re: Could only be replicated to 0 nodes, instead of 1
On May 21, 2009, at 3:10 PM, Stas Oskin wrote: Hi. If this analysis is right, I would add it can happen even on large clusters! I've seen this error at our cluster when we're very full (97%) and very few nodes have any empty space. This usually happens because we have two very large nodes (10x bigger than the rest of the cluster), and HDFS tends to distribute writes randomly -- meaning the smaller nodes fill up quickly, until the balancer can catch up. A bit of topic, do you ran the balancer manually? Or you have some scheduler that does it? crontab does it for us, once an hour. We're always importing data, so the cluster is always out-of-balance. If the previous balancer didn't exit, the new one will simply exit. The real trick has been to make sure the balancer doesn't get stuck -- a Nagios plugin makes sure that the stdout has been printed to in the last hour or so, otherwise it kills the running balancer. Stuck balancers have been an issue in the past. Brian
Re: Could only be replicated to 0 nodes, instead of 1
The real trick has been to make sure the balancer doesn't get stuck -- a Nagios plugin makes sure that the stdout has been printed to in the last hour or so, otherwise it kills the running balancer. Stuck balancers have been an issue in the past. Thanks for the advice.
Re: Could only be replicated to 0 nodes, instead of 1
I think you should file a jira on this. Most likely this is what is happening : Here it is - hope it's ok: https://issues.apache.org/jira/browse/HADOOP-5886
Re: Could only be replicated to 0 nodes, instead of 1
Stas Oskin wrote: I think you should file a jira on this. Most likely this is what is happening : Here it is - hope it's ok: https://issues.apache.org/jira/browse/HADOOP-5886 looks good. I will add my earlier post as comment. You could update the jira with any more tests. Next time, it would be better include larger stack traces, logs etc in subsequent comments rather than in the description. Thanks, Raghu.
Re: could only be replicated to 0 nodes, instead of 1
Hi I have got a very similar problem when trying to configure HDFS. The solution was configuring a smaller block size. I wanted to install HDFS for testing purposes only, so decided to have ~300 MB of storage space on each machine. The block size was set up to 128 MB ( I used cloudera configuration tool). After changing the block size to 1 MB ( could be bigger but it is not a production environment ), everything started to work fine ! regards Piotr Praczyk
Re: could only be replicated to 0 nodes, instead of 1
Hi, If you are getting this in windows environment (2003 64 bit). We have faced the same problem. Now we tried the following steps and it started working. 1)Install cygwin and ssh. 2) Downloaded the stable version Hadoop - hadoop-0.17.2.1.tar.gz as on 13/Nov/2008 3) Untar it via cygwin (tar xvfz hadoop-0.17.2.1.tar.gz). please DONOT use WINZIP to untar. 4) We tried running the sudo distribution example provided in quickstart (http://hadoop.apache.org/core/docs/current/quickstart.html) and it worked. Thanks Arul and Limin eBay Inc., jerrro wrote: I am trying to install/configure hadoop on a cluster with several computers. I followed exactly the instructions in the hadoop website for configuring multiple slaves, and when I run start-all.sh I get no errors - both datanode and tasktracker are reported to be running (doing ps awux | grep hadoop on the slave nodes returns two java processes). Also, the log files are empty - nothing is printed there. Still, when I try to use bin/hadoop dfs -put, I get the following error: # bin/hadoop dfs -put w.txt w.txt put: java.io.IOException: File /user/scohen/w4.txt could only be replicated to 0 nodes, instead of 1 and a file of size 0 is created on the DFS (bin/hadoop dfs -ls shows it). I couldn't find much information about this error, but I did manage to see somewhere it might mean that there are no datanodes running. But as I said, start-all does not give any errors. Any ideas what could be problem? Thanks. Jerr. -- View this message in context: http://www.nabble.com/%22could-only-be-replicated-to-0-nodes%2C-instead-of-1%22-tp14175780p20488938.html Sent from the Hadoop lucene-users mailing list archive at Nabble.com.
Re: could only be replicated to 0 nodes, instead of 1
I get the same error when doing a put and my cluster is running ok i.e. has capacity and all nodes are live. Error message is org.apache.hadoop.ipc.RemoteException: java.io.IOException: File /test/test.txt could only be replicated to 0 nodes, instead of 1 at org.apache.hadoop.dfs.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1127) at org.apache.hadoop.dfs.NameNode.addBlock(NameNode.java:312) at sun.reflect.GeneratedMethodAccessor11.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:585) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:409) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:901) at org.apache.hadoop.ipc.Client.call(Client.java:512) at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:198) at org.apache.hadoop.dfs.$Proxy0.addBlock(Unknown Source) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:585) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59) at org.apache.hadoop.dfs.$Proxy0.addBlock(Unknown Source) at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:2074) at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:1967) at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access$1500(DFSClient.java:1487) at org.apache.hadoop.dfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:1601) I would appreciate any help/suggestions Thanks jerrro wrote: I am trying to install/configure hadoop on a cluster with several computers. I followed exactly the instructions in the hadoop website for configuring multiple slaves, and when I run start-all.sh I get no errors - both datanode and tasktracker are reported to be running (doing ps awux | grep hadoop on the slave nodes returns two java processes). Also, the log files are empty - nothing is printed there. Still, when I try to use bin/hadoop dfs -put, I get the following error: # bin/hadoop dfs -put w.txt w.txt put: java.io.IOException: File /user/scohen/w4.txt could only be replicated to 0 nodes, instead of 1 and a file of size 0 is created on the DFS (bin/hadoop dfs -ls shows it). I couldn't find much information about this error, but I did manage to see somewhere it might mean that there are no datanodes running. But as I said, start-all does not give any errors. Any ideas what could be problem? Thanks. Jerr. -- View this message in context: http://www.nabble.com/%22could-only-be-replicated-to-0-nodes%2C-instead-of-1%22-tp14175780p17124514.html Sent from the Hadoop lucene-users mailing list archive at Nabble.com.
Re: could only be replicated to 0 nodes, instead of 1
Could you please go to the dfs webUI and check how many datanodes are up and how much available space each has? Hairong On 5/8/08 3:30 AM, jasongs [EMAIL PROTECTED] wrote: I get the same error when doing a put and my cluster is running ok i.e. has capacity and all nodes are live. Error message is org.apache.hadoop.ipc.RemoteException: java.io.IOException: File /test/test.txt could only be replicated to 0 nodes, instead of 1 at org.apache.hadoop.dfs.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1127) at org.apache.hadoop.dfs.NameNode.addBlock(NameNode.java:312) at sun.reflect.GeneratedMethodAccessor11.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.j ava:25) at java.lang.reflect.Method.invoke(Method.java:585) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:409) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:901) at org.apache.hadoop.ipc.Client.call(Client.java:512) at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:198) at org.apache.hadoop.dfs.$Proxy0.addBlock(Unknown Source) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.j ava:25) at java.lang.reflect.Method.invoke(Method.java:585) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocation Handler.java:82) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandle r.java:59) at org.apache.hadoop.dfs.$Proxy0.addBlock(Unknown Source) at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient .java:2074) at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClien t.java:1967) at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access$1500(DFSClient.java:148 7) at org.apache.hadoop.dfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.jav a:1601) I would appreciate any help/suggestions Thanks jerrro wrote: I am trying to install/configure hadoop on a cluster with several computers. I followed exactly the instructions in the hadoop website for configuring multiple slaves, and when I run start-all.sh I get no errors - both datanode and tasktracker are reported to be running (doing ps awux | grep hadoop on the slave nodes returns two java processes). Also, the log files are empty - nothing is printed there. Still, when I try to use bin/hadoop dfs -put, I get the following error: # bin/hadoop dfs -put w.txt w.txt put: java.io.IOException: File /user/scohen/w4.txt could only be replicated to 0 nodes, instead of 1 and a file of size 0 is created on the DFS (bin/hadoop dfs -ls shows it). I couldn't find much information about this error, but I did manage to see somewhere it might mean that there are no datanodes running. But as I said, start-all does not give any errors. Any ideas what could be problem? Thanks. Jerr.
Re: Job.jar could only be replicated to 0 nodes, instead of 1(IO Exception)
Sridhar Raman wrote: I am trying to run K-Means using Hadoop. I first wanted to test it within a single-node cluster. And this was the error I got. What could be the problem? $ bin/hadoop jar clustering.jar com.company.analytics.clustering.mr.core.KMeansDriver Iteration 0 org.apache.hadoop.ipc.RemoteException: java.io.IOException: File /WORK/temp/hadoop/workspace/hadoop-user/mapred/system/job_200804291904_0001/job.jar could only be replicated to 0 nodes, instead of 1 Check if your datanode is up or not. Amar at org.apache.hadoop.dfs.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1003) at org.apache.hadoop.dfs.NameNode.addBlock(NameNode.java:293) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:585) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:379) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:596) at org.apache.hadoop.ipc.Client.call(Client.java:482) at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:184) at org.apache.hadoop.dfs.$Proxy0.addBlock(Unknown Source) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:585) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59) at org.apache.hadoop.dfs.$Proxy0.addBlock(Unknown Source) at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:1554) at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:1500) at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.endBlock(DFSClient.java:1626) at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.close(DFSClient.java:1733) at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:49) at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:64) at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:55) at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:83) at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:140) at org.apache.hadoop.fs.FileSystem.copyFromLocalFile(FileSystem.java:827) at org.apache.hadoop.fs.FileSystem.copyFromLocalFile(FileSystem.java:815) at org.apache.hadoop.fs.FileSystem.copyFromLocalFile(FileSystem.java:796) at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:493) at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:753) at com.company.analytics.clustering.mr.core.KMeansDriver.runIteration(KMeansDriver.java:136) at com.company.analytics.clustering.mr.core.KMeansDriver.runJob(KMeansDriver.java:88) at com.company.analytics.clustering.mr.core.KMeansDriver.main(KMeansDriver.java:34) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:585) at org.apache.hadoop.util.RunJar.main(RunJar.java:155)
Re: could only be replicated to 0 nodes, instead of 1
i had the same error message... can you describe when and how this error occurs? Jayant Durgad wrote: I am faced with the exact same problem described here, does anybody know how to resolve this? -- View this message in context: http://www.nabble.com/Re%3A-%22could-only-be-replicated-to-0-nodes%2C-instead-of-1%22-tp16623192p16656655.html Sent from the Hadoop core-user mailing list archive at Nabble.com.
Re: could only be replicated to 0 nodes, instead of 1
Can you check the datanode and namenode logs and see if all are up and running? I am assuming you are running this on single host hence replication of 1. Thanks, Lohit - Original Message From: John Menzer [EMAIL PROTECTED] To: core-user@hadoop.apache.org Sent: Saturday, April 12, 2008 2:04:00 PM Subject: Re: could only be replicated to 0 nodes, instead of 1 i had the same error message... can you describe when and how this error occurs? Jayant Durgad wrote: I am faced with the exact same problem described here, does anybody know how to resolve this? -- View this message in context: http://www.nabble.com/Re%3A-%22could-only-be-replicated-to-0-nodes%2C-instead-of-1%22-tp16623192p16656655.html Sent from the Hadoop core-user mailing list archive at Nabble.com.
Re: could only be replicated to 0 nodes, instead of 1
jerrro wrote: I couldn't find much information about this error, but I did manage to see somewhere it might mean that there are no datanodes running. But as I said, start-all does not give any errors. Any ideas what could be problem? start-all return does not mean datanodes are ok. Did you check if there are any datanodes alive? You can check from http://namenode:50070/. Raghu.
Re: could only be replicated to 0 nodes, instead of 1
I am faced with the exact same problem described here, does anybody know how to resolve this?