How to solve a DisallowedDatanodeException?
Hi, I'm running a cluster on amazon and sometimes I'm getting this exception: 2011-10-07 10:36:28,014 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: org.apache.hadoop.ipc.RemoteException: org.apache.hadoop.hdfs.server.protocol.DisallowedDatanodeException: Datanode denied communication with namenode: ip-10-235-57-112.eu-west-1.compute.internal:50010 at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.registerDatanode(FSNamesystem.java:2042) at org.apache.hadoop.hdfs.server.namenode.NameNode.register(NameNode.java:687) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:508) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:959) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:955) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:953) at org.apache.hadoop.ipc.Client.call(Client.java:740) at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220) at $Proxy4.register(Unknown Source) at org.apache.hadoop.hdfs.server.datanode.DataNode.register(DataNode.java:531) at org.apache.hadoop.hdfs.server.datanode.DataNode.runDatanodeDaemon(DataNode.java:1208) at org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:1247) at org.apache.hadoop.hdfs.server.datanode.DataNode.main(DataNode.java:1368) Since I have this exception I'm not able to run any datanode. I have checked all the connections between the nodes and they are ok, I have tried also to format the namenode but the problem is still remaining. Shall I need to remove the information about the datanode? rm -rf ${HOME}/dfs-xvdh/dn I would prefer a solution that doesn't implies a format or erasing anything... Regards, Raimon Bosch.
Re: Question regarding hdfs synchronously / asynchronously block replication
Thanks a lot!! On Wed, Oct 5, 2011 at 3:51 PM, Eric Fiala e...@fiala.ca wrote: Ronen, On file write HDFS's block replication pipeline is asynchronous - datanode 1 gets a block before passing it onto datanode 2, and so on (limiting network traffic between client node and the data nodes - it only writes to one). The ACK for a packet is returned only once all datanodes in the pipeline have copied the block. However, if a failure occurs in the interim on a datanode in the write pipeline, AND the minimum replication threshold has been met (normally 1) - namenode will, in seperate operation, quell the replica deficit. Don't think that's configurable, however, it would be interesting use case for speeding up writes, while trading off some reliability. EF On Wed, Oct 5, 2011 at 1:53 AM, Ronen Itkin ro...@taykey.com wrote: Hi all! My question is regarding hdfs block replication. From the perspective of client, does the application receives an ACK for a certain packet after it was written on the first hadoop data node in the pipeline? or after the packet is *replicated* to all assigned *replication* nodes? More generaly, does Hadoop's HDFS block replication works synchronously or asynchronously? synchronously -- more replications = decrease in write performances (client has to wait until every packet will be written to all replication nodes before he receives an ACK). asynchronously -- more replication has no influence on write performance (client recieves an ACK packet after the first write to the first datadone finishes, hdfs will complete its replication on his free time). synchronously / asynchronously block replication - is it something configurable ? If it is, than how can I do it? Thanks! -- * Ronen Itkin* Taykey | www.taykey.com -- * Ronen Itkin* Taykey | www.taykey.com
Re: How to solve a DisallowedDatanodeException?
Raimon - the error org.apache.hadoop.hdfs.server.protocol.DisallowedDatanodeException: Datanode denied communication with namenode Usually indicates that the datanode that is trying to connect to the namenode is either: - listed in the file defined by dfs.hosts.exclude (explicitly excluded) - or - that dfs.hosts (explicitly included) is used and the node is not listed within that file Make sure the datanode is not listed in excludes, and if you are using dfs.hosts, add it to the includes, and run hadoop dfsadmin -refreshNodes You should not have to remove any data on local disc to solve this problem. HTH EF On Fri, Oct 7, 2011 at 4:47 AM, Raimon Bosch raimon.bo...@gmail.com wrote: Hi, I'm running a cluster on amazon and sometimes I'm getting this exception: 2011-10-07 10:36:28,014 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: org.apache.hadoop.ipc.RemoteException: org.apache.hadoop.hdfs.server.protocol.DisallowedDatanodeException: Datanode denied communication with namenode: ip-10-235-57-112.eu-west-1.compute.internal:50010 at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.registerDatanode(FSNamesystem.java:2042) at org.apache.hadoop.hdfs.server.namenode.NameNode.register(NameNode.java:687) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:508) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:959) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:955) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:953) at org.apache.hadoop.ipc.Client.call(Client.java:740) at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220) at $Proxy4.register(Unknown Source) at org.apache.hadoop.hdfs.server.datanode.DataNode.register(DataNode.java:531) at org.apache.hadoop.hdfs.server.datanode.DataNode.runDatanodeDaemon(DataNode.java:1208) at org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:1247) at org.apache.hadoop.hdfs.server.datanode.DataNode.main(DataNode.java:1368) Since I have this exception I'm not able to run any datanode. I have checked all the connections between the nodes and they are ok, I have tried also to format the namenode but the problem is still remaining. Shall I need to remove the information about the datanode? rm -rf ${HOME}/dfs-xvdh/dn I would prefer a solution that doesn't implies a format or erasing anything... Regards, Raimon Bosch. -- *Eric Fiala* *Fiala Consulting* T: 403.828.1117 E: e...@fiala.ca http://www.fiala.ca
Re: DFSClient: Could not complete file
Sorry to bring this back from the dead, but we're having the issues again. This is on a NEW cluster, using Cloudera 0.20.2-cdh3u0 (old was stock Apache 0.20.2). Nothing carried over from the old cluster except data in HDFS (copied from old cluster). Bigger/more machines, more RAM, faster disks etc. And it is back. Confirmed that all the disks setup for HDFS are 'deadline'. Runs fine for few days then hangs again with the 'Could not complete' error in the JobTracker log until we kill the cluster. 2011-09-09 08:04:32,429 INFO org.apache.hadoop.hdfs.DFSClient: Could not complete file /log/hadoop/tmp/flow_BYVMTA_family_BYVMTA_72751_8284775/_logs/history/10.120.55.2_1311201333949_job_201107201835_13900_deliv_flow_BYVMTA%2Bflow_BYVMTA*family_B%5B%284%2F5%29+...UNCED%27%2C+ retrying... Found HDFS-148 (https://issues.apache.org/jira/browse/HDFS-148) which looks like what could be happening to us. Anyone found a good workaround? Any other ideas? Also, does the HDFS system try to do 'du' on disks not assigned to it? The HDFS disks are separate from the root and OS disks. Those disks are NOT setup to be 'deadline'. Should that matter? Thanks, Chris On Tue, Mar 29, 2011 at 7:53 PM, Brian Bockelman bbock...@cse.unl.eduwrote: Hi Chris, One thing we've found helping in ext3 is examining your I/O scheduler. Make sure it's set to deadline, not CFQ. This will help prevent nodes from being overloaded; when du -sk is performed and the node is already overloaded, things quickly roll downhill. Brian On Mar 29, 2011, at 11:44 AM, Chris Curtin wrote: We are narrowing this down. The last few times it hung we found a 'du -sk' process for each our HDFS disks as the top users of CPU. They are also taking a really long time. Searching around I find one example of someone reporting a similar issue with du -sk, but they tied it to XFS. We are using Ext3. Anyone have any other ideas since it appears to be related to the 'du' not coming back? Note that running the command directly finishes in a few seconds. Thanks, Chris On Wed, Mar 16, 2011 at 9:41 AM, Chris Curtin curtin.ch...@gmail.com wrote: Caught something today I missed before: 11/03/16 09:32:49 INFO hdfs.DFSClient: Exception in createBlockOutputStream java.io.IOException: Bad connect ack with firstBadLink 10.120.41.105:50010 11/03/16 09:32:49 INFO hdfs.DFSClient: Abandoning block blk_-517003810449127046_10039793 11/03/16 09:32:49 INFO hdfs.DFSClient: Waiting to find target node: 10.120.41.103:50010 11/03/16 09:34:04 INFO hdfs.DFSClient: Exception in createBlockOutputStream java.net.SocketTimeoutException: 69000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/10.120.41.85:34323 remote=/10.120.41.105:50010] 11/03/16 09:34:04 INFO hdfs.DFSClient: Abandoning block blk_2153189599588075377_10039793 11/03/16 09:34:04 INFO hdfs.DFSClient: Waiting to find target node: 10.120.41.105:50010 11/03/16 09:34:55 INFO hdfs.DFSClient: Could not complete file /tmp/hadoop/mapred/system/job_201103160851_0014/job.jar retrying... On Wed, Mar 16, 2011 at 9:00 AM, Chris Curtin curtin.ch...@gmail.com wrote: Thanks. Spent a lot of time looking at logs and nothing on the reducers until they start complaining about 'could not complete'. Found this in the jobtracker log file: 2011-03-16 02:38:47,881 WARN org.apache.hadoop.hdfs.DFSClient: DFSOutputStream ResponseProcessor exception for block blk_3829493505250917008_9959810java.io.IOException: Bad response 1 for block blk_3829493505250917008_9959810 from datanode 10.120.41.103:50010 at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$ResponseProcessor.run(DFSClient.java:2454) 2011-03-16 02:38:47,881 WARN org.apache.hadoop.hdfs.DFSClient: Error Recovery for block blk_3829493505250917008_9959810 bad datanode[2] 10.120.41.103:50010 2011-03-16 02:38:47,881 WARN org.apache.hadoop.hdfs.DFSClient: Error Recovery for block blk_3829493505250917008_9959810 in pipeline 10.120.41.105:50010, 10.120.41.102:50010, 10.120.41.103:50010: bad datanode 10.120.41.103:50010 2011-03-16 02:38:53,133 INFO org.apache.hadoop.hdfs.DFSClient: Could not complete file /var/hadoop/tmp/2_20110316_pmta_pipe_2_20_50351_2503122/_logs/history/hadnn01.atlis1_1299879680612_job_201103111641_0312_deliv_2_20110316_pmta_pipe*2_20110316_%5B%281%2F3%29+...QUEUED_T retrying... Looking at the logs from the various times this happens, the 'from datanode' in the first message is any of the data nodes (roughly equal in # of times it fails), so I don't think it is one specific node having problems. Any other ideas? Thanks, Chris On Sun, Mar 13, 2011 at 3:45 AM, icebergs hkm...@gmail.com wrote: You should check the bad reducers' logs carefully.There may be more information about it. 2011/3/10 Chris Curtin curtin.ch...@gmail.com Hi, The
Re: How to solve a DisallowedDatanodeException?
My list of dfs.hosts was correct in all the servers. In this case I had a problem with the internal DNS from amazon. I had to restart all my nodes to getting rid of this problem. After some changes on my cluster (renaming nodes), some nodes had automatically changed his IP and I had to perform a restart to force a change also in the internal ip's. 2011/10/7 Eric Fiala e...@fiala.ca Raimon - the error org.apache.hadoop.hdfs.server.protocol.DisallowedDatanodeException: Datanode denied communication with namenode Usually indicates that the datanode that is trying to connect to the namenode is either: - listed in the file defined by dfs.hosts.exclude (explicitly excluded) - or - that dfs.hosts (explicitly included) is used and the node is not listed within that file Make sure the datanode is not listed in excludes, and if you are using dfs.hosts, add it to the includes, and run hadoop dfsadmin -refreshNodes You should not have to remove any data on local disc to solve this problem. HTH EF On Fri, Oct 7, 2011 at 4:47 AM, Raimon Bosch raimon.bo...@gmail.com wrote: Hi, I'm running a cluster on amazon and sometimes I'm getting this exception: 2011-10-07 10:36:28,014 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: org.apache.hadoop.ipc.RemoteException: org.apache.hadoop.hdfs.server.protocol.DisallowedDatanodeException: Datanode denied communication with namenode: ip-10-235-57-112.eu-west-1.compute.internal:50010 at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.registerDatanode(FSNamesystem.java:2042) at org.apache.hadoop.hdfs.server.namenode.NameNode.register(NameNode.java:687) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:508) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:959) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:955) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:953) at org.apache.hadoop.ipc.Client.call(Client.java:740) at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220) at $Proxy4.register(Unknown Source) at org.apache.hadoop.hdfs.server.datanode.DataNode.register(DataNode.java:531) at org.apache.hadoop.hdfs.server.datanode.DataNode.runDatanodeDaemon(DataNode.java:1208) at org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:1247) at org.apache.hadoop.hdfs.server.datanode.DataNode.main(DataNode.java:1368) Since I have this exception I'm not able to run any datanode. I have checked all the connections between the nodes and they are ok, I have tried also to format the namenode but the problem is still remaining. Shall I need to remove the information about the datanode? rm -rf ${HOME}/dfs-xvdh/dn I would prefer a solution that doesn't implies a format or erasing anything... Regards, Raimon Bosch. -- *Eric Fiala* *Fiala Consulting* T: 403.828.1117 E: e...@fiala.ca http://www.fiala.ca
Re: How to solve a DisallowedDatanodeException?
in the internal dns's sorry... 2011/10/7 Raimon Bosch raimon.bo...@gmail.com My list of dfs.hosts was correct in all the servers. In this case I had a problem with the internal DNS from amazon. I had to restart all my nodes to getting rid of this problem. After some changes on my cluster (renaming nodes), some nodes had automatically changed his IP and I had to perform a restart to force a change also in the internal ip's. 2011/10/7 Eric Fiala e...@fiala.ca Raimon - the error org.apache.hadoop.hdfs.server.protocol.DisallowedDatanodeException: Datanode denied communication with namenode Usually indicates that the datanode that is trying to connect to the namenode is either: - listed in the file defined by dfs.hosts.exclude (explicitly excluded) - or - that dfs.hosts (explicitly included) is used and the node is not listed within that file Make sure the datanode is not listed in excludes, and if you are using dfs.hosts, add it to the includes, and run hadoop dfsadmin -refreshNodes You should not have to remove any data on local disc to solve this problem. HTH EF On Fri, Oct 7, 2011 at 4:47 AM, Raimon Bosch raimon.bo...@gmail.com wrote: Hi, I'm running a cluster on amazon and sometimes I'm getting this exception: 2011-10-07 10:36:28,014 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: org.apache.hadoop.ipc.RemoteException: org.apache.hadoop.hdfs.server.protocol.DisallowedDatanodeException: Datanode denied communication with namenode: ip-10-235-57-112.eu-west-1.compute.internal:50010 at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.registerDatanode(FSNamesystem.java:2042) at org.apache.hadoop.hdfs.server.namenode.NameNode.register(NameNode.java:687) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:508) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:959) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:955) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:953) at org.apache.hadoop.ipc.Client.call(Client.java:740) at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220) at $Proxy4.register(Unknown Source) at org.apache.hadoop.hdfs.server.datanode.DataNode.register(DataNode.java:531) at org.apache.hadoop.hdfs.server.datanode.DataNode.runDatanodeDaemon(DataNode.java:1208) at org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:1247) at org.apache.hadoop.hdfs.server.datanode.DataNode.main(DataNode.java:1368) Since I have this exception I'm not able to run any datanode. I have checked all the connections between the nodes and they are ok, I have tried also to format the namenode but the problem is still remaining. Shall I need to remove the information about the datanode? rm -rf ${HOME}/dfs-xvdh/dn I would prefer a solution that doesn't implies a format or erasing anything... Regards, Raimon Bosch. -- *Eric Fiala* *Fiala Consulting* T: 403.828.1117 E: e...@fiala.ca http://www.fiala.ca
Using native libraries with hadoop
Hi, I intend to use a native library with Java bindings in the hadoop job. The problem is that Java bindings expect a file path as a parameter, and this file should reside on a local file system. What is the best way to solve this problem? (I wouldn't not want to modify the code of the native library to accept e.g. a file as a string). Thanks a lot in advance, Vyacheslav
Re: Adjusting column value size.
Yes, I need all of those ints at the same time. And no, there is no streaming. I have decided to pack 1024 ints into one cell so that each cell would be of size 4kb. I am already using LZO on my tables. I'll do some experiments once I finish implementing both approach. I'll add a thread about the results when I am done. Thanks for the advice. Ed. 2011/10/7 Jean-Daniel Cryans jdcry...@apache.org (BCC'd common-user@ since this seems strictly HBase related) Interesting question... And you probably need all those ints at the same time right? No streaming? I'll assume no. So the second solution seems better due to the overhead of storing each cell. Basically, storing one int per cell you would end up storing more keys than values (size wise). Another thing is that if you pack enough ints together and there's some sort of repetition, you might be able to use LZO compression on that table. I'd love to hear about your experimentations once you've done them. J-D On Mon, Oct 3, 2011 at 10:58 PM, edward choi mp2...@gmail.com wrote: Hi, I have a question regarding the performance and column value size. I need to store per row several million integers. (Several million is important here) I was wondering which method would be more beneficial performance wise. 1) Store each integer to a single column so that when a row is called, several million columns will also be called. And the user would map each column values to some kind of container (ex: vector, arrayList) 2) Store, for example, a thousand integers into a single column (by concatenating them) so that when a row is called, only several thousand columns will be called along. The user would have to split the column value into 4 bytes and map the split integer to some kind of container (ex: vector, arrayList) I am curious which approach would be better. 1) would call several millions of columns but no additional process is needed. 2) would call only several thousands of columns but additional process is needed. Any advice would be appreciated. Ed
hadoop knowledge gaining
Guys, I am able to deploy the first program word count using hadoop. I am interesting exploring more about hadoop and Hbase and don't know which is the best way to grasp both of them. I have hadoop in action but it has older api. I do also have Hbase definitive guide which I have not started exploring. -Jignesh
Re: How to solve a DisallowedDatanodeException?
Definetly it was an amazon problem. They were assigning a new internal ip but some of the nodes were using the old one. I had to force on all my /etc/hosts redirects from old dns's to the correct ips: [NEW_IP] ip-[OLD_IP].eu-west-1.compute.internal [NEW_IP] ip-[OLD_IP] 2011/10/7 Raimon Bosch raimon.bo...@gmail.com in the internal dns's sorry... 2011/10/7 Raimon Bosch raimon.bo...@gmail.com My list of dfs.hosts was correct in all the servers. In this case I had a problem with the internal DNS from amazon. I had to restart all my nodes to getting rid of this problem. After some changes on my cluster (renaming nodes), some nodes had automatically changed his IP and I had to perform a restart to force a change also in the internal ip's. 2011/10/7 Eric Fiala e...@fiala.ca Raimon - the error org.apache.hadoop.hdfs.server.protocol.DisallowedDatanodeException: Datanode denied communication with namenode Usually indicates that the datanode that is trying to connect to the namenode is either: - listed in the file defined by dfs.hosts.exclude (explicitly excluded) - or - that dfs.hosts (explicitly included) is used and the node is not listed within that file Make sure the datanode is not listed in excludes, and if you are using dfs.hosts, add it to the includes, and run hadoop dfsadmin -refreshNodes You should not have to remove any data on local disc to solve this problem. HTH EF On Fri, Oct 7, 2011 at 4:47 AM, Raimon Bosch raimon.bo...@gmail.com wrote: Hi, I'm running a cluster on amazon and sometimes I'm getting this exception: 2011-10-07 10:36:28,014 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: org.apache.hadoop.ipc.RemoteException: org.apache.hadoop.hdfs.server.protocol.DisallowedDatanodeException: Datanode denied communication with namenode: ip-10-235-57-112.eu-west-1.compute.internal:50010 at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.registerDatanode(FSNamesystem.java:2042) at org.apache.hadoop.hdfs.server.namenode.NameNode.register(NameNode.java:687) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:508) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:959) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:955) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:953) at org.apache.hadoop.ipc.Client.call(Client.java:740) at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220) at $Proxy4.register(Unknown Source) at org.apache.hadoop.hdfs.server.datanode.DataNode.register(DataNode.java:531) at org.apache.hadoop.hdfs.server.datanode.DataNode.runDatanodeDaemon(DataNode.java:1208) at org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:1247) at org.apache.hadoop.hdfs.server.datanode.DataNode.main(DataNode.java:1368) Since I have this exception I'm not able to run any datanode. I have checked all the connections between the nodes and they are ok, I have tried also to format the namenode but the problem is still remaining. Shall I need to remove the information about the datanode? rm -rf ${HOME}/dfs-xvdh/dn I would prefer a solution that doesn't implies a format or erasing anything... Regards, Raimon Bosch. -- *Eric Fiala* *Fiala Consulting* T: 403.828.1117 E: e...@fiala.ca http://www.fiala.ca
FW: Error running org.apache.hadoop.examples.DBCountPageView
Hi, I am getting the following exception when trying to run the DBCountPageView example obtained from http://search-hadoop.com/c/Map/Reduce:/src/examples/org/apache/hadoop/examples/DBCountPageView.java. I am using a PostgreSQL database. Any help would be greatly appreciated! java.io.IOException: Type mismatch in key from map: expected org.apache.hadoop.io.Text, recieved org.apache.hadoop.io.LongWritable at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:871) at org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:574) at org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80) at org.apache.hadoop.mapreduce.Mapper.map(Mapper.java:124) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:647) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:323) at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:210) 11/10/06 08:48:37 INFO mapred.JobClient: map 0% reduce 0% 11/10/06 08:48:37 INFO mapred.JobClient: Job complete: job_local_0001 11/10/06 08:48:37 INFO mapred.JobClient: Counters: 0 11/10/06 08:48:37 INFO examples.DBCountPageView: totalPageview=60 11/10/06 08:48:37 INFO examples.DBCountPageView: sumPageview=0 java.lang.RuntimeException: Evaluation was not correct! at org.apache.hadoop.examples.DBCountPageView.run(DBCountPageView.java:439) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79) at org.apache.hadoop.examples.DBCountPageView.main(DBCountPageView.java:452)
Is it possible to run multiple MapReduce against the same HDFS?
I plan to deploy a HDFS cluster which will be shared by multiple MapReduce clusters. I wonder whether this is possible. Will it incur any conflicts among MapReduce (e.g. different MapReduce clusters try to use the same temp directory in HDFS)? If it is possible, how should the security parameters be set up (e.g. user identity, file permission)? Thanks, Gerald
Re: DFSClient: Could not complete file
Hi Chris, You may be hitting HDFS-2379. Can you grep your DN logs for the string BlockReport and see if you see any taking more than 3ms or so? -Todd On Fri, Oct 7, 2011 at 6:31 AM, Chris Curtin curtin.ch...@gmail.com wrote: Sorry to bring this back from the dead, but we're having the issues again. This is on a NEW cluster, using Cloudera 0.20.2-cdh3u0 (old was stock Apache 0.20.2). Nothing carried over from the old cluster except data in HDFS (copied from old cluster). Bigger/more machines, more RAM, faster disks etc. And it is back. Confirmed that all the disks setup for HDFS are 'deadline'. Runs fine for few days then hangs again with the 'Could not complete' error in the JobTracker log until we kill the cluster. 2011-09-09 08:04:32,429 INFO org.apache.hadoop.hdfs.DFSClient: Could not complete file /log/hadoop/tmp/flow_BYVMTA_family_BYVMTA_72751_8284775/_logs/history/10.120.55.2_1311201333949_job_201107201835_13900_deliv_flow_BYVMTA%2Bflow_BYVMTA*family_B%5B%284%2F5%29+...UNCED%27%2C+ retrying... Found HDFS-148 (https://issues.apache.org/jira/browse/HDFS-148) which looks like what could be happening to us. Anyone found a good workaround? Any other ideas? Also, does the HDFS system try to do 'du' on disks not assigned to it? The HDFS disks are separate from the root and OS disks. Those disks are NOT setup to be 'deadline'. Should that matter? Thanks, Chris On Tue, Mar 29, 2011 at 7:53 PM, Brian Bockelman bbock...@cse.unl.eduwrote: Hi Chris, One thing we've found helping in ext3 is examining your I/O scheduler. Make sure it's set to deadline, not CFQ. This will help prevent nodes from being overloaded; when du -sk is performed and the node is already overloaded, things quickly roll downhill. Brian On Mar 29, 2011, at 11:44 AM, Chris Curtin wrote: We are narrowing this down. The last few times it hung we found a 'du -sk' process for each our HDFS disks as the top users of CPU. They are also taking a really long time. Searching around I find one example of someone reporting a similar issue with du -sk, but they tied it to XFS. We are using Ext3. Anyone have any other ideas since it appears to be related to the 'du' not coming back? Note that running the command directly finishes in a few seconds. Thanks, Chris On Wed, Mar 16, 2011 at 9:41 AM, Chris Curtin curtin.ch...@gmail.com wrote: Caught something today I missed before: 11/03/16 09:32:49 INFO hdfs.DFSClient: Exception in createBlockOutputStream java.io.IOException: Bad connect ack with firstBadLink 10.120.41.105:50010 11/03/16 09:32:49 INFO hdfs.DFSClient: Abandoning block blk_-517003810449127046_10039793 11/03/16 09:32:49 INFO hdfs.DFSClient: Waiting to find target node: 10.120.41.103:50010 11/03/16 09:34:04 INFO hdfs.DFSClient: Exception in createBlockOutputStream java.net.SocketTimeoutException: 69000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/10.120.41.85:34323 remote=/10.120.41.105:50010] 11/03/16 09:34:04 INFO hdfs.DFSClient: Abandoning block blk_2153189599588075377_10039793 11/03/16 09:34:04 INFO hdfs.DFSClient: Waiting to find target node: 10.120.41.105:50010 11/03/16 09:34:55 INFO hdfs.DFSClient: Could not complete file /tmp/hadoop/mapred/system/job_201103160851_0014/job.jar retrying... On Wed, Mar 16, 2011 at 9:00 AM, Chris Curtin curtin.ch...@gmail.com wrote: Thanks. Spent a lot of time looking at logs and nothing on the reducers until they start complaining about 'could not complete'. Found this in the jobtracker log file: 2011-03-16 02:38:47,881 WARN org.apache.hadoop.hdfs.DFSClient: DFSOutputStream ResponseProcessor exception for block blk_3829493505250917008_9959810java.io.IOException: Bad response 1 for block blk_3829493505250917008_9959810 from datanode 10.120.41.103:50010 at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$ResponseProcessor.run(DFSClient.java:2454) 2011-03-16 02:38:47,881 WARN org.apache.hadoop.hdfs.DFSClient: Error Recovery for block blk_3829493505250917008_9959810 bad datanode[2] 10.120.41.103:50010 2011-03-16 02:38:47,881 WARN org.apache.hadoop.hdfs.DFSClient: Error Recovery for block blk_3829493505250917008_9959810 in pipeline 10.120.41.105:50010, 10.120.41.102:50010, 10.120.41.103:50010: bad datanode 10.120.41.103:50010 2011-03-16 02:38:53,133 INFO org.apache.hadoop.hdfs.DFSClient: Could not complete file /var/hadoop/tmp/2_20110316_pmta_pipe_2_20_50351_2503122/_logs/history/hadnn01.atlis1_1299879680612_job_201103111641_0312_deliv_2_20110316_pmta_pipe*2_20110316_%5B%281%2F3%29+...QUEUED_T retrying... Looking at the logs from the various times this happens, the 'from datanode' in the first message is any of the data nodes (roughly equal in # of times it fails), so I don't think it is one specific node having problems. Any other ideas?
Re: FW: Error running org.apache.hadoop.examples.DBCountPageView
Hi Clovis From the exception, it is clearly due to a type mismatch in the Key Value flow between mapper,combiner and reducer. The reducer/combiner is expecting a key from the mapper of type Text,but instead it is receiving a Key of the type LongWritable. I didn't get a chance to debug the whole code, but the code within the url you pasted uses the new map reduce API but the trunk on hadoop-0.20.203 and hadoop-0.20.204 uses the old API. I tried running both and it did work very well for me. Try out this link for the same sample I ran for hadoop-0.20.203 http://www.javasourcecode.org/html/open-source/hadoop/hadoop-0.20.203.0/org/apache/hadoop/examples/DBCountPageView.java.html Hope it helps!... On Fri, Oct 7, 2011 at 9:08 PM, Ta, Le (Clovis) ta...@ne.bah.com wrote: Hi, I am getting the following exception when trying to run the DBCountPageView example obtained from http://search-hadoop.com/c/Map/Reduce:/src/examples/org/apache/hadoop/examples/DBCountPageView.java. I am using a PostgreSQL database. Any help would be greatly appreciated! java.io.IOException: Type mismatch in key from map: expected org.apache.hadoop.io.Text, recieved org.apache.hadoop.io.LongWritable at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:871) at org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:574) at org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80) at org.apache.hadoop.mapreduce.Mapper.map(Mapper.java:124) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:647) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:323) at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:210) 11/10/06 08:48:37 INFO mapred.JobClient: map 0% reduce 0% 11/10/06 08:48:37 INFO mapred.JobClient: Job complete: job_local_0001 11/10/06 08:48:37 INFO mapred.JobClient: Counters: 0 11/10/06 08:48:37 INFO examples.DBCountPageView: totalPageview=60 11/10/06 08:48:37 INFO examples.DBCountPageView: sumPageview=0 java.lang.RuntimeException: Evaluation was not correct! at org.apache.hadoop.examples.DBCountPageView.run(DBCountPageView.java:439) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79) at org.apache.hadoop.examples.DBCountPageView.main(DBCountPageView.java:452)
Any suggestion on why I cannot talk to hdfs over a VPN
I have code to talk to a remote cluster where host = myhost: and port = 9000 String connectString = hdfs:// + host + : + port + /; try { Configuration config = new Configuration(); config.set(fs.default.name,connectString); m_DFS = FileSystem.get(config); } catch (IOException e) { throw new RuntimeException(Failed to connect on + connectString + because + e.getMessage() + exception of class + e.getClass(),e); } The code runs properly if I run at work but at home over the VPN it times out IT assures me that no ports are blocked on the VPN in the browser http://myhost:50075/browseDirectory.jsp?dir=/http://glados2:50075/browseDirectory.jsp?dir=/ shows the hdfs file system when I an connected on the VPN I am on Windows 7 at home and at work and turning off the firewall does not help Any Bright Ideas -- Steven M. Lewis PhD 4221 105th Ave NE Kirkland, WA 98033 206-384-1340 (cell) Skype lordjoe_com
Re: DFSClient: Could not complete file
hi Todd, Thanks for the reply. Yes I'm seeing 30,000 ms a couple of times a day, though it looks like 4000 ms is average. Also see 150,000+ and lots of 50,000. Is there anything I can do about this? The bug is still open in JIRA. Thanks, Chris On Fri, Oct 7, 2011 at 2:15 PM, Todd Lipcon t...@cloudera.com wrote: Hi Chris, You may be hitting HDFS-2379. Can you grep your DN logs for the string BlockReport and see if you see any taking more than 3ms or so? -Todd On Fri, Oct 7, 2011 at 6:31 AM, Chris Curtin curtin.ch...@gmail.com wrote: Sorry to bring this back from the dead, but we're having the issues again. This is on a NEW cluster, using Cloudera 0.20.2-cdh3u0 (old was stock Apache 0.20.2). Nothing carried over from the old cluster except data in HDFS (copied from old cluster). Bigger/more machines, more RAM, faster disks etc. And it is back. Confirmed that all the disks setup for HDFS are 'deadline'. Runs fine for few days then hangs again with the 'Could not complete' error in the JobTracker log until we kill the cluster. 2011-09-09 08:04:32,429 INFO org.apache.hadoop.hdfs.DFSClient: Could not complete file /log/hadoop/tmp/flow_BYVMTA_family_BYVMTA_72751_8284775/_logs/history/10.120.55.2_1311201333949_job_201107201835_13900_deliv_flow_BYVMTA%2Bflow_BYVMTA*family_B%5B%284%2F5%29+...UNCED%27%2C+ retrying... Found HDFS-148 (https://issues.apache.org/jira/browse/HDFS-148) which looks like what could be happening to us. Anyone found a good workaround? Any other ideas? Also, does the HDFS system try to do 'du' on disks not assigned to it? The HDFS disks are separate from the root and OS disks. Those disks are NOT setup to be 'deadline'. Should that matter? Thanks, Chris
What should be in the hosts file on a hadoop cluster?
In trouble shooting some issues on our hadoop cluster on EC2, I keep getting pointed back to properly configuring the /etc/hosts file. But the problem is I've found about 5 different conflicting articles about how to config the hosts file. So I'm hoping to get a definitive answer to how the hosts file should be configured for a hadoop cluster on EC2. The first conflicting piece of info is if 127.0.0.1 should be in the hosts file and if so how its configured. Some people say comment it out, some people say it has to be there, some people say it has to be there, but put localhost.localdomain on the line. So the four possibilities I've seen are: #127.0.0.1 localhost 127.0.0.1 localhost 127.0.0.1 localhost localhost.localdomain 127.0.0.1 localhost.localdomain localhost The next thing are the dns names of the machine(s) in the hadoop cluster. It seems like everyone is constantly saying always use the dns name, not the ip address when configuring hadoop. Though some people say to use the public dns and others say use the private dns. Either one gets resolved to the private ip address, but does it really matter which is used? Next, do you put the dns in the host file? I've seen recommendations that say you put the (public/private) dns of the local machine in the hosts file. I've also seen recommendations that say put all dns names for all machines in your hadoop cluster. So it seems like there is a big pile of confusion on the iterweb. Could someone set me straight as to what my hosts file on an EC2 deployed hadoop cluster should contain? Thanks, John C
Search over the index created by hadoop contrib/index??
Hello ! I'm build index by using hadoop contrib/index, the index is not distributed , I wanna search over this index,How should I do ? -- View this message in context: http://lucene.472066.n3.nabble.com/Search-over-the-index-created-by-hadoop-contrib-index-tp3404585p3404585.html Sent from the Hadoop lucene-users mailing list archive at Nabble.com.
Simple Hadoop program build with Maven
Hi all, I am migrating from ant builds to maven. So, brand new to Maven and do not yet understand many parts of it. Problem: I have a perfectly working map-reduce program (working by ant build). This program needs an external jar file (json-rpc-1.0.jar). So, when I run the program, I do the following to get a nice output: $ hadoop jar jar/myHadoopProgram.jar -libjars ../lib/json-rpc-1.0.jar /usr/PD/input/sample22.json /usr/PD/output/ (note that I include the external jar file by the -libjars option as mentioned in the Hadoop: The Definitive Guide 2nd Edition - page 253). Everything is fine with my ant build. So, now, I move on to Maven. I had some trouble getting my pom.xml right. I am still unsure if it is right, but, it builds successfully (the resulting jar file has the class files of my program). The essential part of my pom.xml has the two following dependencies (a complete pom.xml is at the end of this email). !-- org.json.* -- dependency groupIdcom.metaparadigm/groupId artifactIdjson-rpc/artifactId version1.0/version /dependency !-- org.apache.hadoop.* -- dependency groupIdorg.apache.hadoop/groupId artifactIdhadoop-core/artifactId version0.20.2/version scopeprovided/scope /dependency I try to run it like this: $ hadoop jar ../myHadoopProgram.jar -libjars ../json-rpc-1.0.jar com.ABC.MyHadoopProgram /usr/PD/input/sample22.json /usr/PD/output Exception in thread main java.lang.ClassNotFoundException: -libjars at java.net.URLClassLoader$1.run(URLClassLoader.java:202) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:190) at java.lang.ClassLoader.loadClass(ClassLoader.java:306) at java.lang.ClassLoader.loadClass(ClassLoader.java:247) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:247) at org.apache.hadoop.util.RunJar.main(RunJar.java:179) $ Then, I thought, maybe it is not necessary to include the classpath. So, I ran with the following command: $ hadoop jar ../myHadoopProgram.jar -libjars ../json-rpc-1.0.jar /usr/PD/input/sample22.json /usr/PD/output Exception in thread main java.lang.ClassNotFoundException: -libjars at java.net.URLClassLoader$1.run(URLClassLoader.java:202) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:190) at java.lang.ClassLoader.loadClass(ClassLoader.java:306) at java.lang.ClassLoader.loadClass(ClassLoader.java:247) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:247) at org.apache.hadoop.util.RunJar.main(RunJar.java:179) $ Question: What am I doing wrong? I know, since I am new to Maven, I may be missing some key pieces/concepts. What really happens when one builds the classes, where my java program imports org.json.JSONArray and org.json.JSONObject? This import is just for compilation I suppose and it does not get embedded into the final jar. Am I right? I want to either bundle-up the external jar(s) into a single jar and conveniently run hadoop using that, or, know how to include the external jars in my command-line. This is what I have: - maven 3.0.3 - Mac OSX - Java 1.6.0_26 - Hadoop - CDH 0.20.2-cdh3u0 I have Googled, looked at Tom White's github repo ( https://github.com/cloudera/repository-example/blob/master/pom.xml). The more I Google, the more confused I get. Any help is highly appreciated. Thanks, PD. project xmlns=http://maven.apache.org/POM/4.0.0; xmlns:xsi= http://www.w3.org/2001/XMLSchema-instance; xsi:schemaLocation=http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd; modelVersion4.0.0/modelVersion groupIdcom.ABC/groupId artifactIdMyHadoopProgram/artifactId version1.0/version packagingjar/packaging nameMyHadoopProgram/name urlhttp://maven.apache.org/url properties project.build.sourceEncodingUTF-8/project.build.sourceEncoding /properties dependencies !-- org.json.* -- dependency groupIdcom.metaparadigm/groupId artifactIdjson-rpc/artifactId version1.0/version /dependency !-- org.apache.hadoop.* -- dependency groupIdorg.apache.hadoop/groupId artifactIdhadoop-core/artifactId version0.20.2/version scopeprovided/scope /dependency /dependencies /project