date:20111007

How to solve a DisallowedDatanodeException?

2011-10-07 Thread Raimon Bosch

Hi,

I'm running a cluster on amazon and sometimes I'm getting this exception:

2011-10-07 10:36:28,014 ERROR
org.apache.hadoop.hdfs.server.datanode.DataNode:
org.apache.hadoop.ipc.RemoteException:
org.apache.hadoop.hdfs.server.protocol.DisallowedDatanodeException: Datanode
denied communication with namenode:
ip-10-235-57-112.eu-west-1.compute.internal:50010
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.registerDatanode(FSNamesystem.java:2042)
at
org.apache.hadoop.hdfs.server.namenode.NameNode.register(NameNode.java:687)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:508)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:959)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:955)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:953)

at org.apache.hadoop.ipc.Client.call(Client.java:740)
at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220)
at $Proxy4.register(Unknown Source)
at
org.apache.hadoop.hdfs.server.datanode.DataNode.register(DataNode.java:531)
at
org.apache.hadoop.hdfs.server.datanode.DataNode.runDatanodeDaemon(DataNode.java:1208)
at
org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:1247)
at
org.apache.hadoop.hdfs.server.datanode.DataNode.main(DataNode.java:1368)

Since I have this exception I'm not able to run any datanode. I have checked
all the connections between the nodes and they are ok, I have tried also to
format the namenode but the problem is still remaining.

Shall I need to remove the information about the datanode? rm -rf
${HOME}/dfs-xvdh/dn

I would prefer a solution that doesn't implies a format or erasing
anything...


Regards,
Raimon Bosch.

Re: Question regarding hdfs synchronously / asynchronously block replication

2011-10-07 Thread Ronen Itkin

Thanks a lot!!

On Wed, Oct 5, 2011 at 3:51 PM, Eric Fiala e...@fiala.ca wrote:

 Ronen,
 On file write HDFS's block replication pipeline is asynchronous - datanode
 1
 gets a block before passing it onto datanode 2, and so on (limiting network
 traffic between client node and the data nodes - it only writes to one).

 The ACK for a packet is returned only once all datanodes in the pipeline
 have copied the block.

 However, if a failure occurs in the interim on a datanode in the write
 pipeline, AND the minimum replication threshold has been met (normally 1) -
 namenode will, in seperate operation, quell the replica deficit.

 Don't think that's configurable, however, it would be interesting use case
 for speeding up writes, while trading off some reliability.

 EF

 On Wed, Oct 5, 2011 at 1:53 AM, Ronen Itkin ro...@taykey.com wrote:

  Hi all!
 
  My question is regarding hdfs block replication.
  From the perspective of client, does the application receives an ACK for
 a
  certain packet after it was written on the first
  hadoop data node in the pipeline? or after the packet is *replicated* to
  all
  assigned *replication* nodes?
 
  More generaly, does Hadoop's HDFS block replication works synchronously
 or
  asynchronously?
 
  synchronously -- more replications =  decrease in write performances
  (client has to wait until every packet will be written to all replication
  nodes before he receives an ACK).
  asynchronously -- more replication has no influence on write performance
  (client recieves an ACK packet after the first write to the first
 datadone
  finishes, hdfs will complete its replication on his free time).
 
  synchronously / asynchronously block replication - is it something
  configurable ? If it is, than how can I do it?
 
  Thanks!
 
  --
  *
  Ronen Itkin*
  Taykey | www.taykey.com
 




-- 
*
Ronen Itkin*
Taykey | www.taykey.com

Re: How to solve a DisallowedDatanodeException?

2011-10-07 Thread Eric Fiala

Raimon - the error
org.apache.hadoop.hdfs.server.protocol.DisallowedDatanodeException:
Datanode denied communication with namenode

Usually indicates that the datanode that is trying to connect to the
namenode is either:

   - listed in the file defined by dfs.hosts.exclude (explicitly excluded) -
   or
   - that dfs.hosts (explicitly included) is used and the node is not listed
   within that file

Make sure the datanode is not listed in excludes, and if you are using
dfs.hosts, add it to the includes, and run hadoop dfsadmin -refreshNodes

You should not have to remove any data on local disc to solve this problem.

HTH

EF

On Fri, Oct 7, 2011 at 4:47 AM, Raimon Bosch raimon.bo...@gmail.com wrote:

 Hi,

 I'm running a cluster on amazon and sometimes I'm getting this exception:

 2011-10-07 10:36:28,014 ERROR
 org.apache.hadoop.hdfs.server.datanode.DataNode:
 org.apache.hadoop.ipc.RemoteException:
 org.apache.hadoop.hdfs.server.protocol.DisallowedDatanodeException:
 Datanode
 denied communication with namenode:
 ip-10-235-57-112.eu-west-1.compute.internal:50010
at

 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.registerDatanode(FSNamesystem.java:2042)
at
 org.apache.hadoop.hdfs.server.namenode.NameNode.register(NameNode.java:687)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at

 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at

 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:508)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:959)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:955)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:953)

at org.apache.hadoop.ipc.Client.call(Client.java:740)
at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220)
at $Proxy4.register(Unknown Source)
at
 org.apache.hadoop.hdfs.server.datanode.DataNode.register(DataNode.java:531)
at

 org.apache.hadoop.hdfs.server.datanode.DataNode.runDatanodeDaemon(DataNode.java:1208)
at

 org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:1247)
at
 org.apache.hadoop.hdfs.server.datanode.DataNode.main(DataNode.java:1368)

 Since I have this exception I'm not able to run any datanode. I have
 checked
 all the connections between the nodes and they are ok, I have tried also to
 format the namenode but the problem is still remaining.

 Shall I need to remove the information about the datanode? rm -rf
 ${HOME}/dfs-xvdh/dn

 I would prefer a solution that doesn't implies a format or erasing
 anything...


 Regards,
 Raimon Bosch.




-- 
*Eric Fiala*
*Fiala Consulting*
T: 403.828.1117
E: e...@fiala.ca
http://www.fiala.ca

Re: DFSClient: Could not complete file

2011-10-07 Thread Chris Curtin

Sorry to bring this back from the dead, but we're having the issues again.

This is on a NEW cluster, using Cloudera 0.20.2-cdh3u0 (old was stock Apache
0.20.2). Nothing carried over from the old cluster except data in HDFS
(copied from old cluster). Bigger/more machines, more RAM, faster disks etc.
And it is back.

Confirmed that all the disks setup for HDFS are 'deadline'.

Runs fine for few days then hangs again with the 'Could not complete' error
in the JobTracker log until we kill the cluster.

2011-09-09 08:04:32,429 INFO org.apache.hadoop.hdfs.DFSClient: Could not
complete file
/log/hadoop/tmp/flow_BYVMTA_family_BYVMTA_72751_8284775/_logs/history/10.120.55.2_1311201333949_job_201107201835_13900_deliv_flow_BYVMTA%2Bflow_BYVMTA*family_B%5B%284%2F5%29+...UNCED%27%2C+
retrying...

Found HDFS-148 (https://issues.apache.org/jira/browse/HDFS-148) which looks
like what could be happening to us. Anyone found a good workaround?
Any other ideas?

Also, does the HDFS system try to do 'du' on disks not assigned to it? The
HDFS disks are separate from the root and OS disks. Those disks are NOT
setup to be 'deadline'. Should that matter?

Thanks,

Chris

On Tue, Mar 29, 2011 at 7:53 PM, Brian Bockelman bbock...@cse.unl.eduwrote:

Hi Chris,

One thing we've found helping in ext3 is examining your I/O scheduler.
Make sure it's set to deadline, not CFQ. This will help prevent nodes
from being overloaded; when du -sk is performed and the node is already
overloaded, things quickly roll downhill.

Brian

On Mar 29, 2011, at 11:44 AM, Chris Curtin wrote:

We are narrowing this down. The last few times it hung we found a 'du
-sk'
process for each our HDFS disks as the top users of CPU. They are also
taking a really long time.

Searching around I find one example of someone reporting a similar issue
with du -sk, but they tied it to XFS. We are using Ext3.

Anyone have any other ideas since it appears to be related to the 'du'
not
coming back? Note that running the command directly finishes in a few
seconds.

Thanks,

Chris

On Wed, Mar 16, 2011 at 9:41 AM, Chris Curtin curtin.ch...@gmail.com
wrote:

Caught something today I missed before:

11/03/16 09:32:49 INFO hdfs.DFSClient: Exception in
createBlockOutputStream
java.io.IOException: Bad connect ack with firstBadLink
10.120.41.105:50010
11/03/16 09:32:49 INFO hdfs.DFSClient: Abandoning block
blk_-517003810449127046_10039793
11/03/16 09:32:49 INFO hdfs.DFSClient: Waiting to find target node:
10.120.41.103:50010
11/03/16 09:34:04 INFO hdfs.DFSClient: Exception in
createBlockOutputStream
java.net.SocketTimeoutException: 69000 millis timeout while waiting for
channel to be ready for read. ch :
java.nio.channels.SocketChannel[connected
local=/10.120.41.85:34323 remote=/10.120.41.105:50010]
11/03/16 09:34:04 INFO hdfs.DFSClient: Abandoning block
blk_2153189599588075377_10039793
11/03/16 09:34:04 INFO hdfs.DFSClient: Waiting to find target node:
10.120.41.105:50010
11/03/16 09:34:55 INFO hdfs.DFSClient: Could not complete file
/tmp/hadoop/mapred/system/job_201103160851_0014/job.jar retrying...

On Wed, Mar 16, 2011 at 9:00 AM, Chris Curtin curtin.ch...@gmail.com
wrote:

Thanks. Spent a lot of time looking at logs and nothing on the reducers
until they start complaining about 'could not complete'.

Found this in the jobtracker log file:

2011-03-16 02:38:47,881 WARN org.apache.hadoop.hdfs.DFSClient:
DFSOutputStream ResponseProcessor exception for block
blk_3829493505250917008_9959810java.io.IOException: Bad response 1 for
block
blk_3829493505250917008_9959810 from datanode 10.120.41.103:50010
at

org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$ResponseProcessor.run(DFSClient.java:2454)
2011-03-16 02:38:47,881 WARN org.apache.hadoop.hdfs.DFSClient: Error
Recovery for block blk_3829493505250917008_9959810 bad datanode[2]
10.120.41.103:50010
2011-03-16 02:38:47,881 WARN org.apache.hadoop.hdfs.DFSClient: Error
Recovery for block blk_3829493505250917008_9959810 in pipeline
10.120.41.105:50010, 10.120.41.102:50010, 10.120.41.103:50010: bad
datanode 10.120.41.103:50010
2011-03-16 02:38:53,133 INFO org.apache.hadoop.hdfs.DFSClient: Could
not
complete file

/var/hadoop/tmp/2_20110316_pmta_pipe_2_20_50351_2503122/_logs/history/hadnn01.atlis1_1299879680612_job_201103111641_0312_deliv_2_20110316_pmta_pipe*2_20110316_%5B%281%2F3%29+...QUEUED_T
retrying...

Looking at the logs from the various times this happens, the 'from
datanode' in the first message is any of the data nodes (roughly equal
in #
of times it fails), so I don't think it is one specific node having
problems.
Any other ideas?

Thanks,

Chris
On Sun, Mar 13, 2011 at 3:45 AM, icebergs hkm...@gmail.com wrote:

You should check the bad reducers' logs carefully.There may be more
information about it.

2011/3/10 Chris Curtin curtin.ch...@gmail.com

Hi,

The

Re: How to solve a DisallowedDatanodeException?

2011-10-07 Thread Raimon Bosch

My list of dfs.hosts was correct in all the servers. In this case I had a
problem with the internal DNS from amazon. I had to restart all my nodes to
getting rid of this problem.

After some changes on my cluster (renaming nodes), some nodes had
automatically changed his IP and I had to perform a restart to force a
change also in the internal ip's.

2011/10/7 Eric Fiala e...@fiala.ca

 Raimon - the error
 org.apache.hadoop.hdfs.server.protocol.DisallowedDatanodeException:
 Datanode denied communication with namenode

 Usually indicates that the datanode that is trying to connect to the
 namenode is either:

   - listed in the file defined by dfs.hosts.exclude (explicitly excluded) -
   or
   - that dfs.hosts (explicitly included) is used and the node is not listed
   within that file

 Make sure the datanode is not listed in excludes, and if you are using
 dfs.hosts, add it to the includes, and run hadoop dfsadmin -refreshNodes

 You should not have to remove any data on local disc to solve this problem.

 HTH

 EF

 On Fri, Oct 7, 2011 at 4:47 AM, Raimon Bosch raimon.bo...@gmail.com
 wrote:

  Hi,
 
  I'm running a cluster on amazon and sometimes I'm getting this exception:
 
  2011-10-07 10:36:28,014 ERROR
  org.apache.hadoop.hdfs.server.datanode.DataNode:
  org.apache.hadoop.ipc.RemoteException:
  org.apache.hadoop.hdfs.server.protocol.DisallowedDatanodeException:
  Datanode
  denied communication with namenode:
  ip-10-235-57-112.eu-west-1.compute.internal:50010
 at
 
 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.registerDatanode(FSNamesystem.java:2042)
 at
 
 org.apache.hadoop.hdfs.server.namenode.NameNode.register(NameNode.java:687)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at
 
 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
 at
 
 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597)
 at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:508)
 at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:959)
 at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:955)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:396)
 at org.apache.hadoop.ipc.Server$Handler.run(Server.java:953)
 
 at org.apache.hadoop.ipc.Client.call(Client.java:740)
 at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220)
 at $Proxy4.register(Unknown Source)
 at
 
 org.apache.hadoop.hdfs.server.datanode.DataNode.register(DataNode.java:531)
 at
 
 
 org.apache.hadoop.hdfs.server.datanode.DataNode.runDatanodeDaemon(DataNode.java:1208)
 at
 
 
 org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:1247)
 at
  org.apache.hadoop.hdfs.server.datanode.DataNode.main(DataNode.java:1368)
 
  Since I have this exception I'm not able to run any datanode. I have
  checked
  all the connections between the nodes and they are ok, I have tried also
 to
  format the namenode but the problem is still remaining.
 
  Shall I need to remove the information about the datanode? rm -rf
  ${HOME}/dfs-xvdh/dn
 
  I would prefer a solution that doesn't implies a format or erasing
  anything...
 
 
  Regards,
  Raimon Bosch.
 



 --
 *Eric Fiala*
 *Fiala Consulting*
 T: 403.828.1117
 E: e...@fiala.ca
 http://www.fiala.ca

Re: How to solve a DisallowedDatanodeException?

2011-10-07 Thread Raimon Bosch

in the internal dns's sorry...

2011/10/7 Raimon Bosch raimon.bo...@gmail.com

 My list of dfs.hosts was correct in all the servers. In this case I had a
 problem with the internal DNS from amazon. I had to restart all my nodes to
 getting rid of this problem.

 After some changes on my cluster (renaming nodes), some nodes had
 automatically changed his IP and I had to perform a restart to force a
 change also in the internal ip's.


 2011/10/7 Eric Fiala e...@fiala.ca

 Raimon - the error
 org.apache.hadoop.hdfs.server.protocol.DisallowedDatanodeException:
 Datanode denied communication with namenode

 Usually indicates that the datanode that is trying to connect to the
 namenode is either:

   - listed in the file defined by dfs.hosts.exclude (explicitly excluded)
 -
   or
   - that dfs.hosts (explicitly included) is used and the node is not
 listed
   within that file

 Make sure the datanode is not listed in excludes, and if you are using
 dfs.hosts, add it to the includes, and run hadoop dfsadmin -refreshNodes

 You should not have to remove any data on local disc to solve this
 problem.

 HTH

 EF

 On Fri, Oct 7, 2011 at 4:47 AM, Raimon Bosch raimon.bo...@gmail.com
 wrote:

  Hi,
 
  I'm running a cluster on amazon and sometimes I'm getting this
 exception:
 
  2011-10-07 10:36:28,014 ERROR
  org.apache.hadoop.hdfs.server.datanode.DataNode:
  org.apache.hadoop.ipc.RemoteException:
  org.apache.hadoop.hdfs.server.protocol.DisallowedDatanodeException:
  Datanode
  denied communication with namenode:
  ip-10-235-57-112.eu-west-1.compute.internal:50010
 at
 
 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.registerDatanode(FSNamesystem.java:2042)
 at
 
 org.apache.hadoop.hdfs.server.namenode.NameNode.register(NameNode.java:687)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at
 
 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
 at
 
 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597)
 at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:508)
 at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:959)
 at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:955)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:396)
 at org.apache.hadoop.ipc.Server$Handler.run(Server.java:953)
 
 at org.apache.hadoop.ipc.Client.call(Client.java:740)
 at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220)
 at $Proxy4.register(Unknown Source)
 at
 
 org.apache.hadoop.hdfs.server.datanode.DataNode.register(DataNode.java:531)
 at
 
 
 org.apache.hadoop.hdfs.server.datanode.DataNode.runDatanodeDaemon(DataNode.java:1208)
 at
 
 
 org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:1247)
 at
  org.apache.hadoop.hdfs.server.datanode.DataNode.main(DataNode.java:1368)
 
  Since I have this exception I'm not able to run any datanode. I have
  checked
  all the connections between the nodes and they are ok, I have tried also
 to
  format the namenode but the problem is still remaining.
 
  Shall I need to remove the information about the datanode? rm -rf
  ${HOME}/dfs-xvdh/dn
 
  I would prefer a solution that doesn't implies a format or erasing
  anything...
 
 
  Regards,
  Raimon Bosch.
 



 --
 *Eric Fiala*
 *Fiala Consulting*
 T: 403.828.1117
 E: e...@fiala.ca
 http://www.fiala.ca

Using native libraries with hadoop

2011-10-07 Thread Vyacheslav Zholudev

Hi,

I intend to use a native library with Java bindings in the hadoop job. The 
problem is that Java bindings expect a file path as a parameter, and this file 
should reside on a local file system. What is the best way to solve this 
problem? (I wouldn't not want to modify the code of the native library to 
accept e.g. a file as a string).

Thanks a lot in advance,
Vyacheslav

Re: Adjusting column value size.

2011-10-07 Thread edward choi

Yes, I need all of those ints at the same time. And no, there is no
streaming.

I have decided to pack 1024 ints into one cell so that each cell would be of
size 4kb.
I am already using LZO on my tables.

I'll do some experiments once I finish implementing both approach.
I'll add a thread about the results when I am done.
Thanks for the advice.

Ed.

2011/10/7 Jean-Daniel Cryans jdcry...@apache.org

 (BCC'd common-user@ since this seems strictly HBase related)

 Interesting question... And you probably need all those ints at the same
 time right? No streaming? I'll assume no.

 So the second solution seems better due to the overhead of storing each
 cell. Basically, storing one int per cell you would end up storing more
 keys
 than values (size wise).

 Another thing is that if you pack enough ints together and there's some
 sort
 of repetition, you might be able to use LZO compression on that table.

 I'd love to hear about your experimentations once you've done them.

 J-D

 On Mon, Oct 3, 2011 at 10:58 PM, edward choi mp2...@gmail.com wrote:

  Hi,
 
  I have a question regarding the performance and column value size.
  I need to store per row several million integers. (Several million is
  important here)
  I was wondering which method would be more beneficial performance wise.
 
  1) Store each integer to a single column so that when a row is called,
  several million columns will also be called. And the user would map each
  column values to some kind of container (ex: vector, arrayList)
  2) Store, for example, a thousand integers into a single column (by
  concatenating them) so that when a row is called, only several thousand
  columns will be called along. The user would have to split the column
 value
  into 4 bytes and map the split integer to some kind of container (ex:
  vector, arrayList)
 
  I am curious which approach would be better. 1) would call several
 millions
  of columns but no additional process is needed. 2) would call only
 several
  thousands of columns but additional process is needed.
  Any advice would be appreciated.
 
  Ed

hadoop knowledge gaining

2011-10-07 Thread Jignesh Patel

Guys,
I am able to deploy the first program word count using hadoop. I am interesting 
exploring more about hadoop and Hbase and don't know which is the best way to 
grasp both of them.

I have hadoop in action but it has older api. I do also have Hbase definitive 
guide which I have not started exploring.

-Jignesh

Re: How to solve a DisallowedDatanodeException?

2011-10-07 Thread Raimon Bosch

Definetly it was an amazon problem. They were assigning a new internal ip
but some of the nodes were using the old one. I had to force on all my
/etc/hosts redirects from old dns's to the correct ips:

[NEW_IP]  ip-[OLD_IP].eu-west-1.compute.internal
[NEW_IP]  ip-[OLD_IP]

2011/10/7 Raimon Bosch raimon.bo...@gmail.com

 in the internal dns's sorry...


 2011/10/7 Raimon Bosch raimon.bo...@gmail.com

 My list of dfs.hosts was correct in all the servers. In this case I had a
 problem with the internal DNS from amazon. I had to restart all my nodes to
 getting rid of this problem.

 After some changes on my cluster (renaming nodes), some nodes had
 automatically changed his IP and I had to perform a restart to force a
 change also in the internal ip's.


 2011/10/7 Eric Fiala e...@fiala.ca

 Raimon - the error
 org.apache.hadoop.hdfs.server.protocol.DisallowedDatanodeException:
 Datanode denied communication with namenode

 Usually indicates that the datanode that is trying to connect to the
 namenode is either:

   - listed in the file defined by dfs.hosts.exclude (explicitly excluded)
 -
   or
   - that dfs.hosts (explicitly included) is used and the node is not
 listed
   within that file

 Make sure the datanode is not listed in excludes, and if you are using
 dfs.hosts, add it to the includes, and run hadoop dfsadmin -refreshNodes

 You should not have to remove any data on local disc to solve this
 problem.

 HTH

 EF

 On Fri, Oct 7, 2011 at 4:47 AM, Raimon Bosch raimon.bo...@gmail.com
 wrote:

  Hi,
 
  I'm running a cluster on amazon and sometimes I'm getting this
 exception:
 
  2011-10-07 10:36:28,014 ERROR
  org.apache.hadoop.hdfs.server.datanode.DataNode:
  org.apache.hadoop.ipc.RemoteException:
  org.apache.hadoop.hdfs.server.protocol.DisallowedDatanodeException:
  Datanode
  denied communication with namenode:
  ip-10-235-57-112.eu-west-1.compute.internal:50010
 at
 
 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.registerDatanode(FSNamesystem.java:2042)
 at
 
 org.apache.hadoop.hdfs.server.namenode.NameNode.register(NameNode.java:687)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at
 
 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
 at
 
 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597)
 at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:508)
 at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:959)
 at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:955)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:396)
 at org.apache.hadoop.ipc.Server$Handler.run(Server.java:953)
 
 at org.apache.hadoop.ipc.Client.call(Client.java:740)
 at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220)
 at $Proxy4.register(Unknown Source)
 at
 
 org.apache.hadoop.hdfs.server.datanode.DataNode.register(DataNode.java:531)
 at
 
 
 org.apache.hadoop.hdfs.server.datanode.DataNode.runDatanodeDaemon(DataNode.java:1208)
 at
 
 
 org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:1247)
 at
 
 org.apache.hadoop.hdfs.server.datanode.DataNode.main(DataNode.java:1368)
 
  Since I have this exception I'm not able to run any datanode. I have
  checked
  all the connections between the nodes and they are ok, I have tried
 also to
  format the namenode but the problem is still remaining.
 
  Shall I need to remove the information about the datanode? rm -rf
  ${HOME}/dfs-xvdh/dn
 
  I would prefer a solution that doesn't implies a format or erasing
  anything...
 
 
  Regards,
  Raimon Bosch.
 



 --
 *Eric Fiala*
 *Fiala Consulting*
 T: 403.828.1117
 E: e...@fiala.ca
 http://www.fiala.ca

FW: Error running org.apache.hadoop.examples.DBCountPageView

2011-10-07 Thread Ta, Le (Clovis)

Hi,

I am getting the following exception when trying to run the DBCountPageView 
example obtained from 
http://search-hadoop.com/c/Map/Reduce:/src/examples/org/apache/hadoop/examples/DBCountPageView.java.
 I am using a PostgreSQL database. Any help would be greatly appreciated!

java.io.IOException: Type mismatch in key from map: expected 
org.apache.hadoop.io.Text, recieved org.apache.hadoop.io.LongWritable
at 
org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:871)
at 
org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:574)
at 
org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
at org.apache.hadoop.mapreduce.Mapper.map(Mapper.java:124)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:647)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:323)
at 
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:210)
11/10/06 08:48:37 INFO mapred.JobClient:  map 0% reduce 0%
11/10/06 08:48:37 INFO mapred.JobClient: Job complete: job_local_0001
11/10/06 08:48:37 INFO mapred.JobClient: Counters: 0
11/10/06 08:48:37 INFO examples.DBCountPageView: totalPageview=60
11/10/06 08:48:37 INFO examples.DBCountPageView: sumPageview=0
java.lang.RuntimeException: Evaluation was not correct!
at 
org.apache.hadoop.examples.DBCountPageView.run(DBCountPageView.java:439)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
at 
org.apache.hadoop.examples.DBCountPageView.main(DBCountPageView.java:452)

Is it possible to run multiple MapReduce against the same HDFS?

2011-10-07 Thread Zhenhua (Gerald) Guo

I plan to deploy a HDFS cluster which will be shared by multiple
MapReduce clusters.
I wonder whether this is possible.  Will it incur any conflicts among
MapReduce (e.g. different MapReduce clusters try to use the same temp
directory in HDFS)?
If it is possible, how should the security parameters be set up (e.g.
user identity, file permission)?

Thanks,

Gerald

Re: DFSClient: Could not complete file

2011-10-07 Thread Todd Lipcon

Hi Chris,

You may be hitting HDFS-2379.

Can you grep your DN logs for the string BlockReport and see if you
see any taking more than 3ms or so?

-Todd

On Fri, Oct 7, 2011 at 6:31 AM, Chris Curtin curtin.ch...@gmail.com wrote:
Sorry to bring this back from the dead, but we're having the issues again.

Confirmed that all the disks setup for HDFS are 'deadline'.

Runs fine for few days then hangs again with the 'Could not complete' error
in the JobTracker log until we kill the cluster.

Found HDFS-148 (https://issues.apache.org/jira/browse/HDFS-148) which looks
like what could be happening to us. Anyone found a good workaround?
Any other ideas?

Also, does the HDFS system try to do 'du' on disks not assigned to it? The
HDFS disks are separate from the root and OS disks. Those disks are NOT
setup to be 'deadline'. Should that matter?

Thanks,

Chris

On Tue, Mar 29, 2011 at 7:53 PM, Brian Bockelman bbock...@cse.unl.eduwrote:

Hi Chris,

Brian

On Mar 29, 2011, at 11:44 AM, Chris Curtin wrote:

We are narrowing this down. The last few times it hung we found a 'du
-sk'
process for each our HDFS disks as the top users of CPU. They are also
taking a really long time.

Searching around I find one example of someone reporting a similar issue
with du -sk, but they tied it to XFS. We are using Ext3.

Anyone have any other ideas since it appears to be related to the 'du'
not
coming back? Note that running the command directly finishes in a few
seconds.

Thanks,

Chris

On Wed, Mar 16, 2011 at 9:41 AM, Chris Curtin curtin.ch...@gmail.com
wrote:

Caught something today I missed before:

On Wed, Mar 16, 2011 at 9:00 AM, Chris Curtin curtin.ch...@gmail.com
wrote:

Thanks. Spent a lot of time looking at logs and nothing on the reducers
until they start complaining about 'could not complete'.

Found this in the jobtracker log file:

/var/hadoop/tmp/2_20110316_pmta_pipe_2_20_50351_2503122/_logs/history/hadnn01.atlis1_1299879680612_job_201103111641_0312_deliv_2_20110316_pmta_pipe*2_20110316_%5B%281%2F3%29+...QUEUED_T
retrying...

Re: FW: Error running org.apache.hadoop.examples.DBCountPageView

2011-10-07 Thread Bejoy KS

Hi Clovis
  From the exception, it is clearly due to a type mismatch in the
Key Value flow between mapper,combiner and reducer. The reducer/combiner is
expecting a key from the mapper of type Text,but instead it is receiving a
Key of the type LongWritable. I didn't get a chance to debug the whole code,
but the code within the url you pasted uses the new map reduce API but the
trunk on hadoop-0.20.203 and hadoop-0.20.204 uses the old API. I tried
running both and it did work very well for me.

Try out this link for the same sample I ran for hadoop-0.20.203
http://www.javasourcecode.org/html/open-source/hadoop/hadoop-0.20.203.0/org/apache/hadoop/examples/DBCountPageView.java.html

Hope it helps!...


On Fri, Oct 7, 2011 at 9:08 PM, Ta, Le (Clovis) ta...@ne.bah.com wrote:

 Hi,

 I am getting the following exception when trying to run the DBCountPageView
 example obtained from
 http://search-hadoop.com/c/Map/Reduce:/src/examples/org/apache/hadoop/examples/DBCountPageView.java.
 I am using a PostgreSQL database. Any help would be greatly appreciated!

 java.io.IOException: Type mismatch in key from map: expected
 org.apache.hadoop.io.Text, recieved org.apache.hadoop.io.LongWritable
at
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:871)
at
 org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:574)
at
 org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
at org.apache.hadoop.mapreduce.Mapper.map(Mapper.java:124)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
at
 org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:647)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:323)
at
 org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:210)
 11/10/06 08:48:37 INFO mapred.JobClient:  map 0% reduce 0%
 11/10/06 08:48:37 INFO mapred.JobClient: Job complete: job_local_0001
 11/10/06 08:48:37 INFO mapred.JobClient: Counters: 0
 11/10/06 08:48:37 INFO examples.DBCountPageView: totalPageview=60
 11/10/06 08:48:37 INFO examples.DBCountPageView: sumPageview=0
 java.lang.RuntimeException: Evaluation was not correct!
at
 org.apache.hadoop.examples.DBCountPageView.run(DBCountPageView.java:439)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
at
 org.apache.hadoop.examples.DBCountPageView.main(DBCountPageView.java:452)

Any suggestion on why I cannot talk to hdfs over a VPN

2011-10-07 Thread Steve Lewis

I have code to talk to a remote cluster where host = myhost: and port =
9000

   String connectString =   hdfs:// + host + : + port + /;
try {
Configuration config = new Configuration();

config.set(fs.default.name,connectString);

m_DFS = FileSystem.get(config);
}
catch (IOException e) {
throw new RuntimeException(Failed to connect on  +
connectString +  because  + e.getMessage() +
exception of class  + e.getClass(),e);
}

The code runs properly if I  run at work but at home over the VPN it times
out

IT assures me that no ports are blocked on the VPN

in the browser 
http://myhost:50075/browseDirectory.jsp?dir=/http://glados2:50075/browseDirectory.jsp?dir=/

shows the hdfs file system when I an connected on the VPN

I am on Windows 7 at home and at work and turning off the firewall does not
help

Any Bright Ideas


-- 
Steven M. Lewis PhD
4221 105th Ave NE
Kirkland, WA 98033
206-384-1340 (cell)
Skype lordjoe_com

Re: DFSClient: Could not complete file

2011-10-07 Thread Chris Curtin

hi Todd,

Thanks for the reply.

Yes I'm seeing 30,000 ms a couple of times a day, though it looks like
4000 ms is average. Also see 150,000+ and lots of 50,000.

Is there anything I can do about this? The bug is still open in JIRA.

Thanks,

Chris

On Fri, Oct 7, 2011 at 2:15 PM, Todd Lipcon t...@cloudera.com wrote:

Hi Chris,

You may be hitting HDFS-2379.

Can you grep your DN logs for the string BlockReport and see if you
see any taking more than 3ms or so?

-Todd

On Fri, Oct 7, 2011 at 6:31 AM, Chris Curtin curtin.ch...@gmail.com
wrote:
Sorry to bring this back from the dead, but we're having the issues
again.

This is on a NEW cluster, using Cloudera 0.20.2-cdh3u0 (old was stock
Apache
0.20.2). Nothing carried over from the old cluster except data in HDFS
(copied from old cluster). Bigger/more machines, more RAM, faster disks
etc.
And it is back.

Confirmed that all the disks setup for HDFS are 'deadline'.

Runs fine for few days then hangs again with the 'Could not complete'
error
in the JobTracker log until we kill the cluster.

2011-09-09 08:04:32,429 INFO org.apache.hadoop.hdfs.DFSClient: Could not
complete file

/log/hadoop/tmp/flow_BYVMTA_family_BYVMTA_72751_8284775/_logs/history/10.120.55.2_1311201333949_job_201107201835_13900_deliv_flow_BYVMTA%2Bflow_BYVMTA*family_B%5B%284%2F5%29+...UNCED%27%2C+
retrying...

Found HDFS-148 (https://issues.apache.org/jira/browse/HDFS-148) which
looks
like what could be happening to us. Anyone found a good workaround?
Any other ideas?

Also, does the HDFS system try to do 'du' on disks not assigned to it?
The
HDFS disks are separate from the root and OS disks. Those disks are NOT
setup to be 'deadline'. Should that matter?

Thanks,

Chris

What should be in the hosts file on a hadoop cluster?

2011-10-07 Thread John Conwell

In trouble shooting some issues on our hadoop cluster on EC2, I keep getting
pointed back to properly configuring the /etc/hosts file.  But the problem
is I've found about 5 different conflicting articles about how to config the
hosts file.  So I'm hoping to get a definitive answer to how the hosts file
should be configured for a hadoop cluster on EC2.

The first conflicting piece of info is if 127.0.0.1 should be in the hosts
file and if so how its configured.  Some people say comment it out, some
people say it has to be there, some people say it has to be there, but
put localhost.localdomain on the line.

So the four possibilities I've seen are:
#127.0.0.1  localhost
127.0.0.1  localhost
127.0.0.1  localhost localhost.localdomain
127.0.0.1  localhost.localdomain localhost

The next thing are the dns names of the machine(s) in the hadoop cluster.
 It seems like everyone is constantly saying always use the dns name, not
the ip address when configuring hadoop.  Though some people say to use the
public dns and others say use the private dns.  Either one gets resolved to
the private ip address, but does it really matter which is used?

Next, do you put the dns in the host file?  I've seen recommendations that
say you put the (public/private) dns of the local machine in the hosts file.
 I've also seen recommendations that say put all dns names for all machines
in your hadoop cluster.

So it seems like there is a big pile of confusion on the iterweb.  Could
someone set me straight as to what my hosts file on an EC2 deployed hadoop
cluster should contain?

Thanks,
John C

Search over the index created by hadoop contrib/index??

2011-10-07 Thread 27g

Hello !
I'm build index by using hadoop contrib/index, the index is not
distributed , I wanna search over this index,How should I do ?

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Search-over-the-index-created-by-hadoop-contrib-index-tp3404585p3404585.html
Sent from the Hadoop lucene-users mailing list archive at Nabble.com.

Simple Hadoop program build with Maven

2011-10-07 Thread Periya.Data

Hi all,
I am migrating from ant builds to maven. So, brand new to Maven and do
not yet understand many parts of it.

Problem: I have a perfectly working map-reduce program (working by ant
build). This program needs an external jar file (json-rpc-1.0.jar). So, when
I run the program, I do the following to get a nice output:

$ hadoop jar jar/myHadoopProgram.jar -libjars ../lib/json-rpc-1.0.jar
/usr/PD/input/sample22.json /usr/PD/output/

(note that I include the external jar file by the -libjars option as
mentioned in the Hadoop: The Definitive Guide 2nd Edition - page 253).
Everything is fine with my ant build.

So, now, I move on to Maven. I had some trouble getting my pom.xml right. I
am still unsure if it is right, but, it builds successfully (the resulting
jar file has the class files of my program).  The essential part of my
pom.xml has the two following dependencies (a complete pom.xml is at the end
of this email).

!-- org.json.* --
 dependency
   groupIdcom.metaparadigm/groupId
   artifactIdjson-rpc/artifactId
   version1.0/version
 /dependency

  !-- org.apache.hadoop.* --
 dependency
   groupIdorg.apache.hadoop/groupId
   artifactIdhadoop-core/artifactId
   version0.20.2/version
   scopeprovided/scope
 /dependency


I try to run it like this:

$ hadoop jar ../myHadoopProgram.jar -libjars ../json-rpc-1.0.jar
com.ABC.MyHadoopProgram /usr/PD/input/sample22.json /usr/PD/output
Exception in thread main java.lang.ClassNotFoundException: -libjars
at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:247)
at org.apache.hadoop.util.RunJar.main(RunJar.java:179)
$

Then, I thought, maybe it is not necessary to include the classpath. So, I
ran with the following command:

$ hadoop jar ../myHadoopProgram.jar -libjars ../json-rpc-1.0.jar
/usr/PD/input/sample22.json /usr/PD/output
Exception in thread main java.lang.ClassNotFoundException: -libjars
at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:247)
at org.apache.hadoop.util.RunJar.main(RunJar.java:179)
$

Question: What am I doing wrong? I know, since I am new to Maven, I may be
missing some key pieces/concepts. What really happens when one builds the
classes, where my java program imports org.json.JSONArray and
org.json.JSONObject? This import is just for compilation I suppose and it
does not get embedded into the final jar. Am I right?

I want to either bundle-up the external jar(s) into a single jar and
conveniently run hadoop using that, or, know how to include the external
jars in my command-line.


This is what I have:
- maven 3.0.3
- Mac OSX
- Java 1.6.0_26
- Hadoop - CDH 0.20.2-cdh3u0

I have Googled, looked at Tom White's github repo (
https://github.com/cloudera/repository-example/blob/master/pom.xml). The
more I Google, the more confused I get.

Any help is highly appreciated.

Thanks,
PD.





project xmlns=http://maven.apache.org/POM/4.0.0; xmlns:xsi=
http://www.w3.org/2001/XMLSchema-instance;
  xsi:schemaLocation=http://maven.apache.org/POM/4.0.0
http://maven.apache.org/xsd/maven-4.0.0.xsd;
  modelVersion4.0.0/modelVersion

  groupIdcom.ABC/groupId
  artifactIdMyHadoopProgram/artifactId
  version1.0/version
  packagingjar/packaging

  nameMyHadoopProgram/name
  urlhttp://maven.apache.org/url

  properties
project.build.sourceEncodingUTF-8/project.build.sourceEncoding
  /properties

  dependencies

  !-- org.json.* --
 dependency
   groupIdcom.metaparadigm/groupId
   artifactIdjson-rpc/artifactId
   version1.0/version
 /dependency

  !-- org.apache.hadoop.* --
 dependency
   groupIdorg.apache.hadoop/groupId
   artifactIdhadoop-core/artifactId
   version0.20.2/version
   scopeprovided/scope
 /dependency

  /dependencies
/project

How to solve a DisallowedDatanodeException?

Re: Question regarding hdfs synchronously / asynchronously block replication

Re: How to solve a DisallowedDatanodeException?

Re: DFSClient: Could not complete file

Re: How to solve a DisallowedDatanodeException?

Re: How to solve a DisallowedDatanodeException?

Using native libraries with hadoop

Re: Adjusting column value size.

hadoop knowledge gaining

Re: How to solve a DisallowedDatanodeException?

FW: Error running org.apache.hadoop.examples.DBCountPageView

Is it possible to run multiple MapReduce against the same HDFS?

Re: DFSClient: Could not complete file

Re: FW: Error running org.apache.hadoop.examples.DBCountPageView

Any suggestion on why I cannot talk to hdfs over a VPN

Re: DFSClient: Could not complete file

What should be in the hosts file on a hadoop cluster?

Search over the index created by hadoop contrib/index??

Simple Hadoop program build with Maven

19 matches

Site Navigation

Mail list logo

Footer information