Re: Does error could only be replicated to 0 nodes, instead of 1 mean no datanodes available?
Hello here is the output of hadoop fsck /: Status: HEALTHY Total size:0 B Total dirs:2 Total files: 0 (Files currently being written: 1) Total blocks (validated): 0 Minimally replicated blocks: 0 Over-replicated blocks:0 Under-replicated blocks: 0 Mis-replicated blocks: 0 Default replication factor:3 Average block replication: 0.0 Corrupt blocks:0 Missing replicas: 0 Number of data-nodes: 4 Number of racks: 1 -- Currently,I have set the dfs.http.address configuration property in hdfs-site.xml,all othe error gone,except error in primary namenode: 10/05/27 14:21:06 WARN hdfs.DFSClient: DataStreamer Exception: org.apache.hadoop.ipc.RemoteException: java.io.IOException: File /user/alex/check_ssh.sh could only be replicated to 0 nodes, instead of 1 at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1271) at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:422) ... 10/05/27 14:21:06 WARN hdfs.DFSClient: Error Recovery for block null bad datanode[0] nodes == null 10/05/27 14:21:06 WARN hdfs.DFSClient: Could not get block locations. Source file /user/alex/check_ssh.sh - Aborting... put: java.io.IOException: File /user/alex/check_ssh.sh could only be replicated to 0 nodes, instead of 1 10/05/27 14:21:06 ERROR hdfs.DFSClient: Exception closing file /user/alex/check_ssh.sh : org.apache.hadoop.ipc.RemoteException: java.io.IOException: File /user/alex/check_ssh.sh could only be replicated to 0 nodes, instead of 1 at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1271) at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:422) ... org.apache.hadoop.ipc.RemoteException: java.io.IOException: File /user/alex/check_ssh.sh could only be replicated to 0 nodes, instead of 1 at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1271) at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:422) ... --- On 05/27/2010 12:23 AM, Eric Sammer wrote: Alex: From the data node / secondary NN exceptions, it appears that nothing can talk to your name node. Take a look in the name node logs and look for where data node registration happens. Is it possible the NN disk is full? My guess is that there's something odd happening with the state on the name node. What does hadoop fsck / look like? On Wed, May 26, 2010 at 6:53 AM, Alex Luyaalexander.l...@gmail.com wrote: Hello: I got this error when putting files into hdfs,it seems a old issue,and I followed the solution of this link: http://adityadesai.wordpress.com/2009/02/26/another-problem-with-hadoop- jobjar-could-only-be-replicated-to-0-nodes-instead-of-1io-exception/ - but problem still exists.so I tried to figure it out through source code: --- org.apache.hadoop.hdfs.server.namenode.FSNameSystem.getAdditionalBlock() --- // choose targets for the new block tobe allocated. DatanodeDescriptor targets[] = replicator.chooseTarget(replication, clientNode, null, blockSize); if (targets.length this.minReplication) { throw new IOException(File + src + could only be replicated to + targets.length + nodes, instead of + minReplication); -- I think DatanodeDescriptor represents datanode,so here targets.length means the number of datanode,clearly,it is 0,in other words,no datanode is available.But in the web interface:localhost:50070,I can see 4 live nodes(I have 4 nodes only),and hadoop dfsadmin -report shows 4 nodes also.that is strange. And I got this error message in secondary namenode:
Re: TaskTracker and DataNodes cannot connect to master node (NoRouteToHostException)
Erik Test wrote: I confirmed that the hostname for the machine in the /etc/hosts file points to the actual address of the machine and not the local loopback. However, I see that the ports reported in the log file are not available in the iptables. I'm new to configuring iptables (i.e. I made my first configuration changes yesterday) so do I configure the port on the slave node as an output chain going to the master node? Erik I know nothing about IPtables either. I do know a bad /etc/resolv.conf file breaks a lot of java. nothing obviously bad stands out from the ant diagnostics call, though it needs more network debugging. Java version seems OK
RE: Encryption in Hadoop 0.20.1?
Thanks for responding Ted. I did see that link before but there wasn't enough details there for me to make sense of it. I'm not sure who Owen is ;( Cheers Arv From: Ted Yu [mailto:yuzhih...@gmail.com] Sent: Wed 26/05/2010 10:39 PM To: common-user@hadoop.apache.org Subject: Re: Encryption in Hadoop 0.20.1? Owen should be able to provide more details: http://markmail.org/thread/d2cmsacn32vdatpl On Wed, May 26, 2010 at 6:34 PM, Arv Mistry a...@kindsight.net wrote: Hi, Can anyone direct me to any documentation/examples on using data encryption for map/reduce jobs. And can you do both compress and encrypt the output? Thanks for any informatioin advance! Cheers Arv
Re: Encryption in Hadoop 0.20.1?
On Thu, May 27, 2010 at 6:58 AM, Arv Mistry a...@kindsight.net wrote: Thanks for responding Ted. I did see that link before but there wasn't enough details there for me to make sense of it. I'm not sure who Owen is ;( I'm Owen, although I think I've used at least 5 different email addresses on these lists at various times. *smile* Since you specify 0.20, you'd probably want to put your keys in to HDFS and read it from the tasks. Note that this is *not* secure and other users of your cluster can access your data in HDFS with only a tiny bit of mis-direction. (This will be fixed in 0.22, where we are adding strong authentication based on Kerberos.) The next step would be to define a compression codec that does the encryption. So let's say you define a XorEncryption that does a simple xor with a byte. (Obviously, you would use something better than xor, it is just an example!) XorEncryption would need to implement org.apache.hadoop.io.compression.CompressionCodec. You'd also need add your new class to the list of codecs in the configuration variable io.compression.codecs. For details of how to configure your mapreduce job with compression (or in this case encryption), look at http://bit.ly/9PMHUA. If XorEncryption returned .xor getDefaultExtension, then any file that ended in .xor would automatically be put through the encryption. So input is automatically handled. You need to define some configuration variables to get it applied to the output of MapReduce. -- Owen
distcp from an s3 url with slashes.
My S3 secret key has a slash in it. After replacing the / with %2F I can use it as a filesystem URL in something like: $ hadoop fs -fs s3n://$KEY:$sec...@$bucket -ls / Found 1 items drwxrwxrwx - 0 1969-12-31 16:00 /remote But when I try a distcp, it crashes with: $ hadoop distcp s3n://$KEY:$sec...@$bucket/remote local Copy failed: org.apache.hadoop.fs.s3.S3Exception: org.jets3t.service.S3ServiceException: S3 PUT failed for '/' XML Error Message: ? xml version=1.0 encoding=UTF-8?ErrorCodeSignatureDoesNotMatch/CodeMessage... If I leave the slash unencoded, it crashes with: $ hadoop distcp s3n://$KEY:$sec...@$bucket/remote local java.lang.IllegalArgumentException: Invalid hostname in URI s3n://$KEY:$sec...@$bucket/remote Is this a bug in Hadoop, or does distcp require special handling of S3 URLs?? I am using plain old 0.19.2 Cheers, Anthony
Re: Regading Benchmark test for Hadoop
Hey Sonali, A good start is Terrasort: # generate 10m records hadoop jar $HADOOP_HOME/hadoop-*-examples.jar teragen 1000 input-dir # run terasort hadoop jar $HADOOP_HOME/hadoop-*-examples.jar terasort input-dir output-dir Which will push your setup some. Its not a complete test, but its a good first test! Josh Patterson Solutions Architect Cloudera On Thu, May 27, 2010 at 10:22 AM, sonali sona...@cybage.com wrote: Hi, I am running Benchmarks for Testing performance of Hadoop cluster, I need to know if there are any tests that are crucial or most widely used? regards Sonali Gavhane Legal Disclaimer: This electronic message and all contents contain information from Cybage Software Private Limited which may be privileged, confidential, or otherwise protected from disclosure. The information is intended to be for the addressee(s) only. If you are not an addressee, any disclosure, copy, distribution, or use of the contents of this message is strictly prohibited. If you have received this electronic message in error please notify the sender by reply e-mail to and destroy the original message and all copies. Cybage has taken every reasonable precaution to minimize the risk of malicious content in the mail, but is not liable for any damage you may sustain as a result of any malicious content in this e-mail. You should carry out your own malicious content checks before opening the e-mail or attachment. www.cybage.com
Re: Encryption in Hadoop 0.20.1?
Owen wrote: For details of how to configure your mapreduce job with compression (or in this case encryption), look at http://bit.ly/9PMHUA. Since Arv asked about doing both: in case it's not obvious, compress _first_, then encrypt. (In fact, this is exactly what PGP, GnuPG, etc., do.) Greg
Tasktracker appearing from nowhere
I'm getting the following errors: WARN org.apache.hadoop.mapred.JobTracker: Serious problem, cannot find record of 'previous' heartbeat for 'tracker_m351.ra.wink.com:localhost/127.0.0.1:41885'; reinitializing the tasktracker INFO org.apache.hadoop.mapred.JobTracker: Adding task 'attempt_201005271529_0004_r_42_1' to tip task_201005271529_0004_r_42, for tracker 'tracker_m351.ra.wink.com:localhost/127.0.0.1:41885' INFO org.apache.hadoop.mapred.JobTracker: Removed completed task 'attempt_201005271529_0004_m_000112_0' from 'tracker_m351.ra.wink.com:localhost/127.0.0.1:41885' despite not having m351 in any of the config files except racks.txt. If I take it out of there I can't start any jobs at all. Question is - what would make a machine be contacted as a tasktracker when it is not in the slave or *.xml files? Thanks - ;;peter
Cloudera EC2 scripts
Hi, I was using the beta version of Cloudera scripts from a while back, and I think there is a stable version, but I can't find it. It tells me to go download a Hadoop distribution, and there I can't find cloudera scripts. I do see something there, hadoop-0.18.3/src/contrib/ec2/bin, but it does not look right. Is it me? Thank you, Mark
Passing binary files in maps
Hi, I need to put a binary file in map and then emit that map. I do it by encoding it as a string using Base64 encoding, so that's fine, but I am dealing with pretty large files, and I am running out of memory. That is because I read a complete file into memory. Is there a way to pass streams? Thank you, Mark
Re: Cloudera EC2 scripts
I didn't have any problems using the scripts that are in CDH3 (beta, March 2010) to bring up and tear down Hadoop cluster instances with EC2. I think there were some differences between the documentation and the actual scripts but it's been a few weeks and I don't have access to my notes right now to see what they were. --Andrew On May 27, 2010, at 9:31 PM, Mark Kerzner wrote: Hi, I was using the beta version of Cloudera scripts from a while back, and I think there is a stable version, but I can't find it. It tells me to go download a Hadoop distribution, and there I can't find cloudera scripts. I do see something there, hadoop-0.18.3/src/contrib/ec2/bin, but it does not look right. Is it me? Thank you, Mark
Re: Cloudera EC2 scripts
That would be fine, but where is the link to get them On Fri, May 28, 2010 at 12:10 AM, Andrew Nguyen andrew-lists-had...@ucsfcti.org wrote: I didn't have any problems using the scripts that are in CDH3 (beta, March 2010) to bring up and tear down Hadoop cluster instances with EC2. I think there were some differences between the documentation and the actual scripts but it's been a few weeks and I don't have access to my notes right now to see what they were. --Andrew On May 27, 2010, at 9:31 PM, Mark Kerzner wrote: Hi, I was using the beta version of Cloudera scripts from a while back, and I think there is a stable version, but I can't find it. It tells me to go download a Hadoop distribution, and there I can't find cloudera scripts. I do see something there, hadoop-0.18.3/src/contrib/ec2/bin, but it does not look right. Is it me? Thank you, Mark