Re: DFSClient error
Can you run a regular 'hadoop fs' (put orls or get) command? If yes, how about a wordcount example? 'path/hadoop jar pathhadoop-*examples*.jar wordcount input output' -Original Message- From: Mohit Anchlia mohitanch...@gmail.com Reply-To: common-user@hadoop.apache.org common-user@hadoop.apache.org Date: Fri, 27 Apr 2012 14:36:49 -0700 To: common-user@hadoop.apache.org common-user@hadoop.apache.org Subject: Re: DFSClient error I even tried to reduce number of jobs but didn't help. This is what I see: datanode logs: Initializing secure datanode resources Successfully obtained privileged resources (streaming port = ServerSocket[addr=/0.0.0.0,localport=50010] ) (http listener port = sun.nio.ch.ServerSocketChannelImpl[/0.0.0.0:50075]) Starting regular datanode initialization 26/04/2012 17:06:51 9858 jsvc.exec error: Service exit with a return value of 143 userlogs: 2012-04-26 19:35:22,801 WARN org.apache.hadoop.io.compress.snappy.LoadSnappy: Snappy native library is available 2012-04-26 19:35:22,801 INFO org.apache.hadoop.io.compress.snappy.LoadSnappy: Snappy native library loaded 2012-04-26 19:35:22,808 INFO org.apache.hadoop.io.compress.zlib.ZlibFactory: Successfully loaded initialized native-zlib library 2012-04-26 19:35:22,903 INFO org.apache.hadoop.hdfs.DFSClient: Failed to connect to /125.18.62.197:50010, add to deadNodes and continue java.io.EOFException at java.io.DataInputStream.readShort(DataInputStream.java:298) at org.apache.hadoop.hdfs.DFSClient$RemoteBlockReader.newBlockReader(DFSClien t.java:1664) at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.getBlockReader(DFSClient.j ava:2383) at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.blockSeekTo(DFSClient.java :2056) at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.read(DFSClient.java:2170) at java.io.DataInputStream.read(DataInputStream.java:132) at org.apache.hadoop.io.compress.DecompressorStream.getCompressedData(Decompr essorStream.java:97) at org.apache.hadoop.io.compress.DecompressorStream.decompress(DecompressorSt ream.java:87) at org.apache.hadoop.io.compress.DecompressorStream.read(DecompressorStream.j ava:75) at java.io.InputStream.read(InputStream.java:85) at org.apache.hadoop.util.LineReader.readDefaultLine(LineReader.java:205) at org.apache.hadoop.util.LineReader.readLine(LineReader.java:169) at org.apache.hadoop.mapreduce.lib.input.LineRecordReader.nextKeyValue(LineRe cordReader.java:114) at org.apache.pig.builtin.PigStorage.getNext(PigStorage.java:109) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordRead er.nextKeyValue(PigRecordReader.java:187) at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapT ask.java:456) at org.apache.hadoop.mapreduce.MapContext.nextKeyValue(MapContext.java:67) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:647) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:323) at org.apache.hadoop.mapred.Child$4.run(Child.java:270) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation. java:1157) at org.apache.hadoop.mapred.Child.main(Child.java:264) 2012-04-26 19:35:22,906 INFO org.apache.hadoop.hdfs.DFSClient: Failed to connect to /125.18.62.204:50010, add to deadNodes and continue java.io.EOFException namenode logs: 2012-04-26 16:12:53,562 INFO org.apache.hadoop.mapred.JobTracker: Job job_201204261140_0244 added successfully for user 'hadoop' to queue 'default' 2012-04-26 16:12:53,562 INFO org.apache.hadoop.mapred.JobTracker: Initializing job_201204261140_0244 2012-04-26 16:12:53,562 INFO org.apache.hadoop.mapred.AuditLogger: USER=hadoop IP=125.18.62.196OPERATION=SUBMIT_JOB TARGET=job_201204261140_0244RESULT=SUCCESS 2012-04-26 16:12:53,562 INFO org.apache.hadoop.mapred.JobInProgress: Initializing job_201204261140_0244 2012-04-26 16:12:53,581 INFO org.apache.hadoop.hdfs.DFSClient: Exception in createBlockOutputStream 125.18.62.198:50010 java.io.IOException: Bad connect ack with firstBadLink as 125.18.62.197:50010 2012-04-26 16:12:53,581 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning block blk_2499580289951080275_22499 2012-04-26 16:12:53,582 INFO org.apache.hadoop.hdfs.DFSClient: Excluding datanode 125.18.62.197:50010 2012-04-26 16:12:53,594 INFO org.apache.hadoop.mapred.JobInProgress: jobToken generated and stored with users keys in /data/hadoop/mapreduce/job_201204261140_0244/jobToken 2012-04-26 16:12:53,598 INFO org.apache.hadoop.mapred.JobInProgress: Input size for job job_201204261140_0244 = 73808305. Number of splits = 1 2012-04-26 16:12:53,598 INFO org.apache.hadoop.mapred.JobInProgress:
Re: The meaning of FileSystem in context of OutputFormat storage
I think what it means is that the output files can be stored in any of the possible implementation of the FileSystem abstract class depending on the user requirement. So, it could be stored in DistributedFileSystem, LocalFileSystem etc... Regards, John George -Original Message- From: Jay Vyas jayunit...@gmail.com Reply-To: common-user@hadoop.apache.org common-user@hadoop.apache.org Date: Wed, 25 Apr 2012 10:01:25 -0500 To: common-user@hadoop.apache.org common-user@hadoop.apache.org Subject: The meaning of FileSystem in context of OutputFormat storage I just saw this line in the javadocs for OutputFormat: Output files are stored in a FileSystemhttp://hadoop.apache.org/common/docs/current/api/org/apache/had oop/fs/FileSystem.html. Seems like an odd sentence. What is the implication here -- is this implying anything other than the obvious ? -- Jay Vyas MMSB/UCHC
Re: setting client retry
There are several different types of 'client retries'. The following are some that I know of. My guess is that you meant the following one. If so, it is defined in core-site.xml ipc.client.connect.max.retries (default value: 10) - Indicates the number of retries a client will make to establish a server connection. The other type of retries that I can think of on hdfs side: dfs.client.block.write.retries (default value: 3) - As the name suggests, this is the number of times a DFS client retries write to the DataNodes. dfs.client.block.write.locateFollowingBlock.retries (default value: 5) - On certain exceptions, the client might retry when trying to get an additional block from NN and this configuration controls that. There might be more. Feel free to let me know if you meant something else. Regards, John George -Original Message- From: Rita rmorgan...@gmail.com Reply-To: common-user@hadoop.apache.org common-user@hadoop.apache.org Date: Thu, 12 Apr 2012 07:35:43 -0500 To: common-user@hadoop.apache.org common-user@hadoop.apache.org Subject: setting client retry In the hdfs-site.xml file what argument do I need to set for client retries? Also, what is the default parameter? -- --- Get your facts first, then you can distort them as you please.--
Re: Hadoopp_ClassPath issue.
Dharin, I believe the properties you are looking for are the following: HADOOP_USER_CLASSPATH_FIRST: When defined, this will let the user suggested classpath to the beginning of global classpath. So, you would have to do something like 'export HADOOP_USER_CLASSPATH_FIRST=true'. If you are on 2.0 (or 0.23), please refer bin/hadoop-config.sh for more information. If you are on 1.0 (or 0.20), refer to hadoop script. Now, if you want to run an M/R job by passing your own jar and you want that jar to be used first, you want to set the config parameter 'mapreduce.job.user.classpath.first' and then the user provided jar will be put in before $HADOOP_CLASSPATH. Hope this makes sense. Also, these will work on 1.0 (or 0.23) above. Refer: https://issues.apache.org/jira/browse/MAPREDUCE-3696 (for 2.0, 0.23) https://issues.apache.org/jira/browse/MAPREDUCE-1938 (1.0, 0.20) Thanks, John George -Original Message- From: dmaniar dharin.man...@gmail.com Reply-To: common-user@hadoop.apache.org common-user@hadoop.apache.org Date: Tue, 10 Apr 2012 21:09:10 -0700 To: core-u...@hadoop.apache.org core-u...@hadoop.apache.org Subject: Hadoopp_ClassPath issue. Hi, I am new to hadoop and its not very familiar with internal working. I had some questions about HADOOP_CLASSPATH. We are currently suppose to use a Hadoop cluster with 4 machines and its HADOOP_CLASSPATH in hadoop-env.sh is as below. export HADOOP_CLASSPATH=/home/user/app/www/WEB-INF/classes:$HADOOP_CLASSPATH Now my, /home/user/app/www/WEB-INF/classes has a class called Application.class From a remote machine I submit a map-reduce job to this cluster, with a jar called MyJar.jar. [This has a Application.class too, but with some modifications] When the TaskTracker spawns a child Java process for the Mapper the classpath I see is as below in that order, Lets say my hadoop is installed at: /home/user/hadoop/ /home/user/hadoop/jar1, /home/user/hadoop/jar2, . . . /home/user/hadoop/jarN, /home/user/hadoop/lib/jar1, /home/user/hadoop/lib/jar2, /home/user/hadoop/lib/jarN, 1./home/user/app/www/WEB-INF/classes, 2/${mapred.local.dir}/taskTracker/{user}/jobcache/{jobid}/jars/Myjar.jar [note:- basically this has the modified class that I need to use for my Map-Reduce job] Well its clear from this classpath that i will end up using the Application.class from the classes folder. with gives me incorrect results. Now my Question is, how do I make sure i reverse the order of 1 2. Some pointer that I found was, 1) if MyJar.jar is not changing much then I can put in a shared location and modify my hadoop-env.sh to export HADOOP_CLASSPATH=/some/share/location/lib:/home/user/app/www/WEB-INF/clas ses:$HADOOP_CLASSPATH 2) get rid of /home/user/app/www/WEB-INF/classes, from my hadoop-env.sh 3) is there any property taht suggest to add before classpath ? Any help is greatly appreciated. To Summarize, If I have HADOOP_CLASSPTH in hadoop-env.sh already set, then how do I add application jar before this classpath. Again. I saw the DistributedCache.java [hadoop src] and the code looks like. public static void addFileToClassPath(Path file, Configuration conf) throws IOException { String classpath = conf.get(mapred.job.classpath.files); conf.set(mapred.job.classpath.files, classpath == null ? file .toString() : classpath + System.getProperty(path.separator) + file.toString()); . } basically new files are added to the end of existing classpath. Thanks, Dharin. -- View this message in context: http://old.nabble.com/Hadoopp_ClassPath-issue.-tp33666009p33666009.html Sent from the Hadoop core-user mailing list archive at Nabble.com.
Re: Hadoopp_ClassPath issue.
Dharin, I believe the properties you are looking for are the following: HADOOP_USER_CLASSPATH_FIRST: When defined, this will let the user suggested classpath to the beginning of global classpath. So, you would have to do something like 'export HADOOP_USER_CLASSPATH_FIRST=true'. If you are on 2.0 (or 0.23), please refer bin/hadoop-config.sh for more information. If you are on 1.0 (or 0.20), refer to hadoop script. Now, if you want to run an M/R job by passing your own jar and you want that jar to be used first, you want to set the config parameter 'mapreduce.job.user.classpath.first' and then the user provided jar will be put in before $HADOOP_CLASSPATH. Hope this makes sense. Also, these will work on 1.0 (or 0.23) above. Refer: https://issues.apache.org/jira/browse/MAPREDUCE-3696 (for 2.0, 0.23) https://issues.apache.org/jira/browse/MAPREDUCE-1938 (1.0, 0.20) Thanks, John George -Original Message- From: dmaniar dharin.man...@gmail.com Reply-To: common-user@hadoop.apache.org common-user@hadoop.apache.org Date: Tue, 10 Apr 2012 21:09:10 -0700 To: core-u...@hadoop.apache.org core-u...@hadoop.apache.org Subject: Hadoopp_ClassPath issue. Hi, I am new to hadoop and its not very familiar with internal working. I had some questions about HADOOP_CLASSPATH. We are currently suppose to use a Hadoop cluster with 4 machines and its HADOOP_CLASSPATH in hadoop-env.sh is as below. export HADOOP_CLASSPATH=/home/user/app/www/WEB-INF/classes:$HADOOP_CLASSPATH Now my, /home/user/app/www/WEB-INF/classes has a class called Application.class From a remote machine I submit a map-reduce job to this cluster, with a jar called MyJar.jar. [This has a Application.class too, but with some modifications] When the TaskTracker spawns a child Java process for the Mapper the classpath I see is as below in that order, Lets say my hadoop is installed at: /home/user/hadoop/ /home/user/hadoop/jar1, /home/user/hadoop/jar2, . . . /home/user/hadoop/jarN, /home/user/hadoop/lib/jar1, /home/user/hadoop/lib/jar2, /home/user/hadoop/lib/jarN, 1./home/user/app/www/WEB-INF/classes, 2/${mapred.local.dir}/taskTracker/{user}/jobcache/{jobid}/jars/Myjar.jar [note:- basically this has the modified class that I need to use for my Map-Reduce job] Well its clear from this classpath that i will end up using the Application.class from the classes folder. with gives me incorrect results. Now my Question is, how do I make sure i reverse the order of 1 2. Some pointer that I found was, 1) if MyJar.jar is not changing much then I can put in a shared location and modify my hadoop-env.sh to export HADOOP_CLASSPATH=/some/share/location/lib:/home/user/app/www/WEB-INF/clas ses:$HADOOP_CLASSPATH 2) get rid of /home/user/app/www/WEB-INF/classes, from my hadoop-env.sh 3) is there any property taht suggest to add before classpath ? Any help is greatly appreciated. To Summarize, If I have HADOOP_CLASSPTH in hadoop-env.sh already set, then how do I add application jar before this classpath. Again. I saw the DistributedCache.java [hadoop src] and the code looks like. public static void addFileToClassPath(Path file, Configuration conf) throws IOException { String classpath = conf.get(mapred.job.classpath.files); conf.set(mapred.job.classpath.files, classpath == null ? file .toString() : classpath + System.getProperty(path.separator) + file.toString()); . } basically new files are added to the end of existing classpath. Thanks, Dharin. -- View this message in context: http://old.nabble.com/Hadoopp_ClassPath-issue.-tp33666009p33666009.html Sent from the Hadoop core-user mailing list archive at Nabble.com.
Re: How do I include the newer version of Commons-lang in my jar?
Have you tried setting 'mapreduce.user.classpath.first'? It allows user jars to be put in the classpath before hadoop jars. -Original Message- From: Sky USC sky...@hotmail.com Reply-To: common-user@hadoop.apache.org common-user@hadoop.apache.org Date: Mon, 9 Apr 2012 15:46:52 -0500 To: common-user@hadoop.apache.org common-user@hadoop.apache.org Subject: RE: How do I include the newer version of Commons-lang in my jar? Thanks for the reply. I appreciate your helpfulness. I created Jars by following instructions at http://blog.mafr.de/2010/07/24/maven-hadoop-job/;. So external Jars are stored in lib/ folder within a jar. Am I summarizing this correctly: 1. If hadoop version = 0.20.203 or lower - then, there is not possible for me to use an external jar such as commons-lang from apache in my application. Any external jars packaged within my jar under lib directory are not captured. This appears like a huge limitation to me? 2. If hadoop version 0.20.204 to 1.0.x - then use HADOOP_USER_CLASSPATH_FIRST=true environment variable before launching hadoop jar might help. I tried this for version 0.20.205 but it didnt work. 3. If hadoop version 2.x or formerly 0.23.x - then this can be set via API? Is there a working version of testable jar that has these dependencies that I can try to figure out if its my way of packaging jar or something else?? Thx From: ha...@cloudera.com Date: Mon, 9 Apr 2012 13:50:37 +0530 Subject: Re: How do I include the newer version of Commons-lang in my jar? To: common-user@hadoop.apache.org Answer is a bit messy. Perhaps you can set the environment variable export HADOOP_USER_CLASSPATH_FIRST=true before you do a hadoop jar Š to launch your job. However, although this approach is present in 0.20.204+ (0.20.205, and 1.0.x), am not sure if it makes an impact on the tasks as well. I don't see it changing anything but for the driver CP. I've not tested it - please let us know if it works in your environment. In higher versions (2.x or formerly 0.23.x), this is doable from within your job if you set mapreduce.job.user.classpath.first to true inside your job, and ship your replacement jars along. Some versions would also let you set this via JobConf/Job.setUserClassesTakesPrecedence(true/false) API calls. On Mon, Apr 9, 2012 at 11:14 AM, Sky sky...@hotmail.com wrote: Hi. I am new to Hadoop and I am working on project on AWS Elastic MapReduce. The problem I am facing is: * org.apache.commons.lang.time.DateUtils: parseDate() works OK but parseDateStrictly() fails. I think parseDateStrictly might be new in lang 2.5. I thought I included all dependencies. However, for some reason, during runtime, my app is not picking up the newer commons-lang. Would love some help. Thx - sky -- Harsh J
Re: Hadoop archive
Could you try 0.20.205.0? The HAR issue in branch-20-security was updated by JIRA HADOOP-7539. -Original Message- From: Jonas Hartwig jonas.hart...@cision.com Reply-To: common-user@hadoop.apache.org common-user@hadoop.apache.org Date: Mon, 17 Oct 2011 02:11:24 -0700 To: common-user@hadoop.apache.org common-user@hadoop.apache.org Subject: Hadoop archive Hi, im new to the community. Id like to create an archive but I get the error: Exception in archives null. Im using hadoop 0.204.0. the issue was tracked under MAPREDUCE-1399 https://issues.apache.org/jira/browse/MAPREDUCE-1399 and solved. How do I combine my hadoop version with a new map/reduce release? And how do I get the release using firefox? I saw something like JIRA but the firefox plugin is not working with 7.x. regards
Re: a file can be used as a queue?
On 6/13/11 6:23 AM, Joey Echeverria j...@cloudera.com wrote: This feature doesn't currently work. I don't remember the JIRA for it, but there's a ticket which will allow a reader to read from an HDFS file before it's closed. In that case, you implement a queue by having the producer write to the end of the file and the reader read from the beginning of the file. I'm not sure if there will be a way to tell that a file is still being written, so you may need your own end of stream marker. One way to know the end of stream would be to call getVisibleLength() on the input stream. As long as the writer has flushed (or closed) its stream, the reader should be able to see those bytes. TestWriteRead.java might provide you some clues (hdfs/src/test/hdfs/org/apache/hadoop/hdfs/TestWriteRead.java). -Joey On Jun 13, 2011, at 2:55, ltomuno ltom...@163.com wrote: I heard a HDFS file as a producer - consumer queue, a file can be used as a queue? I am very confused Regards, John George