Re: Handling bad records
Hi Mohit , A and B refers to two different output files (multipart name). The file names will be seq-A* and seq-B*. Its similar to r in part-r-0 On Tue, Feb 28, 2012 at 11:37 AM, Mohit Anchlia mohitanch...@gmail.comwrote: Thanks that's helpful. In that example what is A and B referring to? Is that the output file name? mos.getCollector(seq, A, reporter).collect(key, new Text(Bye)); mos.getCollector(seq, B, reporter).collect(key, new Text(Chau)); On Mon, Feb 27, 2012 at 9:53 PM, Harsh J ha...@cloudera.com wrote: Mohit, Use the MultipleOutputs API: http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/mapred/lib/MultipleOutputs.html to have a named output of bad records. There is an example of use detailed on the link. On Tue, Feb 28, 2012 at 3:48 AM, Mohit Anchlia mohitanch...@gmail.com wrote: What's the best way to write records to a different file? I am doing xml processing and during processing I might come accross invalid xml format. Current I have it under try catch block and writing to log4j. But I think it would be better to just write it to an output file that just contains errors. -- Harsh J -- Join me at http://hadoopworkshop.eventbrite.com/
Re: dfs.block.size
You can use FileSystem.getFileStatus(Path p) which gives you the block size specific to a file. On Tue, Feb 28, 2012 at 2:50 AM, Kai Voigt k...@123.org wrote: hadoop fsck filename -blocks is something that I think of quickly. http://hadoop.apache.org/common/docs/current/commands_manual.html#fsckhas more details Kai Am 28.02.2012 um 02:30 schrieb Mohit Anchlia: How do I verify the block size of a given file? Is there a command? On Mon, Feb 27, 2012 at 7:59 AM, Joey Echeverria j...@cloudera.com wrote: dfs.block.size can be set per job. mapred.tasktracker.map.tasks.maximum is per tasktracker. -Joey On Mon, Feb 27, 2012 at 10:19 AM, Mohit Anchlia mohitanch...@gmail.com wrote: Can someone please suggest if parameters like dfs.block.size, mapred.tasktracker.map.tasks.maximum are only cluster wide settings or can these be set per client job configuration? On Sat, Feb 25, 2012 at 5:43 PM, Mohit Anchlia mohitanch...@gmail.com wrote: If I want to change the block size then can I use Configuration in mapreduce job and set it when writing to the sequence file or does it need to be cluster wide setting in .xml files? Also, is there a way to check the block of a given file? -- Joseph Echeverria Cloudera, Inc. 443.305.9434 -- Kai Voigt k...@123.org -- Join me at http://hadoopworkshop.eventbrite.com/
Re: Handling bad records
Can multiple output be used with Hadoop Streaming? On Tue, Feb 28, 2012 at 2:07 PM, madhu phatak phatak@gmail.com wrote: Hi Mohit , A and B refers to two different output files (multipart name). The file names will be seq-A* and seq-B*. Its similar to r in part-r-0 On Tue, Feb 28, 2012 at 11:37 AM, Mohit Anchlia mohitanch...@gmail.com wrote: Thanks that's helpful. In that example what is A and B referring to? Is that the output file name? mos.getCollector(seq, A, reporter).collect(key, new Text(Bye)); mos.getCollector(seq, B, reporter).collect(key, new Text(Chau)); On Mon, Feb 27, 2012 at 9:53 PM, Harsh J ha...@cloudera.com wrote: Mohit, Use the MultipleOutputs API: http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/mapred/lib/MultipleOutputs.html to have a named output of bad records. There is an example of use detailed on the link. On Tue, Feb 28, 2012 at 3:48 AM, Mohit Anchlia mohitanch...@gmail.com wrote: What's the best way to write records to a different file? I am doing xml processing and during processing I might come accross invalid xml format. Current I have it under try catch block and writing to log4j. But I think it would be better to just write it to an output file that just contains errors. -- Harsh J -- Join me at http://hadoopworkshop.eventbrite.com/
Re: ClassNotFoundException: -libjars not working?
Hi, -libjars doesn't always work.Better way is to create a runnable jar with all dependencies ( if no of dependency is less) or u have to keep the jars into the lib folder of the hadoop in all machines. On Wed, Feb 22, 2012 at 8:13 PM, Ioan Eugen Stan stan.ieu...@gmail.comwrote: Hello, I'm trying to run a map-reduce job and I get ClassNotFoundException, but I have the class submitted with -libjars. What's wrong with how I do things? Please help. I'm running hadoop-0.20.2-cdh3u1, and I have everithing on the -libjars line. The job is submitted via a java app like: exec /usr/lib/jvm/java-6-sun/bin/**java -Dproc_jar -Xmx200m -server -Dhadoop.log.dir=/opt/ui/var/**log/mailsearch -Dhadoop.log.file=hadoop.log -Dhadoop.home.dir=/usr/lib/**hadoop -Dhadoop.id.str=hbase -Dhadoop.root.logger=INFO,**console -Dhadoop.policy.file=hadoop-**policy.xml -classpath '/usr/lib/hadoop/conf:/usr/**lib/jvm/java-6-sun/lib/tools.** jar:/usr/lib/hadoop:/usr/lib/**hadoop/hadoop-core-0.20.2-** cdh3u1.jar:/usr/lib/hadoop/**lib/ant-contrib-1.0b3.jar:/** usr/lib/hadoop/lib/apache-**log4j-extras-1.1.jar:/usr/lib/** hadoop/lib/aspectjrt-1.6.5.**jar:/usr/lib/hadoop/lib/** aspectjtools-1.6.5.jar:/usr/**lib/hadoop/lib/commons-cli-1.** 2.jar:/usr/lib/hadoop/lib/**commons-codec-1.4.jar:/usr/** lib/hadoop/lib/commons-daemon-**1.0.1.jar:/usr/lib/hadoop/lib/** commons-el-1.0.jar:/usr/lib/**hadoop/lib/commons-httpclient-** 3.0.1.jar:/usr/lib/hadoop/lib/**commons-logging-1.0.4.jar:/** usr/lib/hadoop/lib/commons-**logging-api-1.0.4.jar:/usr/** lib/hadoop/lib/commons-net-1.**4.1.jar:/usr/lib/hadoop/lib/** core-3.1.1.jar:/usr/lib/**hadoop/lib/hadoop-**fairscheduler-0.20.2-cdh3u1. **jar:/usr/lib/hadoop/lib/**hsqldb-1.8.0.10.jar:/usr/lib/** hadoop/lib/jackson-core-asl-1.**5.2.jar:/usr/lib/hadoop/lib/** jackson-mapper-asl-1.5.2.jar:/**usr/lib/hadoop/lib/jasper-** compiler-5.5.12.jar:/usr/lib/**hadoop/lib/jasper-runtime-5.5.** 12.jar:/usr/lib/hadoop/lib /jcl-over-slf4j-1.6.1.jar:/**usr/lib/hadoop/lib/jets3t-0.6.** 1.jar:/usr/lib/hadoop/lib/**jetty-6.1.26.jar:/usr/lib/** hadoop/lib/jetty-servlet-**tester-6.1.26.jar:/usr/lib/** hadoop/lib/jetty-util-6.1.26.**jar:/usr/lib/hadoop/lib/jsch-** 0.1.42.jar:/usr/lib/hadoop/**lib/junit-4.5.jar:/usr/lib/** hadoop/lib/kfs-0.2.2.jar:/usr/**lib/hadoop/lib/log4j-1.2.15.** jar:/usr/lib/hadoop/lib/**mockito-all-1.8.2.jar:/usr/** lib/hadoop/lib/oro-2.0.8.jar:/**usr/lib/hadoop/lib/servlet-** api-2.5-20081211.jar:/usr/lib/**hadoop/lib/servlet-api-2.5-6.** 1.14.jar:/usr/lib/hadoop/lib/**slf4j-api-1.6.1.jar:/usr/lib/** hadoop/lib/slf4j-log4j12-1.6.**1.jar:/usr/lib/hadoop/lib/** xmlenc-0.52.jar:/usr/lib/**hadoop/lib/jsp-2.1/jsp-2.1.** jar:/usr/lib/hadoop/lib/jsp-2.**1/jsp-api-2.1.jar:/usr/share/** mailbox-convertor/lib/*:/usr/**lib/hadoop/contrib/capacity-** scheduler/hadoop-capacity-**scheduler-0.20.2-cdh3u1.jar:/** usr/lib/hbase/lib/hadoop-lzo-**0.4.13.jar:/usr/lib/hbase/** hbase.jar:/etc/hbase/conf:/**usr/lib/hbase/lib:/usr/lib/** zookeeper/zookeeper.jar:/usr/**lib/hadoop/contrib /capacity-scheduler/hadoop-**capacity-scheduler-0.20.2-** cdh3u1.jar:/usr/lib/hbase/lib/**hadoop-lzo-0.4.13.jar:/usr/** lib/hbase/hbase.jar:/etc/**hbase/conf:/usr/lib/hbase/lib:** /usr/lib/zookeeper/zookeeper.**jar' org.apache.hadoop.util.RunJar /usr/share/mailbox-convertor/**mailbox-convertor-0.1-**SNAPSHOT.jar -libjars=/usr/share/mailbox-**convertor/lib/antlr-2.7.7.jar,** /usr/share/mailbox-convertor/**lib/aopalliance-1.0.jar,/usr/** share/mailbox-convertor/lib/**asm-3.1.jar,/usr/share/** mailbox-convertor/lib/**backport-util-concurrent-3.1.** jar,/usr/share/mailbox-**convertor/lib/cglib-2.2.jar,/** usr/share/mailbox-convertor/**lib/hadoop-ant-3.0-u1.pom,/** usr/share/mailbox-convertor/**lib/speed4j-0.9.jar,/usr/** share/mailbox-convertor/lib/**jamm-0.2.2.jar,/usr/share/** mailbox-convertor/lib/uuid-3.**2.0.jar,/usr/share/mailbox-** convertor/lib/high-scale-lib-**1.1.1.jar,/usr/share/mailbox-** convertor/lib/jsr305-1.3.9.**jar,/usr/share/mailbox-** convertor/lib/guava-11.0.1.**jar,/usr/share/mailbox-** convertor/lib/protobuf-java-2.**4.0a.jar,/usr/share/mailbox-** convertor/lib/**concurrentlinkedhashmap-lru-1.**1.jar,/usr/share/mailbox-* *convertor/lib/json-simple-1.1.**jar,/usr/share/mailbox-** convertor/lib/itext-2.1.5.jar,**/usr/share/mailbox-convertor/** lib/jmxtools-1.2.1.jar,/usr/**share/mailbox-convertor/lib/** jersey-client-1.4.jar,/usr/**share/mailbox-converto r/lib/jersey-core-1.4.jar,/**usr/share/mailbox-convertor/** lib/jersey-json-1.4.jar,/usr/**share/mailbox-convertor/lib/** jersey-server-1.4.jar,/usr/**share/mailbox-convertor/lib/** jmxri-1.2.1.jar,/usr/share/**mailbox-convertor/lib/jaxb-** impl-2.1.12.jar,/usr/share/**mailbox-convertor/lib/xstream-** 1.2.2.jar,/usr/share/mailbox-**convertor/lib/commons-metrics-** 1.3.jar,/usr/share/mailbox-**convertor/lib/commons-** monitoring-2.9.1.jar,/usr/**share/mailbox-convertor/lib/**
Re: Setting eclipse for map reduce using maven
Hi, Find maven definition for Hadoop core jars here-http://search.maven.org/#browse|-856937612 . On Tue, Feb 21, 2012 at 10:48 PM, Mohit Anchlia mohitanch...@gmail.comwrote: I am trying to search for dependencies that would help me get started with developing map reduce in eclipse and I prefer to use maven for this. Could someone help me point to directions? -- Join me at http://hadoopworkshop.eventbrite.com/
Re: hadoop streaming : need help in using custom key value separator
http://hadoop.apache.org/common/docs/current/streaming.html#Customizing+How+Lines+are+Split+into+Key%2FValue+Pairs Read this link, your options are wrong below. On Tue, Feb 28, 2012 at 1:13 PM, Austin Chungath austi...@gmail.com wrote: When I am using more than one reducer in hadoop streaming where I am using my custom separater rather than the tab, it looks like the hadoop shuffling process is not happening as it should. This is the reducer output when I am using '\t' to separate my key value pair that is output from the mapper. *output from reducer 1:* 10321,22 23644,37 41231,42 23448,20 12325,39 71234,20 *output from reducer 2:* 24123,43 33213,46 11321,29 21232,32 the above output is as expected the first column is the key and the second value is the count. There are 10 unique keys and 6 of them are in output of the first reducer and the remaining 4 int the second reducer output. But now when I use a custom separater for my key value pair output from my mapper. Here I am using '*' as the separator -D stream.mapred.output.field.separator=* -D mapred.reduce.tasks=2 *output from reducer 1:* 10321,5 21232,19 24123,16 33213,28 23644,21 41231,12 23448,18 11321,29 12325,24 71234,9 * * *output from reducer 2:* 10321,17 21232,13 33213,18 23644,16 41231,30 23448,2 24123,27 12325,15 71234,11 Now both the reducers are getting all the keys and part of the values go to reducer 1 and part of the reducer go to reducer 2. Why is it behaving like this when I am using a custom separator, shouldn't each reducer get a unique key after the shuffling? I am using Hadoop 0.20.205.0 and below is the command that I am using to run hadoop streaming. Is there some more options that I should specify for hadoop streaming to work properly if I am using a custom separator? hadoop jar $HADOOP_PREFIX/contrib/streaming/hadoop-streaming-0.20.205.0.jar -D stream.mapred.output.field.separator=* -D mapred.reduce.tasks=2 -mapper ./map.py -reducer ./reducer.py -file ./map.py -file ./reducer.py -input /user/inputdata -output /user/outputdata -verbose Any help is much appreciated, Thanks, Austin
Re: Difference between hdfs dfs and hdfs fs
Hi Mohit, FS is a generic filesystem which can point to any file systems like LocalFileSystem,HDFS etc. But dfs is specific to HDFS. So when u use fs it can copy from local file system to hdfs . But when u specify dfs src file has to be on HDFS. On Tue, Feb 21, 2012 at 10:46 PM, Mohit Anchlia mohitanch...@gmail.comwrote: What's the different between hdfs dfs and hdfs fs commands? When I run hdfs dfs -copyFromLocal /assa . and use pig it can't find it but when I use hdfs fs pig is able to find the file. -- Join me at http://hadoopworkshop.eventbrite.com/
Re: HDFS problem in hadoop 0.20.203
Hi, Did you formatted the HDFS? On Tue, Feb 21, 2012 at 7:40 PM, Shi Yu sh...@uchicago.edu wrote: Hi Hadoopers, We are experiencing a strange problem on Hadoop 0.20.203 Our cluster has 58 nodes, everything is started from a fresh HDFS (we deleted all local folders on datanodes and reformatted the namenode). After running some small jobs, the HDFS becomes behaving abnormally and the jobs become very slow. The namenode log is crushed by Gigabytes of errors like is: 2012-02-21 00:00:38,632 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* NameSystem.addToInvalidates: blk_4524177823306792294 is added to invalidSet of 10.105.19.31:50010 2012-02-21 00:00:38,632 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* NameSystem.addToInvalidates: blk_4524177823306792294 is added to invalidSet of 10.105.19.18:50010 2012-02-21 00:00:38,632 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* NameSystem.addToInvalidates: blk_4524177823306792294 is added to invalidSet of 10.105.19.32:50010 2012-02-21 00:00:38,632 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* NameSystem.addToInvalidates: blk_2884522252507300332 is added to invalidSet of 10.105.19.35:50010 2012-02-21 00:00:38,632 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* NameSystem.addToInvalidates: blk_2884522252507300332 is added to invalidSet of 10.105.19.27:50010 2012-02-21 00:00:38,632 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* NameSystem.addToInvalidates: blk_2884522252507300332 is added to invalidSet of 10.105.19.33:50010 2012-02-21 00:00:38,632 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* NameSystem.addStoredBlock: blockMap updated: 10.105.19.21:50010 is added to blk_- 6843171124277753504_2279882 size 124490 2012-02-21 00:00:38,632 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* NameSystem.allocateBlock: /syu/output/naive/iter5_partout1/_temporary/_attempt_201202202 043_0013_m_000313_0/result_stem-m-00313. blk_- 6379064588594672168_2279890 2012-02-21 00:00:38,633 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* NameSystem.addStoredBlock: blockMap updated: 10.105.19.26:50010 is added to blk_5338983375361999760_2279887 size 1476 2012-02-21 00:00:38,633 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* NameSystem.addStoredBlock: blockMap updated: 10.105.19.29:50010 is added to blk_-977828927900581074_2279887 size 13818 2012-02-21 00:00:38,633 INFO org.apache.hadoop.hdfs.StateChange: DIR* NameSystem.completeFile: file /syu/output/naive/iter5_partout1/_temporary/_attempt_201202202 043_0013_m_000364_0/result_stem-m-00364 is closed by DFSClient_attempt_201202202043_0013_m_000364_0 2012-02-21 00:00:38,633 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* NameSystem.addStoredBlock: blockMap updated: 10.105.19.23:50010 is added to blk_5338983375361999760_2279887 size 1476 2012-02-21 00:00:38,633 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* NameSystem.addStoredBlock: blockMap updated: 10.105.19.20:50010 is added to blk_5338983375361999760_2279887 size 1476 2012-02-21 00:00:38,633 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* NameSystem.allocateBlock: /syu/output/naive/iter5_partout1/_temporary/_attempt_201202202 043_0013_m_000364_0/result_suffix-m-00364. blk_1921685366929756336_2279890 2012-02-21 00:00:38,634 INFO org.apache.hadoop.hdfs.StateChange: DIR* NameSystem.completeFile: file /syu/output/naive/iter5_partout1/_temporary/_attempt_201202202 043_0013_m_000279_0/result_suffix-m-00279 is closed by DFSClient_attempt_201202202043_0013_m_000279_0 2012-02-21 00:00:38,635 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* NameSystem.addToInvalidates: blk_495061820035691700 is added to invalidSet of 10.105.19.20:50010 2012-02-21 00:00:38,635 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* NameSystem.addToInvalidates: blk_495061820035691700 is added to invalidSet of 10.105.19.25:50010 2012-02-21 00:00:38,635 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* NameSystem.addToInvalidates: blk_495061820035691700 is added to invalidSet of 10.105.19.33:50010 2012-02-21 00:00:38,635 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* NameSystem.allocateBlock: /syu/output/naive/iter5_partout1/_temporary/_attempt_201202202 043_0013_m_000284_0/result_stem-m-00284. blk_8796188324642771330_2279891 2012-02-21 00:00:38,638 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* NameSystem.addStoredBlock: blockMap updated: 10.105.19.34:50010 is added to blk_-977828927900581074_2279887 size 13818 2012-02-21 00:00:38,638 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* NameSystem.allocateBlock: /syu/output/naive/iter5_partout1/_temporary/_attempt_201202202 043_0013_m_000296_0/result_stem-m-00296. blk_- 6800409224007034579_2279891 2012-02-21 00:00:38,638 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* NameSystem.addStoredBlock: blockMap updated: 10.105.19.29:50010 is added to blk_1921685366929756336_2279890 size 1511 2012-02-21 00:00:38,638 INFO org.apache.hadoop.hdfs.StateChange: BLOCK*
Re: PathFilter File Glob
Hi, Why not just use: FileSystem fs = FileSystem.get(conf); FileStatus[] files = fs.globStatus(new Path(path+filter)); Thanks, -Idris On Mon, Feb 27, 2012 at 1:06 PM, Harsh J ha...@cloudera.com wrote: Hi Simon, You need to implement your custom PathFilter derivative class, and then set it via your {File}InputFormat class using setInputPathFilter: http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/mapred/FileInputFormat.html#setInputPathFilter(org.apache.hadoop.mapred.JobConf,%20java.lang.Class)http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/mapred/FileInputFormat.html#setInputPathFilter%28org.apache.hadoop.mapred.JobConf,%20java.lang.Class%29 (TextInputFormat is a derivative of FileInputFormat, and hence has the same method.) HTH. 2012/2/23 Heeg, Simon s.h...@telekom.de: Hello, I would like to use a PathFilter for filtering the files with a regular expression which are read by the TextInputFormat, but I don't know how to apply the filter. I cannot find a setter. Unfortunately google was not my friend with this issue and The definitive Guide does not help that much. I am using Hadoop 0.20.2-cdh3u3. -- Harsh J
Re: hadoop streaming : need help in using custom key value separator
Thanks subir, -D stream.mapred.output.field.separator=* is not an available option, my bad what I should have done is: -D stream.map.output.field.separator=* On Tue, Feb 28, 2012 at 2:36 PM, Subir S subir.sasiku...@gmail.com wrote: http://hadoop.apache.org/common/docs/current/streaming.html#Customizing+How+Lines+are+Split+into+Key%2FValue+Pairs Read this link, your options are wrong below. On Tue, Feb 28, 2012 at 1:13 PM, Austin Chungath austi...@gmail.com wrote: When I am using more than one reducer in hadoop streaming where I am using my custom separater rather than the tab, it looks like the hadoop shuffling process is not happening as it should. This is the reducer output when I am using '\t' to separate my key value pair that is output from the mapper. *output from reducer 1:* 10321,22 23644,37 41231,42 23448,20 12325,39 71234,20 *output from reducer 2:* 24123,43 33213,46 11321,29 21232,32 the above output is as expected the first column is the key and the second value is the count. There are 10 unique keys and 6 of them are in output of the first reducer and the remaining 4 int the second reducer output. But now when I use a custom separater for my key value pair output from my mapper. Here I am using '*' as the separator -D stream.mapred.output.field.separator=* -D mapred.reduce.tasks=2 *output from reducer 1:* 10321,5 21232,19 24123,16 33213,28 23644,21 41231,12 23448,18 11321,29 12325,24 71234,9 * * *output from reducer 2:* 10321,17 21232,13 33213,18 23644,16 41231,30 23448,2 24123,27 12325,15 71234,11 Now both the reducers are getting all the keys and part of the values go to reducer 1 and part of the reducer go to reducer 2. Why is it behaving like this when I am using a custom separator, shouldn't each reducer get a unique key after the shuffling? I am using Hadoop 0.20.205.0 and below is the command that I am using to run hadoop streaming. Is there some more options that I should specify for hadoop streaming to work properly if I am using a custom separator? hadoop jar $HADOOP_PREFIX/contrib/streaming/hadoop-streaming-0.20.205.0.jar -D stream.mapred.output.field.separator=* -D mapred.reduce.tasks=2 -mapper ./map.py -reducer ./reducer.py -file ./map.py -file ./reducer.py -input /user/inputdata -output /user/outputdata -verbose Any help is much appreciated, Thanks, Austin
LZO exception decompressing (returned -8)
Hey there, I've been running a cluster for over a year and was getting a lzo decompressing exception less than once a month. Suddenly it happens almost once per day. Any ideas what could be causing it? I'm with hadoop 0.20.2 I've thought in moving to snappy but would like to know why this happens more often now The exception happens always when the reducer gets data from the map and looks like: Error: java.lang.InternalError: lzo1x_decompress returned: -8 at com.hadoop.compression.lzo.LzoDecompressor.decompressBytesDirect(Native Method) at com.hadoop.compression.lzo.LzoDecompressor.decompress(LzoDecompressor.java:305) at org.apache.hadoop.io.compress.BlockDecompressorStream.decompress(BlockDecompressorStream.java:76) at org.apache.hadoop.io.compress.DecompressorStream.read(DecompressorStream.java:75) at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.shuffleInMemory(ReduceTask.java:1553) at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.getMapOutput(ReduceTask.java:1432) at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.copyOutput(ReduceTask.java:1285) at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run(ReduceTask.java:1216) Thanks in advance. -- View this message in context: http://lucene.472066.n3.nabble.com/LZO-exception-decompressing-returned-8-tp3783652p3783652.html Sent from the Hadoop lucene-users mailing list archive at Nabble.com.
Re: Need help on hadoop eclipse plugin
So I made the above changes using WinRAR, it embedded those changes inside the jar itself. I didn't need to extract the jar contents and construct new jar again. I just replaced this new jar with the old jar. Restart the eclipse with eclipse -clean. I am now able to run the hadoop eclipse plugin without any error in eclipse helios 3.6.2. However, now I am looking forward to use the same plugin in IBM RAD 8.0. I am getting the following error in the .log : !ENTRY org.eclipse.core.jobs 4 2 2012-02-28 05:26:12.056 !MESSAGE An internal error occurred during: Connecting to DFS lxe9700. !STACK 0 java.lang.NoClassDefFoundError: org.apache.hadoop.security.UserGroupInformation (initialization failure) at java.lang.J9VMInternals.initialize(Unknown Source) at org.apache.hadoop.fs.FileSystem$Cache$Key.init(Unknown Source) at org.apache.hadoop.fs.FileSystem$Cache.get(Unknown Source) at org.apache.hadoop.fs.FileSystem.get(Unknown Source) at org.apache.hadoop.fs.FileSystem.get(Unknown Source) at org.apache.hadoop.eclipse.server.HadoopServer.getDFS(Unknown Source) at org.apache.hadoop.eclipse.dfs.DFSPath.getDFS(Unknown Source) at org.apache.hadoop.eclipse.dfs.DFSFolder.loadDFSFolderChildren(Unknown Source) at org.apache.hadoop.eclipse.dfs.DFSFolder$1.run(Unknown Source) at org.eclipse.core.internal.jobs.Worker.run(Unknown Source) I downloaded oracle jdk and changed the IBM RAD to use Oracle JDK 1.7 , still I am seeing the above error. Can anyone help me in debugging this issue ? Thanks, Praveenesh On Tue, Feb 28, 2012 at 1:12 PM, praveenesh kumar praveen...@gmail.comwrote: Hi all, I am trying to use hadoop eclipse plugin on my windows machine to connect to the my remote hadoop cluster. I am currently using putty to login to the cluster. So ssh is enable and my windows machine is able to listen to my hadoop cluster. I am using hadoop 0.20.205, hadoop-eclipse plugin -0.20.205.jar . eclipse helios Version: 3.6.2, Oracle JDK 1.7 If I am using original eclipse-plugin.jar by putting it inside my $ECLIPSE_HOME/dropins or /plugins folder, I am able to see Hadoop map-reduce perspective. But after specifying hadoop NN / JT connections, I am seeing the following error, whenever I am trying to access the HDFS. An internal error occurred during: Connecting to DFS lxe9700. org/apache/commons/configuration/Configuration Connecting to DFS lxe9700' has encountered a problem. An internal error occured during Connecting to DFS After seeing the .log file .. I am seeing the following lines : !MESSAGE An internal error occurred during: Connecting to DFS lxe9700. !STACK 0 java.lang.NoClassDefFoundError: org/apache/commons/configuration/Configuration at org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.init(DefaultMetricsSystem.java:37) at org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.clinit(DefaultMetricsSystem.java:34) at org.apache.hadoop.security.UgiInstrumentation.create(UgiInstrumentation.java:51) at org.apache.hadoop.security.UserGroupInformation.initialize(UserGroupInformation.java:196) at org.apache.hadoop.security.UserGroupInformation.ensureInitialized(UserGroupInformation.java:159) at org.apache.hadoop.security.UserGroupInformation.isSecurityEnabled(UserGroupInformation.java:216) at org.apache.hadoop.security.KerberosName.clinit(KerberosName.java:83) at org.apache.hadoop.security.UserGroupInformation.initialize(UserGroupInformation.java:189) at org.apache.hadoop.security.UserGroupInformation.ensureInitialized(UserGroupInformation.java:159) at org.apache.hadoop.security.UserGroupInformation.isSecurityEnabled(UserGroupInformation.java:216) at org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGroupInformation.java:409) at org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGroupInformation.java:395) at org.apache.hadoop.fs.FileSystem$Cache$Key.init(FileSystem.java:1436) at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1337) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:244) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:122) at org.apache.hadoop.eclipse.server.HadoopServer.getDFS(HadoopServer.java:469) at org.apache.hadoop.eclipse.dfs.DFSPath.getDFS(DFSPath.java:146) at org.apache.hadoop.eclipse.dfs.DFSFolder.loadDFSFolderChildren(DFSFolder.java:61) at org.apache.hadoop.eclipse.dfs.DFSFolder$1.run(DFSFolder.java:178) at org.eclipse.core.internal.jobs.Worker.run(Worker.java:54) Caused by: java.lang.ClassNotFoundException: org.apache.commons.configuration.Configuration at org.eclipse.osgi.internal.loader.BundleLoader.findClassInternal(BundleLoader.java:506) at org.eclipse.osgi.internal.loader.BundleLoader.findClass(BundleLoader.java:422) at org.eclipse.osgi.internal.loader.BundleLoader.findClass(BundleLoader.java:410) at
Re: ClassNotFoundException: -libjars not working?
Pe 28.02.2012 10:58, madhu phatak a scris: Hi, -libjars doesn't always work.Better way is to create a runnable jar with all dependencies ( if no of dependency is less) or u have to keep the jars into the lib folder of the hadoop in all machines. Thanks for the reply Madhu, I adopted the second solution as explained in [1]. From what I found browsing the net it seems that -libjars is broken in hadoop version 0.18. I didn't got time to check the code yet. Cloudera released hadoop sources are packaged a bit odd and Netbeans doens't seem to play well with that and this really affects my will to try to fix the problem. -libjars is a nice feature that permits the use of skinny jars and would help system admins do better packaging. It also allows better control over the classpath. Too bad it didn't work. [1] http://www.cloudera.com/blog/2011/01/how-to-include-third-party-libraries-in-your-map-reduce-job/ Cheers, -- Ioan Eugen Stan http://ieugen.blogspot.com
Re: LZO exception decompressing (returned -8)
Which version of the Hadoop LZO library are you using? It looks like something I'm pretty sure was fixed in a newer version. -Joey On Feb 28, 2012, at 4:58, Marc Sturlese marc.sturl...@gmail.com wrote: Hey there, I've been running a cluster for over a year and was getting a lzo decompressing exception less than once a month. Suddenly it happens almost once per day. Any ideas what could be causing it? I'm with hadoop 0.20.2 I've thought in moving to snappy but would like to know why this happens more often now The exception happens always when the reducer gets data from the map and looks like: Error: java.lang.InternalError: lzo1x_decompress returned: -8 at com.hadoop.compression.lzo.LzoDecompressor.decompressBytesDirect(Native Method) at com.hadoop.compression.lzo.LzoDecompressor.decompress(LzoDecompressor.java:305) at org.apache.hadoop.io.compress.BlockDecompressorStream.decompress(BlockDecompressorStream.java:76) at org.apache.hadoop.io.compress.DecompressorStream.read(DecompressorStream.java:75) at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.shuffleInMemory(ReduceTask.java:1553) at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.getMapOutput(ReduceTask.java:1432) at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.copyOutput(ReduceTask.java:1285) at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run(ReduceTask.java:1216) Thanks in advance. -- View this message in context: http://lucene.472066.n3.nabble.com/LZO-exception-decompressing-returned-8-tp3783652p3783652.html Sent from the Hadoop lucene-users mailing list archive at Nabble.com.
Re: LZO exception decompressing (returned -8)
I'm with 0.4.9 (think is the latest) -- View this message in context: http://lucene.472066.n3.nabble.com/LZO-exception-decompressing-returned-8-tp3783652p3783927.html Sent from the Hadoop lucene-users mailing list archive at Nabble.com.
Re: LZO exception decompressing (returned -8)
Try 0.4.15. You can get it from here: https://github.com/toddlipcon/hadoop-lzo Sent from my iPhone On Feb 28, 2012, at 6:49, Marc Sturlese marc.sturl...@gmail.com wrote: I'm with 0.4.9 (think is the latest) -- View this message in context: http://lucene.472066.n3.nabble.com/LZO-exception-decompressing-returned-8-tp3783652p3783927.html Sent from the Hadoop lucene-users mailing list archive at Nabble.com.
Re: Handling bad records
Subir, No, not unless you use a specialized streaming library (pydoop, dumbo, etc. for python, for example). On Tue, Feb 28, 2012 at 2:19 PM, Subir S subir.sasiku...@gmail.com wrote: Can multiple output be used with Hadoop Streaming? On Tue, Feb 28, 2012 at 2:07 PM, madhu phatak phatak@gmail.com wrote: Hi Mohit , A and B refers to two different output files (multipart name). The file names will be seq-A* and seq-B*. Its similar to r in part-r-0 On Tue, Feb 28, 2012 at 11:37 AM, Mohit Anchlia mohitanch...@gmail.com wrote: Thanks that's helpful. In that example what is A and B referring to? Is that the output file name? mos.getCollector(seq, A, reporter).collect(key, new Text(Bye)); mos.getCollector(seq, B, reporter).collect(key, new Text(Chau)); On Mon, Feb 27, 2012 at 9:53 PM, Harsh J ha...@cloudera.com wrote: Mohit, Use the MultipleOutputs API: http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/mapred/lib/MultipleOutputs.html to have a named output of bad records. There is an example of use detailed on the link. On Tue, Feb 28, 2012 at 3:48 AM, Mohit Anchlia mohitanch...@gmail.com wrote: What's the best way to write records to a different file? I am doing xml processing and during processing I might come accross invalid xml format. Current I have it under try catch block and writing to log4j. But I think it would be better to just write it to an output file that just contains errors. -- Harsh J -- Join me at http://hadoopworkshop.eventbrite.com/ -- Harsh J
Should splittable Gzip be a core hadoop feature?
Hi, Some time ago I had an idea and implemented it. Normally you can only run a single gzipped input file through a single mapper and thus only on a single CPU core. What I created makes it possible to process a Gzipped file in such a way that it can run on several mappers in parallel. I've put the javadoc I created on my homepage so you can read more about the details. http://howto.basjes.nl/hadoop/javadoc-for-skipseeksplittablegzipcodec Now the question that was raised by one of the people reviewing this code was: Should this implementation be part of the core Hadoop feature set? The main reason that was given is that this needs a bit more understanding on what is happening and as such cannot be enabled by default. I would like to hear from the Hadoop Core/Map reduce users what you think. Should this be - a part of the default Hadoop feature set so that anyone can simply enable it by setting the right configuration? - a separate library? - a nice idea I had fun building but that no one needs? - ... ? -- Best regards / Met vriendelijke groeten, Niels Basjes
Re: Spilled Records
Hello Dan, The fact that the spilled records are double as the output records means the map task produces more than one spill file, and these spill files are read, merged and written to a single file, thus each record is spilled twice. I can't infer anything from the numbers of the two tasks. Could you provide more info, such as what the application is doing? If you like, you can also try our tool Starfish to see what's going on behind. Thanks, Jie -- Starfish is an intelligent performance tuning tool for Hadoop. Homepage: www.cs.duke.edu/starfish/ Mailing list: http://groups.google.com/group/hadoop-starfish On Tue, Feb 28, 2012 at 8:25 AM, Daniel Baptista daniel.bapti...@performgroup.com wrote: Hi All, I am trying to improve the performance of my hadoop cluster and would like to get some feedback on a couple of numbers that I am seeing. Below is the output from a single task (1 of 16) that took 3 mins 40 Seconds FileSystemCounters FILE_BYTES_READ 214,653,748 HDFS_BYTES_READ 67,108,864 FILE_BYTES_WRITTEN 429,278,388 Map-Reduce Framework Combine output records 0 Map input records 2,221,478 Spilled Records 4,442,956 Map output bytes 210,196,148 Combine input records 0 Map output records 2,221,478 And another task in the same job (16 of 16) that took 7 minutes and 19 seconds FileSystemCounters FILE_BYTES_READ 199,003,192 HDFS_BYTES_READ 58,434,476 FILE_BYTES_WRITTEN 397,975,310 Map-Reduce Framework Combine output records 0 Map input records 2,086,789 Spilled Records 4,173,578 Map output bytes 194,813,958 Combine input records 0 Map output records 2,086,789 Can anybody determine anything from these figures? The first task is twice as quick as the second yet the input and output are comparable (certainly not double). In all of the tasks (in this and other jobs) the spilled records are always double the output records, this can't be 'normal'? Am I clutching at straws (it feels like I am). Thanks in advance, Dan.
RE: Spilled Records
Hi Jie, To be honest I don't think I understand enough of what our job is doing to be able to explain it. Thanks for the response though, I had figured that I was grasping at straws. I have looped at Starfish however all our jobs are submitted via Apache Pig so I don't know if it would be much good. Thanks again, Dan. -Original Message- From: Jie Li [mailto:ji...@cs.duke.edu] Sent: 28 February 2012 16:35 To: common-user@hadoop.apache.org Subject: Re: Spilled Records Hello Dan, The fact that the spilled records are double as the output records means the map task produces more than one spill file, and these spill files are read, merged and written to a single file, thus each record is spilled twice. I can't infer anything from the numbers of the two tasks. Could you provide more info, such as what the application is doing? If you like, you can also try our tool Starfish to see what's going on behind. Thanks, Jie -- Starfish is an intelligent performance tuning tool for Hadoop. Homepage: www.cs.duke.edu/starfish/ Mailing list: http://groups.google.com/group/hadoop-starfish On Tue, Feb 28, 2012 at 8:25 AM, Daniel Baptista daniel.bapti...@performgroup.com wrote: Hi All, I am trying to improve the performance of my hadoop cluster and would like to get some feedback on a couple of numbers that I am seeing. Below is the output from a single task (1 of 16) that took 3 mins 40 Seconds FileSystemCounters FILE_BYTES_READ 214,653,748 HDFS_BYTES_READ 67,108,864 FILE_BYTES_WRITTEN 429,278,388 Map-Reduce Framework Combine output records 0 Map input records 2,221,478 Spilled Records 4,442,956 Map output bytes 210,196,148 Combine input records 0 Map output records 2,221,478 And another task in the same job (16 of 16) that took 7 minutes and 19 seconds FileSystemCounters FILE_BYTES_READ 199,003,192 HDFS_BYTES_READ 58,434,476 FILE_BYTES_WRITTEN 397,975,310 Map-Reduce Framework Combine output records 0 Map input records 2,086,789 Spilled Records 4,173,578 Map output bytes 194,813,958 Combine input records 0 Map output records 2,086,789 Can anybody determine anything from these figures? The first task is twice as quick as the second yet the input and output are comparable (certainly not double). In all of the tasks (in this and other jobs) the spilled records are always double the output records, this can't be 'normal'? Am I clutching at straws (it feels like I am). Thanks in advance, Dan.
Hadoop and Hibernate
All, I am trying to use Hibernate within my reducer and it goeth not well. Has anybody ever successfully done this? I have a java package that contains my Hadoop driver, mapper, and reducer along with a persistence class. I call Hibernate from the cleanup() method in my reducer class. It complains that it cannot find the persistence class. The class is in the same package as the reducer and this all would work outside of Hadoop. The error is thrown when I attempt to begin a transaction. The error: org.hibernate.MappingException: Unknown entity: qq.mob.depart.EpiState The code: protected void cleanup(Context ctx) throws IOException, InterruptedException { ... org.hibernate.cfg.Configuration cfg = new org.hibernate.cfg.Configuration(); SessionFactory sessionFactory = cfg.configure(hibernate.cfg.xml).buildSessionFactory(); cfg.addAnnotatedClass(EpiState.class); // This class is in the same package as the reducer. Session session = sessionFactory.openSession(); Transaction tx = session.getTransaction(); tx.begin(); //Error is thrown here. ... } If I create an executable jar file that contains all dependencies required by the MR job do all said dependencies get distributed to all nodes? If I specify but one reducer, which node in the cluster will the reducer run on? Thanks -- Geoffry Roberts
Re: Spilled Records
Hi Dan, You might want to post your Pig script to the Pig user mailing list. Previously I did some experiments on Pig and Hive and I'll also be interested in looking into your script. Yeah Starfish now only supports Hadoop job-level tuning, and supporting workflow like Pig and Hive is our top priority. We'll let you know once we're ready. Thanks, Jie On Tue, Feb 28, 2012 at 11:57 AM, Daniel Baptista daniel.bapti...@performgroup.com wrote: Hi Jie, To be honest I don't think I understand enough of what our job is doing to be able to explain it. Thanks for the response though, I had figured that I was grasping at straws. I have looped at Starfish however all our jobs are submitted via Apache Pig so I don't know if it would be much good. Thanks again, Dan. -Original Message- From: Jie Li [mailto:ji...@cs.duke.edu] Sent: 28 February 2012 16:35 To: common-user@hadoop.apache.org Subject: Re: Spilled Records Hello Dan, The fact that the spilled records are double as the output records means the map task produces more than one spill file, and these spill files are read, merged and written to a single file, thus each record is spilled twice. I can't infer anything from the numbers of the two tasks. Could you provide more info, such as what the application is doing? If you like, you can also try our tool Starfish to see what's going on behind. Thanks, Jie -- Starfish is an intelligent performance tuning tool for Hadoop. Homepage: www.cs.duke.edu/starfish/ Mailing list: http://groups.google.com/group/hadoop-starfish On Tue, Feb 28, 2012 at 8:25 AM, Daniel Baptista daniel.bapti...@performgroup.com wrote: Hi All, I am trying to improve the performance of my hadoop cluster and would like to get some feedback on a couple of numbers that I am seeing. Below is the output from a single task (1 of 16) that took 3 mins 40 Seconds FileSystemCounters FILE_BYTES_READ 214,653,748 HDFS_BYTES_READ 67,108,864 FILE_BYTES_WRITTEN 429,278,388 Map-Reduce Framework Combine output records 0 Map input records 2,221,478 Spilled Records 4,442,956 Map output bytes 210,196,148 Combine input records 0 Map output records 2,221,478 And another task in the same job (16 of 16) that took 7 minutes and 19 seconds FileSystemCounters FILE_BYTES_READ 199,003,192 HDFS_BYTES_READ 58,434,476 FILE_BYTES_WRITTEN 397,975,310 Map-Reduce Framework Combine output records 0 Map input records 2,086,789 Spilled Records 4,173,578 Map output bytes 194,813,958 Combine input records 0 Map output records 2,086,789 Can anybody determine anything from these figures? The first task is twice as quick as the second yet the input and output are comparable (certainly not double). In all of the tasks (in this and other jobs) the spilled records are always double the output records, this can't be 'normal'? Am I clutching at straws (it feels like I am). Thanks in advance, Dan.
Re: Hadoop and Hibernate
On Tue, Feb 28, 2012 at 5:15 PM, Geoffry Roberts geoffry.robe...@gmail.com wrote: If I create an executable jar file that contains all dependencies required by the MR job do all said dependencies get distributed to all nodes? You can make a single jar and that will be distributed to all of the machines that run the task, but it is better in most cases to use the distributed cache. See http://hadoop.apache.org/common/docs/r1.0.0/mapred_tutorial.html#DistributedCache If I specify but one reducer, which node in the cluster will the reducer run on? The scheduling is done by the JobTracker and it isn't possible to control the location of the reducers. -- Owen
Re: Invocation exception
I commented reducer and combiner both and still I see the same exception. Could it be because I have 2 jars being added? On Mon, Feb 27, 2012 at 8:23 PM, Subir S subir.sasiku...@gmail.com wrote: On Tue, Feb 28, 2012 at 4:30 AM, Mohit Anchlia mohitanch...@gmail.com wrote: For some reason I am getting invocation exception and I don't see any more details other than this exception: My job is configured as: JobConf conf = *new* JobConf(FormMLProcessor.*class*); conf.addResource(hdfs-site.xml); conf.addResource(core-site.xml); conf.addResource(mapred-site.xml); conf.set(mapred.reduce.tasks, 0); conf.setJobName(mlprocessor); DistributedCache.*addFileToClassPath*(*new* Path(/jars/analytics.jar), conf); DistributedCache.*addFileToClassPath*(*new* Path(/jars/common.jar), conf); conf.setOutputKeyClass(Text.*class*); conf.setOutputValueClass(Text.*class*); conf.setMapperClass(Map.*class*); conf.setCombinerClass(Reduce.*class*); conf.setReducerClass(IdentityReducer.*class*); Why would you set the Reducer when the number of reducers is set to zero. Not sure if this is the real cause. conf.setInputFormat(SequenceFileAsTextInputFormat.*class*); conf.setOutputFormat(TextOutputFormat.*class*); FileInputFormat.*setInputPaths*(conf, *new* Path(args[0])); FileOutputFormat.*setOutputPath*(conf, *new* Path(args[1])); JobClient.*runJob*(conf); - * java.lang.RuntimeException*: Error in configuring object at org.apache.hadoop.util.ReflectionUtils.setJobConf(* ReflectionUtils.java:93*) at org.apache.hadoop.util.ReflectionUtils.setConf(*ReflectionUtils.java:64*) at org.apache.hadoop.util.ReflectionUtils.newInstance(* ReflectionUtils.java:117*) at org.apache.hadoop.mapred.MapTask.runOldMapper(*MapTask.java:387*) at org.apache.hadoop.mapred.MapTask.run(*MapTask.java:325*) at org.apache.hadoop.mapred.Child$4.run(*Child.java:270*) at java.security.AccessController.doPrivileged(*Native Method*) at javax.security.auth.Subject.doAs(*Subject.java:396*) at org.apache.hadoop.security.UserGroupInformation.doAs(* UserGroupInformation.java:1157*) at org.apache.hadoop.mapred.Child.main(*Child.java:264*) Caused by: *java.lang.reflect.InvocationTargetException * at sun.reflect.NativeMethodAccessorImpl.invoke0(*Native Method*) at sun.reflect.NativeMethodAccessorImpl.invoke(* NativeMethodAccessorImpl.java:39*) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.jav
Re: Invocation exception
It looks like adding this line causes invocation exception. I looked in hdfs and I see that file in that path DistributedCache.*addFileToClassPath*(*new* Path(/jars/common.jar), conf); I have similar code for another jar DistributedCache.*addFileToClassPath*(*new* Path(/jars/analytics.jar), conf); but this works just fine. On Tue, Feb 28, 2012 at 11:44 AM, Mohit Anchlia mohitanch...@gmail.comwrote: I commented reducer and combiner both and still I see the same exception. Could it be because I have 2 jars being added? On Mon, Feb 27, 2012 at 8:23 PM, Subir S subir.sasiku...@gmail.comwrote: On Tue, Feb 28, 2012 at 4:30 AM, Mohit Anchlia mohitanch...@gmail.com wrote: For some reason I am getting invocation exception and I don't see any more details other than this exception: My job is configured as: JobConf conf = *new* JobConf(FormMLProcessor.*class*); conf.addResource(hdfs-site.xml); conf.addResource(core-site.xml); conf.addResource(mapred-site.xml); conf.set(mapred.reduce.tasks, 0); conf.setJobName(mlprocessor); DistributedCache.*addFileToClassPath*(*new* Path(/jars/analytics.jar), conf); DistributedCache.*addFileToClassPath*(*new* Path(/jars/common.jar), conf); conf.setOutputKeyClass(Text.*class*); conf.setOutputValueClass(Text.*class*); conf.setMapperClass(Map.*class*); conf.setCombinerClass(Reduce.*class*); conf.setReducerClass(IdentityReducer.*class*); Why would you set the Reducer when the number of reducers is set to zero. Not sure if this is the real cause. conf.setInputFormat(SequenceFileAsTextInputFormat.*class*); conf.setOutputFormat(TextOutputFormat.*class*); FileInputFormat.*setInputPaths*(conf, *new* Path(args[0])); FileOutputFormat.*setOutputPath*(conf, *new* Path(args[1])); JobClient.*runJob*(conf); - * java.lang.RuntimeException*: Error in configuring object at org.apache.hadoop.util.ReflectionUtils.setJobConf(* ReflectionUtils.java:93*) at org.apache.hadoop.util.ReflectionUtils.setConf(*ReflectionUtils.java:64*) at org.apache.hadoop.util.ReflectionUtils.newInstance(* ReflectionUtils.java:117*) at org.apache.hadoop.mapred.MapTask.runOldMapper(*MapTask.java:387*) at org.apache.hadoop.mapred.MapTask.run(*MapTask.java:325*) at org.apache.hadoop.mapred.Child$4.run(*Child.java:270*) at java.security.AccessController.doPrivileged(*Native Method*) at javax.security.auth.Subject.doAs(*Subject.java:396*) at org.apache.hadoop.security.UserGroupInformation.doAs(* UserGroupInformation.java:1157*) at org.apache.hadoop.mapred.Child.main(*Child.java:264*) Caused by: *java.lang.reflect.InvocationTargetException * at sun.reflect.NativeMethodAccessorImpl.invoke0(*Native Method*) at sun.reflect.NativeMethodAccessorImpl.invoke(* NativeMethodAccessorImpl.java:39*) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.jav
Re: How to modify hadoop-wordcount example to display File-wise results.
Hi Srilathar, I know this thread is quite old but I need your help with this. I'm interested in also making some modifications to the hadoop Sort example. Please could you give me pointers on how to rebuild hadoop to reflect the changes made in the source. I'm new to hadoop and would really appreciate your assistance. us latha wrote: Greetings! Hi, Am trying to modify the WordCount.java mentioned at Example: WordCount v1.0http://hadoop.apache.org/core/docs/current/mapred_tutorial.html#Example%3A+WordCount+v1.0at http://hadoop.apache.org/core/docs/current/mapred_tutorial.html Would like to have output the following way, FileOneword1 itsCount FileOneword2 itsCount ..(and so on) FileTwoword1 itsCount FileTwowordx its Count .. FileThree word1 its Count .. Am trying to do following changes to the code of WordCount.java 1) private Text filename = new Text(); // Added this to Map class .Not sure if I would have access to filename here. 2) (line 18)OutputCollectorText, Text, IntWritable output // Changed the argument in the map() function to have another Text field. 3) (line 23) output.collect(filename, word , one); // Trying to change the output format as 'filename word count' Am not sure what other changes are to be affected to achieve the required output. filename is not available to the map method. My requirement is to go through all the data available in hdfs and prepare an index file with filename word count format. Could you please throw light on how I can achieve this. Thankyou Srilatha -- View this message in context: http://old.nabble.com/How-to-modify-hadoop-wordcount-example-to-display-File-wise-results.-tp19826857p33410747.html Sent from the Hadoop core-user mailing list archive at Nabble.com.
100x slower mapreduce compared to pig
I am comparing runtime of similar logic. The entire logic is exactly same but surprisingly map reduce job that I submit is 100x slow. For pig I use udf and for hadoop I use mapper only and the logic same as pig. Even the splits on the admin page are same. Not sure why it's so slow. I am submitting job like: java -classpath .:analytics.jar:/hadoop-0.20.2-cdh3u3/lib/*:/root/.mohit/hadoop-0.20.2-cdh3u3/*:common.jar com.services.dp.analytics.hadoop.mapred.FormMLProcessor /examples/testfile40.seq,/examples/testfile41.seq,/examples/testfile42.seq,/examples/testfile43.seq,/examples/testfile44.seq,/examples/testfile45.seq,/examples/testfile46.seq,/examples/testfile47.seq,/examples/testfile48.seq,/examples/testfile49.seq /examples/output1/ How should I go about looking the root cause of why it's so slow? Any suggestions would be really appreciated. One of the things I noticed is that on the admin page of map task list I see status as hdfs://dsdb1:54310/examples/testfile40.seq:0+134217728 but for pig the status is blank.
Re: 100x slower mapreduce compared to pig
It would be great if we can take a look at what you are doing in the UDF vs the Mapper. 100x slow does not make sense for the same job/logic, its either the Mapper code or may be the cluster was busy at the time you scheduled MapReduce job? Thanks, Prashant On Tue, Feb 28, 2012 at 4:11 PM, Mohit Anchlia mohitanch...@gmail.comwrote: I am comparing runtime of similar logic. The entire logic is exactly same but surprisingly map reduce job that I submit is 100x slow. For pig I use udf and for hadoop I use mapper only and the logic same as pig. Even the splits on the admin page are same. Not sure why it's so slow. I am submitting job like: java -classpath .:analytics.jar:/hadoop-0.20.2-cdh3u3/lib/*:/root/.mohit/hadoop-0.20.2-cdh3u3/*:common.jar com.services.dp.analytics.hadoop.mapred.FormMLProcessor /examples/testfile40.seq,/examples/testfile41.seq,/examples/testfile42.seq,/examples/testfile43.seq,/examples/testfile44.seq,/examples/testfile45.seq,/examples/testfile46.seq,/examples/testfile47.seq,/examples/testfile48.seq,/examples/testfile49.seq /examples/output1/ How should I go about looking the root cause of why it's so slow? Any suggestions would be really appreciated. One of the things I noticed is that on the admin page of map task list I see status as hdfs://dsdb1:54310/examples/testfile40.seq:0+134217728 but for pig the status is blank.
[blog post] Accumulo, Nutch, and Gora
Blog post for anyone who's interested. I cover a basic howto for getting Nutch to use Apache Gora to store web crawl data in Accumulo. Let me know if you have any questions. Accumulo, Nutch, and GORA http://www.covert.io/post/18414889381/accumulo-nutch-and-gora --Jason
toward Rack-Awareness approach
Hi Hadoopers, Currently I am running hadoop version 0.20.203 in production with 600 TB in her. I am planning to enable rack awareness in my production, but I still didn't see it through. plan/questions. 1. I have script that can solve datanode/tasktracker IP to rack name. 2. Add topology.script.file.name in hdfs-site.xml and restart cluster. 3. After the cluster come back, my question start here, - do i have to run balancer or fsck or some command to have those 600 TB become redistribute to different rack in one time ? - currently i run balancer 2 hrs. everyday, can i keep this routine and hope that at some point the data will be nicely redistributed and aware of rack location ? - how could we know that the data in the cluster is now fully rack awareness ?? - if i just add the script and run balancer 2 hrs everyday, before the whole data become rack awareness. the data will be kind of mix between default-rack of existing data (haven't get balanced) and probably new loaded data will be rack-awareness. is it OK ? to have mix of default-rack and rack-specific data together ? 4. thought ? Hope this make sense, Thanks in advance Patai
Re: [blog post] Accumulo, Nutch, and Gora
UMMM wow! That's awesome Jason! Thanks so much! Cheers, Chris On Feb 28, 2012, at 5:41 PM, Jason Trost wrote: Blog post for anyone who's interested. I cover a basic howto for getting Nutch to use Apache Gora to store web crawl data in Accumulo. Let me know if you have any questions. Accumulo, Nutch, and GORA http://www.covert.io/post/18414889381/accumulo-nutch-and-gora --Jason ++ Chris Mattmann, Ph.D. Senior Computer Scientist NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 171-266B, Mailstop: 171-246 Email: chris.a.mattm...@nasa.gov WWW: http://sunset.usc.edu/~mattmann/ ++ Adjunct Assistant Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA ++
Re: [blog post] Accumulo, Nutch, and Gora
Fabulous work! There are obviously a lot of local modifications to be done for nutch + gora + accumulo to work. So feel free to propose these to upstream nutch and gora. It should feel good to run the web crawl, and store the results on accumulo. Cheers, Enis On Tue, Feb 28, 2012 at 6:24 PM, Mattmann, Chris A (388J) chris.a.mattm...@jpl.nasa.gov wrote: UMMM wow! That's awesome Jason! Thanks so much! Cheers, Chris On Feb 28, 2012, at 5:41 PM, Jason Trost wrote: Blog post for anyone who's interested. I cover a basic howto for getting Nutch to use Apache Gora to store web crawl data in Accumulo. Let me know if you have any questions. Accumulo, Nutch, and GORA http://www.covert.io/post/18414889381/accumulo-nutch-and-gora --Jason ++ Chris Mattmann, Ph.D. Senior Computer Scientist NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 171-266B, Mailstop: 171-246 Email: chris.a.mattm...@nasa.gov WWW: http://sunset.usc.edu/~mattmann/ ++ Adjunct Assistant Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA ++
Re: Invocation exception
Mohit, If you visit the failed task attempt on the JT Web UI, you can see the complete, informative stack trace on it. It would point the exact line the trouble came up in and what the real error during the configure-phase of task initialization was. A simple attempts page goes like the following (replace job ID and task ID of course): http://host:50030/taskdetails.jsp?jobid=job_201202041249_3964tipid=task_201202041249_3964_m_00 Once there, find and open the All logs link to see stdout, stderr, and syslog of the specific failed task attempt. You'll have more info sifting through this to debug your issue. This is also explained in Tom's book under the title Debugging a Job (p154, Hadoop: The Definitive Guide, 2nd ed.). On Wed, Feb 29, 2012 at 1:40 AM, Mohit Anchlia mohitanch...@gmail.com wrote: It looks like adding this line causes invocation exception. I looked in hdfs and I see that file in that path DistributedCache.*addFileToClassPath*(*new* Path(/jars/common.jar), conf); I have similar code for another jar DistributedCache.*addFileToClassPath*(*new* Path(/jars/analytics.jar), conf); but this works just fine. On Tue, Feb 28, 2012 at 11:44 AM, Mohit Anchlia mohitanch...@gmail.comwrote: I commented reducer and combiner both and still I see the same exception. Could it be because I have 2 jars being added? On Mon, Feb 27, 2012 at 8:23 PM, Subir S subir.sasiku...@gmail.comwrote: On Tue, Feb 28, 2012 at 4:30 AM, Mohit Anchlia mohitanch...@gmail.com wrote: For some reason I am getting invocation exception and I don't see any more details other than this exception: My job is configured as: JobConf conf = *new* JobConf(FormMLProcessor.*class*); conf.addResource(hdfs-site.xml); conf.addResource(core-site.xml); conf.addResource(mapred-site.xml); conf.set(mapred.reduce.tasks, 0); conf.setJobName(mlprocessor); DistributedCache.*addFileToClassPath*(*new* Path(/jars/analytics.jar), conf); DistributedCache.*addFileToClassPath*(*new* Path(/jars/common.jar), conf); conf.setOutputKeyClass(Text.*class*); conf.setOutputValueClass(Text.*class*); conf.setMapperClass(Map.*class*); conf.setCombinerClass(Reduce.*class*); conf.setReducerClass(IdentityReducer.*class*); Why would you set the Reducer when the number of reducers is set to zero. Not sure if this is the real cause. conf.setInputFormat(SequenceFileAsTextInputFormat.*class*); conf.setOutputFormat(TextOutputFormat.*class*); FileInputFormat.*setInputPaths*(conf, *new* Path(args[0])); FileOutputFormat.*setOutputPath*(conf, *new* Path(args[1])); JobClient.*runJob*(conf); - * java.lang.RuntimeException*: Error in configuring object at org.apache.hadoop.util.ReflectionUtils.setJobConf(* ReflectionUtils.java:93*) at org.apache.hadoop.util.ReflectionUtils.setConf(*ReflectionUtils.java:64*) at org.apache.hadoop.util.ReflectionUtils.newInstance(* ReflectionUtils.java:117*) at org.apache.hadoop.mapred.MapTask.runOldMapper(*MapTask.java:387*) at org.apache.hadoop.mapred.MapTask.run(*MapTask.java:325*) at org.apache.hadoop.mapred.Child$4.run(*Child.java:270*) at java.security.AccessController.doPrivileged(*Native Method*) at javax.security.auth.Subject.doAs(*Subject.java:396*) at org.apache.hadoop.security.UserGroupInformation.doAs(* UserGroupInformation.java:1157*) at org.apache.hadoop.mapred.Child.main(*Child.java:264*) Caused by: *java.lang.reflect.InvocationTargetException * at sun.reflect.NativeMethodAccessorImpl.invoke0(*Native Method*) at sun.reflect.NativeMethodAccessorImpl.invoke(* NativeMethodAccessorImpl.java:39*) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.jav -- Harsh J
Re: Browse the filesystem weblink broken after upgrade to 1.0.0: HTTP 404 Problem accessing /browseDirectory.jsp
Hi, Just make sure that Datanode is up. Looking into the datanode logs. On Sun, Feb 19, 2012 at 10:52 PM, W.P. McNeill bill...@gmail.com wrote: I am running in pseudo-distributed on my Mac and just upgraded from 0.20.203.0 to 1.0.0. The web interface for HDFS which was working in 0.20.203.0 is broken in 1.0.0. HDFS itself appears to work: a command line like hadoop fs -ls / returns a result, and the namenode web interface at http:// http://localhost:50070/dfshealth.jsp comes up. However, when I click on the Browse the filesystem link on this page I get a 404 Error. The error message displayed in the browser reads: Problem accessing /browseDirectory.jsp. Reason: /browseDirectory.jsp The URL in the browser bar at this point is http://0.0.0.0:50070/browseDirectory.jsp?namenodeInfoPort=50070dir=/;. The HTML source to the link on the main namenode page is a href=/nn_browsedfscontent.jspBrowse the filesystem/a. If I change the server location from 0.0.0.0 to localhost in my browser bar I get the same error. I updated my configuration files in the new hadoop 1.0.0 conf directory to transfer over my settings from 0.20.203.0. My conf/slaves file consists of the line localhost. I ran hadoop-daemon.sh start namenode -upgrade once when prompted my errors in the namenode logs. After that all the namenode and datanode logs contain no errors. For what it's worth, I've verified that the bug occurs on Firefox, Chrome, and Safari. Any ideas on what is wrong or how I should go about further debugging it? -- Join me at http://hadoopworkshop.eventbrite.com/
Re: Invocation exception
Sorry I missed this email. Harsh answer is apt. Please see the error log from Job Tracker web ui for failed tasks (mapper/reducer) to know the exact reason. On Tue, Feb 28, 2012 at 10:23 AM, Mohit Anchlia mohitanch...@gmail.comwrote: Does it matter if reducer is set even if the no of reducers is 0? Is there a way to get more clear reason? On Mon, Feb 27, 2012 at 8:23 PM, Subir S subir.sasiku...@gmail.com wrote: On Tue, Feb 28, 2012 at 4:30 AM, Mohit Anchlia mohitanch...@gmail.com wrote: For some reason I am getting invocation exception and I don't see any more details other than this exception: My job is configured as: JobConf conf = *new* JobConf(FormMLProcessor.*class*); conf.addResource(hdfs-site.xml); conf.addResource(core-site.xml); conf.addResource(mapred-site.xml); conf.set(mapred.reduce.tasks, 0); conf.setJobName(mlprocessor); DistributedCache.*addFileToClassPath*(*new* Path(/jars/analytics.jar), conf); DistributedCache.*addFileToClassPath*(*new* Path(/jars/common.jar), conf); conf.setOutputKeyClass(Text.*class*); conf.setOutputValueClass(Text.*class*); conf.setMapperClass(Map.*class*); conf.setCombinerClass(Reduce.*class*); conf.setReducerClass(IdentityReducer.*class*); Why would you set the Reducer when the number of reducers is set to zero. Not sure if this is the real cause. conf.setInputFormat(SequenceFileAsTextInputFormat.*class*); conf.setOutputFormat(TextOutputFormat.*class*); FileInputFormat.*setInputPaths*(conf, *new* Path(args[0])); FileOutputFormat.*setOutputPath*(conf, *new* Path(args[1])); JobClient.*runJob*(conf); - * java.lang.RuntimeException*: Error in configuring object at org.apache.hadoop.util.ReflectionUtils.setJobConf(* ReflectionUtils.java:93*) at org.apache.hadoop.util.ReflectionUtils.setConf(*ReflectionUtils.java:64*) at org.apache.hadoop.util.ReflectionUtils.newInstance(* ReflectionUtils.java:117*) at org.apache.hadoop.mapred.MapTask.runOldMapper(*MapTask.java:387*) at org.apache.hadoop.mapred.MapTask.run(*MapTask.java:325*) at org.apache.hadoop.mapred.Child$4.run(*Child.java:270*) at java.security.AccessController.doPrivileged(*Native Method*) at javax.security.auth.Subject.doAs(*Subject.java:396*) at org.apache.hadoop.security.UserGroupInformation.doAs(* UserGroupInformation.java:1157*) at org.apache.hadoop.mapred.Child.main(*Child.java:264*) Caused by: *java.lang.reflect.InvocationTargetException * at sun.reflect.NativeMethodAccessorImpl.invoke0(*Native Method*) at sun.reflect.NativeMethodAccessorImpl.invoke(* NativeMethodAccessorImpl.java:39*) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.jav
Re: namenode null pointer
Hi, This may be the issue with namenode is not correctly formatted. On Sat, Feb 18, 2012 at 1:50 PM, Ben Cuthbert bencuthb...@ymail.com wrote: All sometimes when I startup my hadoop I get the following error 12/02/17 10:29:56 INFO namenode.NameNode: STARTUP_MSG: / STARTUP_MSG: Starting NameNode STARTUP_MSG: host =iMac.local/192.168.0.191 STARTUP_MSG: args = [] STARTUP_MSG: version = 0.20.203.0 STARTUP_MSG: build = http://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20-security-203-r 1099333; compiled by 'oom' on Wed May 4 07:57:50 PDT 2011 / 12/02/17 10:29:56 WARN impl.MetricsSystemImpl: Metrics system not started: Cannot locate configuration: tried hadoop-metrics2-namenode.properties, hadoop-metrics2.properties 2012-02-17 10:29:56.994 java[4065:1903] Unable to load realm info from SCDynamicStore 12/02/17 10:29:57 INFO util.GSet: VM type = 64-bit 12/02/17 10:29:57 INFO util.GSet: 2% max memory = 17.77875 MB 12/02/17 10:29:57 INFO util.GSet: capacity = 2^21 = 2097152 entries 12/02/17 10:29:57 INFO util.GSet: recommended=2097152, actual=2097152 12/02/17 10:29:57 INFO namenode.FSNamesystem: fsOwner=scottsue 12/02/17 10:29:57 INFO namenode.FSNamesystem: supergroup=supergroup 12/02/17 10:29:57 INFO namenode.FSNamesystem: isPermissionEnabled=true 12/02/17 10:29:57 INFO namenode.FSNamesystem: dfs.block.invalidate.limit=100 12/02/17 10:29:57 INFO namenode.FSNamesystem: isAccessTokenEnabled=false accessKeyUpdateInterval=0 min(s), accessTokenLifetime=0 min(s) 12/02/17 10:29:57 INFO namenode.FSNamesystem: Registered FSNamesystemStateMBean and NameNodeMXBean 12/02/17 10:29:57 INFO namenode.NameNode: Caching file names occuring more than 10 times 12/02/17 10:29:57 INFO common.Storage: Number of files = 190 12/02/17 10:29:57 INFO common.Storage: Number of files under construction = 0 12/02/17 10:29:57 INFO common.Storage: Image file of size 26377 loaded in 0 seconds. 12/02/17 10:29:57 ERROR namenode.NameNode: java.lang.NullPointerException at org.apache.hadoop.hdfs.server.namenode.FSDirectory.addChild(FSDirectory.java:1113) at org.apache.hadoop.hdfs.server.namenode.FSDirectory.addChild(FSDirectory.java:1125) at org.apache.hadoop.hdfs.server.namenode.FSDirectory.addNode(FSDirectory.java:1028) at org.apache.hadoop.hdfs.server.namenode.FSDirectory.unprotectedAddFile(FSDirectory.java:205) at org.apache.hadoop.hdfs.server.namenode.FSEditLog.loadFSEdits(FSEditLog.java:613) at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSEdits(FSImage.java:1009) at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:827) at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:365) at org.apache.hadoop.hdfs.server.namenode.FSDirectory.loadFSImage(FSDirectory.java:97) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initialize(FSNamesystem.java:379) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.init(FSNamesystem.java:353) at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:254) at org.apache.hadoop.hdfs.server.namenode.NameNode.init(NameNode.java:434) at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1153) at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1162) 12/02/17 10:29:57 INFO namenode.NameNode: SHUTDOWN_MSG: / SHUTDOWN_MSG: Shutting down NameNode at iMac.local/192.168.0.191 / -- Join me at http://hadoopworkshop.eventbrite.com/
Re: What determines task attempt list URLs?
Hi, Its better to use the hos tnames rather than the ipaddress. If you use hostnames , task_attempt URL will contain the hostname rather than localhost . On Fri, Feb 17, 2012 at 10:52 PM, Keith Wiley kwi...@keithwiley.com wrote: What property or setup parameter determines the URLs displayed on the task attempts webpage of the job/task trackers? My cluster seems to be configured such that all URLs for higher pages (the top cluster admin page, the individual job overview page, and the map/reduce task list page) show URLs by ip address, but the lowest page (the task attempt list for a single task) shows the URLs for the Machine and Task Logs columns by localhost, not by ip address (although the Counters column still uses the ip address just like URLs on all the higher pages). The localhost links obviously don't work (the cluster is not on the local machine, it's on Tier 3)...unless I just happen to have a cluster also running on my local machine; then the links work but obviously they go to my local machine and thus describe a completely unrelated Hadoop cluster!!! It goes without saying, that's ridiculous. So to get it to work, I have to manually copy/paste the ip address into the URLs every time I want to view those pages...which makes it incredibly tedious to view the task logs. I've asked this a few times now and have gotten no response. Does no one have any idea how to properly configure Hadoop to get around this? I've experimented with the mapred-site.xml mapred.job.tracker and mapred.task.tracker.http.address properties to no avail. What's going on here? Desperate Keith Wiley kwi...@keithwiley.com keithwiley.com music.keithwiley.com I used to be with it, but then they changed what it was. Now, what I'm with isn't it, and what's it seems weird and scary to me. -- Abe (Grandpa) Simpson -- Join me at http://hadoopworkshop.eventbrite.com/
Re: namenode null pointer
So the filesystem has corrupted? Regards Ben On 29 Feb 2012, at 05:51, madhu phatak wrote: Hi, This may be the issue with namenode is not correctly formatted. On Sat, Feb 18, 2012 at 1:50 PM, Ben Cuthbert bencuthb...@ymail.com wrote: All sometimes when I startup my hadoop I get the following error 12/02/17 10:29:56 INFO namenode.NameNode: STARTUP_MSG: / STARTUP_MSG: Starting NameNode STARTUP_MSG: host =iMac.local/192.168.0.191 STARTUP_MSG: args = [] STARTUP_MSG: version = 0.20.203.0 STARTUP_MSG: build = http://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20-security-203-r 1099333; compiled by 'oom' on Wed May 4 07:57:50 PDT 2011 / 12/02/17 10:29:56 WARN impl.MetricsSystemImpl: Metrics system not started: Cannot locate configuration: tried hadoop-metrics2-namenode.properties, hadoop-metrics2.properties 2012-02-17 10:29:56.994 java[4065:1903] Unable to load realm info from SCDynamicStore 12/02/17 10:29:57 INFO util.GSet: VM type = 64-bit 12/02/17 10:29:57 INFO util.GSet: 2% max memory = 17.77875 MB 12/02/17 10:29:57 INFO util.GSet: capacity = 2^21 = 2097152 entries 12/02/17 10:29:57 INFO util.GSet: recommended=2097152, actual=2097152 12/02/17 10:29:57 INFO namenode.FSNamesystem: fsOwner=scottsue 12/02/17 10:29:57 INFO namenode.FSNamesystem: supergroup=supergroup 12/02/17 10:29:57 INFO namenode.FSNamesystem: isPermissionEnabled=true 12/02/17 10:29:57 INFO namenode.FSNamesystem: dfs.block.invalidate.limit=100 12/02/17 10:29:57 INFO namenode.FSNamesystem: isAccessTokenEnabled=false accessKeyUpdateInterval=0 min(s), accessTokenLifetime=0 min(s) 12/02/17 10:29:57 INFO namenode.FSNamesystem: Registered FSNamesystemStateMBean and NameNodeMXBean 12/02/17 10:29:57 INFO namenode.NameNode: Caching file names occuring more than 10 times 12/02/17 10:29:57 INFO common.Storage: Number of files = 190 12/02/17 10:29:57 INFO common.Storage: Number of files under construction = 0 12/02/17 10:29:57 INFO common.Storage: Image file of size 26377 loaded in 0 seconds. 12/02/17 10:29:57 ERROR namenode.NameNode: java.lang.NullPointerException at org.apache.hadoop.hdfs.server.namenode.FSDirectory.addChild(FSDirectory.java:1113) at org.apache.hadoop.hdfs.server.namenode.FSDirectory.addChild(FSDirectory.java:1125) at org.apache.hadoop.hdfs.server.namenode.FSDirectory.addNode(FSDirectory.java:1028) at org.apache.hadoop.hdfs.server.namenode.FSDirectory.unprotectedAddFile(FSDirectory.java:205) at org.apache.hadoop.hdfs.server.namenode.FSEditLog.loadFSEdits(FSEditLog.java:613) at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSEdits(FSImage.java:1009) at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:827) at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:365) at org.apache.hadoop.hdfs.server.namenode.FSDirectory.loadFSImage(FSDirectory.java:97) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initialize(FSNamesystem.java:379) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.init(FSNamesystem.java:353) at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:254) at org.apache.hadoop.hdfs.server.namenode.NameNode.init(NameNode.java:434) at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1153) at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1162) 12/02/17 10:29:57 INFO namenode.NameNode: SHUTDOWN_MSG: / SHUTDOWN_MSG: Shutting down NameNode at iMac.local/192.168.0.191 / -- Join me at http://hadoopworkshop.eventbrite.com/