RE: How to configure nodemanager.health-checker.script.path
Hi Health script should execute successfully. If your health check required to fail, than add ERROR that print in console. This is because health script may fail because of Syntax error, Command not found(IOexception) or several other reasons. In order to work health script, Do not add exit -1. #!/bin/bash echo ERROR disk full Thanks Regards Rohith Sharma K S From: Anfernee Xu [mailto:anfernee...@gmail.com] Sent: 19 March 2014 10:32 To: user Subject: How to configure nodemanager.health-checker.script.path Hello, I'm running MR with 2.2.0 release, I noticed we can configure nodemanager.health-checker.script.path in yarn-site.xml to customize NM health checking, so I add below properties to yarn-site.xml property nameyarn.nodemanager.health-checker.script.path/name value/scratch/software/hadoop2/hadoop-dc/node_health.sh/value /property property nameyarn.nodemanager.health-checker.interval-ms/name value1/value /property To get a feel about this, the /scratch/software/hadoop2/hadoop-dc/node_health.sh simply print an ERROR message as below #!/bin/bash echo ERROR disk full exit -1 But it seems not working, the node is still in health state, did I missing something? Thanks for your help. -- --Anfernee
RE: Benchmark Failure
Seems to be this is issue, which is logged..Please check following jira for sameHope you also facing same issue... https://issues.apache.org/jira/browse/HDFS-4929 Thanks Regards Brahma Reddy Battula From: Lixiang Ao [aolixi...@gmail.com] Sent: Tuesday, March 18, 2014 10:34 AM To: user@hadoop.apache.org Subject: Re: Benchmark Failure the version is release 2.2.0 2014年3月18日 上午12:26于 Lixiang Ao aolixi...@gmail.commailto:aolixi...@gmail.com写道: Hi all, I'm running jobclient tests(on single node), other tests like TestDFSIO, mrbench succeed except nnbench. I got a lot of Exceptions but without any explanation(see below). Could anyone tell me what might went wrong? Thanks! 14/03/17 23:54:22 INFO hdfs.NNBench: Waiting in barrier for: 112819 ms 14/03/17 23:54:23 INFO mapreduce.Job: Job job_local2133868569_0001 running in uber mode : false 14/03/17 23:54:23 INFO mapreduce.Job: map 0% reduce 0% 14/03/17 23:54:28 INFO mapred.LocalJobRunner: hdfs://0.0.0.0:9000/benchmarks/NNBench-aolx-PC/control/NNBench_Controlfile_10:0+125http://0.0.0.0:9000/benchmarks/NNBench-aolx-PC/control/NNBench_Controlfile_10:0+125 map 14/03/17 23:54:29 INFO mapreduce.Job: map 6% reduce 0% 14/03/17 23:56:15 INFO hdfs.NNBench: Exception recorded in op: Create/Write/Close 14/03/17 23:56:15 INFO hdfs.NNBench: Exception recorded in op: Create/Write/Close 14/03/17 23:56:15 INFO hdfs.NNBench: Exception recorded in op: Create/Write/Close 14/03/17 23:56:15 INFO hdfs.NNBench: Exception recorded in op: Create/Write/Close 14/03/17 23:56:15 INFO hdfs.NNBench: Exception recorded in op: Create/Write/Close 14/03/17 23:56:15 INFO hdfs.NNBench: Exception recorded in op: Create/Write/Close 14/03/17 23:56:15 INFO hdfs.NNBench: Exception recorded in op: Create/Write/Close 14/03/17 23:56:15 INFO hdfs.NNBench: Exception recorded in op: Create/Write/Close (1000 Exceptions) . . . results: File System Counters FILE: Number of bytes read=18769411 FILE: Number of bytes written=21398315 FILE: Number of read operations=0 FILE: Number of large read operations=0 FILE: Number of write operations=0 HDFS: Number of bytes read=11185 HDFS: Number of bytes written=19540 HDFS: Number of read operations=325 HDFS: Number of large read operations=0 HDFS: Number of write operations=13210 Map-Reduce Framework Map input records=12 Map output records=95 Map output bytes=1829 Map output materialized bytes=2091 Input split bytes=1538 Combine input records=0 Combine output records=0 Reduce input groups=8 Reduce shuffle bytes=0 Reduce input records=95 Reduce output records=8 Spilled Records=214 Shuffled Maps =0 Failed Shuffles=0 Merged Map outputs=0 GC time elapsed (ms)=211 CPU time spent (ms)=0 Physical memory (bytes) snapshot=0 Virtual memory (bytes) snapshot=0 Total committed heap usage (bytes)=4401004544 File Input Format Counters Bytes Read=1490 File Output Format Counters Bytes Written=170 14/03/17 23:56:18 INFO hdfs.NNBench: -- NNBench -- : 14/03/17 23:56:18 INFO hdfs.NNBench:Version: NameNode Benchmark 0.4 14/03/17 23:56:18 INFO hdfs.NNBench:Date time: 2014-03-17 23:56:18,619 14/03/17 23:56:18 INFO hdfs.NNBench: 14/03/17 23:56:18 INFO hdfs.NNBench: Test Operation: create_write 14/03/17 23:56:18 INFO hdfs.NNBench: Start time: 2014-03-17 23:56:15,521 14/03/17 23:56:18 INFO hdfs.NNBench:Maps to run: 12 14/03/17 23:56:18 INFO hdfs.NNBench: Reduces to run: 6 14/03/17 23:56:18 INFO hdfs.NNBench: Block Size (bytes): 1 14/03/17 23:56:18 INFO hdfs.NNBench: Bytes to write: 0 14/03/17 23:56:18 INFO hdfs.NNBench: Bytes per checksum: 1 14/03/17 23:56:18 INFO hdfs.NNBench:Number of files: 1000 14/03/17 23:56:18 INFO hdfs.NNBench: Replication factor: 3 14/03/17 23:56:18 INFO hdfs.NNBench: Successful file operations: 0 14/03/17 23:56:18 INFO hdfs.NNBench: 14/03/17 23:56:18 INFO hdfs.NNBench: # maps that missed the barrier: 11 14/03/17 23:56:18 INFO hdfs.NNBench: # exceptions: 1000 14/03/17 23:56:18 INFO hdfs.NNBench: 14/03/17 23:56:18 INFO hdfs.NNBench:TPS: Create/Write/Close: 0 14/03/17 23:56:18 INFO hdfs.NNBench: Avg exec time (ms): Create/Write/Close: Infinity 14/03/17 23:56:18 INFO hdfs.NNBench: Avg Lat (ms): Create/Write: NaN 14/03/17 23:56:18 INFO hdfs.NNBench:Avg Lat (ms): Close: NaN 14/03/17 23:56:18 INFO hdfs.NNBench: 14/03/17 23:56:18 INFO hdfs.NNBench: RAW DATA: AL Total #1: 0 14/03/17 23:56:18 INFO hdfs.NNBench: RAW DATA: AL Total #2: 0 14/03/17 23:56:18 INFO hdfs.NNBench: RAW DATA: TPS Total (ms): 1131 14/03/17 23:56:18 INFO hdfs.NNBench:RAW DATA: Longest Map
NodeHealthReport local-dirs turned bad
Hi I have one node in unhealthy status: Total Vmem allocated for Containers 4.20 GB Vmem enforcement enabledfalse Total Pmem allocated for Container 2 GB Pmem enforcement enabledfalse NodeHealthyStatus false LastNodeHealthTime Wed Mar 19 13:31:24 EET 2014 NodeHealthReport 1/1 local-dirs turned bad: /hadoop/yarn/local;1/1 log-dirs turned bad: /hadoop/yarn/log Node Manager Version: 2.2.0.2.0.6.0-101 from b07b2906c36defd389c8b5bd22bebc1bead8115b by jenkins source checksum 82bd166aa0ada92b44f8a154836b92 on 2014-01-09T05:24Z Hadoop Version: 2.2.0.2.0.6.0-101 from b07b2906c36defd389c8b5bd22bebc1bead8115b by jenkins source checksum 704f1e463ebc4fb89353011407e965 on 2014-01-09T05:18Z I tried: Deleted /hadoop/* and did namenode -format again Restarted nodemanager but still in unhealthy mode. Is there any guideline what I should do? -- Best regards, Margus (Margusja) Roo +372 51 48 780 http://margus.roo.ee http://ee.linkedin.com/in/margusroo skype: margusja ldapsearch -x -h ldap.sk.ee -b c=EE (serialNumber=37303140314)
RE: NodeHealthReport local-dirs turned bad
Hi There is no relation to NameNode format Does NodeManger is started with default configuration? If no , any NodeManger health script is configured? Suspect can be 1. /hadoop does not have permission or 2. disk is full Thanks Regards Rohith Sharma K S -Original Message- From: Margusja [mailto:mar...@roo.ee] Sent: 19 March 2014 17:04 To: user@hadoop.apache.org Subject: NodeHealthReport local-dirs turned bad Hi I have one node in unhealthy status: Total Vmem allocated for Containers 4.20 GB Vmem enforcement enabledfalse Total Pmem allocated for Container 2 GB Pmem enforcement enabledfalse NodeHealthyStatus false LastNodeHealthTime Wed Mar 19 13:31:24 EET 2014 NodeHealthReport1/1 local-dirs turned bad: /hadoop/yarn/local;1/1 log-dirs turned bad: /hadoop/yarn/log Node Manager Version: 2.2.0.2.0.6.0-101 from b07b2906c36defd389c8b5bd22bebc1bead8115b by jenkins source checksum 82bd166aa0ada92b44f8a154836b92 on 2014-01-09T05:24Z Hadoop Version: 2.2.0.2.0.6.0-101 from b07b2906c36defd389c8b5bd22bebc1bead8115b by jenkins source checksum 704f1e463ebc4fb89353011407e965 on 2014-01-09T05:18Z I tried: Deleted /hadoop/* and did namenode -format again Restarted nodemanager but still in unhealthy mode. Is there any guideline what I should do? -- Best regards, Margus (Margusja) Roo +372 51 48 780 http://margus.roo.ee http://ee.linkedin.com/in/margusroo skype: margusja ldapsearch -x -h ldap.sk.ee -b c=EE (serialNumber=37303140314)
Webinar On Hadoop!
Hi , Want to Know Big Data / Hadoop ? If yes , join us for Free Webinar by industry experts at below link. *FREE webinar on Hadoop, Hosted by : Manoj , Research Director* *Join us for a webinar on Mar 19, 2014 at 8:00 PM IST.* *Register now!* https://attendee.gotowebinar.com/register/54180991637732354 *Discussion Topics? * *What is Big Data ? *Challenges in Big Data *What is Hadoop ? *Opportunities in Hadoop / Big Data For further details visit us at www.soapttrainings.com Best Regards, *Kumar Vivek |* Director M +91-7675824584| si...@soapt.com www.soapttrainings.com http://soapttrainings.com/index.php?action=1#hadoop| www.openbravo.com #2, 38/A, Above Docomo Office, Madhapur, Hyderabad
Re: NodeHealthReport local-dirs turned bad
tnx got it work. In my init script I used wrong user. It was permissions problem like Rohith said. Best regards, Margus (Margusja) Roo +372 51 48 780 http://margus.roo.ee http://ee.linkedin.com/in/margusroo skype: margusja ldapsearch -x -h ldap.sk.ee -b c=EE (serialNumber=37303140314) On 19/03/14 14:08, Rohith Sharma K S wrote: Hi There is no relation to NameNode format Does NodeManger is started with default configuration? If no , any NodeManger health script is configured? Suspect can be 1. /hadoop does not have permission or 2. disk is full Thanks Regards Rohith Sharma K S -Original Message- From: Margusja [mailto:mar...@roo.ee] Sent: 19 March 2014 17:04 To: user@hadoop.apache.org Subject: NodeHealthReport local-dirs turned bad Hi I have one node in unhealthy status: Total Vmem allocated for Containers 4.20 GB Vmem enforcement enabledfalse Total Pmem allocated for Container 2 GB Pmem enforcement enabledfalse NodeHealthyStatus false LastNodeHealthTime Wed Mar 19 13:31:24 EET 2014 NodeHealthReport1/1 local-dirs turned bad: /hadoop/yarn/local;1/1 log-dirs turned bad: /hadoop/yarn/log Node Manager Version: 2.2.0.2.0.6.0-101 from b07b2906c36defd389c8b5bd22bebc1bead8115b by jenkins source checksum 82bd166aa0ada92b44f8a154836b92 on 2014-01-09T05:24Z Hadoop Version: 2.2.0.2.0.6.0-101 from b07b2906c36defd389c8b5bd22bebc1bead8115b by jenkins source checksum 704f1e463ebc4fb89353011407e965 on 2014-01-09T05:18Z I tried: Deleted /hadoop/* and did namenode -format again Restarted nodemanager but still in unhealthy mode. Is there any guideline what I should do? -- Best regards, Margus (Margusja) Roo +372 51 48 780 http://margus.roo.ee http://ee.linkedin.com/in/margusroo skype: margusja ldapsearch -x -h ldap.sk.ee -b c=EE (serialNumber=37303140314)
Need FileName with Content
Hi, I have folder named INPUT. Inside INPUT i have 5 resume are there. hduser@localhost:~/Ranjini$ hadoop fs -ls /user/hduser/INPUT Found 5 items -rw-r--r-- 1 hduser supergroup 5438 2014-03-18 15:20 /user/hduser/INPUT/Rakesh Chowdary_Microstrategy.txt -rw-r--r-- 1 hduser supergroup 6022 2014-03-18 15:22 /user/hduser/INPUT/Ramarao Devineni_Microstrategy.txt -rw-r--r-- 1 hduser supergroup 3517 2014-03-18 15:21 /user/hduser/INPUT/vinitha.txt -rw-r--r-- 1 hduser supergroup 3517 2014-03-18 15:21 /user/hduser/INPUT/sony.txt -rw-r--r-- 1 hduser supergroup 3517 2014-03-18 15:21 /user/hduser/INPUT/ravi.txt hduser@localhost:~/Ranjini$ I have to process the folder and the content . I need ouput has filename word occurance vinitha java 4 sony oracle 3 But iam not getting the filename. Has the input file content are merged file name is not getting correct . please help in this issue to fix. I have given by code below import java.io.IOException; import java.util.*; import org.apache.hadoop.fs.Path; import org.apache.hadoop.conf.*; import org.apache.hadoop.io.*; import org.apache.hadoop.mapred.*; import org.apache.hadoop.util.*; import java.io.File; import java.io.FileReader; import java.io.FileWriter; import java.io.IOException; import org.apache.hadoop.fs.Path; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.FileSystem; import org.apache.hadoop.fs.FileStatus; import org.apache.hadoop.conf.*; import org.apache.hadoop.io.*; import org.apache.hadoop.mapred.*; import org.apache.hadoop.util.*; import org.apache.hadoop.mapred.lib.*; public class WordCount { public static class Map extends MapReduceBase implements MapperLongWritable, Text, Text, IntWritable { private final static IntWritable one = new IntWritable(1); private Text word = new Text(); public void map(LongWritable key, Text value, OutputCollectorText, IntWritable output, Reporter reporter) throws IOException { FSDataInputStream fs=null; FileSystem hdfs = null; String line = value.toString(); int i=0,k=0; try{ Configuration configuration = new Configuration(); configuration.set(fs.default.name, hdfs://localhost:4440/); Path srcPath = new Path(/user/hduser/INPUT/); hdfs = FileSystem.get(configuration); FileStatus[] status = hdfs.listStatus(srcPath); fs=hdfs.open(srcPath); BufferedReader br=new BufferedReader(new InputStreamReader(hdfs.open(srcPath))); String[] splited = line.split(\\s file://s/+); for( i=0;isplited.length;i++) { String sp[]=splited[i].split(,); for( k=0;ksp.length;k++) { if(!sp[k].isEmpty()){ StringTokenizer tokenizer = new StringTokenizer(sp[k]); if((sp[k].equalsIgnoreCase(C))){ while (tokenizer.hasMoreTokens()) { word.set(tokenizer.nextToken()); output.collect(word, one); } } if((sp[k].equalsIgnoreCase(JAVA))){ while (tokenizer.hasMoreTokens()) { word.set(tokenizer.nextToken()); output.collect(word, one); } } } } } } catch (IOException e) { e.printStackTrace(); } } } public static class Reduce extends MapReduceBase implements ReducerText, IntWritable, Text, IntWritable { public void reduce(Text key, IteratorIntWritable values, OutputCollectorText, IntWritable output, Reporter reporter) throws IOException { int sum = 0; while (values.hasNext()) { sum += values.next().get(); } output.collect(key, new IntWritable(sum)); } } public static void main(String[] args) throws Exception { JobConf conf = new JobConf(WordCount.class); conf.setJobName(wordcount); conf.setOutputKeyClass(Text.class); conf.setOutputValueClass(IntWritable.class); conf.setMapperClass(Map.class); conf.setCombinerClass(Reduce.class); conf.setReducerClass(Reduce.class); conf.setInputFormat(TextInputFormat.class); conf.setOutputFormat(TextOutputFormat.class); FileInputFormat.setInputPaths(conf, new Path(args[0])); FileOutputFormat.setOutputPath(conf, new Path(args[1])); JobClient.runJob(conf); } } Please help Thanks in advance. Ranjini
Re: I am about to lose all my data please help
Thanks for you helps, but still could not solve my problem. On Tue, Mar 18, 2014 at 10:13 AM, Stanley Shi s...@gopivotal.com wrote: Ah yes, I overlooked this. Then please check the file are there or not: ls /home/hadoop/project/hadoop-data/dfs/name? Regards, *Stanley Shi,* On Tue, Mar 18, 2014 at 2:06 PM, Azuryy Yu azury...@gmail.com wrote: I don't think this is the case, because there is; property namehadoop.tmp.dir/name value/home/hadoop/project/hadoop-data/value /property On Tue, Mar 18, 2014 at 1:55 PM, Stanley Shi s...@gopivotal.com wrote: one possible reason is that you didn't set the namenode working directory, by default it's in /tmp folder; and the /tmp folder might get deleted by the OS without any notification. If this is the case, I am afraid you have lost all your namenode data. *property namedfs.name.dir/name value${hadoop.tmp.dir}/dfs/name/value descriptionDetermines where on the local filesystem the DFS name node should store the name table(fsimage). If this is a comma-delimited list of directories then the name table is replicated in all of the directories, for redundancy. /description /property* Regards, *Stanley Shi,* On Sun, Mar 16, 2014 at 5:29 PM, Mirko Kämpf mirko.kae...@gmail.comwrote: Hi, what is the location of the namenodes fsimage and editlogs? And how much memory has the NameNode. Did you work with a Secondary NameNode or a Standby NameNode for checkpointing? Where are your HDFS blocks located, are those still safe? With this information at hand, one might be able to fix your setup, but do not format the old namenode before all is working with a fresh one. Grab a copy of the maintainance guide: http://shop.oreilly.com/product/0636920025085.do?sortby=publicationDate which helps solving such type of problems as well. Best wishes Mirko 2014-03-16 9:07 GMT+00:00 Fatih Haltas fatih.hal...@nyu.edu: Dear All, I have just restarted machines of my hadoop clusters. Now, I am trying to restart hadoop clusters again, but getting error on namenode restart. I am afraid of loosing my data as it was properly running for more than 3 months. Currently, I believe if I do namenode formatting, it will work again, however, data will be lost. Is there anyway to solve this without losing the data. I will really appreciate any help. Thanks. = Here is the logs; 2014-02-26 16:02:39,698 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: STARTUP_MSG: / STARTUP_MSG: Starting NameNode STARTUP_MSG: host = ADUAE042-LAP-V/127.0.0.1 STARTUP_MSG: args = [] STARTUP_MSG: version = 1.0.4 STARTUP_MSG: build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.0 -r 1393290; compiled by 'hortonfo' on Wed Oct 3 05:13:58 UTC 2012 / 2014-02-26 16:02:40,005 INFO org.apache.hadoop.metrics2.impl.MetricsConfig: loaded properties from hadoop-metrics2.properties 2014-02-26 16:02:40,019 INFO org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source MetricsSystem,sub=Stats registered. 2014-02-26 16:02:40,021 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled snapshot period at 10 second(s). 2014-02-26 16:02:40,021 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: NameNode metrics system started 2014-02-26 16:02:40,169 INFO org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source ugi registered. 2014-02-26 16:02:40,193 INFO org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source jvm registered. 2014-02-26 16:02:40,194 INFO org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source NameNode registered. 2014-02-26 16:02:40,242 INFO org.apache.hadoop.hdfs.util.GSet: VM type = 64-bit 2014-02-26 16:02:40,242 INFO org.apache.hadoop.hdfs.util.GSet: 2% max memory = 17.77875 MB 2014-02-26 16:02:40,242 INFO org.apache.hadoop.hdfs.util.GSet: capacity = 2^21 = 2097152 entries 2014-02-26 16:02:40,242 INFO org.apache.hadoop.hdfs.util.GSet: recommended=2097152, actual=2097152 2014-02-26 16:02:40,273 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: fsOwner=hadoop 2014-02-26 16:02:40,273 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: supergroup=supergroup 2014-02-26 16:02:40,274 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: isPermissionEnabled=true 2014-02-26 16:02:40,279 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: dfs.block.invalidate.limit=100 2014-02-26 16:02:40,279 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: isAccessTokenEnabled=false accessKeyUpdateInterval=0 min(s), accessTokenLifetime=0 min(s) 2014-02-26 16:02:40,724 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Registered FSNamesystemStateMBean and NameNodeMXBean
Doubt
Hi all, is it possible to install Mongodb on the same VM which consists hadoop? -- amiable harsha
question about yarn webapp
I can get Server Stacks from web ui. But I don't know which code handle the function, how can the web app get the stacks information from jvm?
Re: Doubt
Certainly it is , and quite common especially if you have some high performance machines : they can run as mapreduce slaves and also double as mongo hosts. The problem would of course be that when running mapreduce jobs you might have very slow network bandwidth at times, and if your front end needs fast response times all the time from mongo instances you could be in trouble. On Wed, Mar 19, 2014 at 11:50 AM, praveenesh kumar praveen...@gmail.comwrote: Why not ? Its just a matter of installing 2 different packages. Depends on what do you want to use it for, you need to take care of few things, but as far as installation is concerned, it should be easily doable. Regards Prav On Wed, Mar 19, 2014 at 3:41 PM, sri harsha rsharsh...@gmail.com wrote: Hi all, is it possible to install Mongodb on the same VM which consists hadoop? -- amiable harsha -- Jay Vyas http://jayunit100.blogspot.com
JSR 203 NIO 2 for HDFS
Hi, I'm working on minimal implementation of JSR 203 to provide access to HDFS (1.2.1) for a GUI tool needed in my company. Some features already works as create a directory, delete something, list files in directory. I would know if someone already worked on something like that. Maybe a FOSS had already did this? Anyway if someone want to help me in this task, it's here : g...@github.com:damiencarol/jsr203-hadoop.git Regards, Damien CAROL
Re: Is Hadoop's TooRunner thread-safe?
Any thoughts on this? Confirm or Deny it's an issue.. may be? On Mon, Mar 17, 2014 at 11:43 AM, Something Something mailinglist...@gmail.com wrote: I would like to trigger a few Hadoop jobs simultaneously. I've created a pool of threads using Executors.newFixedThreadPool. Idea is that if the pool size is 2, my code will trigger 2 Hadoop jobs at the same exact time using 'ToolRunner.run'. In my testing, I noticed that these 2 threads keep stepping on each other. When I looked under the hood, I noticed that ToolRunner creates GenericOptionsParser which in turn calls a static method 'buildGeneralOptions'. This method uses 'OptionBuilder.withArgName' which uses an instance variable called, 'argName'. This doesn't look thread safe to me and I believe is the root cause of issues I am running into. Any thoughts?
Is Hadoop's TooRunner thread-safe?
I would like to trigger a few Hadoop jobs simultaneously. I've created a pool of threads using Executors.newFixedThreadPool. Idea is that if the pool size is 2, my code will trigger 2 Hadoop jobs at the same exact time using 'ToolRunner.run'. In my testing, I noticed that these 2 threads keep stepping on each other. When I looked under the hood, I noticed that ToolRunner creates GenericOptionsParser which in turn calls a static method 'buildGeneralOptions'. This method uses 'OptionBuilder.withArgName' which uses an instance variable called, 'argName'. This doesn't look thread safe to me and I believe is the root cause of issues I am running into. Any thoughts?
Class loading in Hadoop and HBase
Hi all, I'm running with Hadoop 1.0.4 and HBase 0.94.12 bundled (OSGi) versions I built. Most issues I encountered are related to class loaders. One of the patterns I noticed in both projects is: ClassLoader cl = Thread.currentThread().getContextClassLoader(); if(cl == null) { cl = Clazz.class.getClassLoader(); } Where Clazz is the Class containing this code. I was wondering about this choice... Why not go the other way around: ClassLoader cl = Clazz.class.getClassLoader(); if(cl == null) { cl = Thread.currentThread().getContextClassLoader(); } And in a more general note, why not always use Configuration (and let it's cl be this.getClass().getClassLoader()) to load classes ? That would surely help in integration with modularity frameworks. Thanks, Amit.
Re: Doubt
thank s jay and praveen, i want to use both separately don't want to use mongodb in the place of hbase On Wed, Mar 19, 2014 at 9:25 PM, Jay Vyas jayunit...@gmail.com wrote: Certainly it is , and quite common especially if you have some high performance machines : they can run as mapreduce slaves and also double as mongo hosts. The problem would of course be that when running mapreduce jobs you might have very slow network bandwidth at times, and if your front end needs fast response times all the time from mongo instances you could be in trouble. On Wed, Mar 19, 2014 at 11:50 AM, praveenesh kumar praveen...@gmail.comwrote: Why not ? Its just a matter of installing 2 different packages. Depends on what do you want to use it for, you need to take care of few things, but as far as installation is concerned, it should be easily doable. Regards Prav On Wed, Mar 19, 2014 at 3:41 PM, sri harsha rsharsh...@gmail.com wrote: Hi all, is it possible to install Mongodb on the same VM which consists hadoop? -- amiable harsha -- Jay Vyas http://jayunit100.blogspot.com -- amiable harsha
Re: Doubt
Why not ? Its just a matter of installing 2 different packages. Depends on what do you want to use it for, you need to take care of few things, but as far as installation is concerned, it should be easily doable. Regards Prav On Wed, Mar 19, 2014 at 3:41 PM, sri harsha rsharsh...@gmail.com wrote: Hi all, is it possible to install Mongodb on the same VM which consists hadoop? -- amiable harsha
The reduce copier failed
Hi In the middle of a map-reduce job I get map 20% reduce 6% ... The reduce copier failed map 20% reduce 0% map 20% reduce 1% map 20% reduce 2% map 20% reduce 3% Does that imply a *retry* process? Or I have to be worried about that message? Regards, Mahmood
Re: question about yarn webapp
Hello, This is a hadoop-common functionality. See the StacksServlet class code: https://github.com/apache/hadoop-common/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/http/HttpServer2.java#L1044 On Wed, Mar 19, 2014 at 9:17 PM, 赵璨 asoq...@gmail.com wrote: I can get Server Stacks from web ui. But I don't know which code handle the function, how can the web app get the stacks information from jvm? -- Harsh J
Re: The reduce copier failed
While it does mean a retry, if the job eventually fails (after finite retries all fail as well), then you have a problem to investigate. If the job eventually succeeded, then this may have been a transient issue. Worth investigating either way. On Thu, Mar 20, 2014 at 12:57 AM, Mahmood Naderan nt_mahm...@yahoo.com wrote: Hi In the middle of a map-reduce job I get map 20% reduce 6% ... The reduce copier failed map 20% reduce 0% map 20% reduce 1% map 20% reduce 2% map 20% reduce 3% Does that imply a *retry* process? Or I have to be worried about that message? Regards, Mahmood -- Harsh J
Re: Need FileName with Content
You want to do a word count for each file, but the code give you a word count for all the files, right? = word.set(tokenizer.nextToken()); output.collect(word, one); == change it to: word.set(filename++tokenizer.nextToken()); output.collect(word,one); Regards, *Stanley Shi,* On Wed, Mar 19, 2014 at 8:50 PM, Ranjini Rathinam ranjinibe...@gmail.comwrote: Hi, I have folder named INPUT. Inside INPUT i have 5 resume are there. hduser@localhost:~/Ranjini$ hadoop fs -ls /user/hduser/INPUT Found 5 items -rw-r--r-- 1 hduser supergroup 5438 2014-03-18 15:20 /user/hduser/INPUT/Rakesh Chowdary_Microstrategy.txt -rw-r--r-- 1 hduser supergroup 6022 2014-03-18 15:22 /user/hduser/INPUT/Ramarao Devineni_Microstrategy.txt -rw-r--r-- 1 hduser supergroup 3517 2014-03-18 15:21 /user/hduser/INPUT/vinitha.txt -rw-r--r-- 1 hduser supergroup 3517 2014-03-18 15:21 /user/hduser/INPUT/sony.txt -rw-r--r-- 1 hduser supergroup 3517 2014-03-18 15:21 /user/hduser/INPUT/ravi.txt hduser@localhost:~/Ranjini$ I have to process the folder and the content . I need ouput has filename word occurance vinitha java 4 sony oracle 3 But iam not getting the filename. Has the input file content are merged file name is not getting correct . please help in this issue to fix. I have given by code below import java.io.IOException; import java.util.*; import org.apache.hadoop.fs.Path; import org.apache.hadoop.conf.*; import org.apache.hadoop.io.*; import org.apache.hadoop.mapred.*; import org.apache.hadoop.util.*; import java.io.File; import java.io.FileReader; import java.io.FileWriter; import java.io.IOException; import org.apache.hadoop.fs.Path; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.FileSystem; import org.apache.hadoop.fs.FileStatus; import org.apache.hadoop.conf.*; import org.apache.hadoop.io.*; import org.apache.hadoop.mapred.*; import org.apache.hadoop.util.*; import org.apache.hadoop.mapred.lib.*; public class WordCount { public static class Map extends MapReduceBase implements MapperLongWritable, Text, Text, IntWritable { private final static IntWritable one = new IntWritable(1); private Text word = new Text(); public void map(LongWritable key, Text value, OutputCollectorText, IntWritable output, Reporter reporter) throws IOException { FSDataInputStream fs=null; FileSystem hdfs = null; String line = value.toString(); int i=0,k=0; try{ Configuration configuration = new Configuration(); configuration.set(fs.default.name, hdfs://localhost:4440/); Path srcPath = new Path(/user/hduser/INPUT/); hdfs = FileSystem.get(configuration); FileStatus[] status = hdfs.listStatus(srcPath); fs=hdfs.open(srcPath); BufferedReader br=new BufferedReader(new InputStreamReader(hdfs.open(srcPath))); String[] splited = line.split(\\s+); for( i=0;isplited.length;i++) { String sp[]=splited[i].split(,); for( k=0;ksp.length;k++) { if(!sp[k].isEmpty()){ StringTokenizer tokenizer = new StringTokenizer(sp[k]); if((sp[k].equalsIgnoreCase(C))){ while (tokenizer.hasMoreTokens()) { word.set(tokenizer.nextToken()); output.collect(word, one); } } if((sp[k].equalsIgnoreCase(JAVA))){ while (tokenizer.hasMoreTokens()) { word.set(tokenizer.nextToken()); output.collect(word, one); } } } } } } catch (IOException e) { e.printStackTrace(); } } } public static class Reduce extends MapReduceBase implements ReducerText, IntWritable, Text, IntWritable { public void reduce(Text key, IteratorIntWritable values, OutputCollectorText, IntWritable output, Reporter reporter) throws IOException { int sum = 0; while (values.hasNext()) { sum += values.next().get(); } output.collect(key, new IntWritable(sum)); } } public static void main(String[] args) throws Exception { JobConf conf = new JobConf(WordCount.class); conf.setJobName(wordcount); conf.setOutputKeyClass(Text.class); conf.setOutputValueClass(IntWritable.class); conf.setMapperClass(Map.class); conf.setCombinerClass(Reduce.class); conf.setReducerClass(Reduce.class); conf.setInputFormat(TextInputFormat.class); conf.setOutputFormat(TextOutputFormat.class); FileInputFormat.setInputPaths(conf, new Path(args[0])); FileOutputFormat.setOutputPath(conf, new Path(args[1])); JobClient.runJob(conf); } } Please help Thanks in advance. Ranjini
how to free up space of the old Data Node
Hi I have 3 nodes Hadoop cluster in which I created 3 Data Nodes. However, I don't have enough space in one of the node to cater other projects' log. So I decommissioned this node from a Data node list but I could not re-claimed the space from it. Is there a way to get this node to release space? [root@nsda3dmsrpt02] /data/dfs/dn# du -sh /data/dfs/dn/* 47G /data/dfs/dn/current $ sudo -u hdfs hadoop fsck /data DEPRECATED: Use of this script to execute hdfs command is deprecated. Instead use the hdfs command for it. Status: HEALTHY Total size:7186453688 B Total dirs:11 Total files: 62 Total symlinks:0 Total blocks (validated): 105 (avg. block size 68442416 B) Minimally replicated blocks: 105 (100.0 %) Over-replicated blocks:0 (0.0 %) Under-replicated blocks: 105 (100.0 %) Mis-replicated blocks: 0 (0.0 %) Default replication factor:3 Average block replication: 2.0 Corrupt blocks:0 Missing replicas: 105 (33.32 %) Number of data-nodes: 2 Number of racks: 1 FSCK ended at Thu Mar 20 13:30:03 EST 2014 in 22 milliseconds The filesystem under path '/data' is HEALTHY Thanks and Regards, Truong Phan Senior Technology Specialist Database Engineering Transport Routing Engineering | Networks | Telstra Operations [cid:image001.gif@01CF442C.0F3C8CF0] P+ 61 2 8576 5771 M + 61 4 1463 7424 Etroung.p...@team.telstra.com W www.telstra.com Love the movies? Telstra takes you there with $10 movie tickets, just to say thanks. Available now at telstra.com/movieshttp://www.telstra.com/movies This communication may contain confidential or copyright information of Telstra Corporation Limited (ABN 33 051 775 556). If you are not an intended recipient, you must not keep, forward, copy, use, save or rely on this communication, and any such action is unauthorised and prohibited. If you have received this communication in error, please reply to this email to notify the sender of its incorrect delivery, and then delete both it and your reply. inline: image001.gif
RE: how to free up space of the old Data Node
Please check my inline comments which are in blue color... From: Phan, Truong Q [troung.p...@team.telstra.com] Sent: Thursday, March 20, 2014 8:04 AM To: user@hadoop.apache.org Subject: how to free up space of the old Data Node Hi I have 3 nodes Hadoop cluster in which I created 3 Data Nodes. However, I don’t have enough space in one of the node to cater other projects’ log. So I decommissioned this node from a Data node list but I could not re-claimed the space from it. is your Replication is 3..? If it is 3 and as you have 3 datanodes,ideally disk space occupied by all nodes should be same(47G, should be present in all the DN's).. And if you RF=3,Decommission will not be success as you've only 3 DN's..you need to add another DN to cluster,,then only decommission will be success.. Hence please mention the replication factor of the file.. Is there a way to get this node to release space? ways are there,,but you need to mention, why only this node disk is full..why not other nodes..? is it because,this node is having less space compared to other nodes If RF=3, then make RF=2(decrease the replication factor)..then do decommission of this node [root@nsda3dmsrpt02] /data/dfs/dn# du -sh /data/dfs/dn/* 47G /data/dfs/dn/current try to give the following output also sudo -u hdfs hadoop fsck / sudo -u hdfs hadoop dfsadmin -report $ sudo -u hdfs hadoop fsck /data DEPRECATED: Use of this script to execute hdfs command is deprecated. Instead use the hdfs command for it. Status: HEALTHY Total size:7186453688 B Total dirs:11 Total files: 62 Total symlinks:0 Total blocks (validated): 105 (avg. block size 68442416 B) Minimally replicated blocks: 105 (100.0 %) Over-replicated blocks:0 (0.0 %) Under-replicated blocks: 105 (100.0 %) Mis-replicated blocks: 0 (0.0 %) Default replication factor:3 Average block replication: 2.0 Corrupt blocks:0 Missing replicas: 105 (33.32 %) Number of data-nodes: 2 Number of racks: 1 FSCK ended at Thu Mar 20 13:30:03 EST 2014 in 22 milliseconds The filesystem under path '/data' is HEALTHY Thanks and Regards, Truong Phan Senior Technology Specialist Database Engineering Transport Routing Engineering | Networks | Telstra Operations [Telstra] P+ 61 2 8576 5771 M + 61 4 1463 7424 Etroung.p...@team.telstra.com W www.telstra.comhttps://email-cn.huawei.com/owa/UrlBlockedError.aspx Love the movies? Telstra takes you there with $10 movie tickets, just to say thanks. Available now at telstra.com/movieshttp://www.telstra.com/movies This communication may contain confidential or copyright information of Telstra Corporation Limited (ABN 33 051 775 556). If you are not an intended recipient, you must not keep, forward, copy, use, save or rely on this communication, and any such action is unauthorised and prohibited. If you have received this communication in error, please reply to this email to notify the sender of its incorrect delivery, and then delete both it and your reply. inline: image001.gif
NullPointerException in offLineImageViewer
I want to access and study the hadoop cluster's metadata which is stored in fsimage file on namenode machine. i came to know that offLineImageViewer is used to do so. But when i am trying doing it i am getting an exception. /usr/hadoop/hadoop-1.2.1# bin/hadoop oiv -i fsimage -o fsimage.txt Warning: $HADOOP_HOME is deprecated. Exception in thread main java.lang.NullPointerException at org.apache.hadoop.hdfs.tools.offlineImageViewer.OfflineImageViewer.go(OfflineImageViewer.java:141) at org.apache.hadoop.hdfs.tools.offlineImageViewer.OfflineImageViewer.main(OfflineImageViewer.java:261) i am not able to solve this error. Is it due to that warning (Warning: $HADOOP_HOME is deprecated.) or there is something else?
NullpointerException in OffLineImageViewer
I want to access and study the hadoop cluster's metadata which is stored in fsimage file on namenode machine. i came to know that offLineImageViewer is used to do so. But when i am trying doing it i am getting an exception. /usr/hadoop/hadoop-1.2.1# bin/hadoop oiv -i fsimage -o fsimage.txt Warning: $HADOOP_HOME is deprecated. Exception in thread main java.lang.NullPointerException at org.apache.hadoop.hdfs.tools.offlineImageViewer.OfflineImageViewer.go(OfflineImageViewer.java:141) at org.apache.hadoop.hdfs.tools.offlineImageViewer.OfflineImageViewer.main(OfflineImageViewer.java:261) i am not able to solve this error. Is it due to that warning (Warning: $HADOOP_HOME is deprecated.) or there is something else?I am having hadoop-1.2.1 version
RE: how to free up space of the old Data Node
Thanks for the reply. This Hadoop cluster is our POC and the node has less space compare to the other two nodes. How do I change the Replication Factore (RF) from 3 down to 2? Is this controlled by this parameter (dfs.datanode.handler.count)? Thanks and Regards, Truong Phan P+ 61 2 8576 5771 M + 61 4 1463 7424 Etroung.p...@team.telstra.com W www.telstra.com From: Brahma Reddy Battula [mailto:brahmareddy.batt...@huawei.com] Sent: Thursday, 20 March 2014 3:27 PM To: user@hadoop.apache.org Subject: RE: how to free up space of the old Data Node Please check my inline comments which are in blue color... From: Phan, Truong Q [troung.p...@team.telstra.com] Sent: Thursday, March 20, 2014 8:04 AM To: user@hadoop.apache.orgmailto:user@hadoop.apache.org Subject: how to free up space of the old Data Node Hi I have 3 nodes Hadoop cluster in which I created 3 Data Nodes. However, I don't have enough space in one of the node to cater other projects' log. So I decommissioned this node from a Data node list but I could not re-claimed the space from it. is your Replication is 3..? If it is 3 and as you have 3 datanodes,ideally disk space occupied by all nodes should be same(47G, should be present in all the DN's).. And if you RF=3,Decommission will not be success as you've only 3 DN's..you need to add another DN to cluster,,then only decommission will be success.. Hence please mention the replication factor of the file.. Is there a way to get this node to release space? ways are there,,but you need to mention, why only this node disk is full..why not other nodes..? is it because,this node is having less space compared to other nodes If RF=3, then make RF=2(decrease the replication factor)..then do decommission of this node [root@nsda3dmsrpt02] /data/dfs/dn# du -sh /data/dfs/dn/* 47G /data/dfs/dn/current try to give the following output also sudo -u hdfs hadoop fsck / sudo -u hdfs hadoop dfsadmin -report $ sudo -u hdfs hadoop fsck /data DEPRECATED: Use of this script to execute hdfs command is deprecated. Instead use the hdfs command for it. Status: HEALTHY Total size:7186453688 B Total dirs:11 Total files: 62 Total symlinks:0 Total blocks (validated): 105 (avg. block size 68442416 B) Minimally replicated blocks: 105 (100.0 %) Over-replicated blocks:0 (0.0 %) Under-replicated blocks: 105 (100.0 %) Mis-replicated blocks: 0 (0.0 %) Default replication factor:3 Average block replication: 2.0 Corrupt blocks:0 Missing replicas: 105 (33.32 %) Number of data-nodes: 2 Number of racks: 1 FSCK ended at Thu Mar 20 13:30:03 EST 2014 in 22 milliseconds The filesystem under path '/data' is HEALTHY Thanks and Regards, Truong Phan Senior Technology Specialist Database Engineering Transport Routing Engineering | Networks | Telstra Operations [cid:image001.gif@01CF4455.2B5B0CD0] P+ 61 2 8576 5771 M + 61 4 1463 7424 Etroung.p...@team.telstra.commailto:troung.p...@team.telstra.com W www.telstra.comhttps://email-cn.huawei.com/owa/UrlBlockedError.aspx Love the movies? Telstra takes you there with $10 movie tickets, just to say thanks. Available now at telstra.com/movieshttp://www.telstra.com/movies This communication may contain confidential or copyright information of Telstra Corporation Limited (ABN 33 051 775 556). If you are not an intended recipient, you must not keep, forward, copy, use, save or rely on this communication, and any such action is unauthorised and prohibited. If you have received this communication in error, please reply to this email to notify the sender of its incorrect delivery, and then delete both it and your reply. inline: image001.gif
RE: how to free up space of the old Data Node
You can change the replication factor using the following command hdfs dfs - setrep [-R] rep path Once this is done, you can re-commission the datanode, then all the overreplicated blocks will be removed. If not removed, restart the datanode. Regards, Vinayakumar B From: Phan, Truong Q [mailto:troung.p...@team.telstra.com] Sent: 20 March 2014 10:28 To: user@hadoop.apache.org Subject: RE: how to free up space of the old Data Node Thanks for the reply. This Hadoop cluster is our POC and the node has less space compare to the other two nodes. How do I change the Replication Factore (RF) from 3 down to 2? Is this controlled by this parameter (dfs.datanode.handler.count)? Thanks and Regards, Truong Phan P+ 61 2 8576 5771 M + 61 4 1463 7424 Etroung.p...@team.telstra.commailto:troung.p...@team.telstra.com W www.telstra.com From: Brahma Reddy Battula [mailto:brahmareddy.batt...@huawei.com] Sent: Thursday, 20 March 2014 3:27 PM To: user@hadoop.apache.orgmailto:user@hadoop.apache.org Subject: RE: how to free up space of the old Data Node Please check my inline comments which are in blue color... From: Phan, Truong Q [troung.p...@team.telstra.com] Sent: Thursday, March 20, 2014 8:04 AM To: user@hadoop.apache.orgmailto:user@hadoop.apache.org Subject: how to free up space of the old Data Node Hi I have 3 nodes Hadoop cluster in which I created 3 Data Nodes. However, I don't have enough space in one of the node to cater other projects' log. So I decommissioned this node from a Data node list but I could not re-claimed the space from it. is your Replication is 3..? If it is 3 and as you have 3 datanodes,ideally disk space occupied by all nodes should be same(47G, should be present in all the DN's).. And if you RF=3,Decommission will not be success as you've only 3 DN's..you need to add another DN to cluster,,then only decommission will be success.. Hence please mention the replication factor of the file.. Is there a way to get this node to release space? ways are there,,but you need to mention, why only this node disk is full..why not other nodes..? is it because,this node is having less space compared to other nodes If RF=3, then make RF=2(decrease the replication factor)..then do decommission of this node [root@nsda3dmsrpt02] /data/dfs/dn# du -sh /data/dfs/dn/* 47G /data/dfs/dn/current try to give the following output also sudo -u hdfs hadoop fsck / sudo -u hdfs hadoop dfsadmin -report $ sudo -u hdfs hadoop fsck /data DEPRECATED: Use of this script to execute hdfs command is deprecated. Instead use the hdfs command for it. Status: HEALTHY Total size:7186453688 B Total dirs:11 Total files: 62 Total symlinks:0 Total blocks (validated): 105 (avg. block size 68442416 B) Minimally replicated blocks: 105 (100.0 %) Over-replicated blocks:0 (0.0 %) Under-replicated blocks: 105 (100.0 %) Mis-replicated blocks: 0 (0.0 %) Default replication factor:3 Average block replication: 2.0 Corrupt blocks:0 Missing replicas: 105 (33.32 %) Number of data-nodes: 2 Number of racks: 1 FSCK ended at Thu Mar 20 13:30:03 EST 2014 in 22 milliseconds The filesystem under path '/data' is HEALTHY Thanks and Regards, Truong Phan Senior Technology Specialist Database Engineering Transport Routing Engineering | Networks | Telstra Operations [Telstra] P+ 61 2 8576 5771 M + 61 4 1463 7424 Etroung.p...@team.telstra.commailto:troung.p...@team.telstra.com W www.telstra.comhttps://email-cn.huawei.com/owa/UrlBlockedError.aspx Love the movies? Telstra takes you there with $10 movie tickets, just to say thanks. Available now at telstra.com/movieshttp://www.telstra.com/movies This communication may contain confidential or copyright information of Telstra Corporation Limited (ABN 33 051 775 556). If you are not an intended recipient, you must not keep, forward, copy, use, save or rely on this communication, and any such action is unauthorised and prohibited. If you have received this communication in error, please reply to this email to notify the sender of its incorrect delivery, and then delete both it and your reply. inline: image001.gif
Hadoop MapReduce Streaming - how to change the final output file name with the desired name rather than in partition like: part-0000*
Hi Could you please provide me an alternative link where it explains on how to change the final output file name with the desired name rather than in partition like: part-*? Can I have a sample Python's code to run MapReduce Streaming with a custome output file names? One helper from the Avro's mailing list gave me the link below but it is: 1) None of the links in the link below are not working 2) The link is explained Java's method not Python's method and I am looking for the Python's method. http://wiki.apache.org/hadoop/FAQ#How_do_I_change_final_output_file_name_with_the_desired_name_rather_than_in_partitions_like_part-0.2C_part-1.3F Thanks and Regards, Truong Phan Senior Technology Specialist Database Engineering Transport Routing Engineering | Networks | Telstra Operations [cid:image001.gif@01CF4456.0B517E00] P+ 61 2 8576 5771 M + 61 4 1463 7424 Etroung.p...@team.telstra.com W www.telstra.com Love the movies? Telstra takes you there with $10 movie tickets, just to say thanks. Available now at telstra.com/movieshttp://www.telstra.com/movies This communication may contain confidential or copyright information of Telstra Corporation Limited (ABN 33 051 775 556). If you are not an intended recipient, you must not keep, forward, copy, use, save or rely on this communication, and any such action is unauthorised and prohibited. If you have received this communication in error, please reply to this email to notify the sender of its incorrect delivery, and then delete both it and your reply. inline: image001.gif
RE: how to free up space of the old Data Node
Hi Battula, I hope Battula is your first name. :P Here are the output of your suggested commands: [root@nsda3dmsrpt02] /usr/lib/hadoop-0.20-mapreduce# sudo -u hdfs hadoop fsck / DEPRECATED: Use of this script to execute hdfs command is deprecated. Instead use the hdfs command for it. Connecting to namenode via http://nsda3dmsrpt02.internal.bigpond.com:50070 FSCK started by hdfs (auth:SIMPLE) from /172.18.126.99 for path / at Thu Mar 20 16:04:35 EST 2014 Status: CORRUPT Total size:7325542923 B Total dirs:138 Total files: 383 Total symlinks:0 (Files currently being written: 2) Total blocks (validated): 424 (avg. block size 17277223 B) CORRUPT FILES:3 MISSING BLOCKS: 3 MISSING SIZE: 791 B CORRUPT BLOCKS: 3 Minimally replicated blocks: 421 (99.29245 %) Over-replicated blocks:0 (0.0 %) Under-replicated blocks: 417 (98.34906 %) Mis-replicated blocks: 0 (0.0 %) Default replication factor:3 Average block replication: 1.976415 Corrupt blocks:3 Missing replicas: 417 (33.147854 %) Number of data-nodes: 2 Number of racks: 1 FSCK ended at Thu Mar 20 16:04:35 EST 2014 in 105 milliseconds The filesystem under path '/' is CORRUPT [root@nsda3dmsrpt02] /usr/lib/hadoop-0.20-mapreduce# sudo -u hdfs hadoop dfsadmin -report DEPRECATED: Use of this script to execute hdfs command is deprecated. Instead use the hdfs command for it. Configured Capacity: 1100387597518 (1.00 TB) Present Capacity: 727189155840 (677.25 GB) DFS Remaining: 712401227776 (663.48 GB) DFS Used: 14787928064 (13.77 GB) DFS Used%: 2.03% Under replicated blocks: 420 Blocks with corrupt replicas: 0 Missing blocks: 3 - Datanodes available: 2 (3 total, 1 dead) Live datanodes: Name: 172.18.127.248:50010 (bpdevdmsdbs02) Hostname: bpdevdmsdbs02 Rack: /default Decommission Status : Normal Configured Capacity: 550193798759 (512.41 GB) DFS Used: 7394033664 (6.89 GB) Non DFS Used: 169131224679 (157.52 GB) DFS Remaining: 373668540416 (348.01 GB) DFS Used%: 1.34% DFS Remaining%: 67.92% Last contact: Thu Mar 20 16:05:44 EST 2014 Name: 172.18.127.245:50010 (bpdevdmsdbs01) Hostname: bpdevdmsdbs01 Rack: /default Decommission Status : Normal Configured Capacity: 550193798759 (512.41 GB) DFS Used: 7393894400 (6.89 GB) Non DFS Used: 204067216999 (190.05 GB) DFS Remaining: 338732687360 (315.47 GB) DFS Used%: 1.34% DFS Remaining%: 61.57% Last contact: Thu Mar 20 16:05:44 EST 2014 Dead datanodes: Name: 172.18.126.99:50010 (nsda3dmsrpt02.internal.bigpond.com) Hostname: nsda3dmsrpt02.internal.bigpond.com Rack: /default Decommission Status : Normal Configured Capacity: 0 (0 B) DFS Used: 0 (0 B) Non DFS Used: 0 (0 B) DFS Remaining: 0 (0 B) DFS Used%: 100.00% DFS Remaining%: 0.00% Last contact: Wed Mar 19 11:44:44 EST 2014 Thanks and Regards, Truong Phan P+ 61 2 8576 5771 M + 61 4 1463 7424 Etroung.p...@team.telstra.com W www.telstra.com From: Brahma Reddy Battula [mailto:brahmareddy.batt...@huawei.com] Sent: Thursday, 20 March 2014 3:27 PM To: user@hadoop.apache.org Subject: RE: how to free up space of the old Data Node Please check my inline comments which are in blue color... From: Phan, Truong Q [troung.p...@team.telstra.com] Sent: Thursday, March 20, 2014 8:04 AM To: user@hadoop.apache.orgmailto:user@hadoop.apache.org Subject: how to free up space of the old Data Node Hi I have 3 nodes Hadoop cluster in which I created 3 Data Nodes. However, I don't have enough space in one of the node to cater other projects' log. So I decommissioned this node from a Data node list but I could not re-claimed the space from it. is your Replication is 3..? If it is 3 and as you have 3 datanodes,ideally disk space occupied by all nodes should be same(47G, should be present in all the DN's).. And if you RF=3,Decommission will not be success as you've only 3 DN's..you need to add another DN to cluster,,then only decommission will be success.. Hence please mention the replication factor of the file.. Is there a way to get this node to release space? ways are there,,but you need to mention, why only this node disk is full..why not other nodes..? is it because,this node is having less space compared to other nodes If RF=3, then make RF=2(decrease the replication factor)..then do decommission of this node [root@nsda3dmsrpt02] /data/dfs/dn# du -sh /data/dfs/dn/* 47G /data/dfs/dn/current try to give the following output also sudo -u hdfs hadoop fsck / sudo -u hdfs hadoop dfsadmin -report $ sudo -u hdfs hadoop fsck /data DEPRECATED: Use of this script to execute hdfs command is deprecated. Instead use the hdfs command for it. Status: HEALTHY Total size:7186453688 B Total dirs:11 Total files: 62 Total symlinks:0
RE: how to free up space of the old Data Node
Please check my inline comments which are in blue color... From: Phan, Truong Q [troung.p...@team.telstra.com] Sent: Thursday, March 20, 2014 10:28 AM To: user@hadoop.apache.org Subject: RE: how to free up space of the old Data Node Thanks for the reply. This Hadoop cluster is our POC and the node has less space compare to the other two nodes. How do I change the Replication Factore (RF) from 3 down to 2? hadoop fs -setrep -w 2(number) -R /location of the Dir/File Is this controlled by this parameter (dfs.datanode.handler.count)? No Thanks and Regards, Truong Phan P+ 61 2 8576 5771 M + 61 4 1463 7424 Etroung.p...@team.telstra.com W www.telstra.comhttps://email-cn.huawei.com/owa/UrlBlockedError.aspx From: Brahma Reddy Battula [mailto:brahmareddy.batt...@huawei.com] Sent: Thursday, 20 March 2014 3:27 PM To: user@hadoop.apache.org Subject: RE: how to free up space of the old Data Node Please check my inline comments which are in blue color... From: Phan, Truong Q [troung.p...@team.telstra.com] Sent: Thursday, March 20, 2014 8:04 AM To: user@hadoop.apache.orgmailto:user@hadoop.apache.org Subject: how to free up space of the old Data Node Hi I have 3 nodes Hadoop cluster in which I created 3 Data Nodes. However, I don’t have enough space in one of the node to cater other projects’ log. So I decommissioned this node from a Data node list but I could not re-claimed the space from it. is your Replication is 3..? If it is 3 and as you have 3 datanodes,ideally disk space occupied by all nodes should be same(47G, should be present in all the DN's).. And if you RF=3,Decommission will not be success as you've only 3 DN's..you need to add another DN to cluster,,then only decommission will be success.. Hence please mention the replication factor of the file.. Is there a way to get this node to release space? ways are there,,but you need to mention, why only this node disk is full..why not other nodes..? is it because,this node is having less space compared to other nodes If RF=3, then make RF=2(decrease the replication factor)..then do decommission of this node [root@nsda3dmsrpt02] /data/dfs/dn# du -sh /data/dfs/dn/* 47G /data/dfs/dn/current try to give the following output also sudo -u hdfs hadoop fsck / sudo -u hdfs hadoop dfsadmin -report $ sudo -u hdfs hadoop fsck /data DEPRECATED: Use of this script to execute hdfs command is deprecated. Instead use the hdfs command for it. Status: HEALTHY Total size:7186453688 B Total dirs:11 Total files: 62 Total symlinks:0 Total blocks (validated): 105 (avg. block size 68442416 B) Minimally replicated blocks: 105 (100.0 %) Over-replicated blocks:0 (0.0 %) Under-replicated blocks: 105 (100.0 %) Mis-replicated blocks: 0 (0.0 %) Default replication factor:3 Average block replication: 2.0 Corrupt blocks:0 Missing replicas: 105 (33.32 %) Number of data-nodes: 2 Number of racks: 1 FSCK ended at Thu Mar 20 13:30:03 EST 2014 in 22 milliseconds The filesystem under path '/data' is HEALTHY Thanks and Regards, Truong Phan Senior Technology Specialist Database Engineering Transport Routing Engineering | Networks | Telstra Operations [Telstra] P+ 61 2 8576 5771 M + 61 4 1463 7424 Etroung.p...@team.telstra.commailto:troung.p...@team.telstra.com W www.telstra.comhttps://email-cn.huawei.com/owa/UrlBlockedError.aspx Love the movies? Telstra takes you there with $10 movie tickets, just to say thanks. Available now at telstra.com/movieshttp://www.telstra.com/movies This communication may contain confidential or copyright information of Telstra Corporation Limited (ABN 33 051 775 556). If you are not an intended recipient, you must not keep, forward, copy, use, save or rely on this communication, and any such action is unauthorised and prohibited. If you have received this communication in error, please reply to this email to notify the sender of its incorrect delivery, and then delete both it and your reply. inline: image001.gif
RE: how to free up space of the old Data Node
Following Node is down..Please have look on datanode logs and try to make it up...Before going for further action..(like decreasing the replication factor..) Dead datanodes: Name: 172.18.126.99:50010 (nsda3dmsrpt02.internal.bigpond.com) From: Phan, Truong Q [troung.p...@team.telstra.com] Sent: Thursday, March 20, 2014 10:39 AM To: user@hadoop.apache.org Subject: RE: how to free up space of the old Data Node Hi Battula, I hope Battula is your first name. :P Here are the output of your suggested commands: [root@nsda3dmsrpt02] /usr/lib/hadoop-0.20-mapreduce# sudo -u hdfs hadoop fsck / DEPRECATED: Use of this script to execute hdfs command is deprecated. Instead use the hdfs command for it. Connecting to namenode via http://nsda3dmsrpt02.internal.bigpond.com:50070 FSCK started by hdfs (auth:SIMPLE) from /172.18.126.99 for path / at Thu Mar 20 16:04:35 EST 2014 Status: CORRUPT Total size:7325542923 B Total dirs:138 Total files: 383 Total symlinks:0 (Files currently being written: 2) Total blocks (validated): 424 (avg. block size 17277223 B) CORRUPT FILES:3 MISSING BLOCKS: 3 MISSING SIZE: 791 B CORRUPT BLOCKS: 3 Minimally replicated blocks: 421 (99.29245 %) Over-replicated blocks:0 (0.0 %) Under-replicated blocks: 417 (98.34906 %) Mis-replicated blocks: 0 (0.0 %) Default replication factor:3 Average block replication: 1.976415 Corrupt blocks:3 Missing replicas: 417 (33.147854 %) Number of data-nodes: 2 Number of racks: 1 FSCK ended at Thu Mar 20 16:04:35 EST 2014 in 105 milliseconds The filesystem under path '/' is CORRUPT [root@nsda3dmsrpt02] /usr/lib/hadoop-0.20-mapreduce# sudo -u hdfs hadoop dfsadmin -report DEPRECATED: Use of this script to execute hdfs command is deprecated. Instead use the hdfs command for it. Configured Capacity: 1100387597518 (1.00 TB) Present Capacity: 727189155840 (677.25 GB) DFS Remaining: 712401227776 (663.48 GB) DFS Used: 14787928064 (13.77 GB) DFS Used%: 2.03% Under replicated blocks: 420 Blocks with corrupt replicas: 0 Missing blocks: 3 - Datanodes available: 2 (3 total, 1 dead) Live datanodes: Name: 172.18.127.248:50010 (bpdevdmsdbs02) Hostname: bpdevdmsdbs02 Rack: /default Decommission Status : Normal Configured Capacity: 550193798759 (512.41 GB) DFS Used: 7394033664 (6.89 GB) Non DFS Used: 169131224679 (157.52 GB) DFS Remaining: 373668540416 (348.01 GB) DFS Used%: 1.34% DFS Remaining%: 67.92% Last contact: Thu Mar 20 16:05:44 EST 2014 Name: 172.18.127.245:50010 (bpdevdmsdbs01) Hostname: bpdevdmsdbs01 Rack: /default Decommission Status : Normal Configured Capacity: 550193798759 (512.41 GB) DFS Used: 7393894400 (6.89 GB) Non DFS Used: 204067216999 (190.05 GB) DFS Remaining: 338732687360 (315.47 GB) DFS Used%: 1.34% DFS Remaining%: 61.57% Last contact: Thu Mar 20 16:05:44 EST 2014 Dead datanodes: Name: 172.18.126.99:50010 (nsda3dmsrpt02.internal.bigpond.com) Hostname: nsda3dmsrpt02.internal.bigpond.com Rack: /default Decommission Status : Normal Configured Capacity: 0 (0 B) DFS Used: 0 (0 B) Non DFS Used: 0 (0 B) DFS Remaining: 0 (0 B) DFS Used%: 100.00% DFS Remaining%: 0.00% Last contact: Wed Mar 19 11:44:44 EST 2014 Thanks and Regards, Truong Phan P+ 61 2 8576 5771 M + 61 4 1463 7424 Etroung.p...@team.telstra.com W www.telstra.comhttps://email-cn.huawei.com/owa/UrlBlockedError.aspx From: Brahma Reddy Battula [mailto:brahmareddy.batt...@huawei.com] Sent: Thursday, 20 March 2014 3:27 PM To: user@hadoop.apache.org Subject: RE: how to free up space of the old Data Node Please check my inline comments which are in blue color... From: Phan, Truong Q [troung.p...@team.telstra.com] Sent: Thursday, March 20, 2014 8:04 AM To: user@hadoop.apache.orgmailto:user@hadoop.apache.org Subject: how to free up space of the old Data Node Hi I have 3 nodes Hadoop cluster in which I created 3 Data Nodes. However, I don’t have enough space in one of the node to cater other projects’ log. So I decommissioned this node from a Data node list but I could not re-claimed the space from it. is your Replication is 3..? If it is 3 and as you have 3 datanodes,ideally disk space occupied by all nodes should be same(47G, should be present in all the DN's).. And if you RF=3,Decommission will not be success as you've only 3 DN's..you need to add another DN to cluster,,then only decommission will be success.. Hence please mention the replication factor of the file.. Is there a way to get this node to release space? ways are there,,but you need to mention, why only this node disk is full..why not other nodes..? is it because,this node is having less space compared to other nodes If RF=3, then make RF=2(decrease the
RE: NullpointerException in OffLineImageViewer
Seems to be issue,,please file a jira.. https://issues.apache.org/jira From: c.agra...@outlook.com [c.agra...@outlook.com] on behalf of Chetan Agrawal [cagrawa...@gmail.com] Sent: Wednesday, March 19, 2014 12:50 PM To: hadoop users Subject: NullpointerException in OffLineImageViewer I want to access and study the hadoop cluster's metadata which is stored in fsimage file on namenode machine. i came to know that offLineImageViewer is used to do so. But when i am trying doing it i am getting an exception. /usr/hadoop/hadoop-1.2.1# bin/hadoop oiv -i fsimage -o fsimage.txt Warning: $HADOOP_HOME is deprecated. Exception in thread main java.lang.NullPointerException at org.apache.hadoop.hdfs.tools.offlineImageViewer.OfflineImageViewer.go(OfflineImageViewer.java:141) at org.apache.hadoop.hdfs.tools.offlineImageViewer.OfflineImageViewer.main(OfflineImageViewer.java:261) i am not able to solve this error. Is it due to that warning (Warning: $HADOOP_HOME is deprecated.) or there is something else? I am having hadoop-1.2.1 version
Re: Need FileName with Content
Hi, If we give the below code, === word.set(filename++tokenizer.nextToken()); output.collect(word,one); == The output is wrong. because it shows the filename word occurance vinitha java 4 vinitha oracle 3 sony java 4 sony oracle 3 Here vinitha does not have oracle word . Similarlly sony does not have java has word. File name is merging for all words. I need the output has given below filename word occurance vinitha java 4 vinitha C++3 sony ETL 4 sony oracle 3 Need fileaName along with the word in that particular file only. No merge should happen. Please help me out for this issue. Please help. Thanks in advance. Ranjini On Thu, Mar 20, 2014 at 10:56 AM, Ranjini Rathinam ranjinibe...@gmail.comwrote: -- Forwarded message -- From: Stanley Shi s...@gopivotal.com Date: Thu, Mar 20, 2014 at 7:39 AM Subject: Re: Need FileName with Content To: user@hadoop.apache.org You want to do a word count for each file, but the code give you a word count for all the files, right? = word.set(tokenizer.nextToken()); output.collect(word, one); == change it to: word.set(filename++tokenizer.nextToken()); output.collect(word,one); Regards, *Stanley Shi,* On Wed, Mar 19, 2014 at 8:50 PM, Ranjini Rathinam ranjinibe...@gmail.comwrote: Hi, I have folder named INPUT. Inside INPUT i have 5 resume are there. hduser@localhost:~/Ranjini$ hadoop fs -ls /user/hduser/INPUT Found 5 items -rw-r--r-- 1 hduser supergroup 5438 2014-03-18 15:20 /user/hduser/INPUT/Rakesh Chowdary_Microstrategy.txt -rw-r--r-- 1 hduser supergroup 6022 2014-03-18 15:22 /user/hduser/INPUT/Ramarao Devineni_Microstrategy.txt -rw-r--r-- 1 hduser supergroup 3517 2014-03-18 15:21 /user/hduser/INPUT/vinitha.txt -rw-r--r-- 1 hduser supergroup 3517 2014-03-18 15:21 /user/hduser/INPUT/sony.txt -rw-r--r-- 1 hduser supergroup 3517 2014-03-18 15:21 /user/hduser/INPUT/ravi.txt hduser@localhost:~/Ranjini$ I have to process the folder and the content . I need ouput has filename word occurance vinitha java 4 sony oracle 3 But iam not getting the filename. Has the input file content are merged file name is not getting correct . please help in this issue to fix. I have given by code below import java.io.IOException; import java.util.*; import org.apache.hadoop.fs.Path; import org.apache.hadoop.conf.*; import org.apache.hadoop.io.*; import org.apache.hadoop.mapred.*; import org.apache.hadoop.util.*; import java.io.File; import java.io.FileReader; import java.io.FileWriter; import java.io.IOException; import org.apache.hadoop.fs.Path; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.FileSystem; import org.apache.hadoop.fs.FileStatus; import org.apache.hadoop.conf.*; import org.apache.hadoop.io.*; import org.apache.hadoop.mapred.*; import org.apache.hadoop.util.*; import org.apache.hadoop.mapred.lib.*; public class WordCount { public static class Map extends MapReduceBase implements MapperLongWritable, Text, Text, IntWritable { private final static IntWritable one = new IntWritable(1); private Text word = new Text(); public void map(LongWritable key, Text value, OutputCollectorText, IntWritable output, Reporter reporter) throws IOException { FSDataInputStream fs=null; FileSystem hdfs = null; String line = value.toString(); int i=0,k=0; try{ Configuration configuration = new Configuration(); configuration.set(fs.default.name, hdfs://localhost:4440/); Path srcPath = new Path(/user/hduser/INPUT/); hdfs = FileSystem.get(configuration); FileStatus[] status = hdfs.listStatus(srcPath); fs=hdfs.open(srcPath); BufferedReader br=new BufferedReader(new InputStreamReader(hdfs.open(srcPath))); String[] splited = line.split(\\s+); for( i=0;isplited.length;i++) { String sp[]=splited[i].split(,); for( k=0;ksp.length;k++) { if(!sp[k].isEmpty()){ StringTokenizer tokenizer = new StringTokenizer(sp[k]); if((sp[k].equalsIgnoreCase(C))){ while (tokenizer.hasMoreTokens()) { word.set(tokenizer.nextToken()); output.collect(word, one); } } if((sp[k].equalsIgnoreCase(JAVA))){ while (tokenizer.hasMoreTokens()) { word.set(tokenizer.nextToken()); output.collect(word, one); } } } } } } catch (IOException e) { e.printStackTrace(); } } } public static class Reduce extends MapReduceBase implements ReducerText, IntWritable, Text, IntWritable { public void reduce(Text key, IteratorIntWritable values, OutputCollectorText, IntWritable output, Reporter reporter) throws IOException {