RE: How to configure nodemanager.health-checker.script.path

2014-03-19 Thread Rohith Sharma K S
Hi

Health script should execute successfully. If your health check required to 
fail, than add ERROR that print in console.  This is because health script may 
fail because of Syntax error, Command not found(IOexception) or several other 
reasons.

In order to work health script,
Do not add exit -1.

#!/bin/bash
echo ERROR disk full

Thanks  Regards
Rohith Sharma K S

From: Anfernee Xu [mailto:anfernee...@gmail.com]
Sent: 19 March 2014 10:32
To: user
Subject: How to configure nodemanager.health-checker.script.path

Hello,

I'm running MR with 2.2.0 release, I noticed we can configure 
nodemanager.health-checker.script.path in yarn-site.xml to customize NM 
health checking, so I add below properties to yarn-site.xml

 property
 nameyarn.nodemanager.health-checker.script.path/name
 value/scratch/software/hadoop2/hadoop-dc/node_health.sh/value
   /property

  property
 nameyarn.nodemanager.health-checker.interval-ms/name
 value1/value
   /property

To get a feel about this, the 
/scratch/software/hadoop2/hadoop-dc/node_health.sh simply print an ERROR 
message as below

#!/bin/bash
echo ERROR disk full
exit -1

But it seems not working, the node is still in health state, did I missing 
something?

Thanks for your help.
--
--Anfernee


RE: Benchmark Failure

2014-03-19 Thread Brahma Reddy Battula
Seems to be this is issue, which is logged..Please check following jira for 
sameHope you also facing same issue...



https://issues.apache.org/jira/browse/HDFS-4929







Thanks  Regards



Brahma Reddy Battula




From: Lixiang Ao [aolixi...@gmail.com]
Sent: Tuesday, March 18, 2014 10:34 AM
To: user@hadoop.apache.org
Subject: Re: Benchmark Failure


the version is release 2.2.0

2014年3月18日 上午12:26于 Lixiang Ao 
aolixi...@gmail.commailto:aolixi...@gmail.com写道:
Hi all,

I'm running jobclient tests(on single node), other tests like TestDFSIO, 
mrbench succeed except nnbench.

I got a lot of Exceptions but without any explanation(see below).

Could anyone tell me what might went wrong?

Thanks!


14/03/17 23:54:22 INFO hdfs.NNBench: Waiting in barrier for: 112819 ms
14/03/17 23:54:23 INFO mapreduce.Job: Job job_local2133868569_0001 running in 
uber mode : false
14/03/17 23:54:23 INFO mapreduce.Job:  map 0% reduce 0%
14/03/17 23:54:28 INFO mapred.LocalJobRunner: 
hdfs://0.0.0.0:9000/benchmarks/NNBench-aolx-PC/control/NNBench_Controlfile_10:0+125http://0.0.0.0:9000/benchmarks/NNBench-aolx-PC/control/NNBench_Controlfile_10:0+125
  map
14/03/17 23:54:29 INFO mapreduce.Job:  map 6% reduce 0%
14/03/17 23:56:15 INFO hdfs.NNBench: Exception recorded in op: 
Create/Write/Close
14/03/17 23:56:15 INFO hdfs.NNBench: Exception recorded in op: 
Create/Write/Close
14/03/17 23:56:15 INFO hdfs.NNBench: Exception recorded in op: 
Create/Write/Close
14/03/17 23:56:15 INFO hdfs.NNBench: Exception recorded in op: 
Create/Write/Close
14/03/17 23:56:15 INFO hdfs.NNBench: Exception recorded in op: 
Create/Write/Close
14/03/17 23:56:15 INFO hdfs.NNBench: Exception recorded in op: 
Create/Write/Close
14/03/17 23:56:15 INFO hdfs.NNBench: Exception recorded in op: 
Create/Write/Close
14/03/17 23:56:15 INFO hdfs.NNBench: Exception recorded in op: 
Create/Write/Close
(1000 Exceptions)
.
.
.
results:

File System Counters
FILE: Number of bytes read=18769411
FILE: Number of bytes written=21398315
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=11185
HDFS: Number of bytes written=19540
HDFS: Number of read operations=325
HDFS: Number of large read operations=0
HDFS: Number of write operations=13210
Map-Reduce Framework
Map input records=12
Map output records=95
Map output bytes=1829
Map output materialized bytes=2091
Input split bytes=1538
Combine input records=0
Combine output records=0
Reduce input groups=8
Reduce shuffle bytes=0
Reduce input records=95
Reduce output records=8
Spilled Records=214
Shuffled Maps =0
Failed Shuffles=0
Merged Map outputs=0
GC time elapsed (ms)=211
CPU time spent (ms)=0
Physical memory (bytes) snapshot=0
Virtual memory (bytes) snapshot=0
Total committed heap usage (bytes)=4401004544
File Input Format Counters
Bytes Read=1490
File Output Format Counters
Bytes Written=170
14/03/17 23:56:18 INFO hdfs.NNBench: -- NNBench -- :
14/03/17 23:56:18 INFO hdfs.NNBench:Version: 
NameNode Benchmark 0.4
14/03/17 23:56:18 INFO hdfs.NNBench:Date  time: 
2014-03-17 23:56:18,619
14/03/17 23:56:18 INFO hdfs.NNBench:
14/03/17 23:56:18 INFO hdfs.NNBench: Test Operation: 
create_write
14/03/17 23:56:18 INFO hdfs.NNBench: Start time: 
2014-03-17 23:56:15,521
14/03/17 23:56:18 INFO hdfs.NNBench:Maps to run: 12
14/03/17 23:56:18 INFO hdfs.NNBench: Reduces to run: 6
14/03/17 23:56:18 INFO hdfs.NNBench: Block Size (bytes): 1
14/03/17 23:56:18 INFO hdfs.NNBench: Bytes to write: 0
14/03/17 23:56:18 INFO hdfs.NNBench: Bytes per checksum: 1
14/03/17 23:56:18 INFO hdfs.NNBench:Number of files: 
1000
14/03/17 23:56:18 INFO hdfs.NNBench: Replication factor: 3
14/03/17 23:56:18 INFO hdfs.NNBench: Successful file operations: 0
14/03/17 23:56:18 INFO hdfs.NNBench:
14/03/17 23:56:18 INFO hdfs.NNBench: # maps that missed the barrier: 11
14/03/17 23:56:18 INFO hdfs.NNBench:   # exceptions: 
1000
14/03/17 23:56:18 INFO hdfs.NNBench:
14/03/17 23:56:18 INFO hdfs.NNBench:TPS: Create/Write/Close: 0
14/03/17 23:56:18 INFO hdfs.NNBench: Avg exec time (ms): Create/Write/Close: 
Infinity
14/03/17 23:56:18 INFO hdfs.NNBench: Avg Lat (ms): Create/Write: NaN
14/03/17 23:56:18 INFO hdfs.NNBench:Avg Lat (ms): Close: NaN
14/03/17 23:56:18 INFO hdfs.NNBench:
14/03/17 23:56:18 INFO hdfs.NNBench:  RAW DATA: AL Total #1: 0
14/03/17 23:56:18 INFO hdfs.NNBench:  RAW DATA: AL Total #2: 0
14/03/17 23:56:18 INFO hdfs.NNBench:   RAW DATA: TPS Total (ms): 
1131
14/03/17 23:56:18 INFO hdfs.NNBench:RAW DATA: Longest Map 

NodeHealthReport local-dirs turned bad

2014-03-19 Thread Margusja

Hi

I have one node in unhealthy status:




Total Vmem allocated for Containers 4.20 GB
Vmem enforcement enabledfalse
Total Pmem allocated for Container  2 GB
Pmem enforcement enabledfalse
NodeHealthyStatus   false
LastNodeHealthTime  Wed Mar 19 13:31:24 EET 2014
NodeHealthReport 	1/1 local-dirs turned bad: /hadoop/yarn/local;1/1 
log-dirs turned bad: /hadoop/yarn/log
Node Manager Version: 	2.2.0.2.0.6.0-101 from 
b07b2906c36defd389c8b5bd22bebc1bead8115b by jenkins source checksum 
82bd166aa0ada92b44f8a154836b92 on 2014-01-09T05:24Z
Hadoop Version: 	2.2.0.2.0.6.0-101 from 
b07b2906c36defd389c8b5bd22bebc1bead8115b by jenkins source checksum 
704f1e463ebc4fb89353011407e965 on 2014-01-09T05:18Z




I tried:
Deleted /hadoop/* and did namenode -format again
Restarted nodemanager but still in unhealthy mode.

Is there any guideline what I should do?

--
Best regards, Margus (Margusja) Roo
+372 51 48 780
http://margus.roo.ee
http://ee.linkedin.com/in/margusroo
skype: margusja
ldapsearch -x -h ldap.sk.ee -b c=EE (serialNumber=37303140314)



RE: NodeHealthReport local-dirs turned bad

2014-03-19 Thread Rohith Sharma K S
Hi

There is no relation to NameNode format

Does NodeManger is started with default configuration? If no , any NodeManger 
health script is configured?

Suspect can be  
1. /hadoop does not have permission or 
2. disk is full

Thanks  Regards
Rohith Sharma K S


-Original Message-
From: Margusja [mailto:mar...@roo.ee] 
Sent: 19 March 2014 17:04
To: user@hadoop.apache.org
Subject: NodeHealthReport local-dirs turned bad

Hi

I have one node in unhealthy status:




Total Vmem allocated for Containers 4.20 GB
Vmem enforcement enabledfalse
Total Pmem allocated for Container  2 GB
Pmem enforcement enabledfalse
NodeHealthyStatus   false
LastNodeHealthTime  Wed Mar 19 13:31:24 EET 2014
NodeHealthReport1/1 local-dirs turned bad: /hadoop/yarn/local;1/1 
log-dirs turned bad: /hadoop/yarn/log
Node Manager Version:   2.2.0.2.0.6.0-101 from 
b07b2906c36defd389c8b5bd22bebc1bead8115b by jenkins source checksum
82bd166aa0ada92b44f8a154836b92 on 2014-01-09T05:24Z
Hadoop Version: 2.2.0.2.0.6.0-101 from 
b07b2906c36defd389c8b5bd22bebc1bead8115b by jenkins source checksum
704f1e463ebc4fb89353011407e965 on 2014-01-09T05:18Z



I tried:
Deleted /hadoop/* and did namenode -format again Restarted nodemanager but 
still in unhealthy mode.

Is there any guideline what I should do?

--
Best regards, Margus (Margusja) Roo
+372 51 48 780
http://margus.roo.ee
http://ee.linkedin.com/in/margusroo
skype: margusja
ldapsearch -x -h ldap.sk.ee -b c=EE (serialNumber=37303140314)



Webinar On Hadoop!

2014-03-19 Thread Vivek Kumar
Hi ,

Want to Know Big Data / Hadoop ?  If yes , join us for Free Webinar by
industry experts at below link.


*FREE webinar on Hadoop, Hosted by : Manoj , Research Director*

*Join us for a webinar on Mar 19, 2014 at 8:00 PM IST.*

*Register now!*

https://attendee.gotowebinar.com/register/54180991637732354

*Discussion Topics? *

*What is Big Data ?

*Challenges in Big Data

*What is Hadoop ?

*Opportunities in Hadoop / Big Data


For further details visit us at www.soapttrainings.com

Best Regards,
*Kumar Vivek |* Director

M +91-7675824584| si...@soapt.com
www.soapttrainings.com http://soapttrainings.com/index.php?action=1#hadoop|
www.openbravo.com
 #2, 38/A, Above Docomo Office, Madhapur, Hyderabad


Re: NodeHealthReport local-dirs turned bad

2014-03-19 Thread Margusja
tnx got it work. In my init script I used wrong user. It was permissions 
problem like Rohith said.


Best regards, Margus (Margusja) Roo
+372 51 48 780
http://margus.roo.ee
http://ee.linkedin.com/in/margusroo
skype: margusja
ldapsearch -x -h ldap.sk.ee -b c=EE (serialNumber=37303140314)

On 19/03/14 14:08, Rohith Sharma K S wrote:

Hi

There is no relation to NameNode format

Does NodeManger is started with default configuration? If no , any NodeManger 
health script is configured?

Suspect can be
 1. /hadoop does not have permission or
 2. disk is full

Thanks  Regards
Rohith Sharma K S


-Original Message-
From: Margusja [mailto:mar...@roo.ee]
Sent: 19 March 2014 17:04
To: user@hadoop.apache.org
Subject: NodeHealthReport local-dirs turned bad

Hi

I have one node in unhealthy status:




Total Vmem allocated for Containers 4.20 GB
Vmem enforcement enabledfalse
Total Pmem allocated for Container  2 GB
Pmem enforcement enabledfalse
NodeHealthyStatus   false
LastNodeHealthTime  Wed Mar 19 13:31:24 EET 2014
NodeHealthReport1/1 local-dirs turned bad: /hadoop/yarn/local;1/1
log-dirs turned bad: /hadoop/yarn/log
Node Manager Version:   2.2.0.2.0.6.0-101 from
b07b2906c36defd389c8b5bd22bebc1bead8115b by jenkins source checksum
82bd166aa0ada92b44f8a154836b92 on 2014-01-09T05:24Z
Hadoop Version: 2.2.0.2.0.6.0-101 from
b07b2906c36defd389c8b5bd22bebc1bead8115b by jenkins source checksum
704f1e463ebc4fb89353011407e965 on 2014-01-09T05:18Z



I tried:
Deleted /hadoop/* and did namenode -format again Restarted nodemanager but 
still in unhealthy mode.

Is there any guideline what I should do?

--
Best regards, Margus (Margusja) Roo
+372 51 48 780
http://margus.roo.ee
http://ee.linkedin.com/in/margusroo
skype: margusja
ldapsearch -x -h ldap.sk.ee -b c=EE (serialNumber=37303140314)





Need FileName with Content

2014-03-19 Thread Ranjini Rathinam
Hi,

I have folder named INPUT.

Inside INPUT i have 5 resume are there.

hduser@localhost:~/Ranjini$ hadoop fs -ls /user/hduser/INPUT
Found 5 items
-rw-r--r--   1 hduser supergroup   5438 2014-03-18 15:20
/user/hduser/INPUT/Rakesh Chowdary_Microstrategy.txt
-rw-r--r--   1 hduser supergroup   6022 2014-03-18 15:22
/user/hduser/INPUT/Ramarao Devineni_Microstrategy.txt
-rw-r--r--   1 hduser supergroup   3517 2014-03-18 15:21
/user/hduser/INPUT/vinitha.txt
-rw-r--r--   1 hduser supergroup   3517 2014-03-18 15:21
/user/hduser/INPUT/sony.txt
-rw-r--r--   1 hduser supergroup   3517 2014-03-18 15:21
/user/hduser/INPUT/ravi.txt
hduser@localhost:~/Ranjini$

I have to process the folder and the content .

I need ouput has

filename   word   occurance
vinitha   java   4
sony  oracle  3



But iam not getting the filename.  Has the input file content are merged
file name is not getting correct .


please help in this issue to fix.  I have given by code below


 import java.io.IOException;
 import java.util.*;
 import org.apache.hadoop.fs.Path;
 import org.apache.hadoop.conf.*;
 import org.apache.hadoop.io.*;
 import org.apache.hadoop.mapred.*;
 import org.apache.hadoop.util.*;
import java.io.File;
import java.io.FileReader;
import java.io.FileWriter;
import java.io.IOException;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.FileStatus;
import org.apache.hadoop.conf.*;
import org.apache.hadoop.io.*;
import org.apache.hadoop.mapred.*;
import org.apache.hadoop.util.*;
import org.apache.hadoop.mapred.lib.*;

 public class WordCount {
public static class Map extends MapReduceBase implements
MapperLongWritable, Text, Text, IntWritable {
 private final static IntWritable one = new IntWritable(1);
  private Text word = new Text();
  public void map(LongWritable key, Text value, OutputCollectorText,
IntWritable output, Reporter reporter) throws IOException {
   FSDataInputStream fs=null;
   FileSystem hdfs = null;
   String line = value.toString();
 int i=0,k=0;
  try{
   Configuration configuration = new Configuration();
  configuration.set(fs.default.name, hdfs://localhost:4440/);

   Path srcPath = new Path(/user/hduser/INPUT/);

   hdfs = FileSystem.get(configuration);
   FileStatus[] status = hdfs.listStatus(srcPath);
   fs=hdfs.open(srcPath);
   BufferedReader br=new BufferedReader(new
InputStreamReader(hdfs.open(srcPath)));

String[] splited = line.split(\\s file://s/+);
for( i=0;isplited.length;i++)
 {
 String sp[]=splited[i].split(,);
 for( k=0;ksp.length;k++)
 {

   if(!sp[k].isEmpty()){
StringTokenizer tokenizer = new StringTokenizer(sp[k]);
if((sp[k].equalsIgnoreCase(C))){
while (tokenizer.hasMoreTokens()) {
  word.set(tokenizer.nextToken());
  output.collect(word, one);
}
}
if((sp[k].equalsIgnoreCase(JAVA))){
while (tokenizer.hasMoreTokens()) {
  word.set(tokenizer.nextToken());
  output.collect(word, one);
}
}
  }
}
}
 } catch (IOException e) {
e.printStackTrace();
 }
}
}
public static class Reduce extends MapReduceBase implements
ReducerText, IntWritable, Text, IntWritable {
  public void reduce(Text key, IteratorIntWritable values,
OutputCollectorText, IntWritable output, Reporter reporter) throws
IOException {
int sum = 0;
while (values.hasNext()) {
  sum += values.next().get();
}
output.collect(key, new IntWritable(sum));
  }
}
public static void main(String[] args) throws Exception {


  JobConf conf = new JobConf(WordCount.class);
  conf.setJobName(wordcount);
  conf.setOutputKeyClass(Text.class);
  conf.setOutputValueClass(IntWritable.class);
  conf.setMapperClass(Map.class);
  conf.setCombinerClass(Reduce.class);
  conf.setReducerClass(Reduce.class);
  conf.setInputFormat(TextInputFormat.class);
  conf.setOutputFormat(TextOutputFormat.class);
  FileInputFormat.setInputPaths(conf, new Path(args[0]));
  FileOutputFormat.setOutputPath(conf, new Path(args[1]));
  JobClient.runJob(conf);
}
 }



Please help

Thanks in advance.

Ranjini


Re: I am about to lose all my data please help

2014-03-19 Thread Fatih Haltas
Thanks for you helps, but still could not solve my problem.


On Tue, Mar 18, 2014 at 10:13 AM, Stanley Shi s...@gopivotal.com wrote:

 Ah yes, I overlooked this. Then please check the file are there or not:
 ls /home/hadoop/project/hadoop-data/dfs/name?

 Regards,
 *Stanley Shi,*



 On Tue, Mar 18, 2014 at 2:06 PM, Azuryy Yu azury...@gmail.com wrote:

 I don't think this is the case, because there is;
   property
 namehadoop.tmp.dir/name
 value/home/hadoop/project/hadoop-data/value
   /property


 On Tue, Mar 18, 2014 at 1:55 PM, Stanley Shi s...@gopivotal.com wrote:

 one possible reason is that you didn't set the namenode working
 directory, by default it's in /tmp folder; and the /tmp folder might
 get deleted by the OS without any notification. If this is the case, I am
 afraid you have lost all your namenode data.

 *property
   namedfs.name.dir/name
   value${hadoop.tmp.dir}/dfs/name/value
   descriptionDetermines where on the local filesystem the DFS name node
   should store the name table(fsimage).  If this is a comma-delimited 
 list
   of directories then the name table is replicated in all of the
   directories, for redundancy. /description
 /property*


 Regards,
 *Stanley Shi,*



 On Sun, Mar 16, 2014 at 5:29 PM, Mirko Kämpf mirko.kae...@gmail.comwrote:

 Hi,

 what is the location of the namenodes fsimage and editlogs?
 And how much memory has the NameNode.

 Did you work with a Secondary NameNode or a Standby NameNode for
 checkpointing?

 Where are your HDFS blocks located, are those still safe?

 With this information at hand, one might be able to fix your setup, but
 do not format the old namenode before
 all is working with a fresh one.

 Grab a copy of the maintainance guide:
 http://shop.oreilly.com/product/0636920025085.do?sortby=publicationDate
 which helps solving such type of problems as well.

 Best wishes
 Mirko


 2014-03-16 9:07 GMT+00:00 Fatih Haltas fatih.hal...@nyu.edu:

 Dear All,

 I have just restarted machines of my hadoop clusters. Now, I am trying
 to restart hadoop clusters again, but getting error on namenode restart. I
 am afraid of loosing my data as it was properly running for more than 3
 months. Currently, I believe if I do namenode formatting, it will work
 again, however, data will be lost. Is there anyway to solve this without
 losing the data.

 I will really appreciate any help.

 Thanks.


 =
 Here is the logs;
 
 2014-02-26 16:02:39,698 INFO
 org.apache.hadoop.hdfs.server.namenode.NameNode: STARTUP_MSG:
 /
 STARTUP_MSG: Starting NameNode
 STARTUP_MSG:   host = ADUAE042-LAP-V/127.0.0.1
 STARTUP_MSG:   args = []
 STARTUP_MSG:   version = 1.0.4
 STARTUP_MSG:   build =
 https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.0 -r
 1393290; compiled by 'hortonfo' on Wed Oct  3 05:13:58 UTC 2012
 /
 2014-02-26 16:02:40,005 INFO
 org.apache.hadoop.metrics2.impl.MetricsConfig: loaded properties from
 hadoop-metrics2.properties
 2014-02-26 16:02:40,019 INFO
 org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source
 MetricsSystem,sub=Stats registered.
 2014-02-26 16:02:40,021 INFO
 org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled snapshot
 period at 10 second(s).
 2014-02-26 16:02:40,021 INFO
 org.apache.hadoop.metrics2.impl.MetricsSystemImpl: NameNode metrics system
 started
 2014-02-26 16:02:40,169 INFO
 org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source ugi
 registered.
 2014-02-26 16:02:40,193 INFO
 org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source jvm
 registered.
 2014-02-26 16:02:40,194 INFO
 org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source
 NameNode registered.
 2014-02-26 16:02:40,242 INFO org.apache.hadoop.hdfs.util.GSet: VM type
   = 64-bit
 2014-02-26 16:02:40,242 INFO org.apache.hadoop.hdfs.util.GSet: 2% max
 memory = 17.77875 MB
 2014-02-26 16:02:40,242 INFO org.apache.hadoop.hdfs.util.GSet:
 capacity  = 2^21 = 2097152 entries
 2014-02-26 16:02:40,242 INFO org.apache.hadoop.hdfs.util.GSet:
 recommended=2097152, actual=2097152
 2014-02-26 16:02:40,273 INFO
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem: fsOwner=hadoop
 2014-02-26 16:02:40,273 INFO
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem: supergroup=supergroup
 2014-02-26 16:02:40,274 INFO
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem:
 isPermissionEnabled=true
 2014-02-26 16:02:40,279 INFO
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem:
 dfs.block.invalidate.limit=100
 2014-02-26 16:02:40,279 INFO
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem:
 isAccessTokenEnabled=false accessKeyUpdateInterval=0 min(s),
 accessTokenLifetime=0 min(s)
 2014-02-26 16:02:40,724 INFO
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Registered
 FSNamesystemStateMBean and NameNodeMXBean
 

Doubt

2014-03-19 Thread sri harsha
Hi all,
is it possible to install Mongodb on the same VM which consists hadoop?

-- 
amiable harsha


question about yarn webapp

2014-03-19 Thread 赵璨
   I can get Server Stacks from web ui. But I don't know which code handle
the function, how can the web app get the stacks information from jvm?


Re: Doubt

2014-03-19 Thread Jay Vyas
Certainly it is , and quite common especially if you have some high
performance machines : they  can run as mapreduce slaves and also double as
mongo hosts.  The problem would of course be that when running mapreduce
jobs you might have very slow network bandwidth at times, and if your front
end needs fast response times all the time from mongo instances you could
be in trouble.



On Wed, Mar 19, 2014 at 11:50 AM, praveenesh kumar praveen...@gmail.comwrote:

 Why not ? Its just a matter of installing 2 different packages.
 Depends on what do you want to use it for, you need to take care of few
 things, but as far as installation is concerned, it should be easily doable.

 Regards
 Prav


 On Wed, Mar 19, 2014 at 3:41 PM, sri harsha rsharsh...@gmail.com wrote:

 Hi all,
 is it possible to install Mongodb on the same VM which consists hadoop?

 --
 amiable harsha





-- 
Jay Vyas
http://jayunit100.blogspot.com


JSR 203 NIO 2 for HDFS

2014-03-19 Thread Damien Carol

Hi,

I'm working on minimal implementation of JSR 203 to provide access to 
HDFS (1.2.1) for a GUI tool needed in my company.


Some features already works as create a directory, delete something, 
list files in directory.


I would know if someone already worked on something like that. Maybe a 
FOSS had already did this?


Anyway if someone want to help me in this task, it's here :

g...@github.com:damiencarol/jsr203-hadoop.git

Regards,

Damien CAROL



Re: Is Hadoop's TooRunner thread-safe?

2014-03-19 Thread Something Something
Any thoughts on this?  Confirm or Deny it's an issue.. may be?


On Mon, Mar 17, 2014 at 11:43 AM, Something Something 
mailinglist...@gmail.com wrote:

 I would like to trigger a few Hadoop jobs simultaneously.  I've created a
 pool of threads using Executors.newFixedThreadPool.  Idea is that if the
 pool size is 2, my code will trigger 2 Hadoop jobs at the same exact time
 using 'ToolRunner.run'.  In my testing, I noticed that these 2 threads
 keep stepping on each other.

 When I looked under the hood, I noticed that ToolRunner creates
 GenericOptionsParser which in turn calls a static method
 'buildGeneralOptions'.  This method uses 'OptionBuilder.withArgName'
 which uses an instance variable called, 'argName'.  This doesn't look
 thread safe to me and I believe is the root cause of issues I am running
 into.

 Any thoughts?



Is Hadoop's TooRunner thread-safe?

2014-03-19 Thread Something Something
I would like to trigger a few Hadoop jobs simultaneously.  I've created a
pool of threads using Executors.newFixedThreadPool.  Idea is that if the
pool size is 2, my code will trigger 2 Hadoop jobs at the same exact time
using 'ToolRunner.run'.  In my testing, I noticed that these 2 threads keep
stepping on each other.

When I looked under the hood, I noticed that ToolRunner creates
GenericOptionsParser which in turn calls a static method
'buildGeneralOptions'.  This method uses 'OptionBuilder.withArgName' which
uses an instance variable called, 'argName'.  This doesn't look thread safe
to me and I believe is the root cause of issues I am running into.

Any thoughts?


Class loading in Hadoop and HBase

2014-03-19 Thread Amit Sela
Hi all,
I'm running with Hadoop 1.0.4 and HBase 0.94.12 bundled (OSGi) versions I
built.
Most issues I encountered are related to class loaders.

One of the patterns I noticed in both projects is:

ClassLoader cl = Thread.currentThread().getContextClassLoader();
  if(cl == null) {
  cl = Clazz.class.getClassLoader();
}

Where Clazz is the Class containing this code.

I was wondering about this choice... Why not go the other way around:

ClassLoader cl = Clazz.class.getClassLoader();
  if(cl == null) {
  cl = Thread.currentThread().getContextClassLoader();
}

And in a more general note, why not always use Configuration (and let it's
cl be this.getClass().getClassLoader()) to load classes ?

That would surely help in integration with modularity frameworks.

Thanks,
Amit.


Re: Doubt

2014-03-19 Thread sri harsha
thank s jay and praveen,
i want to use both separately don't want to use mongodb in the place of
hbase


On Wed, Mar 19, 2014 at 9:25 PM, Jay Vyas jayunit...@gmail.com wrote:

 Certainly it is , and quite common especially if you have some high
 performance machines : they  can run as mapreduce slaves and also double as
 mongo hosts.  The problem would of course be that when running mapreduce
 jobs you might have very slow network bandwidth at times, and if your front
 end needs fast response times all the time from mongo instances you could
 be in trouble.



 On Wed, Mar 19, 2014 at 11:50 AM, praveenesh kumar 
 praveen...@gmail.comwrote:

 Why not ? Its just a matter of installing 2 different packages.
 Depends on what do you want to use it for, you need to take care of few
 things, but as far as installation is concerned, it should be easily doable.

 Regards
 Prav


 On Wed, Mar 19, 2014 at 3:41 PM, sri harsha rsharsh...@gmail.com wrote:

 Hi all,
 is it possible to install Mongodb on the same VM which consists hadoop?

 --
 amiable harsha





 --
 Jay Vyas
 http://jayunit100.blogspot.com




-- 
amiable harsha


Re: Doubt

2014-03-19 Thread praveenesh kumar
Why not ? Its just a matter of installing 2 different packages.
Depends on what do you want to use it for, you need to take care of few
things, but as far as installation is concerned, it should be easily doable.

Regards
Prav


On Wed, Mar 19, 2014 at 3:41 PM, sri harsha rsharsh...@gmail.com wrote:

 Hi all,
 is it possible to install Mongodb on the same VM which consists hadoop?

 --
 amiable harsha



The reduce copier failed

2014-03-19 Thread Mahmood Naderan
Hi
In the middle of a map-reduce job I get

map 20% reduce 6%
...
The reduce copier failed

map 20% reduce 0%
map 20% reduce 1%

map 20% reduce 2%
map 20% reduce 3%
 

Does that imply a *retry* process? Or I have to be worried about that message?


Regards,
Mahmood

Re: question about yarn webapp

2014-03-19 Thread Harsh J
Hello,

This is a hadoop-common functionality. See the StacksServlet class
code: 
https://github.com/apache/hadoop-common/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/http/HttpServer2.java#L1044

On Wed, Mar 19, 2014 at 9:17 PM, 赵璨 asoq...@gmail.com wrote:
I can get Server Stacks from web ui. But I don't know which code handle
 the function, how can the web app get the stacks information from jvm?



-- 
Harsh J


Re: The reduce copier failed

2014-03-19 Thread Harsh J
While it does mean a retry, if the job eventually fails (after finite
retries all fail as well), then you have a problem to investigate. If
the job eventually succeeded, then this may have been a transient
issue. Worth investigating either way.

On Thu, Mar 20, 2014 at 12:57 AM, Mahmood Naderan nt_mahm...@yahoo.com wrote:
 Hi
 In the middle of a map-reduce job I get

 map 20% reduce 6%
 ...
 The reduce copier failed
 
 map 20% reduce 0%
 map 20% reduce 1%
 map 20% reduce 2%
 map 20% reduce 3%


 Does that imply a *retry* process? Or I have to be worried about that
 message?

 Regards,
 Mahmood



-- 
Harsh J


Re: Need FileName with Content

2014-03-19 Thread Stanley Shi
You want to do a word count for each file, but the code give you a word
count for all the files, right?

=
word.set(tokenizer.nextToken());
  output.collect(word, one);
==
change it to:
word.set(filename++tokenizer.nextToken());
output.collect(word,one);




Regards,
*Stanley Shi,*



On Wed, Mar 19, 2014 at 8:50 PM, Ranjini Rathinam ranjinibe...@gmail.comwrote:

 Hi,

 I have folder named INPUT.

 Inside INPUT i have 5 resume are there.

 hduser@localhost:~/Ranjini$ hadoop fs -ls /user/hduser/INPUT
 Found 5 items
 -rw-r--r--   1 hduser supergroup   5438 2014-03-18 15:20
 /user/hduser/INPUT/Rakesh Chowdary_Microstrategy.txt
 -rw-r--r--   1 hduser supergroup   6022 2014-03-18 15:22
 /user/hduser/INPUT/Ramarao Devineni_Microstrategy.txt
 -rw-r--r--   1 hduser supergroup   3517 2014-03-18 15:21
 /user/hduser/INPUT/vinitha.txt
 -rw-r--r--   1 hduser supergroup   3517 2014-03-18 15:21
 /user/hduser/INPUT/sony.txt
 -rw-r--r--   1 hduser supergroup   3517 2014-03-18 15:21
 /user/hduser/INPUT/ravi.txt
 hduser@localhost:~/Ranjini$

 I have to process the folder and the content .

 I need ouput has

 filename   word   occurance
 vinitha   java   4
 sony  oracle  3



 But iam not getting the filename.  Has the input file content are merged
 file name is not getting correct .


 please help in this issue to fix.  I have given by code below


  import java.io.IOException;
  import java.util.*;
  import org.apache.hadoop.fs.Path;
  import org.apache.hadoop.conf.*;
  import org.apache.hadoop.io.*;
  import org.apache.hadoop.mapred.*;
  import org.apache.hadoop.util.*;
 import java.io.File;
 import java.io.FileReader;
 import java.io.FileWriter;
 import java.io.IOException;
 import org.apache.hadoop.fs.Path;
 import org.apache.hadoop.conf.Configuration;
 import org.apache.hadoop.fs.FileSystem;
 import org.apache.hadoop.fs.FileStatus;
 import org.apache.hadoop.conf.*;
 import org.apache.hadoop.io.*;
 import org.apache.hadoop.mapred.*;
 import org.apache.hadoop.util.*;
 import org.apache.hadoop.mapred.lib.*;

  public class WordCount {
 public static class Map extends MapReduceBase implements
 MapperLongWritable, Text, Text, IntWritable {
  private final static IntWritable one = new IntWritable(1);
   private Text word = new Text();
   public void map(LongWritable key, Text value, OutputCollectorText,
 IntWritable output, Reporter reporter) throws IOException {
FSDataInputStream fs=null;
FileSystem hdfs = null;
String line = value.toString();
  int i=0,k=0;
   try{
Configuration configuration = new Configuration();
   configuration.set(fs.default.name, hdfs://localhost:4440/);

Path srcPath = new Path(/user/hduser/INPUT/);

hdfs = FileSystem.get(configuration);
FileStatus[] status = hdfs.listStatus(srcPath);
fs=hdfs.open(srcPath);
BufferedReader br=new BufferedReader(new
 InputStreamReader(hdfs.open(srcPath)));

 String[] splited = line.split(\\s+);
 for( i=0;isplited.length;i++)
  {
  String sp[]=splited[i].split(,);
  for( k=0;ksp.length;k++)
  {

if(!sp[k].isEmpty()){
 StringTokenizer tokenizer = new StringTokenizer(sp[k]);
 if((sp[k].equalsIgnoreCase(C))){
 while (tokenizer.hasMoreTokens()) {
   word.set(tokenizer.nextToken());
   output.collect(word, one);
 }
 }
 if((sp[k].equalsIgnoreCase(JAVA))){
 while (tokenizer.hasMoreTokens()) {
   word.set(tokenizer.nextToken());
   output.collect(word, one);
 }
 }
   }
 }
 }
  } catch (IOException e) {
 e.printStackTrace();
  }
 }
 }
 public static class Reduce extends MapReduceBase implements
 ReducerText, IntWritable, Text, IntWritable {
   public void reduce(Text key, IteratorIntWritable values,
 OutputCollectorText, IntWritable output, Reporter reporter) throws
 IOException {
 int sum = 0;
 while (values.hasNext()) {
   sum += values.next().get();
 }
 output.collect(key, new IntWritable(sum));
   }
 }
 public static void main(String[] args) throws Exception {


   JobConf conf = new JobConf(WordCount.class);
   conf.setJobName(wordcount);
   conf.setOutputKeyClass(Text.class);
   conf.setOutputValueClass(IntWritable.class);
   conf.setMapperClass(Map.class);
   conf.setCombinerClass(Reduce.class);
   conf.setReducerClass(Reduce.class);
   conf.setInputFormat(TextInputFormat.class);
   conf.setOutputFormat(TextOutputFormat.class);
   FileInputFormat.setInputPaths(conf, new Path(args[0]));
   FileOutputFormat.setOutputPath(conf, new Path(args[1]));
   JobClient.runJob(conf);
 }
  }



 Please help

 Thanks in advance.

 Ranjini





how to free up space of the old Data Node

2014-03-19 Thread Phan, Truong Q
Hi

I have 3 nodes Hadoop cluster in which I created 3 Data Nodes.
However, I don't have enough space in one of the node to cater other projects' 
log. So I decommissioned this node from a Data node list but I could not 
re-claimed the space from it.
Is there a way to get this node to release space?

[root@nsda3dmsrpt02] /data/dfs/dn# du -sh /data/dfs/dn/*
47G /data/dfs/dn/current

$ sudo -u hdfs hadoop fsck /data
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.

Status: HEALTHY
Total size:7186453688 B
Total dirs:11
Total files:   62
Total symlinks:0
Total blocks (validated):  105 (avg. block size 68442416 B)
Minimally replicated blocks:   105 (100.0 %)
Over-replicated blocks:0 (0.0 %)
Under-replicated blocks:   105 (100.0 %)
Mis-replicated blocks: 0 (0.0 %)
Default replication factor:3
Average block replication: 2.0
Corrupt blocks:0
Missing replicas:  105 (33.32 %)
Number of data-nodes:  2
Number of racks:   1
FSCK ended at Thu Mar 20 13:30:03 EST 2014 in 22 milliseconds


The filesystem under path '/data' is HEALTHY

Thanks and Regards,
Truong Phan
Senior Technology Specialist
Database Engineering
Transport  Routing Engineering | Networks | Telstra Operations

[cid:image001.gif@01CF442C.0F3C8CF0]


P+ 61 2 8576 5771
M   + 61 4 1463 7424
Etroung.p...@team.telstra.com
W  www.telstra.com

Love the movies? Telstra takes you there with $10 movie tickets, just to say 
thanks. Available now at telstra.com/movieshttp://www.telstra.com/movies

This communication may contain confidential or copyright information of Telstra 
Corporation Limited (ABN 33 051 775 556). If you are not an intended recipient, 
you must not keep, forward, copy, use, save or rely on this communication, and 
any such action is unauthorised and prohibited. If you have received this 
communication in error, please reply to this email to notify the sender of its 
incorrect delivery, and then delete both it and your reply.



inline: image001.gif

RE: how to free up space of the old Data Node

2014-03-19 Thread Brahma Reddy Battula
Please check my inline comments which are in blue color...




From: Phan, Truong Q [troung.p...@team.telstra.com]
Sent: Thursday, March 20, 2014 8:04 AM
To: user@hadoop.apache.org
Subject: how to free up space of the old Data Node

Hi

I have 3 nodes Hadoop cluster in which I created 3 Data Nodes.
However, I don’t have enough space in one of the node to cater other projects’ 
log. So I decommissioned this node from a Data node list but I could not 
re-claimed the space from it.
 is your Replication is 3..? If it is 3 and as you have 3 datanodes,ideally 
 disk space occupied by all nodes should be same(47G, should be present in 
 all the DN's)..
And if you RF=3,Decommission will not be success as you've only 3 DN's..you 
need to add another DN to cluster,,then only decommission will be success..
Hence please mention the replication factor of the file..

Is there a way to get this node to release space?
 ways are there,,but you need to mention, why only this node disk is 
 full..why not other nodes..?  is it because,this node is having less space 
 compared to other nodes
 If RF=3, then make RF=2(decrease the replication factor)..then do 
 decommission of this node
[root@nsda3dmsrpt02] /data/dfs/dn# du -sh /data/dfs/dn/*
47G /data/dfs/dn/current
 try to give the following output also
  sudo -u hdfs hadoop fsck /
 sudo -u hdfs hadoop dfsadmin -report

$ sudo -u hdfs hadoop fsck /data
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.

Status: HEALTHY
Total size:7186453688 B
Total dirs:11
Total files:   62
Total symlinks:0
Total blocks (validated):  105 (avg. block size 68442416 B)
Minimally replicated blocks:   105 (100.0 %)
Over-replicated blocks:0 (0.0 %)
Under-replicated blocks:   105 (100.0 %)
Mis-replicated blocks: 0 (0.0 %)
Default replication factor:3
Average block replication: 2.0
Corrupt blocks:0
Missing replicas:  105 (33.32 %)
Number of data-nodes:  2
Number of racks:   1
FSCK ended at Thu Mar 20 13:30:03 EST 2014 in 22 milliseconds


The filesystem under path '/data' is HEALTHY

Thanks and Regards,
Truong Phan
Senior Technology Specialist
Database Engineering
Transport  Routing Engineering | Networks | Telstra Operations

[Telstra]


P+ 61 2 8576 5771
M   + 61 4 1463 7424
Etroung.p...@team.telstra.com
W  www.telstra.comhttps://email-cn.huawei.com/owa/UrlBlockedError.aspx

Love the movies? Telstra takes you there with $10 movie tickets, just to say 
thanks. Available now at telstra.com/movieshttp://www.telstra.com/movies

This communication may contain confidential or copyright information of Telstra 
Corporation Limited (ABN 33 051 775 556). If you are not an intended recipient, 
you must not keep, forward, copy, use, save or rely on this communication, and 
any such action is unauthorised and prohibited. If you have received this 
communication in error, please reply to this email to notify the sender of its 
incorrect delivery, and then delete both it and your reply.



inline: image001.gif

NullPointerException in offLineImageViewer

2014-03-19 Thread Chetan Agrawal
I want to access and study the hadoop cluster's metadata which is stored in 
fsimage file on namenode machine. i came to know that offLineImageViewer is 
used to do so. But when i am trying doing it i am getting an exception.
/usr/hadoop/hadoop-1.2.1# bin/hadoop oiv -i fsimage -o fsimage.txt Warning: 
$HADOOP_HOME is deprecated.
Exception in thread main java.lang.NullPointerException   at 
org.apache.hadoop.hdfs.tools.offlineImageViewer.OfflineImageViewer.go(OfflineImageViewer.java:141)
   at 
org.apache.hadoop.hdfs.tools.offlineImageViewer.OfflineImageViewer.main(OfflineImageViewer.java:261)
i am not able to solve this error. Is it due to that warning (Warning: 
$HADOOP_HOME is deprecated.) or there is something else?
  

NullpointerException in OffLineImageViewer

2014-03-19 Thread Chetan Agrawal
I want to access and study the hadoop cluster's metadata which is stored in 
fsimage file on namenode machine. i came to know that offLineImageViewer is 
used to do so. But when i am trying doing it i am getting an exception.
/usr/hadoop/hadoop-1.2.1# bin/hadoop oiv -i fsimage -o fsimage.txt Warning: 
$HADOOP_HOME is deprecated.
Exception in thread main java.lang.NullPointerException   at 
org.apache.hadoop.hdfs.tools.offlineImageViewer.OfflineImageViewer.go(OfflineImageViewer.java:141)
   at 
org.apache.hadoop.hdfs.tools.offlineImageViewer.OfflineImageViewer.main(OfflineImageViewer.java:261)
i am not able to solve this error. Is it due to that warning (Warning: 
$HADOOP_HOME is deprecated.) or there is something else?I am having 
hadoop-1.2.1 version   

RE: how to free up space of the old Data Node

2014-03-19 Thread Phan, Truong Q
Thanks for the reply.
This Hadoop cluster is our POC and the node has less space compare to the other 
two nodes.
How do I change the Replication Factore (RF) from 3 down to 2?
Is this controlled by this parameter (dfs.datanode.handler.count)?

Thanks and Regards,
Truong Phan


P+ 61 2 8576 5771
M   + 61 4 1463 7424
Etroung.p...@team.telstra.com
W  www.telstra.com



From: Brahma Reddy Battula [mailto:brahmareddy.batt...@huawei.com]
Sent: Thursday, 20 March 2014 3:27 PM
To: user@hadoop.apache.org
Subject: RE: how to free up space of the old Data Node


Please check my inline comments which are in blue color...




From: Phan, Truong Q [troung.p...@team.telstra.com]
Sent: Thursday, March 20, 2014 8:04 AM
To: user@hadoop.apache.orgmailto:user@hadoop.apache.org
Subject: how to free up space of the old Data Node
Hi

I have 3 nodes Hadoop cluster in which I created 3 Data Nodes.
However, I don't have enough space in one of the node to cater other projects' 
log. So I decommissioned this node from a Data node list but I could not 
re-claimed the space from it.
 is your Replication is 3..? If it is 3 and as you have 3 datanodes,ideally 
 disk space occupied by all nodes should be same(47G, should be present in 
 all the DN's)..
And if you RF=3,Decommission will not be success as you've only 3 DN's..you 
need to add another DN to cluster,,then only decommission will be success..
Hence please mention the replication factor of the file..

Is there a way to get this node to release space?
 ways are there,,but you need to mention, why only this node disk is 
 full..why not other nodes..?  is it because,this node is having less space 
 compared to other nodes
 If RF=3, then make RF=2(decrease the replication factor)..then do 
 decommission of this node
[root@nsda3dmsrpt02] /data/dfs/dn# du -sh /data/dfs/dn/*
47G /data/dfs/dn/current
 try to give the following output also
  sudo -u hdfs hadoop fsck /
 sudo -u hdfs hadoop dfsadmin -report

$ sudo -u hdfs hadoop fsck /data
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.

Status: HEALTHY
Total size:7186453688 B
Total dirs:11
Total files:   62
Total symlinks:0
Total blocks (validated):  105 (avg. block size 68442416 B)
Minimally replicated blocks:   105 (100.0 %)
Over-replicated blocks:0 (0.0 %)
Under-replicated blocks:   105 (100.0 %)
Mis-replicated blocks: 0 (0.0 %)
Default replication factor:3
Average block replication: 2.0
Corrupt blocks:0
Missing replicas:  105 (33.32 %)
Number of data-nodes:  2
Number of racks:   1
FSCK ended at Thu Mar 20 13:30:03 EST 2014 in 22 milliseconds


The filesystem under path '/data' is HEALTHY

Thanks and Regards,
Truong Phan
Senior Technology Specialist
Database Engineering
Transport  Routing Engineering | Networks | Telstra Operations

[cid:image001.gif@01CF4455.2B5B0CD0]


P+ 61 2 8576 5771
M   + 61 4 1463 7424
Etroung.p...@team.telstra.commailto:troung.p...@team.telstra.com
W  www.telstra.comhttps://email-cn.huawei.com/owa/UrlBlockedError.aspx

Love the movies? Telstra takes you there with $10 movie tickets, just to say 
thanks. Available now at telstra.com/movieshttp://www.telstra.com/movies

This communication may contain confidential or copyright information of Telstra 
Corporation Limited (ABN 33 051 775 556). If you are not an intended recipient, 
you must not keep, forward, copy, use, save or rely on this communication, and 
any such action is unauthorised and prohibited. If you have received this 
communication in error, please reply to this email to notify the sender of its 
incorrect delivery, and then delete both it and your reply.



inline: image001.gif

RE: how to free up space of the old Data Node

2014-03-19 Thread Vinayakumar B
You can change the replication factor using the following command
hdfs dfs - setrep [-R] rep path

Once this is done, you can re-commission the datanode, then all the 
overreplicated blocks will be removed.
If not removed, restart the datanode.

Regards,
Vinayakumar B

From: Phan, Truong Q [mailto:troung.p...@team.telstra.com]
Sent: 20 March 2014 10:28
To: user@hadoop.apache.org
Subject: RE: how to free up space of the old Data Node

Thanks for the reply.
This Hadoop cluster is our POC and the node has less space compare to the other 
two nodes.
How do I change the Replication Factore (RF) from 3 down to 2?
Is this controlled by this parameter (dfs.datanode.handler.count)?

Thanks and Regards,
Truong Phan


P+ 61 2 8576 5771
M   + 61 4 1463 7424
Etroung.p...@team.telstra.commailto:troung.p...@team.telstra.com
W  www.telstra.com



From: Brahma Reddy Battula [mailto:brahmareddy.batt...@huawei.com]
Sent: Thursday, 20 March 2014 3:27 PM
To: user@hadoop.apache.orgmailto:user@hadoop.apache.org
Subject: RE: how to free up space of the old Data Node


Please check my inline comments which are in blue color...




From: Phan, Truong Q [troung.p...@team.telstra.com]
Sent: Thursday, March 20, 2014 8:04 AM
To: user@hadoop.apache.orgmailto:user@hadoop.apache.org
Subject: how to free up space of the old Data Node
Hi

I have 3 nodes Hadoop cluster in which I created 3 Data Nodes.
However, I don't have enough space in one of the node to cater other projects' 
log. So I decommissioned this node from a Data node list but I could not 
re-claimed the space from it.
 is your Replication is 3..? If it is 3 and as you have 3 datanodes,ideally 
 disk space occupied by all nodes should be same(47G, should be present in 
 all the DN's)..
And if you RF=3,Decommission will not be success as you've only 3 DN's..you 
need to add another DN to cluster,,then only decommission will be success..
Hence please mention the replication factor of the file..

Is there a way to get this node to release space?
 ways are there,,but you need to mention, why only this node disk is 
 full..why not other nodes..?  is it because,this node is having less space 
 compared to other nodes
 If RF=3, then make RF=2(decrease the replication factor)..then do 
 decommission of this node
[root@nsda3dmsrpt02] /data/dfs/dn# du -sh /data/dfs/dn/*
47G /data/dfs/dn/current
 try to give the following output also
  sudo -u hdfs hadoop fsck /
 sudo -u hdfs hadoop dfsadmin -report

$ sudo -u hdfs hadoop fsck /data
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.

Status: HEALTHY
Total size:7186453688 B
Total dirs:11
Total files:   62
Total symlinks:0
Total blocks (validated):  105 (avg. block size 68442416 B)
Minimally replicated blocks:   105 (100.0 %)
Over-replicated blocks:0 (0.0 %)
Under-replicated blocks:   105 (100.0 %)
Mis-replicated blocks: 0 (0.0 %)
Default replication factor:3
Average block replication: 2.0
Corrupt blocks:0
Missing replicas:  105 (33.32 %)
Number of data-nodes:  2
Number of racks:   1
FSCK ended at Thu Mar 20 13:30:03 EST 2014 in 22 milliseconds


The filesystem under path '/data' is HEALTHY

Thanks and Regards,
Truong Phan
Senior Technology Specialist
Database Engineering
Transport  Routing Engineering | Networks | Telstra Operations

[Telstra]


P+ 61 2 8576 5771
M   + 61 4 1463 7424
Etroung.p...@team.telstra.commailto:troung.p...@team.telstra.com
W  www.telstra.comhttps://email-cn.huawei.com/owa/UrlBlockedError.aspx

Love the movies? Telstra takes you there with $10 movie tickets, just to say 
thanks. Available now at telstra.com/movieshttp://www.telstra.com/movies

This communication may contain confidential or copyright information of Telstra 
Corporation Limited (ABN 33 051 775 556). If you are not an intended recipient, 
you must not keep, forward, copy, use, save or rely on this communication, and 
any such action is unauthorised and prohibited. If you have received this 
communication in error, please reply to this email to notify the sender of its 
incorrect delivery, and then delete both it and your reply.



inline: image001.gif

Hadoop MapReduce Streaming - how to change the final output file name with the desired name rather than in partition like: part-0000*

2014-03-19 Thread Phan, Truong Q
Hi

Could you please provide me an alternative link where it explains on how to 
change the final output file name with the desired name rather than in 
partition like: part-*?
Can I have a sample Python's code to run MapReduce Streaming with a custome 
output file names?

One helper from the Avro's mailing list gave me the link below but it is:

1)  None of  the links in the link below are not working

2)  The link is explained Java's method not Python's method and I am 
looking for the Python's method.

http://wiki.apache.org/hadoop/FAQ#How_do_I_change_final_output_file_name_with_the_desired_name_rather_than_in_partitions_like_part-0.2C_part-1.3F


Thanks and Regards,
Truong Phan
Senior Technology Specialist
Database Engineering
Transport  Routing Engineering | Networks | Telstra Operations

[cid:image001.gif@01CF4456.0B517E00]


P+ 61 2 8576 5771
M   + 61 4 1463 7424
Etroung.p...@team.telstra.com
W  www.telstra.com

Love the movies? Telstra takes you there with $10 movie tickets, just to say 
thanks. Available now at telstra.com/movieshttp://www.telstra.com/movies

This communication may contain confidential or copyright information of Telstra 
Corporation Limited (ABN 33 051 775 556). If you are not an intended recipient, 
you must not keep, forward, copy, use, save or rely on this communication, and 
any such action is unauthorised and prohibited. If you have received this 
communication in error, please reply to this email to notify the sender of its 
incorrect delivery, and then delete both it and your reply.



inline: image001.gif

RE: how to free up space of the old Data Node

2014-03-19 Thread Phan, Truong Q
Hi Battula,

I hope Battula is your first name. :P
Here are the output of your suggested commands:

[root@nsda3dmsrpt02] /usr/lib/hadoop-0.20-mapreduce# sudo -u hdfs hadoop fsck /
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.

Connecting to namenode via http://nsda3dmsrpt02.internal.bigpond.com:50070
FSCK started by hdfs (auth:SIMPLE) from /172.18.126.99 for path / at Thu Mar 20 
16:04:35 EST 2014

Status: CORRUPT
Total size:7325542923 B
Total dirs:138
Total files:   383
Total symlinks:0 (Files currently being written: 2)
Total blocks (validated):  424 (avg. block size 17277223 B)
  
  CORRUPT FILES:3
  MISSING BLOCKS:   3
  MISSING SIZE: 791 B
  CORRUPT BLOCKS:   3
  
Minimally replicated blocks:   421 (99.29245 %)
Over-replicated blocks:0 (0.0 %)
Under-replicated blocks:   417 (98.34906 %)
Mis-replicated blocks: 0 (0.0 %)
Default replication factor:3
Average block replication: 1.976415
Corrupt blocks:3
Missing replicas:  417 (33.147854 %)
Number of data-nodes:  2
Number of racks:   1
FSCK ended at Thu Mar 20 16:04:35 EST 2014 in 105 milliseconds


The filesystem under path '/' is CORRUPT
[root@nsda3dmsrpt02] /usr/lib/hadoop-0.20-mapreduce# sudo -u hdfs hadoop 
dfsadmin -report
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.

Configured Capacity: 1100387597518 (1.00 TB)
Present Capacity: 727189155840 (677.25 GB)
DFS Remaining: 712401227776 (663.48 GB)
DFS Used: 14787928064 (13.77 GB)
DFS Used%: 2.03%
Under replicated blocks: 420
Blocks with corrupt replicas: 0
Missing blocks: 3

-
Datanodes available: 2 (3 total, 1 dead)

Live datanodes:
Name: 172.18.127.248:50010 (bpdevdmsdbs02)
Hostname: bpdevdmsdbs02
Rack: /default
Decommission Status : Normal
Configured Capacity: 550193798759 (512.41 GB)
DFS Used: 7394033664 (6.89 GB)
Non DFS Used: 169131224679 (157.52 GB)
DFS Remaining: 373668540416 (348.01 GB)
DFS Used%: 1.34%
DFS Remaining%: 67.92%
Last contact: Thu Mar 20 16:05:44 EST 2014


Name: 172.18.127.245:50010 (bpdevdmsdbs01)
Hostname: bpdevdmsdbs01
Rack: /default
Decommission Status : Normal
Configured Capacity: 550193798759 (512.41 GB)
DFS Used: 7393894400 (6.89 GB)
Non DFS Used: 204067216999 (190.05 GB)
DFS Remaining: 338732687360 (315.47 GB)
DFS Used%: 1.34%
DFS Remaining%: 61.57%
Last contact: Thu Mar 20 16:05:44 EST 2014


Dead datanodes:
Name: 172.18.126.99:50010 (nsda3dmsrpt02.internal.bigpond.com)
Hostname: nsda3dmsrpt02.internal.bigpond.com
Rack: /default
Decommission Status : Normal
Configured Capacity: 0 (0 B)
DFS Used: 0 (0 B)
Non DFS Used: 0 (0 B)
DFS Remaining: 0 (0 B)
DFS Used%: 100.00%
DFS Remaining%: 0.00%
Last contact: Wed Mar 19 11:44:44 EST 2014




Thanks and Regards,
Truong Phan


P+ 61 2 8576 5771
M   + 61 4 1463 7424
Etroung.p...@team.telstra.com
W  www.telstra.com



From: Brahma Reddy Battula [mailto:brahmareddy.batt...@huawei.com]
Sent: Thursday, 20 March 2014 3:27 PM
To: user@hadoop.apache.org
Subject: RE: how to free up space of the old Data Node


Please check my inline comments which are in blue color...




From: Phan, Truong Q [troung.p...@team.telstra.com]
Sent: Thursday, March 20, 2014 8:04 AM
To: user@hadoop.apache.orgmailto:user@hadoop.apache.org
Subject: how to free up space of the old Data Node
Hi

I have 3 nodes Hadoop cluster in which I created 3 Data Nodes.
However, I don't have enough space in one of the node to cater other projects' 
log. So I decommissioned this node from a Data node list but I could not 
re-claimed the space from it.
 is your Replication is 3..? If it is 3 and as you have 3 datanodes,ideally 
 disk space occupied by all nodes should be same(47G, should be present in 
 all the DN's)..
And if you RF=3,Decommission will not be success as you've only 3 DN's..you 
need to add another DN to cluster,,then only decommission will be success..
Hence please mention the replication factor of the file..

Is there a way to get this node to release space?
 ways are there,,but you need to mention, why only this node disk is 
 full..why not other nodes..?  is it because,this node is having less space 
 compared to other nodes
 If RF=3, then make RF=2(decrease the replication factor)..then do 
 decommission of this node
[root@nsda3dmsrpt02] /data/dfs/dn# du -sh /data/dfs/dn/*
47G /data/dfs/dn/current
 try to give the following output also
  sudo -u hdfs hadoop fsck /
 sudo -u hdfs hadoop dfsadmin -report

$ sudo -u hdfs hadoop fsck /data
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.

Status: HEALTHY
Total size:7186453688 B
Total dirs:11
Total files:   62
Total symlinks:0

RE: how to free up space of the old Data Node

2014-03-19 Thread Brahma Reddy Battula
Please check my inline comments which are in blue color...






From: Phan, Truong Q [troung.p...@team.telstra.com]
Sent: Thursday, March 20, 2014 10:28 AM
To: user@hadoop.apache.org
Subject: RE: how to free up space of the old Data Node

Thanks for the reply.
This Hadoop cluster is our POC and the node has less space compare to the other 
two nodes.
How do I change the Replication Factore (RF) from 3 down to 2?
hadoop fs -setrep -w 2(number) -R /location of the Dir/File
Is this controlled by this parameter (dfs.datanode.handler.count)?
No

Thanks and Regards,
Truong Phan


P+ 61 2 8576 5771
M   + 61 4 1463 7424
Etroung.p...@team.telstra.com
W  www.telstra.comhttps://email-cn.huawei.com/owa/UrlBlockedError.aspx



From: Brahma Reddy Battula [mailto:brahmareddy.batt...@huawei.com]
Sent: Thursday, 20 March 2014 3:27 PM
To: user@hadoop.apache.org
Subject: RE: how to free up space of the old Data Node


Please check my inline comments which are in blue color...




From: Phan, Truong Q [troung.p...@team.telstra.com]
Sent: Thursday, March 20, 2014 8:04 AM
To: user@hadoop.apache.orgmailto:user@hadoop.apache.org
Subject: how to free up space of the old Data Node
Hi

I have 3 nodes Hadoop cluster in which I created 3 Data Nodes.
However, I don’t have enough space in one of the node to cater other projects’ 
log. So I decommissioned this node from a Data node list but I could not 
re-claimed the space from it.
 is your Replication is 3..? If it is 3 and as you have 3 datanodes,ideally 
 disk space occupied by all nodes should be same(47G, should be present in 
 all the DN's)..
And if you RF=3,Decommission will not be success as you've only 3 DN's..you 
need to add another DN to cluster,,then only decommission will be success..
Hence please mention the replication factor of the file..

Is there a way to get this node to release space?
 ways are there,,but you need to mention, why only this node disk is 
 full..why not other nodes..?  is it because,this node is having less space 
 compared to other nodes
 If RF=3, then make RF=2(decrease the replication factor)..then do 
 decommission of this node
[root@nsda3dmsrpt02] /data/dfs/dn# du -sh /data/dfs/dn/*
47G /data/dfs/dn/current
 try to give the following output also
  sudo -u hdfs hadoop fsck /
 sudo -u hdfs hadoop dfsadmin -report

$ sudo -u hdfs hadoop fsck /data
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.

Status: HEALTHY
Total size:7186453688 B
Total dirs:11
Total files:   62
Total symlinks:0
Total blocks (validated):  105 (avg. block size 68442416 B)
Minimally replicated blocks:   105 (100.0 %)
Over-replicated blocks:0 (0.0 %)
Under-replicated blocks:   105 (100.0 %)
Mis-replicated blocks: 0 (0.0 %)
Default replication factor:3
Average block replication: 2.0
Corrupt blocks:0
Missing replicas:  105 (33.32 %)
Number of data-nodes:  2
Number of racks:   1
FSCK ended at Thu Mar 20 13:30:03 EST 2014 in 22 milliseconds


The filesystem under path '/data' is HEALTHY

Thanks and Regards,
Truong Phan
Senior Technology Specialist
Database Engineering
Transport  Routing Engineering | Networks | Telstra Operations

[Telstra]


P+ 61 2 8576 5771
M   + 61 4 1463 7424
Etroung.p...@team.telstra.commailto:troung.p...@team.telstra.com
W  www.telstra.comhttps://email-cn.huawei.com/owa/UrlBlockedError.aspx

Love the movies? Telstra takes you there with $10 movie tickets, just to say 
thanks. Available now at telstra.com/movieshttp://www.telstra.com/movies

This communication may contain confidential or copyright information of Telstra 
Corporation Limited (ABN 33 051 775 556). If you are not an intended recipient, 
you must not keep, forward, copy, use, save or rely on this communication, and 
any such action is unauthorised and prohibited. If you have received this 
communication in error, please reply to this email to notify the sender of its 
incorrect delivery, and then delete both it and your reply.



inline: image001.gif

RE: how to free up space of the old Data Node

2014-03-19 Thread Brahma Reddy Battula
Following Node is down..Please have look on datanode logs and try to make it 
up...Before going for further action..(like decreasing the replication factor..)


Dead datanodes:
Name: 172.18.126.99:50010 (nsda3dmsrpt02.internal.bigpond.com)






From: Phan, Truong Q [troung.p...@team.telstra.com]
Sent: Thursday, March 20, 2014 10:39 AM
To: user@hadoop.apache.org
Subject: RE: how to free up space of the old Data Node

Hi Battula,

I hope Battula is your first name. :P
Here are the output of your suggested commands:

[root@nsda3dmsrpt02] /usr/lib/hadoop-0.20-mapreduce# sudo -u hdfs hadoop fsck /
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.

Connecting to namenode via http://nsda3dmsrpt02.internal.bigpond.com:50070
FSCK started by hdfs (auth:SIMPLE) from /172.18.126.99 for path / at Thu Mar 20 
16:04:35 EST 2014

Status: CORRUPT
Total size:7325542923 B
Total dirs:138
Total files:   383
Total symlinks:0 (Files currently being written: 2)
Total blocks (validated):  424 (avg. block size 17277223 B)
  
  CORRUPT FILES:3
  MISSING BLOCKS:   3
  MISSING SIZE: 791 B
  CORRUPT BLOCKS:   3
  
Minimally replicated blocks:   421 (99.29245 %)
Over-replicated blocks:0 (0.0 %)
Under-replicated blocks:   417 (98.34906 %)
Mis-replicated blocks: 0 (0.0 %)
Default replication factor:3
Average block replication: 1.976415
Corrupt blocks:3
Missing replicas:  417 (33.147854 %)
Number of data-nodes:  2
Number of racks:   1
FSCK ended at Thu Mar 20 16:04:35 EST 2014 in 105 milliseconds


The filesystem under path '/' is CORRUPT
[root@nsda3dmsrpt02] /usr/lib/hadoop-0.20-mapreduce# sudo -u hdfs hadoop 
dfsadmin -report
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.

Configured Capacity: 1100387597518 (1.00 TB)
Present Capacity: 727189155840 (677.25 GB)
DFS Remaining: 712401227776 (663.48 GB)
DFS Used: 14787928064 (13.77 GB)
DFS Used%: 2.03%
Under replicated blocks: 420
Blocks with corrupt replicas: 0
Missing blocks: 3

-
Datanodes available: 2 (3 total, 1 dead)

Live datanodes:
Name: 172.18.127.248:50010 (bpdevdmsdbs02)
Hostname: bpdevdmsdbs02
Rack: /default
Decommission Status : Normal
Configured Capacity: 550193798759 (512.41 GB)
DFS Used: 7394033664 (6.89 GB)
Non DFS Used: 169131224679 (157.52 GB)
DFS Remaining: 373668540416 (348.01 GB)
DFS Used%: 1.34%
DFS Remaining%: 67.92%
Last contact: Thu Mar 20 16:05:44 EST 2014


Name: 172.18.127.245:50010 (bpdevdmsdbs01)
Hostname: bpdevdmsdbs01
Rack: /default
Decommission Status : Normal
Configured Capacity: 550193798759 (512.41 GB)
DFS Used: 7393894400 (6.89 GB)
Non DFS Used: 204067216999 (190.05 GB)
DFS Remaining: 338732687360 (315.47 GB)
DFS Used%: 1.34%
DFS Remaining%: 61.57%
Last contact: Thu Mar 20 16:05:44 EST 2014


Dead datanodes:
Name: 172.18.126.99:50010 (nsda3dmsrpt02.internal.bigpond.com)
Hostname: nsda3dmsrpt02.internal.bigpond.com
Rack: /default
Decommission Status : Normal
Configured Capacity: 0 (0 B)
DFS Used: 0 (0 B)
Non DFS Used: 0 (0 B)
DFS Remaining: 0 (0 B)
DFS Used%: 100.00%
DFS Remaining%: 0.00%
Last contact: Wed Mar 19 11:44:44 EST 2014




Thanks and Regards,
Truong Phan


P+ 61 2 8576 5771
M   + 61 4 1463 7424
Etroung.p...@team.telstra.com
W  www.telstra.comhttps://email-cn.huawei.com/owa/UrlBlockedError.aspx



From: Brahma Reddy Battula [mailto:brahmareddy.batt...@huawei.com]
Sent: Thursday, 20 March 2014 3:27 PM
To: user@hadoop.apache.org
Subject: RE: how to free up space of the old Data Node


Please check my inline comments which are in blue color...




From: Phan, Truong Q [troung.p...@team.telstra.com]
Sent: Thursday, March 20, 2014 8:04 AM
To: user@hadoop.apache.orgmailto:user@hadoop.apache.org
Subject: how to free up space of the old Data Node
Hi

I have 3 nodes Hadoop cluster in which I created 3 Data Nodes.
However, I don’t have enough space in one of the node to cater other projects’ 
log. So I decommissioned this node from a Data node list but I could not 
re-claimed the space from it.
 is your Replication is 3..? If it is 3 and as you have 3 datanodes,ideally 
 disk space occupied by all nodes should be same(47G, should be present in 
 all the DN's)..
And if you RF=3,Decommission will not be success as you've only 3 DN's..you 
need to add another DN to cluster,,then only decommission will be success..
Hence please mention the replication factor of the file..

Is there a way to get this node to release space?
 ways are there,,but you need to mention, why only this node disk is 
 full..why not other nodes..?  is it because,this node is having less space 
 compared to other nodes
 If RF=3, then make RF=2(decrease the 

RE: NullpointerException in OffLineImageViewer

2014-03-19 Thread Brahma Reddy Battula
Seems to be issue,,please file a jira..



https://issues.apache.org/jira






From: c.agra...@outlook.com [c.agra...@outlook.com] on behalf of Chetan Agrawal 
[cagrawa...@gmail.com]
Sent: Wednesday, March 19, 2014 12:50 PM
To: hadoop users
Subject: NullpointerException in OffLineImageViewer

I want to access and study the hadoop cluster's metadata which is stored in 
fsimage file on namenode machine. i came to know that offLineImageViewer is 
used to do so. But when i am trying doing it i am getting an exception.

/usr/hadoop/hadoop-1.2.1# bin/hadoop oiv -i fsimage -o fsimage.txt
Warning: $HADOOP_HOME is deprecated.

Exception in thread main java.lang.NullPointerException
at 
org.apache.hadoop.hdfs.tools.offlineImageViewer.OfflineImageViewer.go(OfflineImageViewer.java:141)
at 
org.apache.hadoop.hdfs.tools.offlineImageViewer.OfflineImageViewer.main(OfflineImageViewer.java:261)

i am not able to solve this error. Is it due to that warning (Warning: 
$HADOOP_HOME is deprecated.) or there is something else?
I am having hadoop-1.2.1 version


Re: Need FileName with Content

2014-03-19 Thread Ranjini Rathinam
Hi,

If we give the below code,
===
word.set(filename++tokenizer.nextToken());
output.collect(word,one);
==

The output is wrong. because it shows the

filename   word   occurance
vinitha   java   4
vinitha oracle  3
sony   java   4
sony  oracle  3


Here vinitha does not have oracle word . Similarlly sony does not have java
has word. File name is merging for  all words.

I need the output has given below

 filename   word   occurance

vinitha   java   4
vinitha C++3
sony   ETL 4
sony  oracle  3


 Need fileaName along with the word in that particular file only. No merge
should happen.

Please help me out for this issue.

Please help.

Thanks in advance.

Ranjini




On Thu, Mar 20, 2014 at 10:56 AM, Ranjini Rathinam
ranjinibe...@gmail.comwrote:



 -- Forwarded message --
 From: Stanley Shi s...@gopivotal.com
 Date: Thu, Mar 20, 2014 at 7:39 AM
 Subject: Re: Need FileName with Content
 To: user@hadoop.apache.org


 You want to do a word count for each file, but the code give you a word
 count for all the files, right?

 =
  word.set(tokenizer.nextToken());
   output.collect(word, one);
 ==
 change it to:
 word.set(filename++tokenizer.nextToken());
 output.collect(word,one);




  Regards,
 *Stanley Shi,*



 On Wed, Mar 19, 2014 at 8:50 PM, Ranjini Rathinam 
 ranjinibe...@gmail.comwrote:

 Hi,

 I have folder named INPUT.

 Inside INPUT i have 5 resume are there.

 hduser@localhost:~/Ranjini$ hadoop fs -ls /user/hduser/INPUT
 Found 5 items
 -rw-r--r--   1 hduser supergroup   5438 2014-03-18 15:20
 /user/hduser/INPUT/Rakesh Chowdary_Microstrategy.txt
 -rw-r--r--   1 hduser supergroup   6022 2014-03-18 15:22
 /user/hduser/INPUT/Ramarao Devineni_Microstrategy.txt
 -rw-r--r--   1 hduser supergroup   3517 2014-03-18 15:21
 /user/hduser/INPUT/vinitha.txt
 -rw-r--r--   1 hduser supergroup   3517 2014-03-18 15:21
 /user/hduser/INPUT/sony.txt
 -rw-r--r--   1 hduser supergroup   3517 2014-03-18 15:21
 /user/hduser/INPUT/ravi.txt
 hduser@localhost:~/Ranjini$

 I have to process the folder and the content .

 I need ouput has

 filename   word   occurance
 vinitha   java   4
 sony  oracle  3



 But iam not getting the filename.  Has the input file content are merged
 file name is not getting correct .


 please help in this issue to fix.  I have given by code below


  import java.io.IOException;
  import java.util.*;
  import org.apache.hadoop.fs.Path;
  import org.apache.hadoop.conf.*;
  import org.apache.hadoop.io.*;
  import org.apache.hadoop.mapred.*;
  import org.apache.hadoop.util.*;
 import java.io.File;
 import java.io.FileReader;
 import java.io.FileWriter;
 import java.io.IOException;
 import org.apache.hadoop.fs.Path;
 import org.apache.hadoop.conf.Configuration;
 import org.apache.hadoop.fs.FileSystem;
 import org.apache.hadoop.fs.FileStatus;
 import org.apache.hadoop.conf.*;
 import org.apache.hadoop.io.*;
 import org.apache.hadoop.mapred.*;
 import org.apache.hadoop.util.*;
 import org.apache.hadoop.mapred.lib.*;

  public class WordCount {
 public static class Map extends MapReduceBase implements
 MapperLongWritable, Text, Text, IntWritable {
  private final static IntWritable one = new IntWritable(1);
   private Text word = new Text();
   public void map(LongWritable key, Text value, OutputCollectorText,
 IntWritable output, Reporter reporter) throws IOException {
FSDataInputStream fs=null;
FileSystem hdfs = null;
String line = value.toString();
  int i=0,k=0;
   try{
Configuration configuration = new Configuration();
   configuration.set(fs.default.name, hdfs://localhost:4440/);

Path srcPath = new Path(/user/hduser/INPUT/);

hdfs = FileSystem.get(configuration);
FileStatus[] status = hdfs.listStatus(srcPath);
fs=hdfs.open(srcPath);
BufferedReader br=new BufferedReader(new
 InputStreamReader(hdfs.open(srcPath)));

 String[] splited = line.split(\\s+);
 for( i=0;isplited.length;i++)
  {
  String sp[]=splited[i].split(,);
  for( k=0;ksp.length;k++)
  {

if(!sp[k].isEmpty()){
 StringTokenizer tokenizer = new StringTokenizer(sp[k]);
 if((sp[k].equalsIgnoreCase(C))){
 while (tokenizer.hasMoreTokens()) {
   word.set(tokenizer.nextToken());
   output.collect(word, one);
 }
 }
 if((sp[k].equalsIgnoreCase(JAVA))){
 while (tokenizer.hasMoreTokens()) {
   word.set(tokenizer.nextToken());
   output.collect(word, one);
 }
 }
   }
 }
 }
  } catch (IOException e) {
 e.printStackTrace();
  }
 }
 }
 public static class Reduce extends MapReduceBase implements
 ReducerText, IntWritable, Text, IntWritable {
   public void reduce(Text key, IteratorIntWritable values,
 OutputCollectorText, IntWritable output, Reporter reporter) throws
 IOException {