Re: Hadoop cluster monitoring

2014-04-15 Thread Arun Murthy
Lots of folks use Apache Ambari (http://ambari.apache.org/) to deploy and
monitor their Hadoop cluster. Ambari uses Ganglia/Nagios as underlying
technology and has much better UI etc.

hth,
Arun


On Mon, Apr 14, 2014 at 9:08 PM, Shashidhar Rao
raoshashidhar...@gmail.comwrote:

 Hi,

 Can somebody please help me in clarifying how hadoop cluster is monitored
 and profiled in real production environment.

 What are the tools and links if any. I heard Ganglia and HPROF.

 For HPROF , can somebody share some experience of how to configure to use
 HPROF to use with Hadoop

 Regards
 Shashi




-- 

--
Arun C. Murthy
Hortonworks Inc.
http://hortonworks.com/

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


Re: Hadoop cluster monitoring

2014-04-15 Thread Shashidhar Rao
Thanks Arun Murthy


On Tue, Apr 15, 2014 at 11:32 AM, Arun Murthy a...@hortonworks.com wrote:

 Lots of folks use Apache Ambari (http://ambari.apache.org/) to deploy and
 monitor their Hadoop cluster. Ambari uses Ganglia/Nagios as underlying
 technology and has much better UI etc.

 hth,
 Arun


 On Mon, Apr 14, 2014 at 9:08 PM, Shashidhar Rao 
 raoshashidhar...@gmail.com wrote:

 Hi,

 Can somebody please help me in clarifying how hadoop cluster is monitored
 and profiled in real production environment.

 What are the tools and links if any. I heard Ganglia and HPROF.

 For HPROF , can somebody share some experience of how to configure to use
 HPROF to use with Hadoop

 Regards
 Shashi




 --

  --
 Arun C. Murthy
 Hortonworks Inc.
 http://hortonworks.com/

 CONFIDENTIALITY NOTICE
 NOTICE: This message is intended for the use of the individual or entity
 to which it is addressed and may contain information that is confidential,
 privileged and exempt from disclosure under applicable law. If the reader
 of this message is not the intended recipient, you are hereby notified that
 any printing, copying, dissemination, distribution, disclosure or
 forwarding of this communication is strictly prohibited. If you have
 received this communication in error, please contact the sender immediately
 and delete it from your system. Thank You.


Hadoop NoClassDefFoundError

2014-04-15 Thread laozh...@sina.cn






Hello EveryOne:    I am new to hadoop,and i am reading Hadoop in action.When i 
tried to run a demo from this book,I got a problem and could not find answer 
from the net. Can you help me on this ?
below is the error info :
 $ hadoop jar myjob.jar MyJob input outputException in thread main 
java.lang.NoClassDefFoundError: MyJob (wrong name: myjob/MyJob)
at java.lang.ClassLoader.defineClass1(Native Method)
at java.lang.ClassLoader.defineClass(ClassLoader.java:791)
at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
at java.net.URLClassLoader.defineClass(URLClassLoader.java:449)
at java.net.URLClassLoader.access$100(URLClassLoader.java:71)
at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:423)
at java.lang.ClassLoader.loadClass(ClassLoader.java:356)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:264)
at org.apache.hadoop.util.RunJar.main(RunJar.java:149)

and this is the command that i compile the .java , I compiled in Win7 and ran 
on ubuntu .

below is MyJob.java
package myjob;



import java.io.IOException;

import java.util.Iterator;



import org.apache.hadoop.conf.Configuration;

import org.apache.hadoop.conf.Configured;

import org.apache.hadoop.fs.Path;

import org.apache.hadoop.io.Text;

import org.apache.hadoop.mapred.FileInputFormat;

import org.apache.hadoop.mapred.FileOutputFormat;

import org.apache.hadoop.mapred.JobClient;

import org.apache.hadoop.mapred.JobConf;

import org.apache.hadoop.mapred.KeyValueTextInputFormat;

import org.apache.hadoop.mapred.MapReduceBase;

import org.apache.hadoop.mapred.Mapper;

import org.apache.hadoop.mapred.OutputCollector;

import org.apache.hadoop.mapred.Reducer;

import org.apache.hadoop.mapred.Reporter;

import org.apache.hadoop.mapred.TextOutputFormat;

import org.apache.hadoop.util.Tool;

import org.apache.hadoop.util.ToolRunner;


public class MyJob extends Configured implements Tool{



@Override

public int run(String[] args) throws Exception {

Configuration conf = getConf();

JobConf job = new JobConf(conf,MyJob.class);

Path in = new Path(args[0]);

Path out = new Path(args[1]);

FileInputFormat.setInputPaths(job, in);

FileOutputFormat.setOutputPath(job, out);

job.setJobName(MyJob);

job.setJarByClass(MyJob.class);

job.setMapperClass(MapClass.class);

job.setReducerClass(Reduce.class);



job.setInputFormat(KeyValueTextInputFormat.class);

job.setOutputFormat(TextOutputFormat.class);

job.setOutputKeyClass(Text.class);

job.setOutputValueClass(Text.class);

job.set(key.value.separator.in.input.line,,);

JobClient.runJob(job);

return 0;

}



public static class MapClass extends MapReduceBase implements 
MapperText,Text,Text,Text{



@Override

public void map(Text key, Text value, OutputCollectorText, 
Text output,

Reporter reporter) throws IOException {

output.collect(value, key);

}

}



public static class Reduce extends MapReduceBase implements 
ReducerText,Text,Text,Text{



@Override

public void reduce(Text key, IteratorText values,

OutputCollectorText, Text output, Reporter 
reporter)

throws IOException {

String csv = ;

while(values.hasNext()){

if(csv.length()  0)

csv += ,;

csv += values.next().toString();

}

}

}



public static void main(String[] args) throws Exception {

int res = ToolRunner.run(new Configuration(), new MyJob(), 
args);

System.exit(res);

}

} 

Thank you for your kindly help !

2014-04-15_150135.png

Re: Offline image viewer - account for edits ?

2014-04-15 Thread Mingjiang Shi
I think you are right because the the offline image viewer only takes the
fsimage file as input.


On Tue, Apr 15, 2014 at 9:29 AM, Manoj Samel manojsamelt...@gmail.comwrote:

 Hi,

 Is it correct to say that the offline image viewer does not accounts for
 any edits that are not yet merged into the fsimage?

 Thanks,





-- 
Cheers
-MJ


Re: Hadoop NoClassDefFoundError

2014-04-15 Thread Azuryy Yu
Please use: hadoop jar myjob.jar myjob.MyJob input output


On Tue, Apr 15, 2014 at 3:06 PM, laozh...@sina.cn laozh...@sina.cn wrote:

 Hello EveryOne:
 I am new to hadoop,and i am reading Hadoop in action.
 When i tried to run a demo from this book,I got a problem and could not
 find answer from the net. Can you help me on this ?

 below is the error info :

  $ hadoop jar myjob.jar MyJob input output
 Exception in thread main java.lang.NoClassDefFoundError: MyJob (wrong
 name: myjob/MyJob)
 at java.lang.ClassLoader.defineClass1(Native Method)
 at java.lang.ClassLoader.defineClass(ClassLoader.java:791)
 at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
 at java.net.URLClassLoader.defineClass(URLClassLoader.java:449)
 at java.net.URLClassLoader.access$100(URLClassLoader.java:71)
 at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
 at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
 at java.security.AccessController.doPrivileged(Native Method)
 at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:423)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:356)
 at java.lang.Class.forName0(Native Method)
 at java.lang.Class.forName(Class.java:264)
 at org.apache.hadoop.util.RunJar.main(RunJar.java:149)

 and this is the command that i compile the .java , I compiled in Win7 and
 ran on ubuntu .


 below is MyJob.java

 package myjob;

 import java.io.IOException;
 import java.util.Iterator;

 import org.apache.hadoop.conf.Configuration;
 import org.apache.hadoop.conf.Configured;
 import org.apache.hadoop.fs.Path;
 import org.apache.hadoop.io.Text;
 import org.apache.hadoop.mapred.FileInputFormat;
 import org.apache.hadoop.mapred.FileOutputFormat;
 import org.apache.hadoop.mapred.JobClient;
 import org.apache.hadoop.mapred.JobConf;
 import org.apache.hadoop.mapred.KeyValueTextInputFormat;
 import org.apache.hadoop.mapred.MapReduceBase;
 import org.apache.hadoop.mapred.Mapper;
 import org.apache.hadoop.mapred.OutputCollector;
 import org.apache.hadoop.mapred.Reducer;
 import org.apache.hadoop.mapred.Reporter;
 import org.apache.hadoop.mapred.TextOutputFormat;
 import org.apache.hadoop.util.Tool;
 import org.apache.hadoop.util.ToolRunner;

 public class MyJob extends Configured implements Tool{

 @Override
 public int run(String[] args) throws Exception {
 Configuration conf = getConf();
 JobConf job = new JobConf(conf,MyJob.class);
 Path in = new Path(args[0]);
 Path out = new Path(args[1]);
 FileInputFormat.setInputPaths(job, in);
 FileOutputFormat.setOutputPath(job, out);
 job.setJobName(MyJob);
 job.setJarByClass(MyJob.class);
 job.setMapperClass(MapClass.class);
 job.setReducerClass(Reduce.class);

 job.setInputFormat(KeyValueTextInputFormat.class);
 job.setOutputFormat(TextOutputFormat.class);
 job.setOutputKeyClass(Text.class);
 job.setOutputValueClass(Text.class);
 job.set(key.value.separator.in.input.line,,);
 JobClient.runJob(job);
 return 0;
 }

 public static class MapClass extends MapReduceBase implements
 MapperText,Text,Text,Text{

 @Override
 public void map(Text key, Text value, OutputCollectorText, Text output,
 Reporter reporter) throws IOException {
 output.collect(value, key);
 }
 }

 public static class Reduce extends MapReduceBase implements
 ReducerText,Text,Text,Text{

 @Override
 public void reduce(Text key, IteratorText values,
 OutputCollectorText, Text output, Reporter reporter)
 throws IOException {
 String csv = ;
 while(values.hasNext()){
 if(csv.length()  0)
 csv += ,;
 csv += values.next().toString();
 }
 }
 }

 public static void main(String[] args) throws Exception {
 int res = ToolRunner.run(new Configuration(), new MyJob(), args);
 System.exit(res);
 }
 }
 --
 Thank you for your kindly help !

inline: 2014-04-15_150135.png

Re: Hadoop NoClassDefFoundError

2014-04-15 Thread Azuryy Yu
Please use: hadoop jar myjob.jar myjob.MyJob input output


On Tue, Apr 15, 2014 at 3:06 PM, laozh...@sina.cn laozh...@sina.cn wrote:

 Hello EveryOne:
 I am new to hadoop,and i am reading Hadoop in action.
 When i tried to run a demo from this book,I got a problem and could not
 find answer from the net. Can you help me on this ?

 below is the error info :

  $ hadoop jar myjob.jar MyJob input output
 Exception in thread main java.lang.NoClassDefFoundError: MyJob (wrong
 name: myjob/MyJob)
 at java.lang.ClassLoader.defineClass1(Native Method)
 at java.lang.ClassLoader.defineClass(ClassLoader.java:791)
 at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
 at java.net.URLClassLoader.defineClass(URLClassLoader.java:449)
 at java.net.URLClassLoader.access$100(URLClassLoader.java:71)
 at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
 at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
 at java.security.AccessController.doPrivileged(Native Method)
 at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:423)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:356)
 at java.lang.Class.forName0(Native Method)
 at java.lang.Class.forName(Class.java:264)
 at org.apache.hadoop.util.RunJar.main(RunJar.java:149)

 and this is the command that i compile the .java , I compiled in Win7 and
 ran on ubuntu .


 below is MyJob.java

 package myjob;

 import java.io.IOException;
 import java.util.Iterator;

 import org.apache.hadoop.conf.Configuration;
 import org.apache.hadoop.conf.Configured;
 import org.apache.hadoop.fs.Path;
 import org.apache.hadoop.io.Text;
 import org.apache.hadoop.mapred.FileInputFormat;
 import org.apache.hadoop.mapred.FileOutputFormat;
 import org.apache.hadoop.mapred.JobClient;
 import org.apache.hadoop.mapred.JobConf;
 import org.apache.hadoop.mapred.KeyValueTextInputFormat;
 import org.apache.hadoop.mapred.MapReduceBase;
 import org.apache.hadoop.mapred.Mapper;
 import org.apache.hadoop.mapred.OutputCollector;
 import org.apache.hadoop.mapred.Reducer;
 import org.apache.hadoop.mapred.Reporter;
 import org.apache.hadoop.mapred.TextOutputFormat;
 import org.apache.hadoop.util.Tool;
 import org.apache.hadoop.util.ToolRunner;

 public class MyJob extends Configured implements Tool{

 @Override
 public int run(String[] args) throws Exception {
 Configuration conf = getConf();
 JobConf job = new JobConf(conf,MyJob.class);
 Path in = new Path(args[0]);
 Path out = new Path(args[1]);
 FileInputFormat.setInputPaths(job, in);
 FileOutputFormat.setOutputPath(job, out);
 job.setJobName(MyJob);
 job.setJarByClass(MyJob.class);
 job.setMapperClass(MapClass.class);
 job.setReducerClass(Reduce.class);

 job.setInputFormat(KeyValueTextInputFormat.class);
 job.setOutputFormat(TextOutputFormat.class);
 job.setOutputKeyClass(Text.class);
 job.setOutputValueClass(Text.class);
 job.set(key.value.separator.in.input.line,,);
 JobClient.runJob(job);
 return 0;
 }

 public static class MapClass extends MapReduceBase implements
 MapperText,Text,Text,Text{

 @Override
 public void map(Text key, Text value, OutputCollectorText, Text output,
 Reporter reporter) throws IOException {
 output.collect(value, key);
 }
 }

 public static class Reduce extends MapReduceBase implements
 ReducerText,Text,Text,Text{

 @Override
 public void reduce(Text key, IteratorText values,
 OutputCollectorText, Text output, Reporter reporter)
 throws IOException {
 String csv = ;
 while(values.hasNext()){
 if(csv.length()  0)
 csv += ,;
 csv += values.next().toString();
 }
 }
 }

 public static void main(String[] args) throws Exception {
 int res = ToolRunner.run(new Configuration(), new MyJob(), args);
 System.exit(res);
 }
 }
 --
 Thank you for your kindly help !

inline: 2014-04-15_150135.png

About Could not find the main class: org.apache.hadoop.hdfs.server.namenode.NameNode

2014-04-15 Thread Anacristing
Hi,


I'm trying to setup Hadoop(version 2.2.0) on Windows(32-bit) with 
cygwin(version 1.7.5).
I export JAVA_HOME=/cygdrive/c/Java/jdk1.7.0_51 in hadoop-env.sh 
and the classpath is
/home/Administrator/hadoop-2.2.0/etc/hadoop:
/home/Administrator/hadoop-2.2.0/share/hadoop/common/lib/*:
/home/Administrator/hadoop-2.2.0/share/hadoop/common/*:
/home/Administrator/hadoop-2.2.0/share/hadoop/hdfs:
/home/Administrator/hadoop-2.2.0/share/hadoop/hdfs/lib/*:
/home/Administrator/hadoop-2.2.0/share/hadoop/hdfs/*:
/home/Administrator/hadoop-2.2.0/share/hadoop/yarn/lib/*:
/home/Administrator/hadoop-2.2.0/share/hadoop/yarn/*:
/home/Administrator/hadoop-2.2.0/share/hadoop/mapreduce/lib/*:
/home/Administrator/hadoop-2.2.0/share/hadoop/mapreduce/*:
/contrib/capacity-scheduler/*.jar



when I execute bin/hdfs namenode -format, I get Could not find the main 
class: org.apache.hadoop.hdfs.server.namenode.NameNode


Anybody know why?


Thanks!

Re: About Could not find the main class: org.apache.hadoop.hdfs.server.namenode.NameNode

2014-04-15 Thread Shengjun Xin
try to use bin/hadoop classpath to check whether the classpath is what you
set


On Tue, Apr 15, 2014 at 4:16 PM, Anacristing 99403...@qq.com wrote:

 Hi,

 I'm trying to setup Hadoop(version 2.2.0) on Windows(32-bit) with
 cygwin(version 1.7.5).
 I export JAVA_HOME=/cygdrive/c/Java/jdk1.7.0_51 in hadoop-env.sh
 and the classpath is
 /home/Administrator/hadoop-2.2.0/etc/hadoop:
 /home/Administrator/hadoop-2.2.0/share/hadoop/common/lib/*:
 /home/Administrator/hadoop-2.2.0/share/hadoop/common/*:
 /home/Administrator/hadoop-2.2.0/share/hadoop/hdfs:
 /home/Administrator/hadoop-2.2.0/share/hadoop/hdfs/lib/*:
 /home/Administrator/hadoop-2.2.0/share/hadoop/hdfs/*:
 /home/Administrator/hadoop-2.2.0/share/hadoop/yarn/lib/*:
 /home/Administrator/hadoop-2.2.0/share/hadoop/yarn/*:
 /home/Administrator/hadoop-2.2.0/share/hadoop/mapreduce/lib/*:
 /home/Administrator/hadoop-2.2.0/share/hadoop/mapreduce/*:
 /contrib/capacity-scheduler/*.jar

 when I execute bin/hdfs namenode -format, I get Could not find the
 main class: org.apache.hadoop.hdfs.server.namenode.NameNode

 Anybody know why?

 Thanks!




-- 
Regards
Shengjun


hadoop eclipse plugin compile path

2014-04-15 Thread Alex Lee
Trying to use the below command to generate hadoop eclipse plugin, but seem the 
directory =/usr/local/hadoop-2.2.0 not correct. I just used the ambari to 
installed the hadoop.

 $ant jar  -Dversion=2.2.0 -Declipse.home=/usr/local/eclipse 
-Dhadoop.home=/usr/local/hadoop-2.2.0
 
error log
BUILD FAILED
/usr/local/hadoop2x-eclipse-plugin/src/contrib/eclipse-plugin/build.xml:76: 
/usr/local/hadoop-2.2.0/share/hadoop/mapreduce does not exist.

May I know where is the install path for hadoop.
 
Any suggestion, thanks.
 
 
  

Re: Setting debug log level for individual daemons

2014-04-15 Thread Gordon Wang
Put the following line in the log4j setting file.

log4j.logger.org.apache.hadoop.yarn.server.resourcemanager=DEBUG,console


On Tue, Apr 15, 2014 at 8:33 AM, Ashwin Shankar
ashwinshanka...@gmail.comwrote:

 Hi,
 How do we set log level to debug for lets say only Resource manager
 and not the other hadoop daemons ?

 --
 Thanks,
 Ashwin





-- 
Regards
Gordon Wang


Re: Re: Hadoop NoClassDefFoundError

2014-04-15 Thread laozh...@sina.cn






Thank you for your advice . When i user your command , i get the below error 
info .$ hadoop jar myjob.jar myjob.MyJob input outputException in thread main 
java.lang.ClassNotFoundException: myjob.MyJob
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:423)
at java.lang.ClassLoader.loadClass(ClassLoader.java:356)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:264)
at org.apache.hadoop.util.RunJar.main(RunJar.java:149)




 From: Azuryy YuDate: 2014-04-15 16:14To: user@hadoop.apache.orgSubject: Re: 
Hadoop NoClassDefFoundErrorPlease use: hadoop jar myjob.jar myjob.MyJob input 
output


On Tue, Apr 15, 2014 at 3:06 PM, laozh...@sina.cn laozh...@sina.cn wrote:


Hello EveryOne:    I am new to hadoop,and i am reading Hadoop in action.When i 
tried to run a demo from this book,I got a problem and could not find answer 
from the net. Can you help me on this ?

below is the error info :


 $ hadoop jar myjob.jar MyJob input output
Exception in thread main java.lang.NoClassDefFoundError: MyJob (wrong name: 
myjob/MyJob)
at java.lang.ClassLoader.defineClass1(Native Method)

at java.lang.ClassLoader.defineClass(ClassLoader.java:791)
at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)

at java.net.URLClassLoader.defineClass(URLClassLoader.java:449)
at java.net.URLClassLoader.access$100(URLClassLoader.java:71)

at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)

at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:423)
at java.lang.ClassLoader.loadClass(ClassLoader.java:356)

at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:264)
at org.apache.hadoop.util.RunJar.main(RunJar.java:149)


and this is the command that i compile the .java , I compiled in Win7 and ran 
on ubuntu .


below is MyJob.java
package myjob;



import java.io.IOException;

import java.util.Iterator;



import org.apache.hadoop.conf.Configuration;

import org.apache.hadoop.conf.Configured;

import org.apache.hadoop.fs.Path;

import org.apache.hadoop.io.Text;

import org.apache.hadoop.mapred.FileInputFormat;

import org.apache.hadoop.mapred.FileOutputFormat;

import org.apache.hadoop.mapred.JobClient;

import org.apache.hadoop.mapred.JobConf;

import org.apache.hadoop.mapred.KeyValueTextInputFormat;

import org.apache.hadoop.mapred.MapReduceBase;

import org.apache.hadoop.mapred.Mapper;

import org.apache.hadoop.mapred.OutputCollector;

import org.apache.hadoop.mapred.Reducer;

import org.apache.hadoop.mapred.Reporter;

import org.apache.hadoop.mapred.TextOutputFormat;

import org.apache.hadoop.util.Tool;

import org.apache.hadoop.util.ToolRunner;


public class MyJob extends Configured implements Tool{



@Override

public int run(String[] args) throws Exception {

Configuration conf = getConf();

JobConf job = new JobConf(conf,MyJob.class);

Path in = new Path(args[0]);

Path out = new Path(args[1]);

FileInputFormat.setInputPaths(job, in);

FileOutputFormat.setOutputPath(job, out);

job.setJobName(MyJob);

job.setJarByClass(MyJob.class);

job.setMapperClass(MapClass.class);

job.setReducerClass(Reduce.class);



job.setInputFormat(KeyValueTextInputFormat.class);

job.setOutputFormat(TextOutputFormat.class);

job.setOutputKeyClass(Text.class);

job.setOutputValueClass(Text.class);

job.set(key.value.separator.in.input.line,,);

JobClient.runJob(job);

return 0;

}



public static class MapClass extends MapReduceBase implements 
MapperText,Text,Text,Text{



@Override

public void map(Text key, Text value, OutputCollectorText, 
Text output,

Reporter reporter) throws IOException {

output.collect(value, key);

}

}



public static class Reduce extends MapReduceBase implements 
ReducerText,Text,Text,Text{



@Override

public void reduce(Text key, IteratorText values,

OutputCollectorText, Text output, Reporter 
reporter)

throws IOException {

String csv = ;

while(values.hasNext()){

if(csv.length()  0)


memoryjava.lang.OutOfMemoryError related with number of reducer?

2014-04-15 Thread leiwang...@gmail.com
I can fix this by changing heap size.
But what confuse me is that when i change the reducer number from 24 to 84, 
there's no this error.

Any insight on this?

Thanks
Lei
Failed to merge in memoryjava.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOf(Arrays.java:2786)
at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:94)
at java.io.DataOutputStream.write(DataOutputStream.java:90)
at java.io.DataOutputStream.writeUTF(DataOutputStream.java:384)
at java.io.DataOutputStream.writeUTF(DataOutputStream.java:306)
at 
org.apache.pig.data.utils.SedesHelper.writeChararray(SedesHelper.java:66)
at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:543)
at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:435)
at 
org.apache.pig.data.utils.SedesHelper.writeGenericTuple(SedesHelper.java:135)
at org.apache.pig.data.BinInterSedes.writeTuple(BinInterSedes.java:613)
at org.apache.pig.data.BinInterSedes.writeBag(BinInterSedes.java:604)
at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:447)
at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:435)
at 
org.apache.pig.data.utils.SedesHelper.writeGenericTuple(SedesHelper.java:135)
at org.apache.pig.data.BinInterSedes.writeTuple(BinInterSedes.java:613)
at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:443)
at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:435)
at 
org.apache.pig.data.utils.SedesHelper.writeGenericTuple(SedesHelper.java:135)
at org.apache.pig.data.BinInterSedes.writeTuple(BinInterSedes.java:613)
at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:443)
at org.apache.pig.data.BinSedesTuple.write(BinSedesTuple.java:41)
at 
org.apache.pig.impl.io.PigNullableWritable.write(PigNullableWritable.java:123)
at 
org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:100)
at 
org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:84)
at org.apache.hadoop.mapred.IFile$Writer.append(IFile.java:188)
at 
org.apache.hadoop.mapred.Task$CombineOutputCollector.collect(Task.java:1145)
at 
org.apache.hadoop.mapred.Task$NewCombinerRunner$OutputConverter.write(Task.java:1456)
at 
org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:85)
at 
org.apache.hadoop.mapreduce.lib.reduce.WrappedReducer$Context.write(WrappedReducer.java:99)
at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Combine.processOnePackageOutput(PigCombiner.java:201)
at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Combine.reduce(PigCombiner.java:163)
at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Combine.reduce(PigCombiner.java:51)


leiwang...@gmail.com


unsubscribe

2014-04-15 Thread Levin Ding
Pls unsubscribe me. Thx.
在 2013-3-16,上午3:03,kishore raju hadoop1...@gmail.com 写道:

 HI,
 
  We are having an issue where multiple Task Trackers are running out of 
 memory. I have collected HeapDump on those TaskTrackers to analyze further. 
 They are currently running with 1GB Heap. we are planning to bump it to 2GB, 
 Is there a way that we can find  which Job is causing this OOM on TT's ?
 
 Any help is appreciated.
 
 
 -Thanks
  kishore 



Re: memoryjava.lang.OutOfMemoryError related with number of reducer?

2014-04-15 Thread Thomas Bentsen
When you increase the number of reducers they each have less to work
with provided the data is distributed evenly between them - in this case
about one third of the original work.
It is eessentially the same thing as increasing the heap size - it's
just distributed between more reducers.

/th



On Tue, 2014-04-15 at 20:41 +0800, leiwang...@gmail.com wrote:
 I can fix this by changing heap size.
 But what confuse me is that when i change the reducer number from 24
 to 84, there's no this error.
 
 
 Any insight on this?
 
 
 Thanks
 Lei
 Failed to merge in memoryjava.lang.OutOfMemoryError: Java heap space
   at java.util.Arrays.copyOf(Arrays.java:2786)
   at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:94)
   at java.io.DataOutputStream.write(DataOutputStream.java:90)
   at java.io.DataOutputStream.writeUTF(DataOutputStream.java:384)
   at java.io.DataOutputStream.writeUTF(DataOutputStream.java:306)
   at 
 org.apache.pig.data.utils.SedesHelper.writeChararray(SedesHelper.java:66)
   at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:543)
   at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:435)
   at 
 org.apache.pig.data.utils.SedesHelper.writeGenericTuple(SedesHelper.java:135)
   at org.apache.pig.data.BinInterSedes.writeTuple(BinInterSedes.java:613)
   at org.apache.pig.data.BinInterSedes.writeBag(BinInterSedes.java:604)
   at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:447)
   at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:435)
   at 
 org.apache.pig.data.utils.SedesHelper.writeGenericTuple(SedesHelper.java:135)
   at org.apache.pig.data.BinInterSedes.writeTuple(BinInterSedes.java:613)
   at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:443)
   at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:435)
   at 
 org.apache.pig.data.utils.SedesHelper.writeGenericTuple(SedesHelper.java:135)
   at org.apache.pig.data.BinInterSedes.writeTuple(BinInterSedes.java:613)
   at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:443)
   at org.apache.pig.data.BinSedesTuple.write(BinSedesTuple.java:41)
   at 
 org.apache.pig.impl.io.PigNullableWritable.write(PigNullableWritable.java:123)
   at 
 org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:100)
   at 
 org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:84)
   at org.apache.hadoop.mapred.IFile$Writer.append(IFile.java:188)
   at 
 org.apache.hadoop.mapred.Task$CombineOutputCollector.collect(Task.java:1145)
   at 
 org.apache.hadoop.mapred.Task$NewCombinerRunner$OutputConverter.write(Task.java:1456)
   at 
 org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:85)
   at 
 org.apache.hadoop.mapreduce.lib.reduce.WrappedReducer$Context.write(WrappedReducer.java:99)
   at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Combine.processOnePackageOutput(PigCombiner.java:201)
   at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Combine.reduce(PigCombiner.java:163)
   at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Combine.reduce(PigCombiner.java:51)
 
 __
 leiwang...@gmail.com




Re: HDFS file system size issue

2014-04-15 Thread Saumitra Shahapure
Hi Rahman,

These are few lines from hadoop fsck / -blocks -files -locations

/mnt/hadoop/hive/warehouse/user.db/table1/000255_0 44323326 bytes, 1
block(s):  OK
0. blk_-7919979022650423857_446500 len=44323326 repl=3 [ip1:50010,
ip2:50010, ip3:50010]

/mnt/hadoop/hive/warehouse/user.db/table1/000256_0 44566965 bytes, 1
block(s):  OK
0. blk_-576894812882540_446288 len=44566965 repl=3 [ip1:50010,
ip2:50010, ip4:50010]


Biswa may have guessed replication factor from fsck summary that I posted
earlier. I am posting it again for today's run:

Status: HEALTHY
 Total size:58143055251 B
 Total dirs:307
 Total files:   5093
 Total blocks (validated):  3903 (avg. block size 14897016 B)
 Minimally replicated blocks:   3903 (100.0 %)
 Over-replicated blocks:0 (0.0 %)
 Under-replicated blocks:   92 (2.357161 %)
 Mis-replicated blocks: 0 (0.0 %)
 Default replication factor:2
 Average block replication: 3.1401486
 Corrupt blocks:0
 Missing replicas:  92 (0.75065273 %)
 Number of data-nodes:  9
 Number of racks:   1
FSCK ended at Tue Apr 15 13:20:25 UTC 2014 in 655 milliseconds


The filesystem under path '/' is HEALTHY

I have not overridden dfs.datanode.du.reserved. It defaults to 0.

$ less $HADOOP_HOME/conf/hdfs-site.xml |grep -A3 'dfs.datanode.du.reserved'
$ less $HADOOP_HOME/src/hdfs/hdfs-default.xml |grep -A3
'dfs.datanode.du.reserved'
  namedfs.datanode.du.reserved/name
  value0/value
  descriptionReserved space in bytes per volume. Always leave this much
space free for non dfs use.
  /description

Below is du -h on every node. FYI, my dfs.data.dir is /mnt/hadoop/dfs/data
and all hadoop/hive logs are dumped in /mnt/logs in various directories.
All machines have 400GB for /mnt.

$for i in `echo $dfs_slaves`; do  ssh $i 'du -sh /mnt/hadoop; du -sh
/mnt/hadoop/dfs/data; du -sh /mnt/logs;'; done


225G/mnt/hadoop
224G/mnt/hadoop/dfs/data
61M /mnt/logs

281G/mnt/hadoop
281G/mnt/hadoop/dfs/data
63M /mnt/logs

139G/mnt/hadoop
139G/mnt/hadoop/dfs/data
68M /mnt/logs

135G/mnt/hadoop
134G/mnt/hadoop/dfs/data
92M /mnt/logs

165G/mnt/hadoop
164G/mnt/hadoop/dfs/data
75M /mnt/logs

137G/mnt/hadoop
137G/mnt/hadoop/dfs/data
95M /mnt/logs

160G/mnt/hadoop
160G/mnt/hadoop/dfs/data
74M /mnt/logs

180G/mnt/hadoop
122G/mnt/hadoop/dfs/data
23M /mnt/logs

139G/mnt/hadoop
138G/mnt/hadoop/dfs/data
76M /mnt/logs



All these numbers are for today, and may differ bit from yesterday.

Today hadoop dfs -dus is 58GB and namenode is reporting DFS Used as 1.46TB.

Pardon me for making the mail dirty by lot of copy-pastes, hope it's still
readable,

-- Saumitra S. Shahapure


On Tue, Apr 15, 2014 at 2:57 AM, Abdelrahman Shettia 
ashet...@hortonworks.com wrote:

 Hi Biswa,

 Are you sure that the replication factor of the files are three? Please
 run a ‘hadoop fsck / -blocks -files -locations’ and see the replication
 factor for each file.  Also, Post the configuration of namedfs.datanode.
 du.reserved/name and please check the real space presented by a
 DataNode by running ‘du -h’

 Thanks,
 Rahman

 On Apr 14, 2014, at 2:07 PM, Saumitra saumitra.offic...@gmail.com wrote:

 Hello,

 Biswanath, looks like we have confusion in calculation, 1TB would be equal
 to 1024GB, not 114GB.


 Sandeep, I checked log directory size as well. Log directories are hardly
 in few GBs, I have configured log4j properties so that logs won’t be too
 large.

 In our slave machines, we have 450GB disk partition for hadoop logs and
 DFS. Over there logs directory is  10GBs and rest space is occupied by
 DFS. 10GB partition is for /.

 Let me quote my confusion point once again:

  Basically I wanted to point out discrepancy in name node status page and 
 hadoop
 dfs -dus. In my case, earlier one reports DFS usage as 1TB and later
 one reports it to be 35GB. What are the factors that can cause this
 difference? And why is just 35GB data causing DFS to hit its limits?



 I am talking about name node status page on 50070 port. Here is the
 screenshot of my name node status page

 Screen Shot 2014-04-15 at 2.07.19 am.png

 As I understand, 'DFS used’ is the space taken by DFS, non-DFS used is
 spaces taken by non-DFS data like logs or other local files from users.
 Namenode shows that DFS used is ~1TB but hadoop dfs -dus shows it to be
 ~38GB.



 On 14-Apr-2014, at 12:33 pm, Sandeep Nemuri nhsande...@gmail.com wrote:

  Please check your logs directory usage.



 On Mon, Apr 14, 2014 at 12:08 PM, Biswajit Nayak 
 biswajit.na...@inmobi.com wrote:

 Whats the replication factor you have? I believe it should be 3. hadoop
 dus shows that disk usage without replication. While name node ui page
 gives with replication.

 38gb * 3 =114gb ~ 1TB

 ~Biswa
 -oThe important thing is not to stop questioning o-


 On Mon, Apr 14, 2014 at 9:38 AM, Saumitra 

Re: Offline image viewer - account for edits ?

2014-04-15 Thread Akira AJISAKA

If you want to parse the edits, please use the Offline Edits Viewer.
http://hadoop.apache.org/docs/r2.4.0/hadoop-project-dist/hadoop-hdfs/HdfsEditsViewer.html

Thanks,
Akira

(2014/04/15 16:41), Mingjiang Shi wrote:

I think you are right because the the offline image viewer only takes
the fsimage file as input.


On Tue, Apr 15, 2014 at 9:29 AM, Manoj Samel manojsamelt...@gmail.com
mailto:manojsamelt...@gmail.com wrote:

Hi,

Is it correct to say that the offline image viewer does not accounts
for any edits that are not yet merged into the fsimage?

Thanks,





--
Cheers
-MJ




Re: Re: memoryjava.lang.OutOfMemoryError related with number of reducer?

2014-04-15 Thread leiwang...@gmail.com
Thanks Thomas. 

Anohter question.  I have no idea what is Failed to merge in memory.  Does 
the 'merge' is the shuffle phase in reducer side?  Why it is in memory?
Except the two methods(increase reducer number and increase heap size),  is 
there any other alternatives to fix this issue? 

Thanks a lot.




leiwang...@gmail.com
 
From: Thomas Bentsen
Date: 2014-04-15 21:53
To: user
Subject: Re: memoryjava.lang.OutOfMemoryError related with number of reducer?
When you increase the number of reducers they each have less to work
with provided the data is distributed evenly between them - in this case
about one third of the original work.
It is eessentially the same thing as increasing the heap size - it's
just distributed between more reducers.
 
/th
 
 
 
On Tue, 2014-04-15 at 20:41 +0800, leiwang...@gmail.com wrote:
 I can fix this by changing heap size.
 But what confuse me is that when i change the reducer number from 24
 to 84, there's no this error.
 
 
 Any insight on this?
 
 
 Thanks
 Lei
 Failed to merge in memoryjava.lang.OutOfMemoryError: Java heap space
 at java.util.Arrays.copyOf(Arrays.java:2786)
 at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:94)
 at java.io.DataOutputStream.write(DataOutputStream.java:90)
 at java.io.DataOutputStream.writeUTF(DataOutputStream.java:384)
 at java.io.DataOutputStream.writeUTF(DataOutputStream.java:306)
 at org.apache.pig.data.utils.SedesHelper.writeChararray(SedesHelper.java:66)
 at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:543)
 at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:435)
 at 
 org.apache.pig.data.utils.SedesHelper.writeGenericTuple(SedesHelper.java:135)
 at org.apache.pig.data.BinInterSedes.writeTuple(BinInterSedes.java:613)
 at org.apache.pig.data.BinInterSedes.writeBag(BinInterSedes.java:604)
 at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:447)
 at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:435)
 at 
 org.apache.pig.data.utils.SedesHelper.writeGenericTuple(SedesHelper.java:135)
 at org.apache.pig.data.BinInterSedes.writeTuple(BinInterSedes.java:613)
 at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:443)
 at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:435)
 at 
 org.apache.pig.data.utils.SedesHelper.writeGenericTuple(SedesHelper.java:135)
 at org.apache.pig.data.BinInterSedes.writeTuple(BinInterSedes.java:613)
 at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:443)
 at org.apache.pig.data.BinSedesTuple.write(BinSedesTuple.java:41)
 at 
 org.apache.pig.impl.io.PigNullableWritable.write(PigNullableWritable.java:123)
 at 
 org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:100)
 at 
 org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:84)
 at org.apache.hadoop.mapred.IFile$Writer.append(IFile.java:188)
 at 
 org.apache.hadoop.mapred.Task$CombineOutputCollector.collect(Task.java:1145)
 at 
 org.apache.hadoop.mapred.Task$NewCombinerRunner$OutputConverter.write(Task.java:1456)
 at 
 org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:85)
 at 
 org.apache.hadoop.mapreduce.lib.reduce.WrappedReducer$Context.write(WrappedReducer.java:99)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Combine.processOnePackageOutput(PigCombiner.java:201)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Combine.reduce(PigCombiner.java:163)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Combine.reduce(PigCombiner.java:51)
 
 __
 leiwang...@gmail.com
 
 


Re: Update interval of default counters

2014-04-15 Thread Akira AJISAKA

Moved to user@hadoop.apache.org.

You can configure the interval by setting
mapreduce.client.progressmonitor.pollinterval parameter.
The default value is 1000 ms.

For more details, please see 
http://hadoop.apache.org/docs/stable/hadoop-mapreduce-client/hadoop-mapreduce-client-core/mapred-default.xml.


Regards,
Akira

(2014/04/15 15:29), Dharmesh Kakadia wrote:

Hi,

What is the update interval of inbuilt framework counters? Is that
configurable?
I am trying to collect very fine grained information about the job
execution and using counters for that. It would be great if someone can
point me to documentation/code for it. Thanks in advance.

Thanks,
Dharmesh





RE: Re: memoryjava.lang.OutOfMemoryError related with number of reducer?

2014-04-15 Thread German Florez-Larrahondo
Lei

A good explanation of this can be found on the Hadoop The Definitive Guide by 
Tom White. 

Here is an excerpt that explains a bit the behavior at the reduce side and some 
possible tweaks to control it. 

 

https://www.inkling.com/read/hadoop-definitive-guide-tom-white-3rd/chapter-6/shuffle-and-sort

 

 

 

From: leiwang...@gmail.com [mailto:leiwang...@gmail.com] 
Sent: Tuesday, April 15, 2014 9:29 AM
To: user; th
Subject: Re: Re: memoryjava.lang.OutOfMemoryError related with number of 
reducer?

 

Thanks Thomas. 

 

Anohter question.  I have no idea what is Failed to merge in memory.  Does 
the 'merge' is the shuffle phase in reducer side?  Why it is in memory?

Except the two methods(increase reducer number and increase heap size),  is 
there any other alternatives to fix this issue? 

 

Thanks a lot.

 

 

  _  

leiwang...@gmail.com

 

From: Thomas Bentsen mailto:t...@bentzn.com 

Date: 2014-04-15 21:53

To: user mailto:user@hadoop.apache.org 

Subject: Re: memoryjava.lang.OutOfMemoryError related with number of reducer?

When you increase the number of reducers they each have less to work

with provided the data is distributed evenly between them - in this case

about one third of the original work.

It is eessentially the same thing as increasing the heap size - it's

just distributed between more reducers.

 

/th

 

 

 

On Tue, 2014-04-15 at 20:41 +0800, leiwang...@gmail.com wrote:

 I can fix this by changing heap size.

 But what confuse me is that when i change the reducer number from 24

 to 84, there's no this error.

 

 

 Any insight on this?

 

 

 Thanks

 Lei

 Failed to merge in memoryjava.lang.OutOfMemoryError: Java heap space

 at java.util.Arrays.copyOf(Arrays.java:2786)

 at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:94)

 at java.io.DataOutputStream.write(DataOutputStream.java:90)

 at java.io.DataOutputStream.writeUTF(DataOutputStream.java:384)

 at java.io.DataOutputStream.writeUTF(DataOutputStream.java:306)

 at org.apache.pig.data.utils.SedesHelper.writeChararray(SedesHelper.java:66)

 at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:543)

 at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:435)

 at 
 org.apache.pig.data.utils.SedesHelper.writeGenericTuple(SedesHelper.java:135)

 at org.apache.pig.data.BinInterSedes.writeTuple(BinInterSedes.java:613)

 at org.apache.pig.data.BinInterSedes.writeBag(BinInterSedes.java:604)

 at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:447)

 at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:435)

 at 
 org.apache.pig.data.utils.SedesHelper.writeGenericTuple(SedesHelper.java:135)

 at org.apache.pig.data.BinInterSedes.writeTuple(BinInterSedes.java:613)

 at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:443)

 at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:435)

 at 
 org.apache.pig.data.utils.SedesHelper.writeGenericTuple(SedesHelper.java:135)

 at org.apache.pig.data.BinInterSedes.writeTuple(BinInterSedes.java:613)

 at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:443)

 at org.apache.pig.data.BinSedesTuple.write(BinSedesTuple.java:41)

 at 
 org.apache.pig.impl.io.PigNullableWritable.write(PigNullableWritable.java:123)

 at 
 org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:100)

 at 
 org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:84)

 at org.apache.hadoop.mapred.IFile$Writer.append(IFile.java:188)

 at 
 org.apache.hadoop.mapred.Task$CombineOutputCollector.collect(Task.java:1145)

 at 
 org.apache.hadoop.mapred.Task$NewCombinerRunner$OutputConverter.write(Task.java:1456)

 at 
 org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:85)

 at 
 org.apache.hadoop.mapreduce.lib.reduce.WrappedReducer$Context.write(WrappedReducer.java:99)

 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Combine.processOnePackageOutput(PigCombiner.java:201)

 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Combine.reduce(PigCombiner.java:163)

 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Combine.reduce(PigCombiner.java:51)

 

 __

 leiwang...@gmail.com

 

 



Re: RE: memoryjava.lang.OutOfMemoryError related with number of reducer?

2014-04-15 Thread leiwang...@gmail.com
Thanks, let me take a careful look at it. 



leiwang...@gmail.com
 
From: German Florez-Larrahondo
Date: 2014-04-15 23:27
To: user; 'th'
Subject: RE: Re: memoryjava.lang.OutOfMemoryError related with number of 
reducer?
Lei
A good explanation of this can be found on the Hadoop The Definitive Guide by 
Tom White. 
Here is an excerpt that explains a bit the behavior at the reduce side and some 
possible tweaks to control it. 
 
https://www.inkling.com/read/hadoop-definitive-guide-tom-white-3rd/chapter-6/shuffle-and-sort
 
 
 
From: leiwang...@gmail.com [mailto:leiwang...@gmail.com] 
Sent: Tuesday, April 15, 2014 9:29 AM
To: user; th
Subject: Re: Re: memoryjava.lang.OutOfMemoryError related with number of 
reducer?
 
Thanks Thomas. 
 
Anohter question.  I have no idea what is Failed to merge in memory.  Does 
the 'merge' is the shuffle phase in reducer side?  Why it is in memory?
Except the two methods(increase reducer number and increase heap size),  is 
there any other alternatives to fix this issue? 
 
Thanks a lot.
 
 


leiwang...@gmail.com
 
From: Thomas Bentsen
Date: 2014-04-15 21:53
To: user
Subject: Re: memoryjava.lang.OutOfMemoryError related with number of reducer?
When you increase the number of reducers they each have less to work
with provided the data is distributed evenly between them - in this case
about one third of the original work.
It is eessentially the same thing as increasing the heap size - it's
just distributed between more reducers.
 
/th
 
 
 
On Tue, 2014-04-15 at 20:41 +0800, leiwang...@gmail.com wrote:
 I can fix this by changing heap size.
 But what confuse me is that when i change the reducer number from 24
 to 84, there's no this error.
 
 
 Any insight on this?
 
 
 Thanks
 Lei
 Failed to merge in memoryjava.lang.OutOfMemoryError: Java heap space
 at java.util.Arrays.copyOf(Arrays.java:2786)
 at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:94)
 at java.io.DataOutputStream.write(DataOutputStream.java:90)
 at java.io.DataOutputStream.writeUTF(DataOutputStream.java:384)
 at java.io.DataOutputStream.writeUTF(DataOutputStream.java:306)
 at org.apache.pig.data.utils.SedesHelper.writeChararray(SedesHelper.java:66)
 at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:543)
 at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:435)
 at 
 org.apache.pig.data.utils.SedesHelper.writeGenericTuple(SedesHelper.java:135)
 at org.apache.pig.data.BinInterSedes.writeTuple(BinInterSedes.java:613)
 at org.apache.pig.data.BinInterSedes.writeBag(BinInterSedes.java:604)
 at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:447)
 at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:435)
 at 
 org.apache.pig.data.utils.SedesHelper.writeGenericTuple(SedesHelper.java:135)
 at org.apache.pig.data.BinInterSedes.writeTuple(BinInterSedes.java:613)
 at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:443)
 at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:435)
 at 
 org.apache.pig.data.utils.SedesHelper.writeGenericTuple(SedesHelper.java:135)
 at org.apache.pig.data.BinInterSedes.writeTuple(BinInterSedes.java:613)
 at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:443)
 at org.apache.pig.data.BinSedesTuple.write(BinSedesTuple.java:41)
 at 
 org.apache.pig.impl.io.PigNullableWritable.write(PigNullableWritable.java:123)
 at 
 org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:100)
 at 
 org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:84)
 at org.apache.hadoop.mapred.IFile$Writer.append(IFile.java:188)
 at 
 org.apache.hadoop.mapred.Task$CombineOutputCollector.collect(Task.java:1145)
 at 
 org.apache.hadoop.mapred.Task$NewCombinerRunner$OutputConverter.write(Task.java:1456)
 at 
 org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:85)
 at 
 org.apache.hadoop.mapreduce.lib.reduce.WrappedReducer$Context.write(WrappedReducer.java:99)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Combine.processOnePackageOutput(PigCombiner.java:201)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Combine.reduce(PigCombiner.java:163)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Combine.reduce(PigCombiner.java:51)
 
 __
 leiwang...@gmail.com
 
 


Re: Offline image viewer - account for edits ?

2014-04-15 Thread Manoj Samel
So, is it correct to say that if one wants to get the latest state of the
Name node, the information from imageviewer and from edits viewer has to be
combined somehow ?

Thanks,


On Tue, Apr 15, 2014 at 7:26 AM, Akira AJISAKA
ajisa...@oss.nttdata.co.jpwrote:

 If you want to parse the edits, please use the Offline Edits Viewer.
 http://hadoop.apache.org/docs/r2.4.0/hadoop-project-dist/
 hadoop-hdfs/HdfsEditsViewer.html

 Thanks,
 Akira


 (2014/04/15 16:41), Mingjiang Shi wrote:

 I think you are right because the the offline image viewer only takes
 the fsimage file as input.


 On Tue, Apr 15, 2014 at 9:29 AM, Manoj Samel manojsamelt...@gmail.com
 mailto:manojsamelt...@gmail.com wrote:

 Hi,

 Is it correct to say that the offline image viewer does not accounts
 for any edits that are not yet merged into the fsimage?

 Thanks,





 --
 Cheers
 -MJ





Re: Setting debug log level for individual daemons

2014-04-15 Thread Ashwin Shankar
Thanks Gordon and Stanley, but this would require us to bounce the process.
Is there a way to change log levels without bouncing the process ?



On Tue, Apr 15, 2014 at 3:23 AM, Gordon Wang gw...@gopivotal.com wrote:

 Put the following line in the log4j setting file.

 log4j.logger.org.apache.hadoop.yarn.server.resourcemanager=DEBUG,console


 On Tue, Apr 15, 2014 at 8:33 AM, Ashwin Shankar ashwinshanka...@gmail.com
  wrote:

 Hi,
 How do we set log level to debug for lets say only Resource manager
 and not the other hadoop daemons ?

 --
 Thanks,
 Ashwin





 --
 Regards
 Gordon Wang




-- 
Thanks,
Ashwin


Find the task and it's datanode which is taking the most time in a cluster

2014-04-15 Thread Shashidhar Rao
Hi,

Can somebody please help me how to find the task and the datanode in a
large cluster which has failed or which is taking the most time to execute
considering thousands of mappers and reducers are running.

Regards
Shashi


Compiling from Source

2014-04-15 Thread Justin Mrkva
I’m using the guide at 
http://hadoop.apache.org/docs/r2.3.0/hadoop-project-dist/hadoop-common/SingleCluster.html
 to try to compile the native Hadoop libraries because I’m running a 64 bit OS 
and it keeps complaining that the native libraries can’t be found.

After running the third command (mvn clean install assembly:assembly -Pnative) 
I get the output shown in the gist at 
https://gist.github.com/anonymous/dd8e1833d09b48bdb813

I’m installing Hadoop 2.4.0 on CentOS 6.5 64-bit. The operating system is a 
clean install and is running nothing but Hadoop.

Where should I go from here? There are so many packages used in Hadoop that I 
have no idea where to begin, and Maven gives no indication of what the actual 
error is.

Compiling from Source

2014-04-15 Thread Justin Mrkva
I’m using the guide at 
http://hadoop.apache.org/docs/r2.3.0/hadoop-project-dist/hadoop-common/SingleCluster.html
 to try to compile the native Hadoop libraries because I’m running a 64 bit OS 
and it keeps complaining that the native libraries can’t be found.

After running the third command (mvn clean install assembly:assembly -Pnative) 
I get the output shown in the gist at 
https://gist.github.com/anonymous/dd8e1833d09b48bdb813

I’m installing Hadoop 2.4.0 on CentOS 6.5 64-bit. The operating system is a 
clean install and is running nothing but Hadoop.

Where should I go from here? There are so many packages used in Hadoop that I 
have no idea where to begin, and Maven gives no indication of what the actual 
error is.

Warning: $HADOOP_HOME is deprecated

2014-04-15 Thread Radhe Radhe
Hello All,
I have configured Apache Hadoop 1.2.0 and set the $HADOOP_HOME env. variable:
I keep getting :Warning: $HADOOP_HOME is deprecated
Solution:(After googling)I replaced HADOOP_HOME with HADOOP_PREFIX and the 
warning disappeared.
Does that mean HADOOP_HOME is replaced by HADOOP_PREFIX? If Yes from which 
version did this got changed? I tried googling but could get the release no.
Is HADOOP_PREFIX the correct env. variable that should be used for all latest 
Apache Hadoop releases including Apache Hadoop 2 (YARN)?

Thanks,-RR

Re: HDFS file system size issue

2014-04-15 Thread Abdelrahman Shettia
Hi Saumitra,

It looks like the over replicated blocks root cause is not the issue that
the cluster is experiencing.   I can only think of miss configuring the
dfs.data.dir parameter. Can you ensure that each one of the data
directories is using only one partition(mount) and there is no other data
directory sharing the same partition(mount)?
The role should be one data directory per partition(mount). Also, please
check inside the dfs.data.dir for a third party files/directories. Hope
this helps.


Thanks
-Rahman


On Tue, Apr 15, 2014 at 6:54 AM, Saumitra Shahapure 
saumitra.offic...@gmail.com wrote:

 Hi Rahman,

 These are few lines from hadoop fsck / -blocks -files -locations

 /mnt/hadoop/hive/warehouse/user.db/table1/000255_0 44323326 bytes, 1
 block(s):  OK
 0. blk_-7919979022650423857_446500 len=44323326 repl=3 [ip1:50010,
 ip2:50010, ip3:50010]

 /mnt/hadoop/hive/warehouse/user.db/table1/000256_0 44566965 bytes, 1
 block(s):  OK
 0. blk_-576894812882540_446288 len=44566965 repl=3 [ip1:50010,
 ip2:50010, ip4:50010]


 Biswa may have guessed replication factor from fsck summary that I posted
 earlier. I am posting it again for today's run:

 Status: HEALTHY
  Total size:58143055251 B
  Total dirs:307
  Total files:   5093
  Total blocks (validated):  3903 (avg. block size 14897016 B)
  Minimally replicated blocks:   3903 (100.0 %)

  Over-replicated blocks:0 (0.0 %)
  Under-replicated blocks:   92 (2.357161 %)

  Mis-replicated blocks: 0 (0.0 %)
  Default replication factor:2
  Average block replication: 3.1401486
  Corrupt blocks:0
  Missing replicas:  92 (0.75065273 %)

  Number of data-nodes:  9
  Number of racks:   1
 FSCK ended at Tue Apr 15 13:20:25 UTC 2014 in 655 milliseconds


 The filesystem under path '/' is HEALTHY

 I have not overridden dfs.datanode.du.reserved. It defaults to 0.

 $ less $HADOOP_HOME/conf/hdfs-site.xml |grep -A3 'dfs.datanode.du.reserved'
 $ less $HADOOP_HOME/src/hdfs/hdfs-default.xml |grep -A3
 'dfs.datanode.du.reserved'
   namedfs.datanode.du.reserved/name
   value0/value
   descriptionReserved space in bytes per volume. Always leave this much
 space free for non dfs use.
   /description

 Below is du -h on every node. FYI, my dfs.data.dir is /mnt/hadoop/dfs/data
 and all hadoop/hive logs are dumped in /mnt/logs in various directories.
 All machines have 400GB for /mnt.

 $for i in `echo $dfs_slaves`; do  ssh $i 'du -sh /mnt/hadoop; du -sh
 /mnt/hadoop/dfs/data; du -sh /mnt/logs;'; done


 225G/mnt/hadoop
 224G/mnt/hadoop/dfs/data
 61M /mnt/logs

 281G/mnt/hadoop
 281G/mnt/hadoop/dfs/data
 63M /mnt/logs

 139G/mnt/hadoop
 139G/mnt/hadoop/dfs/data
 68M /mnt/logs

 135G/mnt/hadoop
 134G/mnt/hadoop/dfs/data
 92M /mnt/logs

 165G/mnt/hadoop
 164G/mnt/hadoop/dfs/data
 75M /mnt/logs

 137G/mnt/hadoop
 137G/mnt/hadoop/dfs/data
 95M /mnt/logs

 160G/mnt/hadoop
 160G/mnt/hadoop/dfs/data
 74M /mnt/logs

 180G/mnt/hadoop
 122G/mnt/hadoop/dfs/data
 23M /mnt/logs

 139G/mnt/hadoop
 138G/mnt/hadoop/dfs/data
 76M /mnt/logs



 All these numbers are for today, and may differ bit from yesterday.

 Today hadoop dfs -dus is 58GB and namenode is reporting DFS Used as 1.46TB.

 Pardon me for making the mail dirty by lot of copy-pastes, hope it's still
 readable,

 -- Saumitra S. Shahapure


 On Tue, Apr 15, 2014 at 2:57 AM, Abdelrahman Shettia 
 ashet...@hortonworks.com wrote:

 Hi Biswa,

 Are you sure that the replication factor of the files are three? Please
 run a 'hadoop fsck / -blocks -files -locations' and see the replication
 factor for each file.  Also, Post the configuration of namedfs.datanode.
 du.reserved/name and please check the real space presented by a
 DataNode by running 'du -h'

 Thanks,
 Rahman

 On Apr 14, 2014, at 2:07 PM, Saumitra saumitra.offic...@gmail.com
 wrote:

 Hello,

 Biswanath, looks like we have confusion in calculation, 1TB would be
 equal to 1024GB, not 114GB.


 Sandeep, I checked log directory size as well. Log directories are hardly
 in few GBs, I have configured log4j properties so that logs won't be too
 large.

 In our slave machines, we have 450GB disk partition for hadoop logs and
 DFS. Over there logs directory is  10GBs and rest space is occupied by
 DFS. 10GB partition is for /.

 Let me quote my confusion point once again:

  Basically I wanted to point out discrepancy in name node status page and 
 hadoop
 dfs -dus. In my case, earlier one reports DFS usage as 1TB and later
 one reports it to be 35GB. What are the factors that can cause this
 difference? And why is just 35GB data causing DFS to hit its limits?



 I am talking about name node status page on 50070 port. Here is the
 screenshot of my name node status page

 Screen Shot 2014-04-15 at 2.07.19 am.png

 As I understand, 'DFS used' is the space taken by DFS, non-DFS used is
 spaces 

Re: Find the task and it's datanode which is taking the most time in a cluster

2014-04-15 Thread Abdelrahman Shettia
Hi Shashi,

I am assuming that you are running hadoop 1.x. There is an option to see
the failed tasks on the Job tracker UI. Please replace the jobtracker host
 with the actual host and click on the following link and look for the task
failure.

http://[jobtrackerhost]:50030/machines.jsp?type=active


Thanks
-Rahman


On Tue, Apr 15, 2014 at 11:11 AM, Shashidhar Rao raoshashidhar...@gmail.com
 wrote:

 Hi,

 Can somebody please help me how to find the task and the datanode in a
 large cluster which has failed or which is taking the most time to execute
 considering thousands of mappers and reducers are running.

 Regards
 Shashi


-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


Apache Hadoop 2.x installation *environment variables*

2014-04-15 Thread Radhe Radhe
Hello All,
For Apache Hadoop 2.x (YARN) installation which *environment variables* are 
REALLY needed.
By referring to various blogs I am getting a mix:
HADOOP_COMMON_HOMEHADOOP_CONF_DIRHADOOP_HDFS_HOMEHADOOP_HOMEHADOOP_MAPRED_HOMEHADOOP_PREFIXYARN_HOME
HADOOP_COMMON_HOMEHADOOP_CONF_DIRHADOOP_HDFS_HOMEHADOOP_PREFIXHADOOP_MAPRED_HOMEHADOOP_YARN_HOMEYARN_CONF_DIRMAPRED_CONF_DIRYARN_CLASSPATH
HADOOP_COMMON_HOMEHADOOP_CONF_DIRHADOOP_HDFS_HOMEHADOOP_HOMEHADOOP_MAPRED_HOMEHADOOP_YARN_HOME
HADOOP_COMMON_HOMEHADOOP_CONF_DIRHADOOP_HDFS_HOMEHADOOP_HOMEHADOOP_MAPRED_HOMEYARN_HOME
From Apache Hadoop Site:
$HADOOP_COMMON_HOME $HADOOP_CONF_DIR$HADOOP_HDFS_HOME$HADOO_MAPRED_HOME 
$HADOOP_YARN_HOME$YARN_CONF_DIR the same as $HADOOP_CONF_DIR
Thanks,-RR
  

RE: Doubt regarding Binary Compatibility\Source Compatibility with old *mapred* APIs and new *mapreduce* APIs in Hadoop

2014-04-15 Thread Radhe Radhe
Thanks John for your comments,
I believe MRv2 has support for both the old *mapred* APIs and new *mapreduce* 
APIs.
I see this way:[1.]  One may have binaries i.e. jar file of the M\R program 
that used old *mapred* APIsThis will work directly on MRv2(YARN).
[2.]  One may have the source code i.e. Java Programs of the M\R program that 
used old *mapred* APIsFor this I need to recompile and generate the binaries 
i.e. jar file. Do I have to change the old *org.apache.hadoop.mapred* APIs to 
new *org.apache.hadoop.mapreduce* APIs? or No code changes are needed?
-RR
 Date: Mon, 14 Apr 2014 10:37:56 -0400
 Subject: Re: Doubt regarding Binary Compatibility\Source Compatibility with 
 old *mapred* APIs and new *mapreduce* APIs in Hadoop
 From: john.meag...@gmail.com
 To: user@hadoop.apache.org
 
 Also, Source Compatibility also means ONLY a recompile is needed.
 No code changes should be needed.
 
 On Mon, Apr 14, 2014 at 10:37 AM, John Meagher john.meag...@gmail.com wrote:
  Source Compatibility = you need to recompile and use the new version
  as part of the compilation
 
  Binary Compatibility = you can take something compiled against the old
  version and run it on the new version
 
  On Mon, Apr 14, 2014 at 9:19 AM, Radhe Radhe
  radhe.krishna.ra...@live.com wrote:
  Hello People,
 
  As per the Apache site
  http://hadoop.apache.org/docs/r2.3.0/hadoop-mapreduce-client/hadoop-mapreduce-client-core/MapReduce_Compatibility_Hadoop1_Hadoop2.html
 
  Binary Compatibility
  
  First, we ensure binary compatibility to the applications that use old
  mapred APIs. This means that applications which were built against MRv1
  mapred APIs can run directly on YARN without recompilation, merely by
  pointing them to an Apache Hadoop 2.x cluster via configuration.
 
  Source Compatibility
  
  We cannot ensure complete binary compatibility with the applications that
  use mapreduce APIs, as these APIs have evolved a lot since MRv1. However, 
  we
  ensure source compatibility for mapreduce APIs that break binary
  compatibility. In other words, users should recompile their applications
  that use mapreduce APIs against MRv2 jars. One notable binary
  incompatibility break is Counter and CounterGroup.
 
  For Binary Compatibility I understand that if I had build a MR job with
  old *mapred* APIs then they can be run directly on YARN without and 
  changes.
 
  Can anybody explain what do we mean by Source Compatibility here and also
  a use case where one will need it?
 
  Does that mean code changes if I already have a MR job source code written
  with with old *mapred* APIs and I need to make some changes to it to run in
  then I need to use the new mapreduce* API and generate the new  binaries?
 
  Thanks,
  -RR
 
 
  

Re: Doubt regarding Binary Compatibility\Source Compatibility with old *mapred* APIs and new *mapreduce* APIs in Hadoop

2014-04-15 Thread Zhijie Shen
1. If you have the binaries that were compiled against MRv1 *mapred* libs,
it should just work with MRv2.
2. If you have the source code that refers to MRv1 *mapred* libs, it should
be compilable without code changes. Of course, you're free to change your
code.
3. If you have the binaries that were compiled against MRv1 *mapreduce* libs,
it may not be executable directly with MRv2, but you should able to compile
it against MRv2 *mapreduce* libs without code changes, and execute it.

- Zhijie


On Tue, Apr 15, 2014 at 12:44 PM, Radhe Radhe
radhe.krishna.ra...@live.comwrote:

 Thanks John for your comments,

 I believe MRv2 has support for both the old *mapred* APIs and new
 *mapreduce* APIs.

 I see this way:
 [1.]  One may have binaries i.e. jar file of the M\R program that used old
 *mapred* APIs
 This will work directly on MRv2(YARN).

 [2.]  One may have the source code i.e. Java Programs of the M\R program
 that used old *mapred* APIs
 For this I need to recompile and generate the binaries i.e. jar file.
 Do I have to change the old *org.apache.hadoop.mapred* APIs to new *
 org.apache.hadoop.mapreduce* APIs? or No code changes are needed?

 -RR

  Date: Mon, 14 Apr 2014 10:37:56 -0400
  Subject: Re: Doubt regarding Binary Compatibility\Source Compatibility
 with old *mapred* APIs and new *mapreduce* APIs in Hadoop
  From: john.meag...@gmail.com
  To: user@hadoop.apache.org

 
  Also, Source Compatibility also means ONLY a recompile is needed.
  No code changes should be needed.
 
  On Mon, Apr 14, 2014 at 10:37 AM, John Meagher john.meag...@gmail.com
 wrote:
   Source Compatibility = you need to recompile and use the new version
   as part of the compilation
  
   Binary Compatibility = you can take something compiled against the old
   version and run it on the new version
  
   On Mon, Apr 14, 2014 at 9:19 AM, Radhe Radhe
   radhe.krishna.ra...@live.com wrote:
   Hello People,
  
   As per the Apache site
  
 http://hadoop.apache.org/docs/r2.3.0/hadoop-mapreduce-client/hadoop-mapreduce-client-core/MapReduce_Compatibility_Hadoop1_Hadoop2.html
  
   Binary Compatibility
   
   First, we ensure binary compatibility to the applications that use old
   mapred APIs. This means that applications which were built against
 MRv1
   mapred APIs can run directly on YARN without recompilation, merely by
   pointing them to an Apache Hadoop 2.x cluster via configuration.
  
   Source Compatibility
   
   We cannot ensure complete binary compatibility with the applications
 that
   use mapreduce APIs, as these APIs have evolved a lot since MRv1.
 However, we
   ensure source compatibility for mapreduce APIs that break binary
   compatibility. In other words, users should recompile their
 applications
   that use mapreduce APIs against MRv2 jars. One notable binary
   incompatibility break is Counter and CounterGroup.
  
   For Binary Compatibility I understand that if I had build a MR job
 with
   old *mapred* APIs then they can be run directly on YARN without and
 changes.
  
   Can anybody explain what do we mean by Source Compatibility here
 and also
   a use case where one will need it?
  
   Does that mean code changes if I already have a MR job source code
 written
   with with old *mapred* APIs and I need to make some changes to it to
 run in
   then I need to use the new mapreduce* API and generate the new
 binaries?
  
   Thanks,
   -RR
  
  




-- 
Zhijie Shen
Hortonworks Inc.
http://hortonworks.com/

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


RE: Doubt regarding Binary Compatibility\Source Compatibility with old *mapred* APIs and new *mapreduce* APIs in Hadoop

2014-04-15 Thread Radhe Radhe
Thanks Zhijie for the explanation.
Regarding #3 if I have ONLY the binaries i.e. jar file (compiled\build against 
old MRv1 mapred APIS) then how can I compile it since I don't have the source 
code i.e. Java files. All I can do with binaries i.e. jar file is execute it. 
-RR
Date: Tue, 15 Apr 2014 13:03:53 -0700
Subject: Re: Doubt regarding Binary Compatibility\Source Compatibility with old 
*mapred* APIs and new *mapreduce* APIs in Hadoop
From: zs...@hortonworks.com
To: user@hadoop.apache.org

1. If you have the binaries that were compiled against MRv1 mapred libs, it 
should just work with MRv2.2. If you have the source code that refers to MRv1 
mapred libs, it should be compilable without code changes. Of course, you're 
free to change your code.
3. If you have the binaries that were compiled against MRv1 mapreduce libs, it 
may not be executable directly with MRv2, but you should able to compile it 
against MRv2 mapreduce libs without code changes, and execute it.

- Zhijie

On Tue, Apr 15, 2014 at 12:44 PM, Radhe Radhe radhe.krishna.ra...@live.com 
wrote:




Thanks John for your comments,
I believe MRv2 has support for both the old *mapred* APIs and new *mapreduce* 
APIs.
I see this way:[1.]  One may have binaries i.e. jar file of the M\R program 
that used old *mapred* APIs
This will work directly on MRv2(YARN).
[2.]  One may have the source code i.e. Java Programs of the M\R program that 
used old *mapred* APIs
For this I need to recompile and generate the binaries i.e. jar file. Do I have 
to change the old *org.apache.hadoop.mapred* APIs to new 
*org.apache.hadoop.mapreduce* APIs? or No code changes are needed?

-RR
 Date: Mon, 14 Apr 2014 10:37:56 -0400
 Subject: Re: Doubt regarding Binary Compatibility\Source Compatibility with 
 old *mapred* APIs and new *mapreduce* APIs in Hadoop

 From: john.meag...@gmail.com
 To: user@hadoop.apache.org
 

 Also, Source Compatibility also means ONLY a recompile is needed.
 No code changes should be needed.
 
 On Mon, Apr 14, 2014 at 10:37 AM, John Meagher john.meag...@gmail.com wrote:

  Source Compatibility = you need to recompile and use the new version
  as part of the compilation
 
  Binary Compatibility = you can take something compiled against the old
  version and run it on the new version

 
  On Mon, Apr 14, 2014 at 9:19 AM, Radhe Radhe
  radhe.krishna.ra...@live.com wrote:
  Hello People,

 
  As per the Apache site
  http://hadoop.apache.org/docs/r2.3.0/hadoop-mapreduce-client/hadoop-mapreduce-client-core/MapReduce_Compatibility_Hadoop1_Hadoop2.html

 
  Binary Compatibility
  
  First, we ensure binary compatibility to the applications that use old
  mapred APIs. This means that applications which were built against MRv1

  mapred APIs can run directly on YARN without recompilation, merely by
  pointing them to an Apache Hadoop 2.x cluster via configuration.
 
  Source Compatibility

  
  We cannot ensure complete binary compatibility with the applications that
  use mapreduce APIs, as these APIs have evolved a lot since MRv1. However, 
  we

  ensure source compatibility for mapreduce APIs that break binary
  compatibility. In other words, users should recompile their applications
  that use mapreduce APIs against MRv2 jars. One notable binary

  incompatibility break is Counter and CounterGroup.
 
  For Binary Compatibility I understand that if I had build a MR job with
  old *mapred* APIs then they can be run directly on YARN without and 
  changes.

 
  Can anybody explain what do we mean by Source Compatibility here and also
  a use case where one will need it?
 
  Does that mean code changes if I already have a MR job source code written

  with with old *mapred* APIs and I need to make some changes to it to run in
  then I need to use the new mapreduce* API and generate the new  binaries?
 
  Thanks,

  -RR
 
 
  


-- 
Zhijie ShenHortonworks Inc.http://hortonworks.com/





CONFIDENTIALITY NOTICENOTICE: This message is intended for the use of the 
individual or entity to which it is addressed and may contain information that 
is confidential, privileged and exempt from disclosure under applicable law. If 
the reader of this message is not the intended recipient, you are hereby 
notified that any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have received 
this communication in error, please contact the sender immediately and delete 
it from your system. Thank You. 

Re: Doubt regarding Binary Compatibility\Source Compatibility with old *mapred* APIs and new *mapreduce* APIs in Hadoop

2014-04-15 Thread Zhijie Shen
bq. Regarding #3 if I have ONLY the binaries i.e. jar file (compiled\build
against old MRv1 mapred APIS)

Which APIs are you talking about, *mapred* or *mapreduce*? In #3, I was
saying about *mapreduce*. If this is the case, you may be in the trouble
unfortunately, because MRv2 has evolved so much in *mapreduce *APIs that
it's difficult to ensure binary compatibility. Anyway, you should still try
your luck, as your binaries may not use the incompatible APIs. On the other
hand, if you meant *mapred* APIs instead, you binaries should just work.

- Zhijie


On Tue, Apr 15, 2014 at 1:35 PM, Radhe Radhe
radhe.krishna.ra...@live.comwrote:

 Thanks Zhijie for the explanation.

 Regarding #3 if I have ONLY the binaries i.e. jar file (compiled\build
 against old MRv1 *mapred* APIS) then how can I compile it since I don't
 have the source code i.e. Java files. All I can do with binaries i.e. jar
 file is execute it.

 -RR
 --
 Date: Tue, 15 Apr 2014 13:03:53 -0700

 Subject: Re: Doubt regarding Binary Compatibility\Source Compatibility
 with old *mapred* APIs and new *mapreduce* APIs in Hadoop
 From: zs...@hortonworks.com
 To: user@hadoop.apache.org


 1. If you have the binaries that were compiled against MRv1 *mapred*libs, it 
 should just work with MRv2.
 2. If you have the source code that refers to MRv1 *mapred* libs, it
 should be compilable without code changes. Of course, you're free to change
 your code.
 3. If you have the binaries that were compiled against MRv1 *mapreduce* libs,
 it may not be executable directly with MRv2, but you should able to compile
 it against MRv2 *mapreduce* libs without code changes, and execute it.

 - Zhijie


 On Tue, Apr 15, 2014 at 12:44 PM, Radhe Radhe 
 radhe.krishna.ra...@live.com wrote:

 Thanks John for your comments,

 I believe MRv2 has support for both the old *mapred* APIs and new
 *mapreduce* APIs.

 I see this way:
 [1.]  One may have binaries i.e. jar file of the M\R program that used old
 *mapred* APIs
 This will work directly on MRv2(YARN).

 [2.]  One may have the source code i.e. Java Programs of the M\R program
 that used old *mapred* APIs
 For this I need to recompile and generate the binaries i.e. jar file.
 Do I have to change the old *org.apache.hadoop.mapred* APIs to new *
 org.apache.hadoop.mapreduce* APIs? or No code changes are needed?

 -RR

  Date: Mon, 14 Apr 2014 10:37:56 -0400
  Subject: Re: Doubt regarding Binary Compatibility\Source Compatibility
 with old *mapred* APIs and new *mapreduce* APIs in Hadoop
  From: john.meag...@gmail.com
  To: user@hadoop.apache.org

 
  Also, Source Compatibility also means ONLY a recompile is needed.
  No code changes should be needed.
 
  On Mon, Apr 14, 2014 at 10:37 AM, John Meagher john.meag...@gmail.com
 wrote:
   Source Compatibility = you need to recompile and use the new version
   as part of the compilation
  
   Binary Compatibility = you can take something compiled against the old
   version and run it on the new version
  
   On Mon, Apr 14, 2014 at 9:19 AM, Radhe Radhe
   radhe.krishna.ra...@live.com wrote:
   Hello People,
  
   As per the Apache site
  
 http://hadoop.apache.org/docs/r2.3.0/hadoop-mapreduce-client/hadoop-mapreduce-client-core/MapReduce_Compatibility_Hadoop1_Hadoop2.html
  
   Binary Compatibility
   
   First, we ensure binary compatibility to the applications that use old
   mapred APIs. This means that applications which were built against
 MRv1
   mapred APIs can run directly on YARN without recompilation, merely by
   pointing them to an Apache Hadoop 2.x cluster via configuration.
  
   Source Compatibility
   
   We cannot ensure complete binary compatibility with the applications
 that
   use mapreduce APIs, as these APIs have evolved a lot since MRv1.
 However, we
   ensure source compatibility for mapreduce APIs that break binary
   compatibility. In other words, users should recompile their
 applications
   that use mapreduce APIs against MRv2 jars. One notable binary
   incompatibility break is Counter and CounterGroup.
  
   For Binary Compatibility I understand that if I had build a MR job
 with
   old *mapred* APIs then they can be run directly on YARN without and
 changes.
  
   Can anybody explain what do we mean by Source Compatibility here
 and also
   a use case where one will need it?
  
   Does that mean code changes if I already have a MR job source code
 written
   with with old *mapred* APIs and I need to make some changes to it to
 run in
   then I need to use the new mapreduce* API and generate the new
 binaries?
  
   Thanks,
   -RR
  
  




 --
 Zhijie Shen
 Hortonworks Inc.
 http://hortonworks.com/

 CONFIDENTIALITY NOTICE
 NOTICE: This message is intended for the use of the individual or entity
 to which it is addressed and may contain information that is confidential,
 privileged and exempt from disclosure under applicable law. If the reader
 of this message is not the intended 

Re: About Could not find the main class: org.apache.hadoop.hdfs.server.namenode.NameNode

2014-04-15 Thread Anacristing
It's the same




-- Original --
From:  Shengjun Xin;s...@gopivotal.com;
Date:  Tue, Apr 15, 2014 04:43 PM
To:  useruser@hadoop.apache.org; 

Subject:  Re: About Could not find the main class: 
org.apache.hadoop.hdfs.server.namenode.NameNode



try to use bin/hadoop classpath to check whether the classpath is what you set



On Tue, Apr 15, 2014 at 4:16 PM, Anacristing 99403...@qq.com wrote:
 Hi,


I'm trying to setup Hadoop(version 2.2.0) on Windows(32-bit) with 
cygwin(version 1.7.5).
 I export JAVA_HOME=/cygdrive/c/Java/jdk1.7.0_51 in hadoop-env.sh 
and the classpath is
/home/Administrator/hadoop-2.2.0/etc/hadoop:
/home/Administrator/hadoop-2.2.0/share/hadoop/common/lib/*:
 /home/Administrator/hadoop-2.2.0/share/hadoop/common/*:
/home/Administrator/hadoop-2.2.0/share/hadoop/hdfs:
/home/Administrator/hadoop-2.2.0/share/hadoop/hdfs/lib/*:
/home/Administrator/hadoop-2.2.0/share/hadoop/hdfs/*:
 /home/Administrator/hadoop-2.2.0/share/hadoop/yarn/lib/*:
/home/Administrator/hadoop-2.2.0/share/hadoop/yarn/*:
/home/Administrator/hadoop-2.2.0/share/hadoop/mapreduce/lib/*:
/home/Administrator/hadoop-2.2.0/share/hadoop/mapreduce/*:
 /contrib/capacity-scheduler/*.jar



when I execute bin/hdfs namenode -format, I get Could not find the main 
class: org.apache.hadoop.hdfs.server.namenode.NameNode
 

Anybody know why?


Thanks!






-- 
Regards 

Shengjun

Re: Compiling from Source

2014-04-15 Thread Shengjun Xin
I think you can  use the command 'mvn package -Pnative,dist -DskipTests' in
source code root directory to build the binaries


On Wed, Apr 16, 2014 at 2:31 AM, Justin Mrkva m...@justinmrkva.com wrote:

 I'm using the guide at
 http://hadoop.apache.org/docs/r2.3.0/hadoop-project-dist/hadoop-common/SingleCluster.htmlto
  try to compile the native Hadoop libraries because I'm running a 64 bit
 OS and it keeps complaining that the native libraries can't be found.

 After running the third command (mvn clean install assembly:assembly
 -Pnative) I get the output shown in the gist at
 https://gist.github.com/anonymous/dd8e1833d09b48bdb813

 I'm installing Hadoop 2.4.0 on CentOS 6.5 64-bit. The operating system is
 a clean install and is running nothing but Hadoop.

 Where should I go from here? There are so many packages used in Hadoop
 that I have no idea where to begin, and Maven gives no indication of what
 the actual error is.




-- 
Regards
Shengjun


Re: Offline image viewer - account for edits ?

2014-04-15 Thread Akira AJISAKA

Yes, I think you are right.

(2014/04/16 1:20), Manoj Samel wrote:

So, is it correct to say that if one wants to get the latest state of
the Name node, the information from imageviewer and from edits viewer
has to be combined somehow ?

Thanks,


On Tue, Apr 15, 2014 at 7:26 AM, Akira AJISAKA
ajisa...@oss.nttdata.co.jp mailto:ajisa...@oss.nttdata.co.jp wrote:

If you want to parse the edits, please use the Offline Edits Viewer.

http://hadoop.apache.org/docs/__r2.4.0/hadoop-project-dist/__hadoop-hdfs/HdfsEditsViewer.__html

http://hadoop.apache.org/docs/r2.4.0/hadoop-project-dist/hadoop-hdfs/HdfsEditsViewer.html

Thanks,
Akira


(2014/04/15 16:41), Mingjiang Shi wrote:

I think you are right because the the offline image viewer only
takes
the fsimage file as input.


On Tue, Apr 15, 2014 at 9:29 AM, Manoj Samel
manojsamelt...@gmail.com mailto:manojsamelt...@gmail.com
mailto:manojsameltech@gmail.__com
mailto:manojsamelt...@gmail.com wrote:

 Hi,

 Is it correct to say that the offline image viewer does not
accounts
 for any edits that are not yet merged into the fsimage?

 Thanks,





--
Cheers
-MJ







Re: Re: Hadoop NoClassDefFoundError

2014-04-15 Thread Stanley Shi
can do you an unzip -l myjob.jar to see if your jar file has the correct
hierarchy?

Regards,
*Stanley Shi,*



On Tue, Apr 15, 2014 at 6:53 PM, laozh...@sina.cn laozh...@sina.cn wrote:

 Thank you for your advice . When i user your command , i get the below
 error info .
 $ hadoop jar myjob.jar myjob.MyJob input output
 Exception in thread main java.lang.ClassNotFoundException: myjob.MyJob
 at java.net.URLClassLoader$1.run(URLClassLoader.java:366)

 at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
 at java.security.AccessController.doPrivileged(Native Method)
 at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:423)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:356)
 at java.lang.Class.forName0(Native Method)
 at java.lang.Class.forName(Class.java:264)
 at org.apache.hadoop.util.RunJar.main(RunJar.java:149)

 --


 *From:* Azuryy Yu azury...@gmail.com
 *Date:* 2014-04-15 16:14
 *To:* user@hadoop.apache.org
 *Subject:* Re: Hadoop NoClassDefFoundError
 Please use: hadoop jar myjob.jar myjob.MyJob input output


 On Tue, Apr 15, 2014 at 3:06 PM, laozh...@sina.cn laozh...@sina.cnwrote:

 Hello EveryOne:
 I am new to hadoop,and i am reading Hadoop in action.
 When i tried to run a demo from this book,I got a problem and could not
 find answer from the net. Can you help me on this ?

 below is the error info :

   $ hadoop jar myjob.jar MyJob input output
 Exception in thread main java.lang.NoClassDefFoundError: MyJob (wrong
 name: myjob/MyJob)
 at java.lang.ClassLoader.defineClass1(Native Method)
 at java.lang.ClassLoader.defineClass(ClassLoader.java:791)
 at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
 at java.net.URLClassLoader.defineClass(URLClassLoader.java:449)
 at java.net.URLClassLoader.access$100(URLClassLoader.java:71)
 at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
 at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
 at java.security.AccessController.doPrivileged(Native Method)
 at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:423)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:356)
 at java.lang.Class.forName0(Native Method)
 at java.lang.Class.forName(Class.java:264)
 at org.apache.hadoop.util.RunJar.main(RunJar.java:149)

 and this is the command that i compile the .java , I compiled in Win7 and
 ran on ubuntu .


 below is MyJob.java

 package myjob;

 import java.io.IOException;
 import java.util.Iterator;

 import org.apache.hadoop.conf.Configuration;
 import org.apache.hadoop.conf.Configured;
 import org.apache.hadoop.fs.Path;
 import org.apache.hadoop.io.Text;
 import org.apache.hadoop.mapred.FileInputFormat;
 import org.apache.hadoop.mapred.FileOutputFormat;
 import org.apache.hadoop.mapred.JobClient;
 import org.apache.hadoop.mapred.JobConf;
 import org.apache.hadoop.mapred.KeyValueTextInputFormat;
 import org.apache.hadoop.mapred.MapReduceBase;
 import org.apache.hadoop.mapred.Mapper;
 import org.apache.hadoop.mapred.OutputCollector;
 import org.apache.hadoop.mapred.Reducer;
 import org.apache.hadoop.mapred.Reporter;
 import org.apache.hadoop.mapred.TextOutputFormat;
 import org.apache.hadoop.util.Tool;
 import org.apache.hadoop.util.ToolRunner;

 public class MyJob extends Configured implements Tool{

 @Override
 public int run(String[] args) throws Exception {
 Configuration conf = getConf();
 JobConf job = new JobConf(conf,MyJob.class);
 Path in = new Path(args[0]);
 Path out = new Path(args[1]);
 FileInputFormat.setInputPaths(job, in);
 FileOutputFormat.setOutputPath(job, out);
 job.setJobName(MyJob);
 job.setJarByClass(MyJob.class);
 job.setMapperClass(MapClass.class);
 job.setReducerClass(Reduce.class);

 job.setInputFormat(KeyValueTextInputFormat.class);
 job.setOutputFormat(TextOutputFormat.class);
 job.setOutputKeyClass(Text.class);
 job.setOutputValueClass(Text.class);
 job.set(key.value.separator.in.input.line,,);
 JobClient.runJob(job);
 return 0;
 }

 public static class MapClass extends MapReduceBase implements
 MapperText,Text,Text,Text{

 @Override
 public void map(Text key, Text value, OutputCollectorText, Text output,
 Reporter reporter) throws IOException {
 output.collect(value, key);
 }
 }

 public static class Reduce extends MapReduceBase implements
 ReducerText,Text,Text,Text{

 @Override
 public void reduce(Text key, IteratorText values,
 OutputCollectorText, Text output, Reporter reporter)
 throws IOException {
 String csv = ;
 while(values.hasNext()){
 if(csv.length()  0)
 csv += ,;
 csv += values.next().toString();
 }
 }
 }

 public static void main(String[] args) throws Exception {
 int res = ToolRunner.run(new Configuration(), new MyJob(), args);
 System.exit(res);
 }
 }
 --
 Thank you for your kindly help !



inline: 2014-04-15_15013(04-15-18-51-38).png

Re: Setting debug log level for individual daemons

2014-04-15 Thread Stanley Shi
Is this what you are looking for?
http://hadoop.apache.org/docs/r2.3.0/hadoop-project-dist/hadoop-common/CommandsManual.html#daemonlog

Regards,
*Stanley Shi,*



On Wed, Apr 16, 2014 at 2:06 AM, Ashwin Shankar
ashwinshanka...@gmail.comwrote:

 Thanks Gordon and Stanley, but this would require us to bounce the process.
 Is there a way to change log levels without bouncing the process ?



 On Tue, Apr 15, 2014 at 3:23 AM, Gordon Wang gw...@gopivotal.com wrote:

 Put the following line in the log4j setting file.

 log4j.logger.org.apache.hadoop.yarn.server.resourcemanager=DEBUG,console


 On Tue, Apr 15, 2014 at 8:33 AM, Ashwin Shankar 
 ashwinshanka...@gmail.com wrote:

 Hi,
 How do we set log level to debug for lets say only Resource manager
 and not the other hadoop daemons ?

 --
 Thanks,
 Ashwin





 --
 Regards
 Gordon Wang




 --
 Thanks,
 Ashwin





Re: Re: java.lang.OutOfMemoryError related with number of reducer?

2014-04-15 Thread leiwang...@gmail.com
Hi German  Thomas,

Seems i found the data that causes the error, but i still don't know the 
exactly reason.

I just do a group with pig latin: 

domain_device_group = GROUP data_filter BY (custid, domain, level, device); 
domain_device = FOREACH domain_device_group { 
distinct_ip = DISTINCT data_filter.ip; 
distinct_userid = DISTINCT data_filter.userid; 
GENERATE group.custid, group.domain, group.level, group.device, 
COUNT_STAR(data_filter), COUNT_STAR(distinct_ip), COUNT_STAR(distinct_userid); 
} 
STORE domain_device INTO '$outputdir/$batchdate/data/domain_device' USING 
PigStorage('\t');

The group key (custid, domain, level, device)  is significantly skewed,  about 
42% (58,621,533 / 138,455,355) of the records are the same key, and only the 
reducer which handle this key failed.
But from 
https://www.inkling.com/read/hadoop-definitive-guide-tom-white-3rd/chapter-6/shuffle-and-sort
 ,  I still have no idea why it cause an OOM.  It doesn't tell how skewed key 
will be handled, neither how different keys in same reducer will be merged. 


leiwang...@gmail.com
 
From: leiwang...@gmail.com
Date: 2014-04-15 23:35
To: user; th; german.fl
Subject: Re: RE: memoryjava.lang.OutOfMemoryError related with number of 
reducer?
Thanks, let me take a careful look at it. 



leiwang...@gmail.com
 
From: German Florez-Larrahondo
Date: 2014-04-15 23:27
To: user; 'th'
Subject: RE: Re: memoryjava.lang.OutOfMemoryError related with number of 
reducer?
Lei
A good explanation of this can be found on the Hadoop The Definitive Guide by 
Tom White. 
Here is an excerpt that explains a bit the behavior at the reduce side and some 
possible tweaks to control it. 
 
https://www.inkling.com/read/hadoop-definitive-guide-tom-white-3rd/chapter-6/shuffle-and-sort
 
 
 
From: leiwang...@gmail.com [mailto:leiwang...@gmail.com] 
Sent: Tuesday, April 15, 2014 9:29 AM
To: user; th
Subject: Re: Re: memoryjava.lang.OutOfMemoryError related with number of 
reducer?
 
Thanks Thomas. 
 
Anohter question.  I have no idea what is Failed to merge in memory.  Does 
the 'merge' is the shuffle phase in reducer side?  Why it is in memory?
Except the two methods(increase reducer number and increase heap size),  is 
there any other alternatives to fix this issue? 
 
Thanks a lot.
 
 


leiwang...@gmail.com
 
From: Thomas Bentsen
Date: 2014-04-15 21:53
To: user
Subject: Re: memoryjava.lang.OutOfMemoryError related with number of reducer?
When you increase the number of reducers they each have less to work
with provided the data is distributed evenly between them - in this case
about one third of the original work.
It is eessentially the same thing as increasing the heap size - it's
just distributed between more reducers.
 
/th
 
 
 
On Tue, 2014-04-15 at 20:41 +0800, leiwang...@gmail.com wrote:
 I can fix this by changing heap size.
 But what confuse me is that when i change the reducer number from 24
 to 84, there's no this error.
 
 
 Any insight on this?
 
 
 Thanks
 Lei
 Failed to merge in memoryjava.lang.OutOfMemoryError: Java heap space
 at java.util.Arrays.copyOf(Arrays.java:2786)
 at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:94)
 at java.io.DataOutputStream.write(DataOutputStream.java:90)
 at java.io.DataOutputStream.writeUTF(DataOutputStream.java:384)
 at java.io.DataOutputStream.writeUTF(DataOutputStream.java:306)
 at org.apache.pig.data.utils.SedesHelper.writeChararray(SedesHelper.java:66)
 at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:543)
 at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:435)
 at 
 org.apache.pig.data.utils.SedesHelper.writeGenericTuple(SedesHelper.java:135)
 at org.apache.pig.data.BinInterSedes.writeTuple(BinInterSedes.java:613)
 at org.apache.pig.data.BinInterSedes.writeBag(BinInterSedes.java:604)
 at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:447)
 at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:435)
 at 
 org.apache.pig.data.utils.SedesHelper.writeGenericTuple(SedesHelper.java:135)
 at org.apache.pig.data.BinInterSedes.writeTuple(BinInterSedes.java:613)
 at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:443)
 at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:435)
 at 
 org.apache.pig.data.utils.SedesHelper.writeGenericTuple(SedesHelper.java:135)
 at org.apache.pig.data.BinInterSedes.writeTuple(BinInterSedes.java:613)
 at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:443)
 at org.apache.pig.data.BinSedesTuple.write(BinSedesTuple.java:41)
 at 
 org.apache.pig.impl.io.PigNullableWritable.write(PigNullableWritable.java:123)
 at 
 org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:100)
 at 
 org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:84)
 at org.apache.hadoop.mapred.IFile$Writer.append(IFile.java:188)
 at