A general doubt about what offers the Hadoop project.

2016-09-26 Thread Manuel Enrique Puebla Martínez


Hello:

  I'm meeting Hadoop and I have the following question.

It is a general doubt about what offers the Hadoop project.

"The Apache Hadoop software library is a framework That Allows for the
distributed processing of large data sets across clusters of computers using
a simple programming models." ... Now ... Hadoop also puts the hardware ?,
i.e. if I want to processing that involves a high consumption of RAM. The
Hadoop project only provides the software, so then install it on my hardware
or it also provides capabilities hardware free manner.

Regards

The University of Informatics Sciences invites you to participate in the 
Scientific Conference UCIENCIA 2016, [extended] deadline: september 30th.
Conferencia Científica UCIENCIA 2016, plazo de envío de trabajos hasta el 30 
de septiembre.
http://uciencia.eventos.uci.cu/

Re: Doubt in DoubleWritable

2015-11-23 Thread unmesha sreeveni
Please try this
for (DoubleArrayWritable avalue : values) {
Writable[] value = avalue.get();
// DoubleWritable[] value = new DoubleWritable[6];
// for(int k=0;k<6;k++){
// value[k] = DoubleWritable(wvalue[k]);
// }
//parse accordingly
if (Double.parseDouble(value[1].toString()) != 0) {
total_records_Temp = total_records_Temp + 1;
sumvalueTemp = sumvalueTemp + Double.parseDouble(value[0].toString());
}
if (Double.parseDouble(value[3].toString()) != 0) {
total_records_Dewpoint = total_records_Dewpoint + 1;
sumvalueDewpoint = sumvalueDewpoint +
Double.parseDouble(value[2].toString());
}
if (Double.parseDouble(value[5].toString()) != 0) {
total_records_Windspeed = total_records_Windspeed + 1;
sumvalueWindspeed = sumvalueWindspeed +
Double.parseDouble(value[4].toString());
}
}
​Attaching the code​


-- 
*Thanks & Regards *


*Unmesha Sreeveni U.B*
*Hadoop, Bigdata Developer*
*Centre for Cyber Security | Amrita Vishwa Vidyapeetham*
http://www.unmeshasreeveni.blogspot.in/
//cc MaxTemperature Application to find the maximum temperature in the weather dataset
//vv MaxTemperature
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.ArrayWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.conf.Configuration;

public class MapReduce {

	public static void main(String[] args) throws Exception {
		if (args.length != 2) {
			System.err
	.println("Usage: MaxTemperature  ");
			System.exit(-1);
		}

		/*
		 * Job job = new Job(); job.setJarByClass(MaxTemperature.class);
		 * job.setJobName("Max temperature");
		 */

		Configuration conf = new Configuration();
		FileSystem fs = FileSystem.get(conf);
		Job job = Job.getInstance(conf, "AverageTempValues");

		/*
		 * Deleting output directory for reuseing same dir
		 */
		Path dest = new Path(args[1]);
		if(fs.exists(dest)){
			fs.delete(dest, true);
		}
		FileInputFormat.addInputPath(job, new Path(args[0]));
		FileOutputFormat.setOutputPath(job, new Path(args[1]));

		job.setNumReduceTasks(2);

		job.setMapperClass(NewMapper.class);
		job.setReducerClass(NewReducer.class);

		
		job.setOutputKeyClass(Text.class);
		job.setOutputValueClass(DoubleArrayWritable.class);

		System.exit(job.waitForCompletion(true) ? 0 : 1);
	}
}
// ^^ MaxTemperature
// cc MaxTemperatureMapper Mapper for maximum temperature example
// vv MaxTemperatureMapper
import java.io.IOException;

import org.apache.hadoop.io.DoubleWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;

public class NewMapper extends
		Mapper {

	public void map(LongWritable key, Text value, Context context)
			throws IOException, InterruptedException {

		String Str = value.toString();
		String[] Mylist = new String[1000];
		int i = 0;

		for (String retval : Str.split("\\s+")) {
			System.out.println(retval);
			Mylist[i++] = retval;

		}
		String Val = Mylist[2];
		String Year = Val.substring(0, 4);
		String Month = Val.substring(5, 6);
		String[] Section = Val.split("_");

		String section_string = "0";
		if (Section[1].matches("^(0|1|2|3|4|5)$")) {
			section_string = "4";
		} else if (Section[1].matches("^(6|7|8|9|10|11)$")) {
			section_string = "1";
		} else if (Section[1].matches("^(12|13|14|15|16|17)$")) {
			section_string = "2";
		} else if (Section[1].matches("^(18|19|20|21|22|23)$")) {
			section_string = "3";
		}

		DoubleWritable[] array = new DoubleWritable[6];
		DoubleArrayWritable output = new DoubleArrayWritable();
		array[0].set(Double.parseDouble(Mylist[3]));
		array[2].set(Double.parseDouble(Mylist[4]));
		array[4].set(Double.parseDouble(Mylist[12]));
		for (int j = 0; j < 6; j = j + 2) {
			if (999.9 == array[j].get()) {
array[j + 1].set(0);
			} else {
array[j + 1].set(1);
			}
		}
		output.set(array);
		context.write(new Text(Year + section_string + Month), output);
	}
}
//cc MaxTemperatureReducer Reducer for maximum temperature example
//vv MaxTemperatureReducer
import java.io.IOException;

import org.apache.hadoop.io.DoubleWritable;
import org.apache.hadoop.io.Writable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;

public class NewReducer extends
		Reducer {

	@Override
	public void reduce(Text key, Iterable values,
			Context context) throws IOException, InterruptedException {
		double sumvalueTemp = 0;
		double sumvalueDewpoint = 0;
		double sumvalueWindspeed = 0;
		double total_records_Temp = 0;
		double total_records_Dewpoint = 0;
		double total_records_Windspeed = 0;
		double average_Temp = Integer.MIN_VALUE;
		double average_Dewpoint = Integer.MIN_VALUE;
		double average_Windspeed = Integer.MIN_VALUE;
		DoubleWritable[] temp = new DoubleWritable[3];
		DoubleArrayWritable output = new DoubleArrayWritable();
		for (DoubleArrayWritable avalue : values)

Re: Doubt Regarding QJM protocol - example 2.10.6 of Quorum-Journal Design document

2014-09-28 Thread Ulul

Hi

A developer should answer that but a quick look to an edit file with od 
suggests that record are not fixed length. So maybe the likeliness of 
the situation you suggest is so low that there is no need to check more 
than file size


Ulul

Le 28/09/2014 11:17, Giridhar Addepalli a écrit :

Hi All,

I am going through Quorum Journal Design document.

It is mentioned in Section 2.8 - In Accept Recovery RPC section
"
If the current on-disk log is missing, or a /different length /than 
the proposed recovery, the JN downloads the log from the provided URI, 
replacing any current copy of the log segment.

"

I can see it that the code follows above design

Source :: Journal.java
 

  public synchronized void acceptRecovery(RequestInfo reqInfo,
  SegmentStateProto segment, URL fromUrl)
  throws IOException {

  
  if (currentSegment == null ||
currentSegment.getEndTxId() != segment.getEndTxId()) {
  
  } else {
  LOG.info("Skipping download of log " +
  TextFormat.shortDebugString(segment) +
  ": already have up-to-date logs");
  }
  
  }


My question is what if on-disk log is present and is of /same length 
/as the proposed recovery


If JournalNode is skipping download because the logs are of same 
length, then we could end up in a situation where finalized log 
segments contain different data !


This could happen if we follow example 2.10.6

As per that example we write transactions (151-153 ) on JN1
then when recovery proceeded with only JN2 & JN3 let us assume that we 
write again /different transactions/ as (151-153) . Then after the 
crash when we run recovery , JN1 will skip downloading correct segment 
from JN2/JN3 as it thinks it has correct segment( as per the code 
pasted above). This will result in a situation where finalized segment 
( edits_151-153 ) on JN1 is different from finalized segment 
edits_151-153 on JN2/JN3.


Please let me know if i have gone wrong some where, and this situation 
is taken care of.


Thanks,
Giridhar.




Doubt Regarding QJM protocol - example 2.10.6 of Quorum-Journal Design document

2014-09-28 Thread Giridhar Addepalli
Hi All,

I am going through Quorum Journal Design document.

It is mentioned in Section 2.8 - In Accept Recovery RPC section
"
If the current on-disk log is missing, or a *different length *than the
proposed recovery, the JN downloads the log from the provided URI,
replacing any current copy of the log segment.
"

I can see it that the code follows above design

Source :: Journal.java
 

  public synchronized void acceptRecovery(RequestInfo reqInfo,
  SegmentStateProto segment, URL fromUrl)
  throws IOException {

  
  if (currentSegment == null ||
currentSegment.getEndTxId() != segment.getEndTxId()) {
  
  } else {
  LOG.info("Skipping download of log " +
  TextFormat.shortDebugString(segment) +
  ": already have up-to-date logs");
  }
  
  }


My question is what if on-disk log is present and is of *same length *as
the proposed recovery

If JournalNode is skipping download because the logs are of same length,
then we could end up in a situation where finalized log segments contain
different data !

This could happen if we follow example 2.10.6

As per that example we write transactions (151-153 ) on JN1
then when recovery proceeded with only JN2 & JN3 let us assume that we
write again *different transactions* as (151-153) . Then after the crash
when we run recovery , JN1 will skip downloading correct segment from
JN2/JN3 as it thinks it has correct segment( as per the code pasted above).
This will result in a situation where finalized segment ( edits_151-153 )
on JN1 is different from finalized segment edits_151-153 on JN2/JN3.

Please let me know if i have gone wrong some where, and this situation is
taken care of.

Thanks,
Giridhar.


Re: Doubt regarding Binary Compatibility\Source Compatibility with old *mapred* APIs and new *mapreduce* APIs in Hadoop

2014-04-15 Thread Zhijie Shen
bq. Regarding #3 if I have ONLY the binaries i.e. jar file (compiled\build
against old MRv1 mapred APIS)

Which APIs are you talking about, *mapred* or *mapreduce*? In #3, I was
saying about *mapreduce*. If this is the case, you may be in the trouble
unfortunately, because MRv2 has evolved so much in *mapreduce *APIs that
it's difficult to ensure binary compatibility. Anyway, you should still try
your luck, as your binaries may not use the incompatible APIs. On the other
hand, if you meant *mapred* APIs instead, you binaries should just work.

- Zhijie


On Tue, Apr 15, 2014 at 1:35 PM, Radhe Radhe
wrote:

> Thanks Zhijie for the explanation.
>
> Regarding #3 if I have ONLY the binaries i.e. jar file (compiled\build
> against old MRv1 *mapred* APIS) then how can I compile it since I don't
> have the source code i.e. Java files. All I can do with binaries i.e. jar
> file is execute it.
>
> -RR
> --
> Date: Tue, 15 Apr 2014 13:03:53 -0700
>
> Subject: Re: Doubt regarding Binary Compatibility\Source Compatibility
> with old *mapred* APIs and new *mapreduce* APIs in Hadoop
> From: zs...@hortonworks.com
> To: user@hadoop.apache.org
>
>
> 1. If you have the binaries that were compiled against MRv1 *mapred*libs, it 
> should just work with MRv2.
> 2. If you have the source code that refers to MRv1 *mapred* libs, it
> should be compilable without code changes. Of course, you're free to change
> your code.
> 3. If you have the binaries that were compiled against MRv1 *mapreduce* libs,
> it may not be executable directly with MRv2, but you should able to compile
> it against MRv2 *mapreduce* libs without code changes, and execute it.
>
> - Zhijie
>
>
> On Tue, Apr 15, 2014 at 12:44 PM, Radhe Radhe <
> radhe.krishna.ra...@live.com> wrote:
>
> Thanks John for your comments,
>
> I believe MRv2 has support for both the old *mapred* APIs and new
> *mapreduce* APIs.
>
> I see this way:
> [1.]  One may have binaries i.e. jar file of the M\R program that used old
> *mapred* APIs
> This will work directly on MRv2(YARN).
>
> [2.]  One may have the source code i.e. Java Programs of the M\R program
> that used old *mapred* APIs
> For this I need to recompile and generate the binaries i.e. jar file.
> Do I have to change the old *org.apache.hadoop.mapred* APIs to new *
> org.apache.hadoop.mapreduce* APIs? or No code changes are needed?
>
> -RR
>
> > Date: Mon, 14 Apr 2014 10:37:56 -0400
> > Subject: Re: Doubt regarding Binary Compatibility\Source Compatibility
> with old *mapred* APIs and new *mapreduce* APIs in Hadoop
> > From: john.meag...@gmail.com
> > To: user@hadoop.apache.org
>
> >
> > Also, "Source Compatibility" also means ONLY a recompile is needed.
> > No code changes should be needed.
> >
> > On Mon, Apr 14, 2014 at 10:37 AM, John Meagher 
> wrote:
> > > Source Compatibility = you need to recompile and use the new version
> > > as part of the compilation
> > >
> > > Binary Compatibility = you can take something compiled against the old
> > > version and run it on the new version
> > >
> > > On Mon, Apr 14, 2014 at 9:19 AM, Radhe Radhe
> > >  wrote:
> > >> Hello People,
> > >>
> > >> As per the Apache site
> > >>
> http://hadoop.apache.org/docs/r2.3.0/hadoop-mapreduce-client/hadoop-mapreduce-client-core/MapReduce_Compatibility_Hadoop1_Hadoop2.html
> > >>
> > >> Binary Compatibility
> > >> 
> > >> First, we ensure binary compatibility to the applications that use old
> > >> mapred APIs. This means that applications which were built against
> MRv1
> > >> mapred APIs can run directly on YARN without recompilation, merely by
> > >> pointing them to an Apache Hadoop 2.x cluster via configuration.
> > >>
> > >> Source Compatibility
> > >> 
> > >> We cannot ensure complete binary compatibility with the applications
> that
> > >> use mapreduce APIs, as these APIs have evolved a lot since MRv1.
> However, we
> > >> ensure source compatibility for mapreduce APIs that break binary
> > >> compatibility. In other words, users should recompile their
> applications
> > >> that use mapreduce APIs against MRv2 jars. One notable binary
> > >> incompatibility break is Counter and CounterGroup.
> > >>
> > >> For "Binary Compatibility" I understand that if I had build a MR job
> with
> > >> old *mapred* APIs then they can be run directly on YARN without and
>

RE: Doubt regarding Binary Compatibility\Source Compatibility with old *mapred* APIs and new *mapreduce* APIs in Hadoop

2014-04-15 Thread Radhe Radhe
Thanks Zhijie for the explanation.
Regarding #3 if I have ONLY the binaries i.e. jar file (compiled\build against 
old MRv1 mapred APIS) then how can I compile it since I don't have the source 
code i.e. Java files. All I can do with binaries i.e. jar file is execute it. 
-RR
Date: Tue, 15 Apr 2014 13:03:53 -0700
Subject: Re: Doubt regarding Binary Compatibility\Source Compatibility with old 
*mapred* APIs and new *mapreduce* APIs in Hadoop
From: zs...@hortonworks.com
To: user@hadoop.apache.org

1. If you have the binaries that were compiled against MRv1 mapred libs, it 
should just work with MRv2.2. If you have the source code that refers to MRv1 
mapred libs, it should be compilable without code changes. Of course, you're 
free to change your code.
3. If you have the binaries that were compiled against MRv1 mapreduce libs, it 
may not be executable directly with MRv2, but you should able to compile it 
against MRv2 mapreduce libs without code changes, and execute it.

- Zhijie

On Tue, Apr 15, 2014 at 12:44 PM, Radhe Radhe  
wrote:




Thanks John for your comments,
I believe MRv2 has support for both the old *mapred* APIs and new *mapreduce* 
APIs.
I see this way:[1.]  One may have binaries i.e. jar file of the M\R program 
that used old *mapred* APIs
This will work directly on MRv2(YARN).
[2.]  One may have the source code i.e. Java Programs of the M\R program that 
used old *mapred* APIs
For this I need to recompile and generate the binaries i.e. jar file. Do I have 
to change the old *org.apache.hadoop.mapred* APIs to new 
*org.apache.hadoop.mapreduce* APIs? or No code changes are needed?

-RR
> Date: Mon, 14 Apr 2014 10:37:56 -0400
> Subject: Re: Doubt regarding Binary Compatibility\Source Compatibility with 
> old *mapred* APIs and new *mapreduce* APIs in Hadoop

> From: john.meag...@gmail.com
> To: user@hadoop.apache.org
> 

> Also, "Source Compatibility" also means ONLY a recompile is needed.
> No code changes should be needed.
> 
> On Mon, Apr 14, 2014 at 10:37 AM, John Meagher  wrote:

> > Source Compatibility = you need to recompile and use the new version
> > as part of the compilation
> >
> > Binary Compatibility = you can take something compiled against the old
> > version and run it on the new version

> >
> > On Mon, Apr 14, 2014 at 9:19 AM, Radhe Radhe
> >  wrote:
> >> Hello People,

> >>
> >> As per the Apache site
> >> http://hadoop.apache.org/docs/r2.3.0/hadoop-mapreduce-client/hadoop-mapreduce-client-core/MapReduce_Compatibility_Hadoop1_Hadoop2.html

> >>
> >> Binary Compatibility
> >> 
> >> First, we ensure binary compatibility to the applications that use old
> >> mapred APIs. This means that applications which were built against MRv1

> >> mapred APIs can run directly on YARN without recompilation, merely by
> >> pointing them to an Apache Hadoop 2.x cluster via configuration.
> >>
> >> Source Compatibility

> >> 
> >> We cannot ensure complete binary compatibility with the applications that
> >> use mapreduce APIs, as these APIs have evolved a lot since MRv1. However, 
> >> we

> >> ensure source compatibility for mapreduce APIs that break binary
> >> compatibility. In other words, users should recompile their applications
> >> that use mapreduce APIs against MRv2 jars. One notable binary

> >> incompatibility break is Counter and CounterGroup.
> >>
> >> For "Binary Compatibility" I understand that if I had build a MR job with
> >> old *mapred* APIs then they can be run directly on YARN without and 
> >> changes.

> >>
> >> Can anybody explain what do we mean by "Source Compatibility" here and also
> >> a use case where one will need it?
> >>
> >> Does that mean code changes if I already have a MR job source code written

> >> with with old *mapred* APIs and I need to make some changes to it to run in
> >> then I need to use the new "mapreduce* API and generate the new  binaries?
> >>
> >> Thanks,

> >> -RR
> >>
> >>
  


-- 
Zhijie ShenHortonworks Inc.http://hortonworks.com/





CONFIDENTIALITY NOTICENOTICE: This message is intended for the use of the 
individual or entity to which it is addressed and may contain information that 
is confidential, privileged and exempt from disclosure under applicable law. If 
the reader of this message is not the intended recipient, you are hereby 
notified that any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have received 
this communication in error, please contact the sender immediately and delete 
it from your system. Thank You. 

Re: Doubt regarding Binary Compatibility\Source Compatibility with old *mapred* APIs and new *mapreduce* APIs in Hadoop

2014-04-15 Thread Zhijie Shen
1. If you have the binaries that were compiled against MRv1 *mapred* libs,
it should just work with MRv2.
2. If you have the source code that refers to MRv1 *mapred* libs, it should
be compilable without code changes. Of course, you're free to change your
code.
3. If you have the binaries that were compiled against MRv1 *mapreduce* libs,
it may not be executable directly with MRv2, but you should able to compile
it against MRv2 *mapreduce* libs without code changes, and execute it.

- Zhijie


On Tue, Apr 15, 2014 at 12:44 PM, Radhe Radhe
wrote:

> Thanks John for your comments,
>
> I believe MRv2 has support for both the old *mapred* APIs and new
> *mapreduce* APIs.
>
> I see this way:
> [1.]  One may have binaries i.e. jar file of the M\R program that used old
> *mapred* APIs
> This will work directly on MRv2(YARN).
>
> [2.]  One may have the source code i.e. Java Programs of the M\R program
> that used old *mapred* APIs
> For this I need to recompile and generate the binaries i.e. jar file.
> Do I have to change the old *org.apache.hadoop.mapred* APIs to new *
> org.apache.hadoop.mapreduce* APIs? or No code changes are needed?
>
> -RR
>
> > Date: Mon, 14 Apr 2014 10:37:56 -0400
> > Subject: Re: Doubt regarding Binary Compatibility\Source Compatibility
> with old *mapred* APIs and new *mapreduce* APIs in Hadoop
> > From: john.meag...@gmail.com
> > To: user@hadoop.apache.org
>
> >
> > Also, "Source Compatibility" also means ONLY a recompile is needed.
> > No code changes should be needed.
> >
> > On Mon, Apr 14, 2014 at 10:37 AM, John Meagher 
> wrote:
> > > Source Compatibility = you need to recompile and use the new version
> > > as part of the compilation
> > >
> > > Binary Compatibility = you can take something compiled against the old
> > > version and run it on the new version
> > >
> > > On Mon, Apr 14, 2014 at 9:19 AM, Radhe Radhe
> > >  wrote:
> > >> Hello People,
> > >>
> > >> As per the Apache site
> > >>
> http://hadoop.apache.org/docs/r2.3.0/hadoop-mapreduce-client/hadoop-mapreduce-client-core/MapReduce_Compatibility_Hadoop1_Hadoop2.html
> > >>
> > >> Binary Compatibility
> > >> 
> > >> First, we ensure binary compatibility to the applications that use old
> > >> mapred APIs. This means that applications which were built against
> MRv1
> > >> mapred APIs can run directly on YARN without recompilation, merely by
> > >> pointing them to an Apache Hadoop 2.x cluster via configuration.
> > >>
> > >> Source Compatibility
> > >> 
> > >> We cannot ensure complete binary compatibility with the applications
> that
> > >> use mapreduce APIs, as these APIs have evolved a lot since MRv1.
> However, we
> > >> ensure source compatibility for mapreduce APIs that break binary
> > >> compatibility. In other words, users should recompile their
> applications
> > >> that use mapreduce APIs against MRv2 jars. One notable binary
> > >> incompatibility break is Counter and CounterGroup.
> > >>
> > >> For "Binary Compatibility" I understand that if I had build a MR job
> with
> > >> old *mapred* APIs then they can be run directly on YARN without and
> changes.
> > >>
> > >> Can anybody explain what do we mean by "Source Compatibility" here
> and also
> > >> a use case where one will need it?
> > >>
> > >> Does that mean code changes if I already have a MR job source code
> written
> > >> with with old *mapred* APIs and I need to make some changes to it to
> run in
> > >> then I need to use the new "mapreduce* API and generate the new
> binaries?
> > >>
> > >> Thanks,
> > >> -RR
> > >>
> > >>
>



-- 
Zhijie Shen
Hortonworks Inc.
http://hortonworks.com/

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


RE: Doubt regarding Binary Compatibility\Source Compatibility with old *mapred* APIs and new *mapreduce* APIs in Hadoop

2014-04-15 Thread Radhe Radhe
Thanks John for your comments,
I believe MRv2 has support for both the old *mapred* APIs and new *mapreduce* 
APIs.
I see this way:[1.]  One may have binaries i.e. jar file of the M\R program 
that used old *mapred* APIsThis will work directly on MRv2(YARN).
[2.]  One may have the source code i.e. Java Programs of the M\R program that 
used old *mapred* APIsFor this I need to recompile and generate the binaries 
i.e. jar file. Do I have to change the old *org.apache.hadoop.mapred* APIs to 
new *org.apache.hadoop.mapreduce* APIs? or No code changes are needed?
-RR
> Date: Mon, 14 Apr 2014 10:37:56 -0400
> Subject: Re: Doubt regarding Binary Compatibility\Source Compatibility with 
> old *mapred* APIs and new *mapreduce* APIs in Hadoop
> From: john.meag...@gmail.com
> To: user@hadoop.apache.org
> 
> Also, "Source Compatibility" also means ONLY a recompile is needed.
> No code changes should be needed.
> 
> On Mon, Apr 14, 2014 at 10:37 AM, John Meagher  wrote:
> > Source Compatibility = you need to recompile and use the new version
> > as part of the compilation
> >
> > Binary Compatibility = you can take something compiled against the old
> > version and run it on the new version
> >
> > On Mon, Apr 14, 2014 at 9:19 AM, Radhe Radhe
> >  wrote:
> >> Hello People,
> >>
> >> As per the Apache site
> >> http://hadoop.apache.org/docs/r2.3.0/hadoop-mapreduce-client/hadoop-mapreduce-client-core/MapReduce_Compatibility_Hadoop1_Hadoop2.html
> >>
> >> Binary Compatibility
> >> 
> >> First, we ensure binary compatibility to the applications that use old
> >> mapred APIs. This means that applications which were built against MRv1
> >> mapred APIs can run directly on YARN without recompilation, merely by
> >> pointing them to an Apache Hadoop 2.x cluster via configuration.
> >>
> >> Source Compatibility
> >> 
> >> We cannot ensure complete binary compatibility with the applications that
> >> use mapreduce APIs, as these APIs have evolved a lot since MRv1. However, 
> >> we
> >> ensure source compatibility for mapreduce APIs that break binary
> >> compatibility. In other words, users should recompile their applications
> >> that use mapreduce APIs against MRv2 jars. One notable binary
> >> incompatibility break is Counter and CounterGroup.
> >>
> >> For "Binary Compatibility" I understand that if I had build a MR job with
> >> old *mapred* APIs then they can be run directly on YARN without and 
> >> changes.
> >>
> >> Can anybody explain what do we mean by "Source Compatibility" here and also
> >> a use case where one will need it?
> >>
> >> Does that mean code changes if I already have a MR job source code written
> >> with with old *mapred* APIs and I need to make some changes to it to run in
> >> then I need to use the new "mapreduce* API and generate the new  binaries?
> >>
> >> Thanks,
> >> -RR
> >>
> >>
  

Re: Doubt regarding Binary Compatibility\Source Compatibility with old *mapred* APIs and new *mapreduce* APIs in Hadoop

2014-04-14 Thread John Meagher
Also, "Source Compatibility" also means ONLY a recompile is needed.
No code changes should be needed.

On Mon, Apr 14, 2014 at 10:37 AM, John Meagher  wrote:
> Source Compatibility = you need to recompile and use the new version
> as part of the compilation
>
> Binary Compatibility = you can take something compiled against the old
> version and run it on the new version
>
> On Mon, Apr 14, 2014 at 9:19 AM, Radhe Radhe
>  wrote:
>> Hello People,
>>
>> As per the Apache site
>> http://hadoop.apache.org/docs/r2.3.0/hadoop-mapreduce-client/hadoop-mapreduce-client-core/MapReduce_Compatibility_Hadoop1_Hadoop2.html
>>
>> Binary Compatibility
>> 
>> First, we ensure binary compatibility to the applications that use old
>> mapred APIs. This means that applications which were built against MRv1
>> mapred APIs can run directly on YARN without recompilation, merely by
>> pointing them to an Apache Hadoop 2.x cluster via configuration.
>>
>> Source Compatibility
>> 
>> We cannot ensure complete binary compatibility with the applications that
>> use mapreduce APIs, as these APIs have evolved a lot since MRv1. However, we
>> ensure source compatibility for mapreduce APIs that break binary
>> compatibility. In other words, users should recompile their applications
>> that use mapreduce APIs against MRv2 jars. One notable binary
>> incompatibility break is Counter and CounterGroup.
>>
>> For "Binary Compatibility" I understand that if I had build a MR job with
>> old *mapred* APIs then they can be run directly on YARN without and changes.
>>
>> Can anybody explain what do we mean by "Source Compatibility" here and also
>> a use case where one will need it?
>>
>> Does that mean code changes if I already have a MR job source code written
>> with with old *mapred* APIs and I need to make some changes to it to run in
>> then I need to use the new "mapreduce* API and generate the new  binaries?
>>
>> Thanks,
>> -RR
>>
>>


Re: Doubt regarding Binary Compatibility\Source Compatibility with old *mapred* APIs and new *mapreduce* APIs in Hadoop

2014-04-14 Thread John Meagher
Source Compatibility = you need to recompile and use the new version
as part of the compilation

Binary Compatibility = you can take something compiled against the old
version and run it on the new version

On Mon, Apr 14, 2014 at 9:19 AM, Radhe Radhe
 wrote:
> Hello People,
>
> As per the Apache site
> http://hadoop.apache.org/docs/r2.3.0/hadoop-mapreduce-client/hadoop-mapreduce-client-core/MapReduce_Compatibility_Hadoop1_Hadoop2.html
>
> Binary Compatibility
> 
> First, we ensure binary compatibility to the applications that use old
> mapred APIs. This means that applications which were built against MRv1
> mapred APIs can run directly on YARN without recompilation, merely by
> pointing them to an Apache Hadoop 2.x cluster via configuration.
>
> Source Compatibility
> 
> We cannot ensure complete binary compatibility with the applications that
> use mapreduce APIs, as these APIs have evolved a lot since MRv1. However, we
> ensure source compatibility for mapreduce APIs that break binary
> compatibility. In other words, users should recompile their applications
> that use mapreduce APIs against MRv2 jars. One notable binary
> incompatibility break is Counter and CounterGroup.
>
> For "Binary Compatibility" I understand that if I had build a MR job with
> old *mapred* APIs then they can be run directly on YARN without and changes.
>
> Can anybody explain what do we mean by "Source Compatibility" here and also
> a use case where one will need it?
>
> Does that mean code changes if I already have a MR job source code written
> with with old *mapred* APIs and I need to make some changes to it to run in
> then I need to use the new "mapreduce* API and generate the new  binaries?
>
> Thanks,
> -RR
>
>


Doubt regarding Binary Compatibility\Source Compatibility with old *mapred* APIs and new *mapreduce* APIs in Hadoop

2014-04-14 Thread Radhe Radhe
Hello People,
As per the Apache site 
http://hadoop.apache.org/docs/r2.3.0/hadoop-mapreduce-client/hadoop-mapreduce-client-core/MapReduce_Compatibility_Hadoop1_Hadoop2.html
  Binary CompatibilityFirst, we ensure binary compatibility 
to the applications that use old mapred APIs. This means that applications 
which were built against MRv1 mapred APIs can run directly on YARN without 
recompilation, merely by pointing them to an Apache Hadoop 2.x cluster via 
configuration.
Source CompatibilityWe cannot ensure complete binary 
compatibility with the applications that use mapreduce APIs, as these APIs have 
evolved a lot since MRv1. However, we ensure source compatibility for mapreduce 
APIs that break binary compatibility. In other words, users should recompile 
their applications that use mapreduce APIs against MRv2 jars. One notable 
binary incompatibility break is Counter and CounterGroup.
For "Binary Compatibility" I understand that if I had build a MR job with old 
*mapred* APIs then they can be run directly on YARN without and changes.
Can anybody explain what do we mean by "Source Compatibility" here and also a 
use case where one will need it?
Does that mean code changes if I already have a MR job source code written with 
with old *mapred* APIs and I need to make some changes to it to run in then I 
need to use the new "mapreduce* API and generate the new  binaries?
Thanks,-RR

  

Re: Doubt

2014-03-19 Thread praveenesh kumar
Why not ? Its just a matter of installing 2 different packages.
Depends on what do you want to use it for, you need to take care of few
things, but as far as installation is concerned, it should be easily doable.

Regards
Prav


On Wed, Mar 19, 2014 at 3:41 PM, sri harsha  wrote:

> Hi all,
> is it possible to install Mongodb on the same VM which consists hadoop?
>
> --
> amiable harsha
>


Re: Doubt

2014-03-19 Thread sri harsha
thank s jay and praveen,
i want to use both separately don't want to use mongodb in the place of
hbase


On Wed, Mar 19, 2014 at 9:25 PM, Jay Vyas  wrote:

> Certainly it is , and quite common especially if you have some high
> performance machines : they  can run as mapreduce slaves and also double as
> mongo hosts.  The problem would of course be that when running mapreduce
> jobs you might have very slow network bandwidth at times, and if your front
> end needs fast response times all the time from mongo instances you could
> be in trouble.
>
>
>
> On Wed, Mar 19, 2014 at 11:50 AM, praveenesh kumar 
> wrote:
>
>> Why not ? Its just a matter of installing 2 different packages.
>> Depends on what do you want to use it for, you need to take care of few
>> things, but as far as installation is concerned, it should be easily doable.
>>
>> Regards
>> Prav
>>
>>
>> On Wed, Mar 19, 2014 at 3:41 PM, sri harsha  wrote:
>>
>>> Hi all,
>>> is it possible to install Mongodb on the same VM which consists hadoop?
>>>
>>> --
>>> amiable harsha
>>>
>>
>>
>
>
> --
> Jay Vyas
> http://jayunit100.blogspot.com
>



-- 
amiable harsha


Re: Doubt

2014-03-19 Thread Jay Vyas
Certainly it is , and quite common especially if you have some high
performance machines : they  can run as mapreduce slaves and also double as
mongo hosts.  The problem would of course be that when running mapreduce
jobs you might have very slow network bandwidth at times, and if your front
end needs fast response times all the time from mongo instances you could
be in trouble.



On Wed, Mar 19, 2014 at 11:50 AM, praveenesh kumar wrote:

> Why not ? Its just a matter of installing 2 different packages.
> Depends on what do you want to use it for, you need to take care of few
> things, but as far as installation is concerned, it should be easily doable.
>
> Regards
> Prav
>
>
> On Wed, Mar 19, 2014 at 3:41 PM, sri harsha  wrote:
>
>> Hi all,
>> is it possible to install Mongodb on the same VM which consists hadoop?
>>
>> --
>> amiable harsha
>>
>
>


-- 
Jay Vyas
http://jayunit100.blogspot.com


Doubt

2014-03-19 Thread sri harsha
Hi all,
is it possible to install Mongodb on the same VM which consists hadoop?

-- 
amiable harsha


Re: doubt

2014-01-19 Thread Justin Black
I've installed a hadoop single node cluster on a VirtualBox machine running
ubuntu 12.04LTS (64-bit) with 512MB RAM and 8GB HD. I haven't seen any
errors in my testing yet. Is 1GB RAM required? Will I run into issues when
I expand the cluster?


On Sat, Jan 18, 2014 at 11:24 PM, Alexander Pivovarov
wrote:

> it' enough. hadoop uses only 1GB RAM by default.
>
>
> On Sat, Jan 18, 2014 at 10:11 PM, sri harsha  wrote:
>
>> Hi ,
>> i want to install 4 node cluster in 64-bit LINUX. 4GB RAM 500HD is enough
>> for this or shall i need to expand ?
>> please suggest about my query.
>>
>> than x
>>
>> --
>> amiable harsha
>>
>
>


-- 
-jblack


Re: doubt

2014-01-18 Thread Alexander Pivovarov
it' enough. hadoop uses only 1GB RAM by default.


On Sat, Jan 18, 2014 at 10:11 PM, sri harsha  wrote:

> Hi ,
> i want to install 4 node cluster in 64-bit LINUX. 4GB RAM 500HD is enough
> for this or shall i need to expand ?
> please suggest about my query.
>
> than x
>
> --
> amiable harsha
>


doubt

2014-01-18 Thread sri harsha
Hi ,
i want to install 4 node cluster in 64-bit LINUX. 4GB RAM 500HD is enough
for this or shall i need to expand ?
please suggest about my query.

than x

-- 
amiable harsha


Re: Basic Doubt in Hadoop

2013-04-17 Thread maisnam ns
@Bejoy
Adding a little bit here,the ouput of map task writes first to a memory
buffer, and when contents reaches a threshold a background thread will
write the contents to disk.

Niranjan Singh


On Wed, Apr 17, 2013 at 1:06 PM, Ramesh R Nair wrote:

> Hi Bejoy,
>
>Regarding the output of Map phase, does  Hadoop store it in local fs or
> in HDFS.
>I believe it is in the former. Correct me if I am wrong.
>
> Regards
> Ramesh
>
>
> On Wed, Apr 17, 2013 at 10:30 AM,  wrote:
>
>> The data is in HDFS in case of WordCount MR sample.
>>
>> In hdfs, you have the metadata in NameNode and actual data as blocks
>> replicated across DataNodes.
>>
>> In case of reducer, If a reducer is running on a particular node then you
>> have one replica of the blocks in the same node (If there is no space
>> issues) and rest replicas on other nodes.
>> Regards
>> Bejoy KS
>>
>> Sent from remote device, Please excuse typos
>> --
>> *From: * Raj Hadoop 
>> *Date: *Tue, 16 Apr 2013 21:49:34 -0700 (PDT)
>> *To: *user@hadoop.apache.org
>> *ReplyTo: * user@hadoop.apache.org
>> *Subject: *Basic Doubt in Hadoop
>>
>> Hi,
>>
>> I am new to Hadoop. I started reading the standard Wordcount program. I
>> got this basic doubt in Hadoop.
>>
>> After the Map - Reduce is done, where is the output generated?  Does the
>> reducer ouput sit on individual DataNodes ? Please advise.
>>
>>
>> Thanks,
>> Raj
>>
>
>


Re: Basic Doubt in Hadoop

2013-04-17 Thread bejoy . hadoop

You are correct, map outputs are stored in LFS not in HDFS.

Regards 
Bejoy KS

Sent from remote device, Please excuse typos

-Original Message-
From: Ramesh R Nair 
Date: Wed, 17 Apr 2013 13:06:32 
To: ; 
Subject: Re: Basic Doubt in Hadoop

Hi Bejoy,

   Regarding the output of Map phase, does  Hadoop store it in local fs or
in HDFS.
   I believe it is in the former. Correct me if I am wrong.

Regards
Ramesh


On Wed, Apr 17, 2013 at 10:30 AM,  wrote:

> The data is in HDFS in case of WordCount MR sample.
>
> In hdfs, you have the metadata in NameNode and actual data as blocks
> replicated across DataNodes.
>
> In case of reducer, If a reducer is running on a particular node then you
> have one replica of the blocks in the same node (If there is no space
> issues) and rest replicas on other nodes.
> Regards
> Bejoy KS
>
> Sent from remote device, Please excuse typos
> --
> *From: * Raj Hadoop 
> *Date: *Tue, 16 Apr 2013 21:49:34 -0700 (PDT)
> *To: *user@hadoop.apache.org
> *ReplyTo: * user@hadoop.apache.org
> *Subject: *Basic Doubt in Hadoop
>
> Hi,
>
> I am new to Hadoop. I started reading the standard Wordcount program. I
> got this basic doubt in Hadoop.
>
> After the Map - Reduce is done, where is the output generated?  Does the
> reducer ouput sit on individual DataNodes ? Please advise.
>
>
> Thanks,
> Raj
>



Re: Basic Doubt in Hadoop

2013-04-17 Thread Ramesh R Nair
Hi Bejoy,

   Regarding the output of Map phase, does  Hadoop store it in local fs or
in HDFS.
   I believe it is in the former. Correct me if I am wrong.

Regards
Ramesh


On Wed, Apr 17, 2013 at 10:30 AM,  wrote:

> The data is in HDFS in case of WordCount MR sample.
>
> In hdfs, you have the metadata in NameNode and actual data as blocks
> replicated across DataNodes.
>
> In case of reducer, If a reducer is running on a particular node then you
> have one replica of the blocks in the same node (If there is no space
> issues) and rest replicas on other nodes.
> Regards
> Bejoy KS
>
> Sent from remote device, Please excuse typos
> --
> *From: * Raj Hadoop 
> *Date: *Tue, 16 Apr 2013 21:49:34 -0700 (PDT)
> *To: *user@hadoop.apache.org
> *ReplyTo: * user@hadoop.apache.org
> *Subject: *Basic Doubt in Hadoop
>
> Hi,
>
> I am new to Hadoop. I started reading the standard Wordcount program. I
> got this basic doubt in Hadoop.
>
> After the Map - Reduce is done, where is the output generated?  Does the
> reducer ouput sit on individual DataNodes ? Please advise.
>
>
> Thanks,
> Raj
>


Re: Basic Doubt in Hadoop

2013-04-16 Thread bejoy . hadoop
The data is in HDFS in case of WordCount MR sample. 

In hdfs, you have the metadata in NameNode and actual data as blocks replicated 
across DataNodes.

In case of reducer, If a reducer is running on a particular node then you have 
one replica of the blocks in the same node (If there is no space issues) and 
rest replicas on other nodes.
Regards 
Bejoy KS

Sent from remote device, Please excuse typos

-Original Message-
From: Raj Hadoop 
Date: Tue, 16 Apr 2013 21:49:34 
To: user@hadoop.apache.org
Reply-To: user@hadoop.apache.org
Subject: Basic Doubt in Hadoop

Hi,

I am new to Hadoop. I started reading the standard Wordcount program. I got 
this basic doubt in Hadoop.

After the Map - Reduce is done, where is the output generated?  Does the 
reducer ouput sit on individual DataNodes ? Please advise.



Thanks,
Raj



Basic Doubt in Hadoop

2013-04-16 Thread Raj Hadoop
Hi,

I am new to Hadoop. I started reading the standard Wordcount program. I got 
this basic doubt in Hadoop.

After the Map - Reduce is done, where is the output generated?  Does the 
reducer ouput sit on individual DataNodes ? Please advise.



Thanks,
Raj


Re: fundamental doubt

2012-11-21 Thread jamal sasha
got it.
thanks for clarification


On Wed, Nov 21, 2012 at 3:03 PM, Bejoy KS  wrote:

> **
> Hi Jamal
>
> It is performed at a frame work level map emits key value pairs and the
> framework collects and groups all the values corresponding to a key from
> all the map tasks. Now the reducer takes the input as a key and a
> collection of values only. The reduce method signature defines it.
>
> Regards
> Bejoy KS
>
> Sent from handheld, please excuse typos.
> --
> *From: * jamal sasha 
> *Date: *Wed, 21 Nov 2012 14:50:51 -0500
> *To: *user@hadoop.apache.org
> *ReplyTo: * user@hadoop.apache.org
> *Subject: *fundamental doubt
>
> Hi..
> I guess i am asking alot of fundamental questions but i thank you guys for
> taking out time to explain my doubts.
> So i am able to write map reduce jobs but here is my mydoubt
> As of now i am writing mappers which emit key and a value
> This key value is then captured at reducer end and then i process the key
> and value there.
> Let's say i want to calculate the average...
> Key1 value1
> Key2 value 2
> Key 1 value 3
>
> So the output is something like
> Key1 average of value  1 and value 3
> Key2 average 2 = value 2
>
> Right now in reducer i have to create a dictionary with key as original
> keys and value is a list.
> Data = defaultdict(list) == // python usrr
> But i thought that
> Mapper takes in the key value pairs and outputs key: ( v1,v2)and
> Reducer takes in this key and list of values and returns
> Key , new value..
>
> So why is the input of reducer the simple output of mapper and not the
> list of all the values to a particular key or did i  understood something.
> Am i making any sense ??
>


Re: fundamental doubt

2012-11-21 Thread Bejoy KS
Hi Jamal

It is performed at a frame work level map emits key value pairs and the 
framework collects and groups all the values corresponding to a key from all 
the map tasks. Now the reducer takes the input as a key and a collection of 
values only. The reduce method signature defines it.


Regards
Bejoy KS

Sent from handheld, please excuse typos.

-Original Message-
From: jamal sasha 
Date: Wed, 21 Nov 2012 14:50:51 
To: user@hadoop.apache.org
Reply-To: user@hadoop.apache.org
Subject: fundamental doubt

Hi..
I guess i am asking alot of fundamental questions but i thank you guys for
taking out time to explain my doubts.
So i am able to write map reduce jobs but here is my mydoubt
As of now i am writing mappers which emit key and a value
This key value is then captured at reducer end and then i process the key
and value there.
Let's say i want to calculate the average...
Key1 value1
Key2 value 2
Key 1 value 3

So the output is something like
Key1 average of value  1 and value 3
Key2 average 2 = value 2

Right now in reducer i have to create a dictionary with key as original
keys and value is a list.
Data = defaultdict(list) == // python usrr
But i thought that
Mapper takes in the key value pairs and outputs key: ( v1,v2)and
Reducer takes in this key and list of values and returns
Key , new value..

So why is the input of reducer the simple output of mapper and not the list
of all the values to a particular key or did i  understood something.
Am i making any sense ??



Re: fundamental doubt

2012-11-21 Thread Mohammad Tariq
Hello Jamal,

 For efficient processing all the values associated with the same key
get sorted and go to same reducer. As a result the reducer gets a key and a
list of values as its input. To me your assumption seems correct.

Regards,
Mohammad Tariq



On Thu, Nov 22, 2012 at 1:20 AM, jamal sasha  wrote:

> Hi..
> I guess i am asking alot of fundamental questions but i thank you guys for
> taking out time to explain my doubts.
> So i am able to write map reduce jobs but here is my mydoubt
> As of now i am writing mappers which emit key and a value
> This key value is then captured at reducer end and then i process the key
> and value there.
> Let's say i want to calculate the average...
> Key1 value1
> Key2 value 2
> Key 1 value 3
>
> So the output is something like
> Key1 average of value  1 and value 3
> Key2 average 2 = value 2
>
> Right now in reducer i have to create a dictionary with key as original
> keys and value is a list.
> Data = defaultdict(list) == // python usrr
> But i thought that
> Mapper takes in the key value pairs and outputs key: ( v1,v2)and
> Reducer takes in this key and list of values and returns
> Key , new value..
>
> So why is the input of reducer the simple output of mapper and not the
> list of all the values to a particular key or did i  understood something.
> Am i making any sense ??


fundamental doubt

2012-11-21 Thread jamal sasha
Hi..
I guess i am asking alot of fundamental questions but i thank you guys for
taking out time to explain my doubts.
So i am able to write map reduce jobs but here is my mydoubt
As of now i am writing mappers which emit key and a value
This key value is then captured at reducer end and then i process the key
and value there.
Let's say i want to calculate the average...
Key1 value1
Key2 value 2
Key 1 value 3

So the output is something like
Key1 average of value  1 and value 3
Key2 average 2 = value 2

Right now in reducer i have to create a dictionary with key as original
keys and value is a list.
Data = defaultdict(list) == // python usrr
But i thought that
Mapper takes in the key value pairs and outputs key: ( v1,v2)and
Reducer takes in this key and list of values and returns
Key , new value..

So why is the input of reducer the simple output of mapper and not the list
of all the values to a particular key or did i  understood something.
Am i making any sense ??


Re: Doubt on Input and Output Mapper - Key value pairs

2012-11-07 Thread Mahesh Balija
Hi Rams,

   A mapper will accept single key-value pair as input and can emit
0 or more key-value pairs based on what you want to do in mapper function
(I mean based on your business logic in mapper function).
   But the framework will actually aggregate the list of values
associated with a given key and sends the key and List of values to the
reducer function.

Best,
Mahesh Balija.

On Wed, Nov 7, 2012 at 6:09 PM, Ramasubramanian Narayanan <
ramasubramanian.naraya...@gmail.com> wrote:

> Hi,
>
> Which of the following is correct w.r.t mapper.
>
> (a) It accepts a single key-value pair as input and can emit any number of
> key-value pairs as output, including zero.
> (b) It accepts a single key-value pair as input and emits a single key and
> list of corresponding values as output
>
>
> regards,
> Rams
>


Re: Doubt on Input and Output Mapper - Key value pairs

2012-11-07 Thread Harsh J
The answer (a) is correct, in general.

On Wed, Nov 7, 2012 at 6:09 PM, Ramasubramanian Narayanan
 wrote:
> Hi,
>
> Which of the following is correct w.r.t mapper.
>
> (a) It accepts a single key-value pair as input and can emit any number of
> key-value pairs as output, including zero.
> (b) It accepts a single key-value pair as input and emits a single key and
> list of corresponding values as output
>
>
> regards,
> Rams



-- 
Harsh J


Doubt on Input and Output Mapper - Key value pairs

2012-11-07 Thread Ramasubramanian Narayanan
Hi,

Which of the following is correct w.r.t mapper.

(a) It accepts a single key-value pair as input and can emit any number of
key-value pairs as output, including zero.
(b) It accepts a single key-value pair as input and emits a single key and
list of corresponding values as output


regards,
Rams


Re: Amateur doubt about Terasort

2012-09-26 Thread Harsh J
Please do not mail general@ with user/dev questions. Use the user@
alias for it in future.

The IdentityMapper and IdentityReducer is what TeraSort uses ("it is
not needed/hadoop does sort on default" -> uses default
mapper/reducer).

On Wed, Sep 26, 2012 at 10:08 PM, Nitin Khandelwal  wrote:
> HI,
>
> I was trying to understand TeraSort code, but didn't find any
> Mapper/Reducer. On googling came to know that its not needed (Hadoop does
> sort on default). But I am not very clear about how it works. Can anyone
> please brief me about how terasort works or any link that has document on
> the same.
>
> Thanks in advance,
> Nitin



-- 
Harsh J