A general doubt about what offers the Hadoop project.
Hello: I'm meeting Hadoop and I have the following question. It is a general doubt about what offers the Hadoop project. "The Apache Hadoop software library is a framework That Allows for the distributed processing of large data sets across clusters of computers using a simple programming models." ... Now ... Hadoop also puts the hardware ?, i.e. if I want to processing that involves a high consumption of RAM. The Hadoop project only provides the software, so then install it on my hardware or it also provides capabilities hardware free manner. Regards The University of Informatics Sciences invites you to participate in the Scientific Conference UCIENCIA 2016, [extended] deadline: september 30th. Conferencia CientÃfica UCIENCIA 2016, plazo de envÃo de trabajos hasta el 30 de septiembre. http://uciencia.eventos.uci.cu/
Re: Doubt in DoubleWritable
Please try this for (DoubleArrayWritable avalue : values) { Writable[] value = avalue.get(); // DoubleWritable[] value = new DoubleWritable[6]; // for(int k=0;k<6;k++){ // value[k] = DoubleWritable(wvalue[k]); // } //parse accordingly if (Double.parseDouble(value[1].toString()) != 0) { total_records_Temp = total_records_Temp + 1; sumvalueTemp = sumvalueTemp + Double.parseDouble(value[0].toString()); } if (Double.parseDouble(value[3].toString()) != 0) { total_records_Dewpoint = total_records_Dewpoint + 1; sumvalueDewpoint = sumvalueDewpoint + Double.parseDouble(value[2].toString()); } if (Double.parseDouble(value[5].toString()) != 0) { total_records_Windspeed = total_records_Windspeed + 1; sumvalueWindspeed = sumvalueWindspeed + Double.parseDouble(value[4].toString()); } } Attaching the code -- *Thanks & Regards * *Unmesha Sreeveni U.B* *Hadoop, Bigdata Developer* *Centre for Cyber Security | Amrita Vishwa Vidyapeetham* http://www.unmeshasreeveni.blogspot.in/ //cc MaxTemperature Application to find the maximum temperature in the weather dataset //vv MaxTemperature import org.apache.hadoop.fs.FileSystem; import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.ArrayWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Job; import org.apache.hadoop.mapreduce.lib.input.FileInputFormat; import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat; import org.apache.hadoop.conf.Configuration; public class MapReduce { public static void main(String[] args) throws Exception { if (args.length != 2) { System.err .println("Usage: MaxTemperature "); System.exit(-1); } /* * Job job = new Job(); job.setJarByClass(MaxTemperature.class); * job.setJobName("Max temperature"); */ Configuration conf = new Configuration(); FileSystem fs = FileSystem.get(conf); Job job = Job.getInstance(conf, "AverageTempValues"); /* * Deleting output directory for reuseing same dir */ Path dest = new Path(args[1]); if(fs.exists(dest)){ fs.delete(dest, true); } FileInputFormat.addInputPath(job, new Path(args[0])); FileOutputFormat.setOutputPath(job, new Path(args[1])); job.setNumReduceTasks(2); job.setMapperClass(NewMapper.class); job.setReducerClass(NewReducer.class); job.setOutputKeyClass(Text.class); job.setOutputValueClass(DoubleArrayWritable.class); System.exit(job.waitForCompletion(true) ? 0 : 1); } } // ^^ MaxTemperature // cc MaxTemperatureMapper Mapper for maximum temperature example // vv MaxTemperatureMapper import java.io.IOException; import org.apache.hadoop.io.DoubleWritable; import org.apache.hadoop.io.LongWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Mapper; public class NewMapper extends Mapper { public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException { String Str = value.toString(); String[] Mylist = new String[1000]; int i = 0; for (String retval : Str.split("\\s+")) { System.out.println(retval); Mylist[i++] = retval; } String Val = Mylist[2]; String Year = Val.substring(0, 4); String Month = Val.substring(5, 6); String[] Section = Val.split("_"); String section_string = "0"; if (Section[1].matches("^(0|1|2|3|4|5)$")) { section_string = "4"; } else if (Section[1].matches("^(6|7|8|9|10|11)$")) { section_string = "1"; } else if (Section[1].matches("^(12|13|14|15|16|17)$")) { section_string = "2"; } else if (Section[1].matches("^(18|19|20|21|22|23)$")) { section_string = "3"; } DoubleWritable[] array = new DoubleWritable[6]; DoubleArrayWritable output = new DoubleArrayWritable(); array[0].set(Double.parseDouble(Mylist[3])); array[2].set(Double.parseDouble(Mylist[4])); array[4].set(Double.parseDouble(Mylist[12])); for (int j = 0; j < 6; j = j + 2) { if (999.9 == array[j].get()) { array[j + 1].set(0); } else { array[j + 1].set(1); } } output.set(array); context.write(new Text(Year + section_string + Month), output); } } //cc MaxTemperatureReducer Reducer for maximum temperature example //vv MaxTemperatureReducer import java.io.IOException; import org.apache.hadoop.io.DoubleWritable; import org.apache.hadoop.io.Writable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Reducer; public class NewReducer extends Reducer { @Override public void reduce(Text key, Iterable values, Context context) throws IOException, InterruptedException { double sumvalueTemp = 0; double sumvalueDewpoint = 0; double sumvalueWindspeed = 0; double total_records_Temp = 0; double total_records_Dewpoint = 0; double total_records_Windspeed = 0; double average_Temp = Integer.MIN_VALUE; double average_Dewpoint = Integer.MIN_VALUE; double average_Windspeed = Integer.MIN_VALUE; DoubleWritable[] temp = new DoubleWritable[3]; DoubleArrayWritable output = new DoubleArrayWritable(); for (DoubleArrayWritable avalue : values)
Re: Doubt Regarding QJM protocol - example 2.10.6 of Quorum-Journal Design document
Hi A developer should answer that but a quick look to an edit file with od suggests that record are not fixed length. So maybe the likeliness of the situation you suggest is so low that there is no need to check more than file size Ulul Le 28/09/2014 11:17, Giridhar Addepalli a écrit : Hi All, I am going through Quorum Journal Design document. It is mentioned in Section 2.8 - In Accept Recovery RPC section " If the current on-disk log is missing, or a /different length /than the proposed recovery, the JN downloads the log from the provided URI, replacing any current copy of the log segment. " I can see it that the code follows above design Source :: Journal.java public synchronized void acceptRecovery(RequestInfo reqInfo, SegmentStateProto segment, URL fromUrl) throws IOException { if (currentSegment == null || currentSegment.getEndTxId() != segment.getEndTxId()) { } else { LOG.info("Skipping download of log " + TextFormat.shortDebugString(segment) + ": already have up-to-date logs"); } } My question is what if on-disk log is present and is of /same length /as the proposed recovery If JournalNode is skipping download because the logs are of same length, then we could end up in a situation where finalized log segments contain different data ! This could happen if we follow example 2.10.6 As per that example we write transactions (151-153 ) on JN1 then when recovery proceeded with only JN2 & JN3 let us assume that we write again /different transactions/ as (151-153) . Then after the crash when we run recovery , JN1 will skip downloading correct segment from JN2/JN3 as it thinks it has correct segment( as per the code pasted above). This will result in a situation where finalized segment ( edits_151-153 ) on JN1 is different from finalized segment edits_151-153 on JN2/JN3. Please let me know if i have gone wrong some where, and this situation is taken care of. Thanks, Giridhar.
Doubt Regarding QJM protocol - example 2.10.6 of Quorum-Journal Design document
Hi All, I am going through Quorum Journal Design document. It is mentioned in Section 2.8 - In Accept Recovery RPC section " If the current on-disk log is missing, or a *different length *than the proposed recovery, the JN downloads the log from the provided URI, replacing any current copy of the log segment. " I can see it that the code follows above design Source :: Journal.java public synchronized void acceptRecovery(RequestInfo reqInfo, SegmentStateProto segment, URL fromUrl) throws IOException { if (currentSegment == null || currentSegment.getEndTxId() != segment.getEndTxId()) { } else { LOG.info("Skipping download of log " + TextFormat.shortDebugString(segment) + ": already have up-to-date logs"); } } My question is what if on-disk log is present and is of *same length *as the proposed recovery If JournalNode is skipping download because the logs are of same length, then we could end up in a situation where finalized log segments contain different data ! This could happen if we follow example 2.10.6 As per that example we write transactions (151-153 ) on JN1 then when recovery proceeded with only JN2 & JN3 let us assume that we write again *different transactions* as (151-153) . Then after the crash when we run recovery , JN1 will skip downloading correct segment from JN2/JN3 as it thinks it has correct segment( as per the code pasted above). This will result in a situation where finalized segment ( edits_151-153 ) on JN1 is different from finalized segment edits_151-153 on JN2/JN3. Please let me know if i have gone wrong some where, and this situation is taken care of. Thanks, Giridhar.
Re: Doubt regarding Binary Compatibility\Source Compatibility with old *mapred* APIs and new *mapreduce* APIs in Hadoop
bq. Regarding #3 if I have ONLY the binaries i.e. jar file (compiled\build against old MRv1 mapred APIS) Which APIs are you talking about, *mapred* or *mapreduce*? In #3, I was saying about *mapreduce*. If this is the case, you may be in the trouble unfortunately, because MRv2 has evolved so much in *mapreduce *APIs that it's difficult to ensure binary compatibility. Anyway, you should still try your luck, as your binaries may not use the incompatible APIs. On the other hand, if you meant *mapred* APIs instead, you binaries should just work. - Zhijie On Tue, Apr 15, 2014 at 1:35 PM, Radhe Radhe wrote: > Thanks Zhijie for the explanation. > > Regarding #3 if I have ONLY the binaries i.e. jar file (compiled\build > against old MRv1 *mapred* APIS) then how can I compile it since I don't > have the source code i.e. Java files. All I can do with binaries i.e. jar > file is execute it. > > -RR > -- > Date: Tue, 15 Apr 2014 13:03:53 -0700 > > Subject: Re: Doubt regarding Binary Compatibility\Source Compatibility > with old *mapred* APIs and new *mapreduce* APIs in Hadoop > From: zs...@hortonworks.com > To: user@hadoop.apache.org > > > 1. If you have the binaries that were compiled against MRv1 *mapred*libs, it > should just work with MRv2. > 2. If you have the source code that refers to MRv1 *mapred* libs, it > should be compilable without code changes. Of course, you're free to change > your code. > 3. If you have the binaries that were compiled against MRv1 *mapreduce* libs, > it may not be executable directly with MRv2, but you should able to compile > it against MRv2 *mapreduce* libs without code changes, and execute it. > > - Zhijie > > > On Tue, Apr 15, 2014 at 12:44 PM, Radhe Radhe < > radhe.krishna.ra...@live.com> wrote: > > Thanks John for your comments, > > I believe MRv2 has support for both the old *mapred* APIs and new > *mapreduce* APIs. > > I see this way: > [1.] One may have binaries i.e. jar file of the M\R program that used old > *mapred* APIs > This will work directly on MRv2(YARN). > > [2.] One may have the source code i.e. Java Programs of the M\R program > that used old *mapred* APIs > For this I need to recompile and generate the binaries i.e. jar file. > Do I have to change the old *org.apache.hadoop.mapred* APIs to new * > org.apache.hadoop.mapreduce* APIs? or No code changes are needed? > > -RR > > > Date: Mon, 14 Apr 2014 10:37:56 -0400 > > Subject: Re: Doubt regarding Binary Compatibility\Source Compatibility > with old *mapred* APIs and new *mapreduce* APIs in Hadoop > > From: john.meag...@gmail.com > > To: user@hadoop.apache.org > > > > > Also, "Source Compatibility" also means ONLY a recompile is needed. > > No code changes should be needed. > > > > On Mon, Apr 14, 2014 at 10:37 AM, John Meagher > wrote: > > > Source Compatibility = you need to recompile and use the new version > > > as part of the compilation > > > > > > Binary Compatibility = you can take something compiled against the old > > > version and run it on the new version > > > > > > On Mon, Apr 14, 2014 at 9:19 AM, Radhe Radhe > > > wrote: > > >> Hello People, > > >> > > >> As per the Apache site > > >> > http://hadoop.apache.org/docs/r2.3.0/hadoop-mapreduce-client/hadoop-mapreduce-client-core/MapReduce_Compatibility_Hadoop1_Hadoop2.html > > >> > > >> Binary Compatibility > > >> > > >> First, we ensure binary compatibility to the applications that use old > > >> mapred APIs. This means that applications which were built against > MRv1 > > >> mapred APIs can run directly on YARN without recompilation, merely by > > >> pointing them to an Apache Hadoop 2.x cluster via configuration. > > >> > > >> Source Compatibility > > >> > > >> We cannot ensure complete binary compatibility with the applications > that > > >> use mapreduce APIs, as these APIs have evolved a lot since MRv1. > However, we > > >> ensure source compatibility for mapreduce APIs that break binary > > >> compatibility. In other words, users should recompile their > applications > > >> that use mapreduce APIs against MRv2 jars. One notable binary > > >> incompatibility break is Counter and CounterGroup. > > >> > > >> For "Binary Compatibility" I understand that if I had build a MR job > with > > >> old *mapred* APIs then they can be run directly on YARN without and >
RE: Doubt regarding Binary Compatibility\Source Compatibility with old *mapred* APIs and new *mapreduce* APIs in Hadoop
Thanks Zhijie for the explanation. Regarding #3 if I have ONLY the binaries i.e. jar file (compiled\build against old MRv1 mapred APIS) then how can I compile it since I don't have the source code i.e. Java files. All I can do with binaries i.e. jar file is execute it. -RR Date: Tue, 15 Apr 2014 13:03:53 -0700 Subject: Re: Doubt regarding Binary Compatibility\Source Compatibility with old *mapred* APIs and new *mapreduce* APIs in Hadoop From: zs...@hortonworks.com To: user@hadoop.apache.org 1. If you have the binaries that were compiled against MRv1 mapred libs, it should just work with MRv2.2. If you have the source code that refers to MRv1 mapred libs, it should be compilable without code changes. Of course, you're free to change your code. 3. If you have the binaries that were compiled against MRv1 mapreduce libs, it may not be executable directly with MRv2, but you should able to compile it against MRv2 mapreduce libs without code changes, and execute it. - Zhijie On Tue, Apr 15, 2014 at 12:44 PM, Radhe Radhe wrote: Thanks John for your comments, I believe MRv2 has support for both the old *mapred* APIs and new *mapreduce* APIs. I see this way:[1.] One may have binaries i.e. jar file of the M\R program that used old *mapred* APIs This will work directly on MRv2(YARN). [2.] One may have the source code i.e. Java Programs of the M\R program that used old *mapred* APIs For this I need to recompile and generate the binaries i.e. jar file. Do I have to change the old *org.apache.hadoop.mapred* APIs to new *org.apache.hadoop.mapreduce* APIs? or No code changes are needed? -RR > Date: Mon, 14 Apr 2014 10:37:56 -0400 > Subject: Re: Doubt regarding Binary Compatibility\Source Compatibility with > old *mapred* APIs and new *mapreduce* APIs in Hadoop > From: john.meag...@gmail.com > To: user@hadoop.apache.org > > Also, "Source Compatibility" also means ONLY a recompile is needed. > No code changes should be needed. > > On Mon, Apr 14, 2014 at 10:37 AM, John Meagher wrote: > > Source Compatibility = you need to recompile and use the new version > > as part of the compilation > > > > Binary Compatibility = you can take something compiled against the old > > version and run it on the new version > > > > On Mon, Apr 14, 2014 at 9:19 AM, Radhe Radhe > > wrote: > >> Hello People, > >> > >> As per the Apache site > >> http://hadoop.apache.org/docs/r2.3.0/hadoop-mapreduce-client/hadoop-mapreduce-client-core/MapReduce_Compatibility_Hadoop1_Hadoop2.html > >> > >> Binary Compatibility > >> > >> First, we ensure binary compatibility to the applications that use old > >> mapred APIs. This means that applications which were built against MRv1 > >> mapred APIs can run directly on YARN without recompilation, merely by > >> pointing them to an Apache Hadoop 2.x cluster via configuration. > >> > >> Source Compatibility > >> > >> We cannot ensure complete binary compatibility with the applications that > >> use mapreduce APIs, as these APIs have evolved a lot since MRv1. However, > >> we > >> ensure source compatibility for mapreduce APIs that break binary > >> compatibility. In other words, users should recompile their applications > >> that use mapreduce APIs against MRv2 jars. One notable binary > >> incompatibility break is Counter and CounterGroup. > >> > >> For "Binary Compatibility" I understand that if I had build a MR job with > >> old *mapred* APIs then they can be run directly on YARN without and > >> changes. > >> > >> Can anybody explain what do we mean by "Source Compatibility" here and also > >> a use case where one will need it? > >> > >> Does that mean code changes if I already have a MR job source code written > >> with with old *mapred* APIs and I need to make some changes to it to run in > >> then I need to use the new "mapreduce* API and generate the new binaries? > >> > >> Thanks, > >> -RR > >> > >> -- Zhijie ShenHortonworks Inc.http://hortonworks.com/ CONFIDENTIALITY NOTICENOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
Re: Doubt regarding Binary Compatibility\Source Compatibility with old *mapred* APIs and new *mapreduce* APIs in Hadoop
1. If you have the binaries that were compiled against MRv1 *mapred* libs, it should just work with MRv2. 2. If you have the source code that refers to MRv1 *mapred* libs, it should be compilable without code changes. Of course, you're free to change your code. 3. If you have the binaries that were compiled against MRv1 *mapreduce* libs, it may not be executable directly with MRv2, but you should able to compile it against MRv2 *mapreduce* libs without code changes, and execute it. - Zhijie On Tue, Apr 15, 2014 at 12:44 PM, Radhe Radhe wrote: > Thanks John for your comments, > > I believe MRv2 has support for both the old *mapred* APIs and new > *mapreduce* APIs. > > I see this way: > [1.] One may have binaries i.e. jar file of the M\R program that used old > *mapred* APIs > This will work directly on MRv2(YARN). > > [2.] One may have the source code i.e. Java Programs of the M\R program > that used old *mapred* APIs > For this I need to recompile and generate the binaries i.e. jar file. > Do I have to change the old *org.apache.hadoop.mapred* APIs to new * > org.apache.hadoop.mapreduce* APIs? or No code changes are needed? > > -RR > > > Date: Mon, 14 Apr 2014 10:37:56 -0400 > > Subject: Re: Doubt regarding Binary Compatibility\Source Compatibility > with old *mapred* APIs and new *mapreduce* APIs in Hadoop > > From: john.meag...@gmail.com > > To: user@hadoop.apache.org > > > > > Also, "Source Compatibility" also means ONLY a recompile is needed. > > No code changes should be needed. > > > > On Mon, Apr 14, 2014 at 10:37 AM, John Meagher > wrote: > > > Source Compatibility = you need to recompile and use the new version > > > as part of the compilation > > > > > > Binary Compatibility = you can take something compiled against the old > > > version and run it on the new version > > > > > > On Mon, Apr 14, 2014 at 9:19 AM, Radhe Radhe > > > wrote: > > >> Hello People, > > >> > > >> As per the Apache site > > >> > http://hadoop.apache.org/docs/r2.3.0/hadoop-mapreduce-client/hadoop-mapreduce-client-core/MapReduce_Compatibility_Hadoop1_Hadoop2.html > > >> > > >> Binary Compatibility > > >> > > >> First, we ensure binary compatibility to the applications that use old > > >> mapred APIs. This means that applications which were built against > MRv1 > > >> mapred APIs can run directly on YARN without recompilation, merely by > > >> pointing them to an Apache Hadoop 2.x cluster via configuration. > > >> > > >> Source Compatibility > > >> > > >> We cannot ensure complete binary compatibility with the applications > that > > >> use mapreduce APIs, as these APIs have evolved a lot since MRv1. > However, we > > >> ensure source compatibility for mapreduce APIs that break binary > > >> compatibility. In other words, users should recompile their > applications > > >> that use mapreduce APIs against MRv2 jars. One notable binary > > >> incompatibility break is Counter and CounterGroup. > > >> > > >> For "Binary Compatibility" I understand that if I had build a MR job > with > > >> old *mapred* APIs then they can be run directly on YARN without and > changes. > > >> > > >> Can anybody explain what do we mean by "Source Compatibility" here > and also > > >> a use case where one will need it? > > >> > > >> Does that mean code changes if I already have a MR job source code > written > > >> with with old *mapred* APIs and I need to make some changes to it to > run in > > >> then I need to use the new "mapreduce* API and generate the new > binaries? > > >> > > >> Thanks, > > >> -RR > > >> > > >> > -- Zhijie Shen Hortonworks Inc. http://hortonworks.com/ -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
RE: Doubt regarding Binary Compatibility\Source Compatibility with old *mapred* APIs and new *mapreduce* APIs in Hadoop
Thanks John for your comments, I believe MRv2 has support for both the old *mapred* APIs and new *mapreduce* APIs. I see this way:[1.] One may have binaries i.e. jar file of the M\R program that used old *mapred* APIsThis will work directly on MRv2(YARN). [2.] One may have the source code i.e. Java Programs of the M\R program that used old *mapred* APIsFor this I need to recompile and generate the binaries i.e. jar file. Do I have to change the old *org.apache.hadoop.mapred* APIs to new *org.apache.hadoop.mapreduce* APIs? or No code changes are needed? -RR > Date: Mon, 14 Apr 2014 10:37:56 -0400 > Subject: Re: Doubt regarding Binary Compatibility\Source Compatibility with > old *mapred* APIs and new *mapreduce* APIs in Hadoop > From: john.meag...@gmail.com > To: user@hadoop.apache.org > > Also, "Source Compatibility" also means ONLY a recompile is needed. > No code changes should be needed. > > On Mon, Apr 14, 2014 at 10:37 AM, John Meagher wrote: > > Source Compatibility = you need to recompile and use the new version > > as part of the compilation > > > > Binary Compatibility = you can take something compiled against the old > > version and run it on the new version > > > > On Mon, Apr 14, 2014 at 9:19 AM, Radhe Radhe > > wrote: > >> Hello People, > >> > >> As per the Apache site > >> http://hadoop.apache.org/docs/r2.3.0/hadoop-mapreduce-client/hadoop-mapreduce-client-core/MapReduce_Compatibility_Hadoop1_Hadoop2.html > >> > >> Binary Compatibility > >> > >> First, we ensure binary compatibility to the applications that use old > >> mapred APIs. This means that applications which were built against MRv1 > >> mapred APIs can run directly on YARN without recompilation, merely by > >> pointing them to an Apache Hadoop 2.x cluster via configuration. > >> > >> Source Compatibility > >> > >> We cannot ensure complete binary compatibility with the applications that > >> use mapreduce APIs, as these APIs have evolved a lot since MRv1. However, > >> we > >> ensure source compatibility for mapreduce APIs that break binary > >> compatibility. In other words, users should recompile their applications > >> that use mapreduce APIs against MRv2 jars. One notable binary > >> incompatibility break is Counter and CounterGroup. > >> > >> For "Binary Compatibility" I understand that if I had build a MR job with > >> old *mapred* APIs then they can be run directly on YARN without and > >> changes. > >> > >> Can anybody explain what do we mean by "Source Compatibility" here and also > >> a use case where one will need it? > >> > >> Does that mean code changes if I already have a MR job source code written > >> with with old *mapred* APIs and I need to make some changes to it to run in > >> then I need to use the new "mapreduce* API and generate the new binaries? > >> > >> Thanks, > >> -RR > >> > >>
Re: Doubt regarding Binary Compatibility\Source Compatibility with old *mapred* APIs and new *mapreduce* APIs in Hadoop
Also, "Source Compatibility" also means ONLY a recompile is needed. No code changes should be needed. On Mon, Apr 14, 2014 at 10:37 AM, John Meagher wrote: > Source Compatibility = you need to recompile and use the new version > as part of the compilation > > Binary Compatibility = you can take something compiled against the old > version and run it on the new version > > On Mon, Apr 14, 2014 at 9:19 AM, Radhe Radhe > wrote: >> Hello People, >> >> As per the Apache site >> http://hadoop.apache.org/docs/r2.3.0/hadoop-mapreduce-client/hadoop-mapreduce-client-core/MapReduce_Compatibility_Hadoop1_Hadoop2.html >> >> Binary Compatibility >> >> First, we ensure binary compatibility to the applications that use old >> mapred APIs. This means that applications which were built against MRv1 >> mapred APIs can run directly on YARN without recompilation, merely by >> pointing them to an Apache Hadoop 2.x cluster via configuration. >> >> Source Compatibility >> >> We cannot ensure complete binary compatibility with the applications that >> use mapreduce APIs, as these APIs have evolved a lot since MRv1. However, we >> ensure source compatibility for mapreduce APIs that break binary >> compatibility. In other words, users should recompile their applications >> that use mapreduce APIs against MRv2 jars. One notable binary >> incompatibility break is Counter and CounterGroup. >> >> For "Binary Compatibility" I understand that if I had build a MR job with >> old *mapred* APIs then they can be run directly on YARN without and changes. >> >> Can anybody explain what do we mean by "Source Compatibility" here and also >> a use case where one will need it? >> >> Does that mean code changes if I already have a MR job source code written >> with with old *mapred* APIs and I need to make some changes to it to run in >> then I need to use the new "mapreduce* API and generate the new binaries? >> >> Thanks, >> -RR >> >>
Re: Doubt regarding Binary Compatibility\Source Compatibility with old *mapred* APIs and new *mapreduce* APIs in Hadoop
Source Compatibility = you need to recompile and use the new version as part of the compilation Binary Compatibility = you can take something compiled against the old version and run it on the new version On Mon, Apr 14, 2014 at 9:19 AM, Radhe Radhe wrote: > Hello People, > > As per the Apache site > http://hadoop.apache.org/docs/r2.3.0/hadoop-mapreduce-client/hadoop-mapreduce-client-core/MapReduce_Compatibility_Hadoop1_Hadoop2.html > > Binary Compatibility > > First, we ensure binary compatibility to the applications that use old > mapred APIs. This means that applications which were built against MRv1 > mapred APIs can run directly on YARN without recompilation, merely by > pointing them to an Apache Hadoop 2.x cluster via configuration. > > Source Compatibility > > We cannot ensure complete binary compatibility with the applications that > use mapreduce APIs, as these APIs have evolved a lot since MRv1. However, we > ensure source compatibility for mapreduce APIs that break binary > compatibility. In other words, users should recompile their applications > that use mapreduce APIs against MRv2 jars. One notable binary > incompatibility break is Counter and CounterGroup. > > For "Binary Compatibility" I understand that if I had build a MR job with > old *mapred* APIs then they can be run directly on YARN without and changes. > > Can anybody explain what do we mean by "Source Compatibility" here and also > a use case where one will need it? > > Does that mean code changes if I already have a MR job source code written > with with old *mapred* APIs and I need to make some changes to it to run in > then I need to use the new "mapreduce* API and generate the new binaries? > > Thanks, > -RR > >
Doubt regarding Binary Compatibility\Source Compatibility with old *mapred* APIs and new *mapreduce* APIs in Hadoop
Hello People, As per the Apache site http://hadoop.apache.org/docs/r2.3.0/hadoop-mapreduce-client/hadoop-mapreduce-client-core/MapReduce_Compatibility_Hadoop1_Hadoop2.html Binary CompatibilityFirst, we ensure binary compatibility to the applications that use old mapred APIs. This means that applications which were built against MRv1 mapred APIs can run directly on YARN without recompilation, merely by pointing them to an Apache Hadoop 2.x cluster via configuration. Source CompatibilityWe cannot ensure complete binary compatibility with the applications that use mapreduce APIs, as these APIs have evolved a lot since MRv1. However, we ensure source compatibility for mapreduce APIs that break binary compatibility. In other words, users should recompile their applications that use mapreduce APIs against MRv2 jars. One notable binary incompatibility break is Counter and CounterGroup. For "Binary Compatibility" I understand that if I had build a MR job with old *mapred* APIs then they can be run directly on YARN without and changes. Can anybody explain what do we mean by "Source Compatibility" here and also a use case where one will need it? Does that mean code changes if I already have a MR job source code written with with old *mapred* APIs and I need to make some changes to it to run in then I need to use the new "mapreduce* API and generate the new binaries? Thanks,-RR
Re: Doubt
Why not ? Its just a matter of installing 2 different packages. Depends on what do you want to use it for, you need to take care of few things, but as far as installation is concerned, it should be easily doable. Regards Prav On Wed, Mar 19, 2014 at 3:41 PM, sri harsha wrote: > Hi all, > is it possible to install Mongodb on the same VM which consists hadoop? > > -- > amiable harsha >
Re: Doubt
thank s jay and praveen, i want to use both separately don't want to use mongodb in the place of hbase On Wed, Mar 19, 2014 at 9:25 PM, Jay Vyas wrote: > Certainly it is , and quite common especially if you have some high > performance machines : they can run as mapreduce slaves and also double as > mongo hosts. The problem would of course be that when running mapreduce > jobs you might have very slow network bandwidth at times, and if your front > end needs fast response times all the time from mongo instances you could > be in trouble. > > > > On Wed, Mar 19, 2014 at 11:50 AM, praveenesh kumar > wrote: > >> Why not ? Its just a matter of installing 2 different packages. >> Depends on what do you want to use it for, you need to take care of few >> things, but as far as installation is concerned, it should be easily doable. >> >> Regards >> Prav >> >> >> On Wed, Mar 19, 2014 at 3:41 PM, sri harsha wrote: >> >>> Hi all, >>> is it possible to install Mongodb on the same VM which consists hadoop? >>> >>> -- >>> amiable harsha >>> >> >> > > > -- > Jay Vyas > http://jayunit100.blogspot.com > -- amiable harsha
Re: Doubt
Certainly it is , and quite common especially if you have some high performance machines : they can run as mapreduce slaves and also double as mongo hosts. The problem would of course be that when running mapreduce jobs you might have very slow network bandwidth at times, and if your front end needs fast response times all the time from mongo instances you could be in trouble. On Wed, Mar 19, 2014 at 11:50 AM, praveenesh kumar wrote: > Why not ? Its just a matter of installing 2 different packages. > Depends on what do you want to use it for, you need to take care of few > things, but as far as installation is concerned, it should be easily doable. > > Regards > Prav > > > On Wed, Mar 19, 2014 at 3:41 PM, sri harsha wrote: > >> Hi all, >> is it possible to install Mongodb on the same VM which consists hadoop? >> >> -- >> amiable harsha >> > > -- Jay Vyas http://jayunit100.blogspot.com
Doubt
Hi all, is it possible to install Mongodb on the same VM which consists hadoop? -- amiable harsha
Re: doubt
I've installed a hadoop single node cluster on a VirtualBox machine running ubuntu 12.04LTS (64-bit) with 512MB RAM and 8GB HD. I haven't seen any errors in my testing yet. Is 1GB RAM required? Will I run into issues when I expand the cluster? On Sat, Jan 18, 2014 at 11:24 PM, Alexander Pivovarov wrote: > it' enough. hadoop uses only 1GB RAM by default. > > > On Sat, Jan 18, 2014 at 10:11 PM, sri harsha wrote: > >> Hi , >> i want to install 4 node cluster in 64-bit LINUX. 4GB RAM 500HD is enough >> for this or shall i need to expand ? >> please suggest about my query. >> >> than x >> >> -- >> amiable harsha >> > > -- -jblack
Re: doubt
it' enough. hadoop uses only 1GB RAM by default. On Sat, Jan 18, 2014 at 10:11 PM, sri harsha wrote: > Hi , > i want to install 4 node cluster in 64-bit LINUX. 4GB RAM 500HD is enough > for this or shall i need to expand ? > please suggest about my query. > > than x > > -- > amiable harsha >
doubt
Hi , i want to install 4 node cluster in 64-bit LINUX. 4GB RAM 500HD is enough for this or shall i need to expand ? please suggest about my query. than x -- amiable harsha
Re: Basic Doubt in Hadoop
@Bejoy Adding a little bit here,the ouput of map task writes first to a memory buffer, and when contents reaches a threshold a background thread will write the contents to disk. Niranjan Singh On Wed, Apr 17, 2013 at 1:06 PM, Ramesh R Nair wrote: > Hi Bejoy, > >Regarding the output of Map phase, does Hadoop store it in local fs or > in HDFS. >I believe it is in the former. Correct me if I am wrong. > > Regards > Ramesh > > > On Wed, Apr 17, 2013 at 10:30 AM, wrote: > >> The data is in HDFS in case of WordCount MR sample. >> >> In hdfs, you have the metadata in NameNode and actual data as blocks >> replicated across DataNodes. >> >> In case of reducer, If a reducer is running on a particular node then you >> have one replica of the blocks in the same node (If there is no space >> issues) and rest replicas on other nodes. >> Regards >> Bejoy KS >> >> Sent from remote device, Please excuse typos >> -- >> *From: * Raj Hadoop >> *Date: *Tue, 16 Apr 2013 21:49:34 -0700 (PDT) >> *To: *user@hadoop.apache.org >> *ReplyTo: * user@hadoop.apache.org >> *Subject: *Basic Doubt in Hadoop >> >> Hi, >> >> I am new to Hadoop. I started reading the standard Wordcount program. I >> got this basic doubt in Hadoop. >> >> After the Map - Reduce is done, where is the output generated? Does the >> reducer ouput sit on individual DataNodes ? Please advise. >> >> >> Thanks, >> Raj >> > >
Re: Basic Doubt in Hadoop
You are correct, map outputs are stored in LFS not in HDFS. Regards Bejoy KS Sent from remote device, Please excuse typos -Original Message- From: Ramesh R Nair Date: Wed, 17 Apr 2013 13:06:32 To: ; Subject: Re: Basic Doubt in Hadoop Hi Bejoy, Regarding the output of Map phase, does Hadoop store it in local fs or in HDFS. I believe it is in the former. Correct me if I am wrong. Regards Ramesh On Wed, Apr 17, 2013 at 10:30 AM, wrote: > The data is in HDFS in case of WordCount MR sample. > > In hdfs, you have the metadata in NameNode and actual data as blocks > replicated across DataNodes. > > In case of reducer, If a reducer is running on a particular node then you > have one replica of the blocks in the same node (If there is no space > issues) and rest replicas on other nodes. > Regards > Bejoy KS > > Sent from remote device, Please excuse typos > -- > *From: * Raj Hadoop > *Date: *Tue, 16 Apr 2013 21:49:34 -0700 (PDT) > *To: *user@hadoop.apache.org > *ReplyTo: * user@hadoop.apache.org > *Subject: *Basic Doubt in Hadoop > > Hi, > > I am new to Hadoop. I started reading the standard Wordcount program. I > got this basic doubt in Hadoop. > > After the Map - Reduce is done, where is the output generated? Does the > reducer ouput sit on individual DataNodes ? Please advise. > > > Thanks, > Raj >
Re: Basic Doubt in Hadoop
Hi Bejoy, Regarding the output of Map phase, does Hadoop store it in local fs or in HDFS. I believe it is in the former. Correct me if I am wrong. Regards Ramesh On Wed, Apr 17, 2013 at 10:30 AM, wrote: > The data is in HDFS in case of WordCount MR sample. > > In hdfs, you have the metadata in NameNode and actual data as blocks > replicated across DataNodes. > > In case of reducer, If a reducer is running on a particular node then you > have one replica of the blocks in the same node (If there is no space > issues) and rest replicas on other nodes. > Regards > Bejoy KS > > Sent from remote device, Please excuse typos > -- > *From: * Raj Hadoop > *Date: *Tue, 16 Apr 2013 21:49:34 -0700 (PDT) > *To: *user@hadoop.apache.org > *ReplyTo: * user@hadoop.apache.org > *Subject: *Basic Doubt in Hadoop > > Hi, > > I am new to Hadoop. I started reading the standard Wordcount program. I > got this basic doubt in Hadoop. > > After the Map - Reduce is done, where is the output generated? Does the > reducer ouput sit on individual DataNodes ? Please advise. > > > Thanks, > Raj >
Re: Basic Doubt in Hadoop
The data is in HDFS in case of WordCount MR sample. In hdfs, you have the metadata in NameNode and actual data as blocks replicated across DataNodes. In case of reducer, If a reducer is running on a particular node then you have one replica of the blocks in the same node (If there is no space issues) and rest replicas on other nodes. Regards Bejoy KS Sent from remote device, Please excuse typos -Original Message- From: Raj Hadoop Date: Tue, 16 Apr 2013 21:49:34 To: user@hadoop.apache.org Reply-To: user@hadoop.apache.org Subject: Basic Doubt in Hadoop Hi, I am new to Hadoop. I started reading the standard Wordcount program. I got this basic doubt in Hadoop. After the Map - Reduce is done, where is the output generated? Does the reducer ouput sit on individual DataNodes ? Please advise. Thanks, Raj
Basic Doubt in Hadoop
Hi, I am new to Hadoop. I started reading the standard Wordcount program. I got this basic doubt in Hadoop. After the Map - Reduce is done, where is the output generated? Does the reducer ouput sit on individual DataNodes ? Please advise. Thanks, Raj
Re: fundamental doubt
got it. thanks for clarification On Wed, Nov 21, 2012 at 3:03 PM, Bejoy KS wrote: > ** > Hi Jamal > > It is performed at a frame work level map emits key value pairs and the > framework collects and groups all the values corresponding to a key from > all the map tasks. Now the reducer takes the input as a key and a > collection of values only. The reduce method signature defines it. > > Regards > Bejoy KS > > Sent from handheld, please excuse typos. > -- > *From: * jamal sasha > *Date: *Wed, 21 Nov 2012 14:50:51 -0500 > *To: *user@hadoop.apache.org > *ReplyTo: * user@hadoop.apache.org > *Subject: *fundamental doubt > > Hi.. > I guess i am asking alot of fundamental questions but i thank you guys for > taking out time to explain my doubts. > So i am able to write map reduce jobs but here is my mydoubt > As of now i am writing mappers which emit key and a value > This key value is then captured at reducer end and then i process the key > and value there. > Let's say i want to calculate the average... > Key1 value1 > Key2 value 2 > Key 1 value 3 > > So the output is something like > Key1 average of value 1 and value 3 > Key2 average 2 = value 2 > > Right now in reducer i have to create a dictionary with key as original > keys and value is a list. > Data = defaultdict(list) == // python usrr > But i thought that > Mapper takes in the key value pairs and outputs key: ( v1,v2)and > Reducer takes in this key and list of values and returns > Key , new value.. > > So why is the input of reducer the simple output of mapper and not the > list of all the values to a particular key or did i understood something. > Am i making any sense ?? >
Re: fundamental doubt
Hi Jamal It is performed at a frame work level map emits key value pairs and the framework collects and groups all the values corresponding to a key from all the map tasks. Now the reducer takes the input as a key and a collection of values only. The reduce method signature defines it. Regards Bejoy KS Sent from handheld, please excuse typos. -Original Message- From: jamal sasha Date: Wed, 21 Nov 2012 14:50:51 To: user@hadoop.apache.org Reply-To: user@hadoop.apache.org Subject: fundamental doubt Hi.. I guess i am asking alot of fundamental questions but i thank you guys for taking out time to explain my doubts. So i am able to write map reduce jobs but here is my mydoubt As of now i am writing mappers which emit key and a value This key value is then captured at reducer end and then i process the key and value there. Let's say i want to calculate the average... Key1 value1 Key2 value 2 Key 1 value 3 So the output is something like Key1 average of value 1 and value 3 Key2 average 2 = value 2 Right now in reducer i have to create a dictionary with key as original keys and value is a list. Data = defaultdict(list) == // python usrr But i thought that Mapper takes in the key value pairs and outputs key: ( v1,v2)and Reducer takes in this key and list of values and returns Key , new value.. So why is the input of reducer the simple output of mapper and not the list of all the values to a particular key or did i understood something. Am i making any sense ??
Re: fundamental doubt
Hello Jamal, For efficient processing all the values associated with the same key get sorted and go to same reducer. As a result the reducer gets a key and a list of values as its input. To me your assumption seems correct. Regards, Mohammad Tariq On Thu, Nov 22, 2012 at 1:20 AM, jamal sasha wrote: > Hi.. > I guess i am asking alot of fundamental questions but i thank you guys for > taking out time to explain my doubts. > So i am able to write map reduce jobs but here is my mydoubt > As of now i am writing mappers which emit key and a value > This key value is then captured at reducer end and then i process the key > and value there. > Let's say i want to calculate the average... > Key1 value1 > Key2 value 2 > Key 1 value 3 > > So the output is something like > Key1 average of value 1 and value 3 > Key2 average 2 = value 2 > > Right now in reducer i have to create a dictionary with key as original > keys and value is a list. > Data = defaultdict(list) == // python usrr > But i thought that > Mapper takes in the key value pairs and outputs key: ( v1,v2)and > Reducer takes in this key and list of values and returns > Key , new value.. > > So why is the input of reducer the simple output of mapper and not the > list of all the values to a particular key or did i understood something. > Am i making any sense ??
fundamental doubt
Hi.. I guess i am asking alot of fundamental questions but i thank you guys for taking out time to explain my doubts. So i am able to write map reduce jobs but here is my mydoubt As of now i am writing mappers which emit key and a value This key value is then captured at reducer end and then i process the key and value there. Let's say i want to calculate the average... Key1 value1 Key2 value 2 Key 1 value 3 So the output is something like Key1 average of value 1 and value 3 Key2 average 2 = value 2 Right now in reducer i have to create a dictionary with key as original keys and value is a list. Data = defaultdict(list) == // python usrr But i thought that Mapper takes in the key value pairs and outputs key: ( v1,v2)and Reducer takes in this key and list of values and returns Key , new value.. So why is the input of reducer the simple output of mapper and not the list of all the values to a particular key or did i understood something. Am i making any sense ??
Re: Doubt on Input and Output Mapper - Key value pairs
Hi Rams, A mapper will accept single key-value pair as input and can emit 0 or more key-value pairs based on what you want to do in mapper function (I mean based on your business logic in mapper function). But the framework will actually aggregate the list of values associated with a given key and sends the key and List of values to the reducer function. Best, Mahesh Balija. On Wed, Nov 7, 2012 at 6:09 PM, Ramasubramanian Narayanan < ramasubramanian.naraya...@gmail.com> wrote: > Hi, > > Which of the following is correct w.r.t mapper. > > (a) It accepts a single key-value pair as input and can emit any number of > key-value pairs as output, including zero. > (b) It accepts a single key-value pair as input and emits a single key and > list of corresponding values as output > > > regards, > Rams >
Re: Doubt on Input and Output Mapper - Key value pairs
The answer (a) is correct, in general. On Wed, Nov 7, 2012 at 6:09 PM, Ramasubramanian Narayanan wrote: > Hi, > > Which of the following is correct w.r.t mapper. > > (a) It accepts a single key-value pair as input and can emit any number of > key-value pairs as output, including zero. > (b) It accepts a single key-value pair as input and emits a single key and > list of corresponding values as output > > > regards, > Rams -- Harsh J
Doubt on Input and Output Mapper - Key value pairs
Hi, Which of the following is correct w.r.t mapper. (a) It accepts a single key-value pair as input and can emit any number of key-value pairs as output, including zero. (b) It accepts a single key-value pair as input and emits a single key and list of corresponding values as output regards, Rams
Re: Amateur doubt about Terasort
Please do not mail general@ with user/dev questions. Use the user@ alias for it in future. The IdentityMapper and IdentityReducer is what TeraSort uses ("it is not needed/hadoop does sort on default" -> uses default mapper/reducer). On Wed, Sep 26, 2012 at 10:08 PM, Nitin Khandelwal wrote: > HI, > > I was trying to understand TeraSort code, but didn't find any > Mapper/Reducer. On googling came to know that its not needed (Hadoop does > sort on default). But I am not very clear about how it works. Can anyone > please brief me about how terasort works or any link that has document on > the same. > > Thanks in advance, > Nitin -- Harsh J