Hi all,
I'm a newbie hadoop user, and I started using hadoop 2.4.1 as my first
installation.
So now I'm struggling with mapred, mapreduce, yarnMRv1, MRv2, yarn.
I tried to read the documentation, but I couldn't find a clear
answer...sometimes it seems that documentations thinks that you know all
the history about hadoop framework... :(
I started with standalone node of course, but I have deployed also a
cluster with 10 machines.
Start with the example on the documentation.
Cluster installed...dfs running with
start-dfs.sh
when I run
bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.4.1.jar
grep input output 'dfs[a-z.]+'
What I'm using? MRv1, MRv2?
The job execute successfully and I can get the output on HDFS output
directory.
Then on the same installation I start yarn with start-yarn.sh
I run the same command after starting yarn
bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.4.1.jar
grep input output 'dfs[a-z.]+'
So what I'm using in this case?
I'm not sure about what is the difference from mapreduce and
yarnprobably mapreduce is running on top of yarn? How does mapreduce
interact with yarn? it it completely transparent?
What's the difference between a mapreduce and a yarn application? (Forgive
me if it's not correct to talk about mapreduce application)
Besides that...writing a completely new mapreduce application what API that
should be used? not to write deprecated/old hadoop style code?
mapred or mapreduce
Thanks a lot.
Kind regards.
Seba