Re: Need some tutorials for Mapreduce written in Python

2014-08-27 Thread Sebastiano Di Paola
Hi there,
In order to use Python to write mapreduce jobs you need to use hadoop
streaming api.
So I will suggest start searching for it.
(here's a link although is for hadoop 1.x
http://hadoop.apache.org/docs/r1.2.1/streaming.html ) but it's a starting
point.
With streaming API you can use whatever language to write map/reduce jobs
provided they will expect to read data from stdin and write data to stdout.
Streaming api will do the magic for you ;-)
Hope it helps.
Seba



On Wed, Aug 27, 2014 at 8:13 PM, Amar Singh  wrote:

> Hi Users,
> I am new to big data world and was in process of reading some material of
> writing mapreduce using Python.
>
> Any links or pointers in that direction will be really helpful.
>


Re: Started learning Hadoop. Which distribution is best for native install in pseudo distributed mode?

2014-08-13 Thread Sebastiano Di Paola
Hi,
I'm a newbie too and I'm not using any particular distribution. Just
download the component I need / want to try for my deploiment and use them.
It's a slow process but allows me to better understand what I'm doing under
the hood.
Regards,
Seba


On Tue, Aug 12, 2014 at 10:12 PM, mani kandan  wrote:

> Which distribution are you people using? Cloudera vs Hortonworks vs
> Biginsights?
>


Yarn, MRv1, MRv2 lots of newbie doubts and questions

2014-08-10 Thread Sebastiano Di Paola
Hi all,
I'm a newbie hadoop user, and I started using hadoop 2.4.1 as my first
installation.
So now I'm struggling with mapred, mapreduce, yarnMRv1, MRv2, yarn.
I tried to read the documentation, but I couldn't find a clear
answer...sometimes it seems  that documentations thinks that you know all
the history about hadoop framework... :(

I started with standalone node of course, but I have deployed also a
cluster with 10 machines.

Start with the example on the documentation.

Cluster installed...dfs running with
start-dfs.sh

when I run

bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.4.1.jar
grep input output 'dfs[a-z.]+'

What I'm using? MRv1, MRv2?
The job execute successfully and I can get the output on HDFS output
directory.


Then on the same installation I start yarn with start-yarn.sh
I run the same command after starting yarn

bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.4.1.jar
grep input output 'dfs[a-z.]+'

So what I'm using in this case?

I'm not sure about what is the difference from mapreduce and
yarnprobably mapreduce is running on top of yarn? How does mapreduce
interact with yarn? it it completely transparent?

What's the difference between a mapreduce and a yarn application? (Forgive
me if it's not correct to talk about mapreduce application)

Besides that...writing a completely new mapreduce application what API that
should be used? not to write deprecated/old hadoop style code?
mapred or mapreduce
Thanks a lot.
Kind regards.
Seba