Re: hadoop learning

2015-02-21 Thread Fabio C.
Hi Rishabh,
I didn't know anything about Hadoop a few months ago, and I started from
the very beginning. I don't suggest you to start with online documentation,
that is always fragmented, incomplete and sometimes not even up to date.
Also starting by directly using Hadoop is the fastest way to frustration
and will just take you to abandon this technology.
I can suggest you two books I used to start with, and they have been quite
helpful for someone who didn't even know what mapreduce is. They provide
many examples and use cases (especially the first one):
- OReilly - Hadoop The Definitive Guide 3rd Edition. This is quite old
but, other than the coding part, it could explain quite well what hadoop
is, what it does and how it works. It is mainly about old versions of
Hadoop, but I believe it's something you should know, even because most of
articles online still refer to the pre-YARN terminology.
-  Addison-Wesley Professional - Apache Hadoop YARN: Moving beyond
MapReduce and Batch Processing with Apache Hadoop 2. This is what you I
used to really understand the new hadoop architecture and terminology.
Sometimes it gives too many details, but better more than less. It also has
a couple of chapters about installing Hadoop.

Good luck

Fabio

On Sat, Feb 21, 2015 at 3:33 PM, Ted Yu yuzhih...@gmail.com wrote:

 Rishabh:
 You can start with:
 http://wiki.apache.org/hadoop/HowToContribute

 There're several components: common, hdfs, YARN, mapreduce, ...
 Which ones are you interested in ?

 Cheers

 On Sat, Feb 21, 2015 at 12:18 AM, Bhupendra Gupta bhupendra1...@gmail.com
  wrote:

 I have been learning and trying to implement a hadoop ecosystem for one
 of the POC from last 1 month or so and i think that the best way to learn
 is by doing it..

 Hadoop as the concept has lots of implementation and i picked up
 hortonworks sandbox for learning...
 This has helped me in guaging some of the concepts and few practical
 understanding as well.

 Happy learning

 Sent from my iPhone

 Bhupendra Gupta

  On 21-Feb-2015, at 1:39 pm, Rishabh Agrawal ss.rishab...@gmail.com
 wrote:
 
  Hello,
 
  Please tell me where can i learn the concepts of Big Data and Hadoop
 from the scratch. Please provide some links online.
 
 
 
  Rishabh Agrawal





Re: hadoop learning

2015-02-21 Thread Bhupendra Gupta
I have been learning and trying to implement a hadoop ecosystem for one of the 
POC from last 1 month or so and i think that the best way to learn is by doing 
it..

Hadoop as the concept has lots of implementation and i picked up hortonworks 
sandbox for learning...
This has helped me in guaging some of the concepts and few practical 
understanding as well.

Happy learning 

Sent from my iPhone

Bhupendra Gupta

 On 21-Feb-2015, at 1:39 pm, Rishabh Agrawal ss.rishab...@gmail.com wrote:
 
 Hello,
 
 Please tell me where can i learn the concepts of Big Data and Hadoop from the 
 scratch. Please provide some links online. 
 
 
 
 Rishabh Agrawal


Re: hadoop learning

2015-02-21 Thread Ted Yu
Rishabh:
You can start with:
http://wiki.apache.org/hadoop/HowToContribute

There're several components: common, hdfs, YARN, mapreduce, ...
Which ones are you interested in ?

Cheers

On Sat, Feb 21, 2015 at 12:18 AM, Bhupendra Gupta bhupendra1...@gmail.com
wrote:

 I have been learning and trying to implement a hadoop ecosystem for one of
 the POC from last 1 month or so and i think that the best way to learn is
 by doing it..

 Hadoop as the concept has lots of implementation and i picked up
 hortonworks sandbox for learning...
 This has helped me in guaging some of the concepts and few practical
 understanding as well.

 Happy learning

 Sent from my iPhone

 Bhupendra Gupta

  On 21-Feb-2015, at 1:39 pm, Rishabh Agrawal ss.rishab...@gmail.com
 wrote:
 
  Hello,
 
  Please tell me where can i learn the concepts of Big Data and Hadoop
 from the scratch. Please provide some links online.
 
 
 
  Rishabh Agrawal



Re: Hadoop Learning Environment

2014-11-05 Thread Jim Shi
Hi, Yay,
   I followed the steps you described and got the following error.
Any idea?

  vagrant up
creating provisioner directive for running tests
Bringing machine 'bigtop1' up with 'virtualbox' provider...
== bigtop1: Box 'puppetlab-centos-64-nocm' could not be found. Attempting to 
find and install...
bigtop1: Box Provider: virtualbox
bigtop1: Box Version: = 0
== bigtop1: Adding box 'puppetlab-centos-64-nocm' (v0) for provider: virtualbox
bigtop1: Downloading: 
http://puppet-vagrant-boxes.puppetlabs.com/centos-64-x64-vbox4210-nocm.box
== bigtop1: Successfully added box 'puppetlab-centos-64-nocm' (v0) for 
'virtualbox'!
There are errors in the configuration of this machine. Please fix
the following errors and try again:

vm:
* The 'hostmanager' provisioner could not be found.

Thanks
Jim





On Nov 4, 2014, at 6:36 PM, jay vyas jayunit100.apa...@gmail.com wrote:

 Hi daemon:  Actually, for most folks who would want to actually use a hadoop 
 cluster,  i would think setting up bigtop is super easy ! If you have issues 
 with it ping me and I can help you get started.
 Also, we have docker containers - so you dont even *need* a VM to run a 4 or 
 5 node hadoop cluster.
 
 install vagrant
 install VirtualBox
 git clone https://github.com/apache/bigtop
 cd bigtop/bigtop-deploy/vm/vagrant-puppet
 vagrant up
 Then vagrant destroy when your done.
 
 This to me is easier than manually downloading an appliance, picking memory
 starting the virtualbox gui, loading the appliance , etc...  and also its 
 easy to turn the simple single node bigtop VM into a multinode one, 
 by just modifying the vagrantile. 
 
 
 On Tue, Nov 4, 2014 at 5:32 PM, daemeon reiydelle daeme...@gmail.com wrote:
 What you want as a sandbox depends on what you are trying to learn. 
 
 If you are trying to learn to code in e.g PigLatin, Sqooz, or similar, all of 
 the suggestions (perhaps excluding BigTop due to its setup complexities) are 
 great. Laptop? perhaps but laptop's are really kind of infuriatingly slow 
 (because of the hardware - you pay a price for a 30-45watt average heating 
 bill). A laptop is an OK place to start if it is e.g. an i5 or i7 with lots 
 of memory. What do you think of the thought that you will pretty quickly 
 graduate to wanting a small'ish desktop for your sandbox?
 
 A simple, single node, Hadoop instance will let you learn many things. The 
 next level of complexity comes when you are attempting to deal with data 
 whose processing needs to be split up, so you can learn about how to split 
 data in Mapping, reduce the splits via reduce jobs, etc. For that, you could 
 get a windows desktop box or e.g. RedHat/CentOS and use virtualization. 
 Something like a 4 core i5 with 32gb of memory, running 3 or for some things 
 4, vm's. You could load e.g. hortonworks into each of the vm's and practice 
 setting up a 3/4 way cluster. Throw in 2-3 1tb drives off of eBay and you can 
 have a lot of learning. 
 
 
 
 
 
 ...
 “The race is not to the swift,
 nor the battle to the strong,
 but to those who can see it coming and jump aside.” - Hunter Thompson
 Daemeon
 
 On Tue, Nov 4, 2014 at 1:24 PM, oscar sumano osum...@gmail.com wrote:
 you can try the pivotal vm as well. 
 
 http://pivotalhd.docs.pivotal.io/tutorial/getting-started/pivotalhd-vm.html
 
 On Tue, Nov 4, 2014 at 3:13 PM, Leonid Fedotov lfedo...@hortonworks.com 
 wrote:
 Tim,
 download Sandbox from http://hortonworks/com
 You will have everything needed in a small VM instance which will run on your 
 home desktop.
 
 
 Thank you!
 
 
 
 Sincerely,
 
 Leonid Fedotov
 
 Systems Architect - Professional Services
 
 lfedo...@hortonworks.com
 
 office: +1 855 846 7866 ext 292
 
 mobile: +1 650 430 1673
 
 
 On Tue, Nov 4, 2014 at 11:28 AM, Tim Dunphy bluethu...@gmail.com wrote:
 Hey all,
 
  I want to setup an environment where I can teach myself hadoop. Usually the 
 way I'll handle this is to grab a machine off the Amazon free tier and setup 
 whatever software I want. 
 
 However I realize that Hadoop is a memory intensive, big data solution. So 
 what I'm wondering is, would a t2.micro instance be sufficient for setting up 
 a cluster of hadoop nodes with the intention of learning it? To keep things 
 running longer in the free tier I would either setup however many nodes as I 
 want and keep them stopped when I'm not actively using them. Or just setup a 
 few nodes with a few different accounts (with a different gmail address for 
 each one.. easy enough to do).
 
 Failing that, what are some other free/cheap solutions for setting up a 
 hadoop learning environment?
 
 Thanks,
 Tim
 
 -- 
 GPG me!!
 
 gpg --keyserver pool.sks-keyservers.net --recv-keys F186197B
 
 
 
 CONFIDENTIALITY NOTICE
 NOTICE: This message is intended for the use of the individual or entity to 
 which it is addressed and may contain information that is confidential, 
 privileged and exempt from disclosure under applicable law. If the reader of 
 this message is not 

Re: Hadoop Learning Environment

2014-11-04 Thread jay vyas
Hi tim.  Id suggest using apache bigtop for this.

BigTop integrates the hadoop ecosystem into a single upstream distribution,
packages everything, curates smoke tests, vagrant, docker recipes for
deployment.
Also, we curate a blueprint hadoop application (bigpetstore) which you
build yourself, easily, and can run to generate, process, and visualize the
bigdata ecosystem.

You can also easily deploy bigtop onto ec2 if you want to pay for it .




On Tue, Nov 4, 2014 at 2:28 PM, Tim Dunphy bluethu...@gmail.com wrote:

 Hey all,

  I want to setup an environment where I can teach myself hadoop. Usually
 the way I'll handle this is to grab a machine off the Amazon free tier and
 setup whatever software I want.

 However I realize that Hadoop is a memory intensive, big data solution. So
 what I'm wondering is, would a t2.micro instance be sufficient for setting
 up a cluster of hadoop nodes with the intention of learning it? To keep
 things running longer in the free tier I would either setup however many
 nodes as I want and keep them stopped when I'm not actively using them. Or
 just setup a few nodes with a few different accounts (with a different
 gmail address for each one.. easy enough to do).

 Failing that, what are some other free/cheap solutions for setting up a
 hadoop learning environment?

 Thanks,
 Tim

 --
 GPG me!!

 gpg --keyserver pool.sks-keyservers.net --recv-keys F186197B




-- 
jay vyas


Re: Hadoop Learning Environment

2014-11-04 Thread Leonid Fedotov
Tim,
download Sandbox from http://hortonworks/com
You will have everything needed in a small VM instance which will run on
your home desktop.


*Thank you!*


*Sincerely,*

*Leonid Fedotov*

Systems Architect - Professional Services

lfedo...@hortonworks.com

office: +1 855 846 7866 ext 292

mobile: +1 650 430 1673

On Tue, Nov 4, 2014 at 11:28 AM, Tim Dunphy bluethu...@gmail.com wrote:

 Hey all,

  I want to setup an environment where I can teach myself hadoop. Usually
 the way I'll handle this is to grab a machine off the Amazon free tier and
 setup whatever software I want.

 However I realize that Hadoop is a memory intensive, big data solution. So
 what I'm wondering is, would a t2.micro instance be sufficient for setting
 up a cluster of hadoop nodes with the intention of learning it? To keep
 things running longer in the free tier I would either setup however many
 nodes as I want and keep them stopped when I'm not actively using them. Or
 just setup a few nodes with a few different accounts (with a different
 gmail address for each one.. easy enough to do).

 Failing that, what are some other free/cheap solutions for setting up a
 hadoop learning environment?

 Thanks,
 Tim

 --
 GPG me!!

 gpg --keyserver pool.sks-keyservers.net --recv-keys F186197B



-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


Re: Hadoop Learning Environment

2014-11-04 Thread Jim Colestock
Hello Tim, 

Horton and Cloudera both offer VM’s (Including Virtual box, which is free) you 
can pull down to play with, if you’re looking just for something small to get 
you started.  i’m partial to the horton works one myself. 

Hope that help. 

JC



 On Nov 4, 2014, at 2:28 PM, Tim Dunphy bluethu...@gmail.com wrote:
 
 Hey all,
 
  I want to setup an environment where I can teach myself hadoop. Usually the 
 way I'll handle this is to grab a machine off the Amazon free tier and setup 
 whatever software I want. 
 
 However I realize that Hadoop is a memory intensive, big data solution. So 
 what I'm wondering is, would a t2.micro instance be sufficient for setting up 
 a cluster of hadoop nodes with the intention of learning it? To keep things 
 running longer in the free tier I would either setup however many nodes as I 
 want and keep them stopped when I'm not actively using them. Or just setup a 
 few nodes with a few different accounts (with a different gmail address for 
 each one.. easy enough to do).
 
 Failing that, what are some other free/cheap solutions for setting up a 
 hadoop learning environment?
 
 Thanks,
 Tim
 
 -- 
 GPG me!!
 
 gpg --keyserver pool.sks-keyservers.net http://pool.sks-keyservers.net/ 
 --recv-keys F186197B
 



Re: Hadoop Learning Environment

2014-11-04 Thread Sandeep Khurana
Or on your local laptop or desktop you can setup the env using VM and VM
image of Hadoop and related components. Wrote instructions sometime back
here
https://www.linkedin.com/today/post/article/20140924133831-2560863-new-to-hadoop-and-want-to-setup-dev-environment
On Nov 5, 2014 2:25 AM, Jim Colestock j...@ramblingredneck.com wrote:

 Hello Tim,

 Horton and Cloudera both offer VM’s (Including Virtual box, which is free)
 you can pull down to play with, if you’re looking just for something small
 to get you started.  i’m partial to the horton works one myself.

 Hope that help.

 JC



 On Nov 4, 2014, at 2:28 PM, Tim Dunphy bluethu...@gmail.com wrote:

 Hey all,

  I want to setup an environment where I can teach myself hadoop. Usually
 the way I'll handle this is to grab a machine off the Amazon free tier and
 setup whatever software I want.

 However I realize that Hadoop is a memory intensive, big data solution. So
 what I'm wondering is, would a t2.micro instance be sufficient for setting
 up a cluster of hadoop nodes with the intention of learning it? To keep
 things running longer in the free tier I would either setup however many
 nodes as I want and keep them stopped when I'm not actively using them. Or
 just setup a few nodes with a few different accounts (with a different
 gmail address for each one.. easy enough to do).

 Failing that, what are some other free/cheap solutions for setting up a
 hadoop learning environment?

 Thanks,
 Tim

 --
 GPG me!!

 gpg --keyserver pool.sks-keyservers.net --recv-keys F186197B





Re: Hadoop Learning Environment

2014-11-04 Thread oscar sumano
you can try the pivotal vm as well.

http://pivotalhd.docs.pivotal.io/tutorial/getting-started/pivotalhd-vm.html

On Tue, Nov 4, 2014 at 3:13 PM, Leonid Fedotov lfedo...@hortonworks.com
wrote:

 Tim,
 download Sandbox from http://hortonworks/com
 You will have everything needed in a small VM instance which will run on
 your home desktop.


 *Thank you!*


 *Sincerely,*

 *Leonid Fedotov*

 Systems Architect - Professional Services

 lfedo...@hortonworks.com

 office: +1 855 846 7866 ext 292

 mobile: +1 650 430 1673

 On Tue, Nov 4, 2014 at 11:28 AM, Tim Dunphy bluethu...@gmail.com wrote:

 Hey all,

  I want to setup an environment where I can teach myself hadoop. Usually
 the way I'll handle this is to grab a machine off the Amazon free tier and
 setup whatever software I want.

 However I realize that Hadoop is a memory intensive, big data solution.
 So what I'm wondering is, would a t2.micro instance be sufficient for
 setting up a cluster of hadoop nodes with the intention of learning it? To
 keep things running longer in the free tier I would either setup however
 many nodes as I want and keep them stopped when I'm not actively using
 them. Or just setup a few nodes with a few different accounts (with a
 different gmail address for each one.. easy enough to do).

 Failing that, what are some other free/cheap solutions for setting up a
 hadoop learning environment?

 Thanks,
 Tim

 --
 GPG me!!

 gpg --keyserver pool.sks-keyservers.net --recv-keys F186197B



 CONFIDENTIALITY NOTICE
 NOTICE: This message is intended for the use of the individual or entity
 to which it is addressed and may contain information that is confidential,
 privileged and exempt from disclosure under applicable law. If the reader
 of this message is not the intended recipient, you are hereby notified that
 any printing, copying, dissemination, distribution, disclosure or
 forwarding of this communication is strictly prohibited. If you have
 received this communication in error, please contact the sender immediately
 and delete it from your system. Thank You.


Re: Hadoop Learning Environment

2014-11-04 Thread daemeon reiydelle
What you want as a sandbox depends on what you are trying to learn.

If you are trying to learn to code in e.g PigLatin, Sqooz, or similar, all
of the suggestions (perhaps excluding BigTop due to its setup complexities)
are great. Laptop? perhaps but laptop's are really kind of infuriatingly
slow (because of the hardware - you pay a price for a 30-45watt average
heating bill). A laptop is an OK place to start if it is e.g. an i5 or i7
with lots of memory. What do you think of the thought that you will pretty
quickly graduate to wanting a small'ish desktop for your sandbox?

A simple, single node, Hadoop instance will let you learn many things. The
next level of complexity comes when you are attempting to deal with data
whose processing needs to be split up, so you can learn about how to split
data in Mapping, reduce the splits via reduce jobs, etc. For that, you
could get a windows desktop box or e.g. RedHat/CentOS and use
virtualization. Something like a 4 core i5 with 32gb of memory, running 3
or for some things 4, vm's. You could load e.g. hortonworks into each of
the vm's and practice setting up a 3/4 way cluster. Throw in 2-3 1tb drives
off of eBay and you can have a lot of learning.











*...“The race is not to the swift,nor the battle to the strong,but to
those who can see it coming and jump aside.” - Hunter ThompsonDaemeon*
On Tue, Nov 4, 2014 at 1:24 PM, oscar sumano osum...@gmail.com wrote:

 you can try the pivotal vm as well.

 http://pivotalhd.docs.pivotal.io/tutorial/getting-started/pivotalhd-vm.html

 On Tue, Nov 4, 2014 at 3:13 PM, Leonid Fedotov lfedo...@hortonworks.com
 wrote:

 Tim,
 download Sandbox from http://hortonworks/com
 You will have everything needed in a small VM instance which will run on
 your home desktop.


 *Thank you!*


 *Sincerely,*

 *Leonid Fedotov*

 Systems Architect - Professional Services

 lfedo...@hortonworks.com

 office: +1 855 846 7866 ext 292

 mobile: +1 650 430 1673

 On Tue, Nov 4, 2014 at 11:28 AM, Tim Dunphy bluethu...@gmail.com wrote:

 Hey all,

  I want to setup an environment where I can teach myself hadoop. Usually
 the way I'll handle this is to grab a machine off the Amazon free tier and
 setup whatever software I want.

 However I realize that Hadoop is a memory intensive, big data solution.
 So what I'm wondering is, would a t2.micro instance be sufficient for
 setting up a cluster of hadoop nodes with the intention of learning it? To
 keep things running longer in the free tier I would either setup however
 many nodes as I want and keep them stopped when I'm not actively using
 them. Or just setup a few nodes with a few different accounts (with a
 different gmail address for each one.. easy enough to do).

 Failing that, what are some other free/cheap solutions for setting up a
 hadoop learning environment?

 Thanks,
 Tim

 --
 GPG me!!

 gpg --keyserver pool.sks-keyservers.net --recv-keys F186197B



 CONFIDENTIALITY NOTICE
 NOTICE: This message is intended for the use of the individual or entity
 to which it is addressed and may contain information that is confidential,
 privileged and exempt from disclosure under applicable law. If the reader
 of this message is not the intended recipient, you are hereby notified that
 any printing, copying, dissemination, distribution, disclosure or
 forwarding of this communication is strictly prohibited. If you have
 received this communication in error, please contact the sender immediately
 and delete it from your system. Thank You.





Re: Hadoop Learning Environment

2014-11-04 Thread jay vyas
Hi daemon:  Actually, for most folks who would want to actually use a
hadoop cluster,  i would think setting up bigtop is super easy ! If you
have issues with it ping me and I can help you get started.
Also, we have docker containers - so you dont even *need* a VM to run a 4
or 5 node hadoop cluster.

install vagrant
install VirtualBox
git clone https://github.com/apache/bigtop
cd bigtop/bigtop-deploy/vm/vagrant-puppet
vagrant up
Then vagrant destroy when your done.

This to me is easier than manually downloading an appliance, picking memory
starting the virtualbox gui, loading the appliance , etc...  and also its
easy to turn the simple single node bigtop VM into a multinode one,
by just modifying the vagrantile.


On Tue, Nov 4, 2014 at 5:32 PM, daemeon reiydelle daeme...@gmail.com
wrote:

 What you want as a sandbox depends on what you are trying to learn.

 If you are trying to learn to code in e.g PigLatin, Sqooz, or similar, all
 of the suggestions (perhaps excluding BigTop due to its setup complexities)
 are great. Laptop? perhaps but laptop's are really kind of infuriatingly
 slow (because of the hardware - you pay a price for a 30-45watt average
 heating bill). A laptop is an OK place to start if it is e.g. an i5 or i7
 with lots of memory. What do you think of the thought that you will pretty
 quickly graduate to wanting a small'ish desktop for your sandbox?

 A simple, single node, Hadoop instance will let you learn many things. The
 next level of complexity comes when you are attempting to deal with data
 whose processing needs to be split up, so you can learn about how to split
 data in Mapping, reduce the splits via reduce jobs, etc. For that, you
 could get a windows desktop box or e.g. RedHat/CentOS and use
 virtualization. Something like a 4 core i5 with 32gb of memory, running 3
 or for some things 4, vm's. You could load e.g. hortonworks into each of
 the vm's and practice setting up a 3/4 way cluster. Throw in 2-3 1tb drives
 off of eBay and you can have a lot of learning.











 *...“The race is not to the swift,nor the battle to the strong,but to
 those who can see it coming and jump aside.” - Hunter ThompsonDaemeon*
 On Tue, Nov 4, 2014 at 1:24 PM, oscar sumano osum...@gmail.com wrote:

 you can try the pivotal vm as well.


 http://pivotalhd.docs.pivotal.io/tutorial/getting-started/pivotalhd-vm.html

 On Tue, Nov 4, 2014 at 3:13 PM, Leonid Fedotov lfedo...@hortonworks.com
 wrote:

 Tim,
 download Sandbox from http://hortonworks/com
 You will have everything needed in a small VM instance which will run on
 your home desktop.


 *Thank you!*


 *Sincerely,*

 *Leonid Fedotov*

 Systems Architect - Professional Services

 lfedo...@hortonworks.com

 office: +1 855 846 7866 ext 292

 mobile: +1 650 430 1673

 On Tue, Nov 4, 2014 at 11:28 AM, Tim Dunphy bluethu...@gmail.com
 wrote:

 Hey all,

  I want to setup an environment where I can teach myself hadoop.
 Usually the way I'll handle this is to grab a machine off the Amazon free
 tier and setup whatever software I want.

 However I realize that Hadoop is a memory intensive, big data solution.
 So what I'm wondering is, would a t2.micro instance be sufficient for
 setting up a cluster of hadoop nodes with the intention of learning it? To
 keep things running longer in the free tier I would either setup however
 many nodes as I want and keep them stopped when I'm not actively using
 them. Or just setup a few nodes with a few different accounts (with a
 different gmail address for each one.. easy enough to do).

 Failing that, what are some other free/cheap solutions for setting up a
 hadoop learning environment?

 Thanks,
 Tim

 --
 GPG me!!

 gpg --keyserver pool.sks-keyservers.net --recv-keys F186197B



 CONFIDENTIALITY NOTICE
 NOTICE: This message is intended for the use of the individual or entity
 to which it is addressed and may contain information that is confidential,
 privileged and exempt from disclosure under applicable law. If the reader
 of this message is not the intended recipient, you are hereby notified that
 any printing, copying, dissemination, distribution, disclosure or
 forwarding of this communication is strictly prohibited. If you have
 received this communication in error, please contact the sender immediately
 and delete it from your system. Thank You.






-- 
jay vyas


Re: Hadoop Learning Environment

2014-11-04 Thread Gavin Yue
Try docker!

http://ferry.opencore.io/en/latest/examples/hadoop.html



On Tue, Nov 4, 2014 at 6:36 PM, jay vyas jayunit100.apa...@gmail.com
wrote:

 Hi daemon:  Actually, for most folks who would want to actually use a
 hadoop cluster,  i would think setting up bigtop is super easy ! If you
 have issues with it ping me and I can help you get started.
 Also, we have docker containers - so you dont even *need* a VM to run a 4
 or 5 node hadoop cluster.

 install vagrant
 install VirtualBox
 git clone https://github.com/apache/bigtop
 cd bigtop/bigtop-deploy/vm/vagrant-puppet
 vagrant up
 Then vagrant destroy when your done.

 This to me is easier than manually downloading an appliance, picking memory
 starting the virtualbox gui, loading the appliance , etc...  and also its
 easy to turn the simple single node bigtop VM into a multinode one,
 by just modifying the vagrantile.


 On Tue, Nov 4, 2014 at 5:32 PM, daemeon reiydelle daeme...@gmail.com
 wrote:

 What you want as a sandbox depends on what you are trying to learn.

 If you are trying to learn to code in e.g PigLatin, Sqooz, or similar,
 all of the suggestions (perhaps excluding BigTop due to its setup
 complexities) are great. Laptop? perhaps but laptop's are really kind of
 infuriatingly slow (because of the hardware - you pay a price for a
 30-45watt average heating bill). A laptop is an OK place to start if it is
 e.g. an i5 or i7 with lots of memory. What do you think of the thought that
 you will pretty quickly graduate to wanting a small'ish desktop for your
 sandbox?

 A simple, single node, Hadoop instance will let you learn many things.
 The next level of complexity comes when you are attempting to deal with
 data whose processing needs to be split up, so you can learn about how to
 split data in Mapping, reduce the splits via reduce jobs, etc. For that,
 you could get a windows desktop box or e.g. RedHat/CentOS and use
 virtualization. Something like a 4 core i5 with 32gb of memory, running 3
 or for some things 4, vm's. You could load e.g. hortonworks into each of
 the vm's and practice setting up a 3/4 way cluster. Throw in 2-3 1tb drives
 off of eBay and you can have a lot of learning.











 *...“The race is not to the swift,nor the battle to the strong,but to
 those who can see it coming and jump aside.” - Hunter ThompsonDaemeon*
 On Tue, Nov 4, 2014 at 1:24 PM, oscar sumano osum...@gmail.com wrote:

 you can try the pivotal vm as well.


 http://pivotalhd.docs.pivotal.io/tutorial/getting-started/pivotalhd-vm.html

 On Tue, Nov 4, 2014 at 3:13 PM, Leonid Fedotov lfedo...@hortonworks.com
  wrote:

 Tim,
 download Sandbox from http://hortonworks/com
 You will have everything needed in a small VM instance which will run
 on your home desktop.


 *Thank you!*


 *Sincerely,*

 *Leonid Fedotov*

 Systems Architect - Professional Services

 lfedo...@hortonworks.com

 office: +1 855 846 7866 ext 292

 mobile: +1 650 430 1673

 On Tue, Nov 4, 2014 at 11:28 AM, Tim Dunphy bluethu...@gmail.com
 wrote:

 Hey all,

  I want to setup an environment where I can teach myself hadoop.
 Usually the way I'll handle this is to grab a machine off the Amazon free
 tier and setup whatever software I want.

 However I realize that Hadoop is a memory intensive, big data
 solution. So what I'm wondering is, would a t2.micro instance be 
 sufficient
 for setting up a cluster of hadoop nodes with the intention of learning 
 it?
 To keep things running longer in the free tier I would either setup 
 however
 many nodes as I want and keep them stopped when I'm not actively using
 them. Or just setup a few nodes with a few different accounts (with a
 different gmail address for each one.. easy enough to do).

 Failing that, what are some other free/cheap solutions for setting up
 a hadoop learning environment?

 Thanks,
 Tim

 --
 GPG me!!

 gpg --keyserver pool.sks-keyservers.net --recv-keys F186197B



 CONFIDENTIALITY NOTICE
 NOTICE: This message is intended for the use of the individual or
 entity to which it is addressed and may contain information that is
 confidential, privileged and exempt from disclosure under applicable law.
 If the reader of this message is not the intended recipient, you are hereby
 notified that any printing, copying, dissemination, distribution,
 disclosure or forwarding of this communication is strictly prohibited. If
 you have received this communication in error, please contact the sender
 immediately and delete it from your system. Thank You.






 --
 jay vyas