Re: Hadoop Learning Environment

2014-11-04 Thread Jim Colestock
Hello Tim, 

Horton and Cloudera both offer VM’s (Including Virtual box, which is free) you 
can pull down to play with, if you’re looking just for something small to get 
you started.  i’m partial to the horton works one myself. 

Hope that help. 

JC



> On Nov 4, 2014, at 2:28 PM, Tim Dunphy  wrote:
> 
> Hey all,
> 
>  I want to setup an environment where I can teach myself hadoop. Usually the 
> way I'll handle this is to grab a machine off the Amazon free tier and setup 
> whatever software I want. 
> 
> However I realize that Hadoop is a memory intensive, big data solution. So 
> what I'm wondering is, would a t2.micro instance be sufficient for setting up 
> a cluster of hadoop nodes with the intention of learning it? To keep things 
> running longer in the free tier I would either setup however many nodes as I 
> want and keep them stopped when I'm not actively using them. Or just setup a 
> few nodes with a few different accounts (with a different gmail address for 
> each one.. easy enough to do).
> 
> Failing that, what are some other free/cheap solutions for setting up a 
> hadoop learning environment?
> 
> Thanks,
> Tim
> 
> -- 
> GPG me!!
> 
> gpg --keyserver pool.sks-keyservers.net  
> --recv-keys F186197B
> 



reduce job hung in pending state: "No room for reduce task"

2013-08-30 Thread Jim Colestock
Hello All, 

We're running into the following 2 bugs again: 
https://issues.apache.org/jira/browse/HADOOP-5241
https://issues.apache.org/jira/browse/MAPREDUCE-2324

Both of them a listed as closed fixed.  (I was actually the one that got 
cloudera to submit MAPREDUCE-2324)  Does anyone know is anyone else seeing 
these in later releases?   We're running the following on various version of 
Cent OS with Java 1.6:

hadoop-2.0.0+1357-1.cdh4.3.0.p0.21.el5

hadoop-0.20-mapreduce-jobtracker-2.0.0+1357-1.cdh4.3.0.p0.21.el5
hadoop-0.20-mapreduce-2.0.0+1357-1.cdh4.3.0.p0.21.el5
hadoop-0.20-mapreduce-tasktracker-2.0.0+1357-1.cdh4.3.0.p0.21.el5

hadoop-hdfs-namenode-2.0.0+1357-1.cdh4.3.0.p0.21.el5
hadoop-hdfs-secondarynamenode-2.0.0+1357-1.cdh4.3.0.p0.21.el5
hadoop-hdfs-2.0.0+1357-1.cdh4.3.0.p0.21.el5
hadoop-hdfs-datanode-2.0.0+1357-1.cdh4.3.0.p0.21.el5

Just for a quick summary, basically a reduce job get hung in pending while 
trying to find room on a task tracker, it keeps trying over and over and never 
fails.  So you end up with a whole bunch of these in the logs: 

2013-08-27 00:48:01,412 WARN org.apache.hadoop.mapred.JobInProgress: No room 
for reduce task. Node tracker_104.sm.tld:127.0.0.1/127.0.0.1:43723 has 
250176954368 bytes free; but we expect reduce input to take 283580756533

Thanks in advance for any help on the issue.. 

JC