Re: Sometimes no map tasks are run - X are complete and N-X are pending, none running

2009-04-16 Thread Sharad Agarwal


> The last map task is forrever in the pending queue - is this is issue my
> setup/config or do others have the problem?
Do you mean the left over maps are not at all scheduled ? What do you see in 
jobtracker logs ?


Re: Map-Reduce Slow Down

2009-04-16 Thread Mithila Nagendra
Thanks Jason! This helps a lot. I m planning to talk to my network admin
tomorrow. I hoping he ll be able to fix this problem.
Mithila

On Fri, Apr 17, 2009 at 9:00 AM, jason hadoop wrote:

> Assuming you are on a linux box, on both machines
> verify that the servers are listening on the ports you expect via
> netstat -a -n -t -p
> -a show sockets accepting connections
> -n do not translate ip addresses to host names
> -t only list tcp sockets
> -p list the pid/process name
>
> on the machine 192.168.0.18
> you should have sockets bound to 0.0.0.0:54310 with a process of java, and
> the pid should be the pid of your namenode process.
>
> On the remote machine you should be able to *telnet 192.168.0.18 54310* and
> have it connect
> *Connected to 192.168.0.18.
> Escape character is '^]'.
> *
>
> If the netstat shows the socket accepting and the telnet does not connect,
> then something is blocking the TCP packets between the machines. one or
> both
> machines has a firewall, an intervening router has a firewall, or there is
> some routing problem
> the command /sbin/iptables -L will normally list the firewall rules, if any
> for a linux machine.
>
>
> You should be able to use telnet to verify that you can connect from the
> remote machine.
>
> On Thu, Apr 16, 2009 at 9:18 PM, Mithila Nagendra 
> wrote:
>
> > Thanks! I ll see what I can find out.
> >
> > On Fri, Apr 17, 2009 at 4:55 AM, jason hadoop  > >wrote:
> >
> > > The firewall was run at system startup, I think there was a
> > > /etc/sysconfig/iptables file present which triggered the firewall.
> > > I don't currently have access to any centos 5 machines so I can't
> easily
> > > check.
> > >
> > >
> > >
> > > On Thu, Apr 16, 2009 at 6:54 PM, jason hadoop  > > >wrote:
> > >
> > > > The kickstart script was something that the operations staff was
> using
> > to
> > > > initialize new machines, I never actually saw the script, just
> figured
> > > out
> > > > that there was a firewall in place.
> > > >
> > > >
> > > >
> > > > On Thu, Apr 16, 2009 at 1:28 PM, Mithila Nagendra  > > >wrote:
> > > >
> > > >> Jason: the kickstart script - was it something you wrote or is it
> run
> > > when
> > > >> the system turns on?
> > > >> Mithila
> > > >>
> > > >> On Thu, Apr 16, 2009 at 1:06 AM, Mithila Nagendra  >
> > > >> wrote:
> > > >>
> > > >> > Thanks Jason! Will check that out.
> > > >> > Mithila
> > > >> >
> > > >> >
> > > >> > On Thu, Apr 16, 2009 at 5:23 AM, jason hadoop <
> > jason.had...@gmail.com
> > > >> >wrote:
> > > >> >
> > > >> >> Double check that there is no firewall in place.
> > > >> >> At one point a bunch of new machines were kickstarted and placed
> in
> > a
> > > >> >> cluster and they all failed with something similar.
> > > >> >> It turned out the kickstart script turned enabled the firewall
> with
> > a
> > > >> rule
> > > >> >> that blocked ports in the 50k range.
> > > >> >> It took us a while to even think to check that was not a part of
> > our
> > > >> >> normal
> > > >> >> machine configuration
> > > >> >>
> > > >> >> On Wed, Apr 15, 2009 at 11:04 AM, Mithila Nagendra <
> > mnage...@asu.edu
> > > >
> > > >> >> wrote:
> > > >> >>
> > > >> >> > Hi Aaron
> > > >> >> > I will look into that thanks!
> > > >> >> >
> > > >> >> > I spoke to the admin who overlooks the cluster. He said that
> the
> > > >> gateway
> > > >> >> > comes in to the picture only when one of the nodes communicates
> > > with
> > > >> a
> > > >> >> node
> > > >> >> > outside of the cluster. But in my case the communication is
> > carried
> > > >> out
> > > >> >> > between the nodes which all belong to the same cluster.
> > > >> >> >
> > > >> >> > Mithila
> > > >> >> >
> > > >> >> > On Wed, Apr 15, 2009 at 8:59 PM, Aaron Kimball <
> > aa...@cloudera.com
> > > >
> > > >> >> wrote:
> > > >> >> >
> > > >> >> > > Hi,
> > > >> >> > >
> > > >> >> > > I wrote a blog post a while back about connecting nodes via a
> > > >> gateway.
> > > >> >> > See
> > > >> >> > >
> > > >> >> >
> > > >> >>
> > > >>
> > >
> >
> http://www.cloudera.com/blog/2008/12/03/securing-a-hadoop-cluster-through-a-gateway/
> > > >> >> > >
> > > >> >> > > This assumes that the client is outside the gateway and all
> > > >> >> > > datanodes/namenode are inside, but the same principles apply.
> > > >> You'll
> > > >> >> just
> > > >> >> > > need to set up ssh tunnels from every datanode to the
> namenode.
> > > >> >> > >
> > > >> >> > > - Aaron
> > > >> >> > >
> > > >> >> > >
> > > >> >> > > On Wed, Apr 15, 2009 at 10:19 AM, Ravi Phulari <
> > > >> >> rphul...@yahoo-inc.com
> > > >> >> > >wrote:
> > > >> >> > >
> > > >> >> > >> Looks like your NameNode is down .
> > > >> >> > >> Verify if hadoop process are running (   jps should show you
> > all
> > > >> java
> > > >> >> > >> running process).
> > > >> >> > >> If your hadoop process are running try restarting your
> hadoop
> > > >> process
> > > >> >> .
> > > >> >> > >> I guess this problem is due to your fsimage not being
> correct
> > .
> > > >> >> > >> Y

Re: Question about the classpath setting for bin/hadoop jar

2009-04-16 Thread Sharad Agarwal

> I noticed that the "bin/hadoop jar" command doesn't add the jar being 
> executed to the classpath. Is this deliberate and what is the reasoning? The 
> result is that resources in the jar are not accessible from the system class 
> loader. Rather they are only available from the thread context class loader 
> and the class loader of the main class.
In map and reduce tasks' jvm,  job libraries are added to the system 
classloader. However for others only framework code is present in system 
classloader. If you are seeing this as a problem in your client side code, you 
can use Configuration#getClassByName(String name) instead of Class.forName() 
for loading your job related classes.


Problem with using differnt username

2009-04-16 Thread Puri, Aseem
Hi

I am running Hadoop Cluster on windows. I have 4 datnodes. 3
data node have same username so they always start. But one datanode have
different username. When I run command $bin/start-all.sh master tries to
find $bin/Hadoop-demon.sh giving master username instead of username in
which file is there. Please where should I make change so master find
file on the different user name.

 

Thanks & Regards

Aseem Puri

Project Trainee

Honeywell Technology Solutions Lab

Bangalore

 



If I make two map reduce, can I don't save the medial output?

2009-04-16 Thread 王红宝
as the tittle.


Thank You!
imcaptor


Re: Map-Reduce Slow Down

2009-04-16 Thread jason hadoop
Assuming you are on a linux box, on both machines
verify that the servers are listening on the ports you expect via
netstat -a -n -t -p
-a show sockets accepting connections
-n do not translate ip addresses to host names
-t only list tcp sockets
-p list the pid/process name

on the machine 192.168.0.18
you should have sockets bound to 0.0.0.0:54310 with a process of java, and
the pid should be the pid of your namenode process.

On the remote machine you should be able to *telnet 192.168.0.18 54310* and
have it connect
*Connected to 192.168.0.18.
Escape character is '^]'.
*

If the netstat shows the socket accepting and the telnet does not connect,
then something is blocking the TCP packets between the machines. one or both
machines has a firewall, an intervening router has a firewall, or there is
some routing problem
the command /sbin/iptables -L will normally list the firewall rules, if any
for a linux machine.


You should be able to use telnet to verify that you can connect from the
remote machine.

On Thu, Apr 16, 2009 at 9:18 PM, Mithila Nagendra  wrote:

> Thanks! I ll see what I can find out.
>
> On Fri, Apr 17, 2009 at 4:55 AM, jason hadoop  >wrote:
>
> > The firewall was run at system startup, I think there was a
> > /etc/sysconfig/iptables file present which triggered the firewall.
> > I don't currently have access to any centos 5 machines so I can't easily
> > check.
> >
> >
> >
> > On Thu, Apr 16, 2009 at 6:54 PM, jason hadoop  > >wrote:
> >
> > > The kickstart script was something that the operations staff was using
> to
> > > initialize new machines, I never actually saw the script, just figured
> > out
> > > that there was a firewall in place.
> > >
> > >
> > >
> > > On Thu, Apr 16, 2009 at 1:28 PM, Mithila Nagendra  > >wrote:
> > >
> > >> Jason: the kickstart script - was it something you wrote or is it run
> > when
> > >> the system turns on?
> > >> Mithila
> > >>
> > >> On Thu, Apr 16, 2009 at 1:06 AM, Mithila Nagendra 
> > >> wrote:
> > >>
> > >> > Thanks Jason! Will check that out.
> > >> > Mithila
> > >> >
> > >> >
> > >> > On Thu, Apr 16, 2009 at 5:23 AM, jason hadoop <
> jason.had...@gmail.com
> > >> >wrote:
> > >> >
> > >> >> Double check that there is no firewall in place.
> > >> >> At one point a bunch of new machines were kickstarted and placed in
> a
> > >> >> cluster and they all failed with something similar.
> > >> >> It turned out the kickstart script turned enabled the firewall with
> a
> > >> rule
> > >> >> that blocked ports in the 50k range.
> > >> >> It took us a while to even think to check that was not a part of
> our
> > >> >> normal
> > >> >> machine configuration
> > >> >>
> > >> >> On Wed, Apr 15, 2009 at 11:04 AM, Mithila Nagendra <
> mnage...@asu.edu
> > >
> > >> >> wrote:
> > >> >>
> > >> >> > Hi Aaron
> > >> >> > I will look into that thanks!
> > >> >> >
> > >> >> > I spoke to the admin who overlooks the cluster. He said that the
> > >> gateway
> > >> >> > comes in to the picture only when one of the nodes communicates
> > with
> > >> a
> > >> >> node
> > >> >> > outside of the cluster. But in my case the communication is
> carried
> > >> out
> > >> >> > between the nodes which all belong to the same cluster.
> > >> >> >
> > >> >> > Mithila
> > >> >> >
> > >> >> > On Wed, Apr 15, 2009 at 8:59 PM, Aaron Kimball <
> aa...@cloudera.com
> > >
> > >> >> wrote:
> > >> >> >
> > >> >> > > Hi,
> > >> >> > >
> > >> >> > > I wrote a blog post a while back about connecting nodes via a
> > >> gateway.
> > >> >> > See
> > >> >> > >
> > >> >> >
> > >> >>
> > >>
> >
> http://www.cloudera.com/blog/2008/12/03/securing-a-hadoop-cluster-through-a-gateway/
> > >> >> > >
> > >> >> > > This assumes that the client is outside the gateway and all
> > >> >> > > datanodes/namenode are inside, but the same principles apply.
> > >> You'll
> > >> >> just
> > >> >> > > need to set up ssh tunnels from every datanode to the namenode.
> > >> >> > >
> > >> >> > > - Aaron
> > >> >> > >
> > >> >> > >
> > >> >> > > On Wed, Apr 15, 2009 at 10:19 AM, Ravi Phulari <
> > >> >> rphul...@yahoo-inc.com
> > >> >> > >wrote:
> > >> >> > >
> > >> >> > >> Looks like your NameNode is down .
> > >> >> > >> Verify if hadoop process are running (   jps should show you
> all
> > >> java
> > >> >> > >> running process).
> > >> >> > >> If your hadoop process are running try restarting your hadoop
> > >> process
> > >> >> .
> > >> >> > >> I guess this problem is due to your fsimage not being correct
> .
> > >> >> > >> You might have to format your namenode.
> > >> >> > >> Hope this helps.
> > >> >> > >>
> > >> >> > >> Thanks,
> > >> >> > >> --
> > >> >> > >> Ravi
> > >> >> > >>
> > >> >> > >>
> > >> >> > >> On 4/15/09 10:15 AM, "Mithila Nagendra" 
> > wrote:
> > >> >> > >>
> > >> >> > >> The log file runs into thousands of line with the same message
> > >> being
> > >> >> > >> displayed every time.
> > >> >> > >>
> > >> >> > >> On Wed, Apr 15, 2009 at 8:10 PM, Mithila Nagendra <
> > >> mnage...@asu.edu>
> > >> >> > >

Re: More Replication on dfs

2009-04-16 Thread Aaron Kimball
That setting will instruct future file writes to replicate two-fold. This
has no bearing on existing files; replication can be set on a per-file
basis, so they already have their replications set in the DFS indivdually.

Use the command: bin/hadoop fs -setrep [-R] repl_factor filename...

to change the replication factor for files already in HDFS
- Aaron

On Wed, Apr 15, 2009 at 10:04 PM, Puri, Aseem wrote:

> Hi
>My problem is not that my data is under replicated. I have 3
> data nodes. In my hadoop-site.xml I also set the configuration as:
>
>  
>  dfs.replication
>  2
>  
>
> But after this also data is replicated on 3 nodes instead of two nodes.
>
> Now, please tell what can be the problem?
>
> Thanks & Regards
> Aseem Puri
>
> -Original Message-
> From: Raghu Angadi [mailto:rang...@yahoo-inc.com]
> Sent: Wednesday, April 15, 2009 2:58 AM
> To: core-user@hadoop.apache.org
> Subject: Re: More Replication on dfs
>
> Aseem,
>
> Regd over-replication, it is mostly app related issue as Alex mentioned.
>
> But if you are concerned about under-replicated blocks in fsck output :
>
> These blocks should not stay under-replicated if you have enough nodes
> and enough space on them (check NameNode webui).
>
> Try grep-ing for one of the blocks in NameNode log (and datnode logs as
> well, since you have just 3 nodes).
>
> Raghu.
>
> Puri, Aseem wrote:
> > Alex,
> >
> > Ouput of $ bin/hadoop fsck / command after running HBase data insert
> > command in a table is:
> >
> > .
> > .
> > .
> > .
> > .
> > /hbase/test/903188508/tags/info/4897652949308499876:  Under replicated
> > blk_-5193
> > 695109439554521_3133. Target Replicas is 3 but found 1 replica(s).
> > .
> > /hbase/test/903188508/tags/mapfiles/4897652949308499876/data:  Under
> > replicated
> > blk_-1213602857020415242_3132. Target Replicas is 3 but found 1
> > replica(s).
> > .
> > /hbase/test/903188508/tags/mapfiles/4897652949308499876/index:  Under
> > replicated
> >  blk_3934493034551838567_3132. Target Replicas is 3 but found 1
> > replica(s).
> > .
> > /user/HadoopAdmin/hbase table.doc:  Under replicated
> > blk_4339521803948458144_103
> > 1. Target Replicas is 3 but found 2 replica(s).
> > .
> > /user/HadoopAdmin/input/bin.doc:  Under replicated
> > blk_-3661765932004150973_1030
> > . Target Replicas is 3 but found 2 replica(s).
> > .
> > /user/HadoopAdmin/input/file01.txt:  Under replicated
> > blk_2744169131466786624_10
> > 01. Target Replicas is 3 but found 2 replica(s).
> > .
> > /user/HadoopAdmin/input/file02.txt:  Under replicated
> > blk_2021956984317789924_10
> > 02. Target Replicas is 3 but found 2 replica(s).
> > .
> > /user/HadoopAdmin/input/test.txt:  Under replicated
> > blk_-3062256167060082648_100
> > 4. Target Replicas is 3 but found 2 replica(s).
> > ...
> > /user/HadoopAdmin/output/part-0:  Under replicated
> > blk_8908973033976428484_1
> > 010. Target Replicas is 3 but found 2 replica(s).
> > Status: HEALTHY
> >  Total size:48510226 B
> >  Total dirs:492
> >  Total files:   439 (Files currently being written: 2)
> >  Total blocks (validated):  401 (avg. block size 120973 B) (Total
> > open file
> > blocks (not validated): 2)
> >  Minimally replicated blocks:   401 (100.0 %)
> >  Over-replicated blocks:0 (0.0 %)
> >  Under-replicated blocks:   399 (99.50124 %)
> >  Mis-replicated blocks: 0 (0.0 %)
> >  Default replication factor:2
> >  Average block replication: 1.3117207
> >  Corrupt blocks:0
> >  Missing replicas:  675 (128.327 %)
> >  Number of data-nodes:  2
> >  Number of racks:   1
> >
> >
> > The filesystem under path '/' is HEALTHY
> > Please tell what is wrong.
> >
> > Aseem
> >
> > -Original Message-
> > From: Alex Loddengaard [mailto:a...@cloudera.com]
> > Sent: Friday, April 10, 2009 11:04 PM
> > To: core-user@hadoop.apache.org
> > Subject: Re: More Replication on dfs
> >
> > Aseem,
> >
> > How are you verifying that blocks are not being replicated?  Have you
> > ran
> > fsck?  *bin/hadoop fsck /*
> >
> > I'd be surprised if replication really wasn't happening.  Can you run
> > fsck
> > and pay attention to "Under-replicated blocks" and "Mis-replicated
> > blocks?"
> > In fact, can you just copy-paste the output of fsck?
> >
> > Alex
> >
> > On Thu, Apr 9, 2009 at 11:23 PM, Puri, Aseem
> > wrote:
> >
> >> Hi
> >>I also tried the command $ bin/hadoop balancer. But still the
> >> same problem.
> >>
> >> Aseem
> >>
> >> -Original Message-
> >> From: Puri, Aseem [mailto:aseem.p...@honeywell.com]
> >> Sent: Friday, April 10, 2009 11:18 AM
> >> To: core-user@hadoop.apache.org
> >> Subject: RE: More Replication on dfs
> >>
> >> Hi Alex,
> >>
> >>Thanks for sharing your knowledge. Till now I have three
> >> machines and I have to check the behavior of Hadoop so I want
> >> replication factor should be 2. I started my Hadoop server with
> >> replication factor 3. After that 

Re: Error reading task output

2009-04-16 Thread Aaron Kimball
Cam,

This isn't Hadoop-specific, it's how Linux treats its network configuration.
If you look at /etc/host.conf, you'll probably see a line that says "order
hosts, bind" -- this is telling Linux's DNS resolution library to first read
your /etc/hosts file, then check an external DNS server.

You could probably disable local hostfile checking, but that means that
every time a program on your system queries the authoritative hostname for
"localhost", it'll go out to the network. You'll probably see a big
performance hit. The better solution, I think, is to get your nodes'
/etc/hosts files squared away. You only need to do so once :)


-- Aaron


On Thu, Apr 16, 2009 at 11:31 AM, Cam Macdonell  wrote:

> Cam Macdonell wrote:
>
>>
>> Hi,
>>
>> I'm getting the following warning when running the simple wordcount and
>> grep examples.
>>
>> 09/04/15 16:54:16 INFO mapred.JobClient: Task Id :
>> attempt_200904151649_0001_m_19_0, Status : FAILED
>> Too many fetch-failures
>> 09/04/15 16:54:16 WARN mapred.JobClient: Error reading task
>> outputhttp://localhost.localdomain:50060/tasklog?plaintext=true&taskid=attempt_200904151649_0001_m_19_0&filter=stdout
>>
>> 09/04/15 16:54:16 WARN mapred.JobClient: Error reading task
>> outputhttp://localhost.localdomain:50060/tasklog?plaintext=true&taskid=attempt_200904151649_0001_m_19_0&filter=stderr
>>
>>
>> The only advice I could find from other posts with similar errors is to
>> setup /etc/hosts with all slaves and the host IPs.  I did this, but I still
>> get the warning above.  The output seems to come out alright however (I
>> guess that's why it is a warning).
>>
>> I tried running a wget on the http:// address in the warning message and
>> I get the following back
>>
>> 2009-04-15 16:53:46 ERROR 400: Argument taskid is required.
>>
>> So perhaps the wrong task ID is being passed to the http request.  Any
>> ideas on what can get rid of these warnings?
>>
>> Thanks,
>> Cam
>>
>
> Well, for future googlers, I'll answer my own post.  Watch our for the
> hostname at the end of "localhost" lines on slaves.  One of my slaves was
> registering itself as "localhost.localdomain" with the jobtracker.
>
> Is there a way that Hadoop could be made to not be so dependent on
> /etc/hosts, but on more dynamic hostname resolution?
>
> Cam
>


Re: Map-Reduce Slow Down

2009-04-16 Thread Mithila Nagendra
Thanks! I ll see what I can find out.

On Fri, Apr 17, 2009 at 4:55 AM, jason hadoop wrote:

> The firewall was run at system startup, I think there was a
> /etc/sysconfig/iptables file present which triggered the firewall.
> I don't currently have access to any centos 5 machines so I can't easily
> check.
>
>
>
> On Thu, Apr 16, 2009 at 6:54 PM, jason hadoop  >wrote:
>
> > The kickstart script was something that the operations staff was using to
> > initialize new machines, I never actually saw the script, just figured
> out
> > that there was a firewall in place.
> >
> >
> >
> > On Thu, Apr 16, 2009 at 1:28 PM, Mithila Nagendra  >wrote:
> >
> >> Jason: the kickstart script - was it something you wrote or is it run
> when
> >> the system turns on?
> >> Mithila
> >>
> >> On Thu, Apr 16, 2009 at 1:06 AM, Mithila Nagendra 
> >> wrote:
> >>
> >> > Thanks Jason! Will check that out.
> >> > Mithila
> >> >
> >> >
> >> > On Thu, Apr 16, 2009 at 5:23 AM, jason hadoop  >> >wrote:
> >> >
> >> >> Double check that there is no firewall in place.
> >> >> At one point a bunch of new machines were kickstarted and placed in a
> >> >> cluster and they all failed with something similar.
> >> >> It turned out the kickstart script turned enabled the firewall with a
> >> rule
> >> >> that blocked ports in the 50k range.
> >> >> It took us a while to even think to check that was not a part of our
> >> >> normal
> >> >> machine configuration
> >> >>
> >> >> On Wed, Apr 15, 2009 at 11:04 AM, Mithila Nagendra  >
> >> >> wrote:
> >> >>
> >> >> > Hi Aaron
> >> >> > I will look into that thanks!
> >> >> >
> >> >> > I spoke to the admin who overlooks the cluster. He said that the
> >> gateway
> >> >> > comes in to the picture only when one of the nodes communicates
> with
> >> a
> >> >> node
> >> >> > outside of the cluster. But in my case the communication is carried
> >> out
> >> >> > between the nodes which all belong to the same cluster.
> >> >> >
> >> >> > Mithila
> >> >> >
> >> >> > On Wed, Apr 15, 2009 at 8:59 PM, Aaron Kimball  >
> >> >> wrote:
> >> >> >
> >> >> > > Hi,
> >> >> > >
> >> >> > > I wrote a blog post a while back about connecting nodes via a
> >> gateway.
> >> >> > See
> >> >> > >
> >> >> >
> >> >>
> >>
> http://www.cloudera.com/blog/2008/12/03/securing-a-hadoop-cluster-through-a-gateway/
> >> >> > >
> >> >> > > This assumes that the client is outside the gateway and all
> >> >> > > datanodes/namenode are inside, but the same principles apply.
> >> You'll
> >> >> just
> >> >> > > need to set up ssh tunnels from every datanode to the namenode.
> >> >> > >
> >> >> > > - Aaron
> >> >> > >
> >> >> > >
> >> >> > > On Wed, Apr 15, 2009 at 10:19 AM, Ravi Phulari <
> >> >> rphul...@yahoo-inc.com
> >> >> > >wrote:
> >> >> > >
> >> >> > >> Looks like your NameNode is down .
> >> >> > >> Verify if hadoop process are running (   jps should show you all
> >> java
> >> >> > >> running process).
> >> >> > >> If your hadoop process are running try restarting your hadoop
> >> process
> >> >> .
> >> >> > >> I guess this problem is due to your fsimage not being correct .
> >> >> > >> You might have to format your namenode.
> >> >> > >> Hope this helps.
> >> >> > >>
> >> >> > >> Thanks,
> >> >> > >> --
> >> >> > >> Ravi
> >> >> > >>
> >> >> > >>
> >> >> > >> On 4/15/09 10:15 AM, "Mithila Nagendra" 
> wrote:
> >> >> > >>
> >> >> > >> The log file runs into thousands of line with the same message
> >> being
> >> >> > >> displayed every time.
> >> >> > >>
> >> >> > >> On Wed, Apr 15, 2009 at 8:10 PM, Mithila Nagendra <
> >> mnage...@asu.edu>
> >> >> > >> wrote:
> >> >> > >>
> >> >> > >> > The log file : hadoop-mithila-datanode-node19.log.2009-04-14
> has
> >> >> the
> >> >> > >> > following in it:
> >> >> > >> >
> >> >> > >> > 2009-04-14 10:08:11,499 INFO org.apache.hadoop.dfs.DataNode:
> >> >> > >> STARTUP_MSG:
> >> >> > >> > /
> >> >> > >> > STARTUP_MSG: Starting DataNode
> >> >> > >> > STARTUP_MSG:   host = node19/127.0.0.1
> >> >> > >> > STARTUP_MSG:   args = []
> >> >> > >> > STARTUP_MSG:   version = 0.18.3
> >> >> > >> > STARTUP_MSG:   build =
> >> >> > >> >
> >> https://svn.apache.org/repos/asf/hadoop/core/branches/branch-0.18-r
> >> >> > >> > 736250; compiled by 'ndaley' on Thu Jan 22 23:12:08 UTC 2009
> >> >> > >> > /
> >> >> > >> > 2009-04-14 10:08:12,915 INFO org.apache.hadoop.ipc.Client:
> >> Retrying
> >> >> > >> connect
> >> >> > >> > to server: node18/192.168.0.18:54310. Already tried 0
> time(s).
> >> >> > >> > 2009-04-14 10:08:13,925 INFO org.apache.hadoop.ipc.Client:
> >> Retrying
> >> >> > >> connect
> >> >> > >> > to server: node18/192.168.0.18:54310. Already tried 1
> time(s).
> >> >> > >> > 2009-04-14 10:08:14,935 INFO org.apache.hadoop.ipc.Client:
> >> Retrying
> >> >> > >> connect
> >> >> > >> > to server: node18/192.168.0.18:54310. Already tried 2
> time(s).
> >> >> > >> > 2009-04-14 10:08:15,9

Re: Map-Reduce Slow Down

2009-04-16 Thread jason hadoop
The firewall was run at system startup, I think there was a
/etc/sysconfig/iptables file present which triggered the firewall.
I don't currently have access to any centos 5 machines so I can't easily
check.



On Thu, Apr 16, 2009 at 6:54 PM, jason hadoop wrote:

> The kickstart script was something that the operations staff was using to
> initialize new machines, I never actually saw the script, just figured out
> that there was a firewall in place.
>
>
>
> On Thu, Apr 16, 2009 at 1:28 PM, Mithila Nagendra wrote:
>
>> Jason: the kickstart script - was it something you wrote or is it run when
>> the system turns on?
>> Mithila
>>
>> On Thu, Apr 16, 2009 at 1:06 AM, Mithila Nagendra 
>> wrote:
>>
>> > Thanks Jason! Will check that out.
>> > Mithila
>> >
>> >
>> > On Thu, Apr 16, 2009 at 5:23 AM, jason hadoop > >wrote:
>> >
>> >> Double check that there is no firewall in place.
>> >> At one point a bunch of new machines were kickstarted and placed in a
>> >> cluster and they all failed with something similar.
>> >> It turned out the kickstart script turned enabled the firewall with a
>> rule
>> >> that blocked ports in the 50k range.
>> >> It took us a while to even think to check that was not a part of our
>> >> normal
>> >> machine configuration
>> >>
>> >> On Wed, Apr 15, 2009 at 11:04 AM, Mithila Nagendra 
>> >> wrote:
>> >>
>> >> > Hi Aaron
>> >> > I will look into that thanks!
>> >> >
>> >> > I spoke to the admin who overlooks the cluster. He said that the
>> gateway
>> >> > comes in to the picture only when one of the nodes communicates with
>> a
>> >> node
>> >> > outside of the cluster. But in my case the communication is carried
>> out
>> >> > between the nodes which all belong to the same cluster.
>> >> >
>> >> > Mithila
>> >> >
>> >> > On Wed, Apr 15, 2009 at 8:59 PM, Aaron Kimball 
>> >> wrote:
>> >> >
>> >> > > Hi,
>> >> > >
>> >> > > I wrote a blog post a while back about connecting nodes via a
>> gateway.
>> >> > See
>> >> > >
>> >> >
>> >>
>> http://www.cloudera.com/blog/2008/12/03/securing-a-hadoop-cluster-through-a-gateway/
>> >> > >
>> >> > > This assumes that the client is outside the gateway and all
>> >> > > datanodes/namenode are inside, but the same principles apply.
>> You'll
>> >> just
>> >> > > need to set up ssh tunnels from every datanode to the namenode.
>> >> > >
>> >> > > - Aaron
>> >> > >
>> >> > >
>> >> > > On Wed, Apr 15, 2009 at 10:19 AM, Ravi Phulari <
>> >> rphul...@yahoo-inc.com
>> >> > >wrote:
>> >> > >
>> >> > >> Looks like your NameNode is down .
>> >> > >> Verify if hadoop process are running (   jps should show you all
>> java
>> >> > >> running process).
>> >> > >> If your hadoop process are running try restarting your hadoop
>> process
>> >> .
>> >> > >> I guess this problem is due to your fsimage not being correct .
>> >> > >> You might have to format your namenode.
>> >> > >> Hope this helps.
>> >> > >>
>> >> > >> Thanks,
>> >> > >> --
>> >> > >> Ravi
>> >> > >>
>> >> > >>
>> >> > >> On 4/15/09 10:15 AM, "Mithila Nagendra"  wrote:
>> >> > >>
>> >> > >> The log file runs into thousands of line with the same message
>> being
>> >> > >> displayed every time.
>> >> > >>
>> >> > >> On Wed, Apr 15, 2009 at 8:10 PM, Mithila Nagendra <
>> mnage...@asu.edu>
>> >> > >> wrote:
>> >> > >>
>> >> > >> > The log file : hadoop-mithila-datanode-node19.log.2009-04-14 has
>> >> the
>> >> > >> > following in it:
>> >> > >> >
>> >> > >> > 2009-04-14 10:08:11,499 INFO org.apache.hadoop.dfs.DataNode:
>> >> > >> STARTUP_MSG:
>> >> > >> > /
>> >> > >> > STARTUP_MSG: Starting DataNode
>> >> > >> > STARTUP_MSG:   host = node19/127.0.0.1
>> >> > >> > STARTUP_MSG:   args = []
>> >> > >> > STARTUP_MSG:   version = 0.18.3
>> >> > >> > STARTUP_MSG:   build =
>> >> > >> >
>> https://svn.apache.org/repos/asf/hadoop/core/branches/branch-0.18-r
>> >> > >> > 736250; compiled by 'ndaley' on Thu Jan 22 23:12:08 UTC 2009
>> >> > >> > /
>> >> > >> > 2009-04-14 10:08:12,915 INFO org.apache.hadoop.ipc.Client:
>> Retrying
>> >> > >> connect
>> >> > >> > to server: node18/192.168.0.18:54310. Already tried 0 time(s).
>> >> > >> > 2009-04-14 10:08:13,925 INFO org.apache.hadoop.ipc.Client:
>> Retrying
>> >> > >> connect
>> >> > >> > to server: node18/192.168.0.18:54310. Already tried 1 time(s).
>> >> > >> > 2009-04-14 10:08:14,935 INFO org.apache.hadoop.ipc.Client:
>> Retrying
>> >> > >> connect
>> >> > >> > to server: node18/192.168.0.18:54310. Already tried 2 time(s).
>> >> > >> > 2009-04-14 10:08:15,945 INFO org.apache.hadoop.ipc.Client:
>> Retrying
>> >> > >> connect
>> >> > >> > to server: node18/192.168.0.18:54310. Already tried 3 time(s).
>> >> > >> > 2009-04-14 10:08:16,955 INFO org.apache.hadoop.ipc.Client:
>> Retrying
>> >> > >> connect
>> >> > >> > to server: node18/192.168.0.18:54310. Already tried 4 time(s).
>> >> > >> > 2009-04-14 10:08:17,965 INFO org.apache.hadoop.ipc.Clien

Re: Map-Reduce Slow Down

2009-04-16 Thread jason hadoop
The kickstart script was something that the operations staff was using to
initialize new machines, I never actually saw the script, just figured out
that there was a firewall in place.


On Thu, Apr 16, 2009 at 1:28 PM, Mithila Nagendra  wrote:

> Jason: the kickstart script - was it something you wrote or is it run when
> the system turns on?
> Mithila
>
> On Thu, Apr 16, 2009 at 1:06 AM, Mithila Nagendra 
> wrote:
>
> > Thanks Jason! Will check that out.
> > Mithila
> >
> >
> > On Thu, Apr 16, 2009 at 5:23 AM, jason hadoop  >wrote:
> >
> >> Double check that there is no firewall in place.
> >> At one point a bunch of new machines were kickstarted and placed in a
> >> cluster and they all failed with something similar.
> >> It turned out the kickstart script turned enabled the firewall with a
> rule
> >> that blocked ports in the 50k range.
> >> It took us a while to even think to check that was not a part of our
> >> normal
> >> machine configuration
> >>
> >> On Wed, Apr 15, 2009 at 11:04 AM, Mithila Nagendra 
> >> wrote:
> >>
> >> > Hi Aaron
> >> > I will look into that thanks!
> >> >
> >> > I spoke to the admin who overlooks the cluster. He said that the
> gateway
> >> > comes in to the picture only when one of the nodes communicates with a
> >> node
> >> > outside of the cluster. But in my case the communication is carried
> out
> >> > between the nodes which all belong to the same cluster.
> >> >
> >> > Mithila
> >> >
> >> > On Wed, Apr 15, 2009 at 8:59 PM, Aaron Kimball 
> >> wrote:
> >> >
> >> > > Hi,
> >> > >
> >> > > I wrote a blog post a while back about connecting nodes via a
> gateway.
> >> > See
> >> > >
> >> >
> >>
> http://www.cloudera.com/blog/2008/12/03/securing-a-hadoop-cluster-through-a-gateway/
> >> > >
> >> > > This assumes that the client is outside the gateway and all
> >> > > datanodes/namenode are inside, but the same principles apply. You'll
> >> just
> >> > > need to set up ssh tunnels from every datanode to the namenode.
> >> > >
> >> > > - Aaron
> >> > >
> >> > >
> >> > > On Wed, Apr 15, 2009 at 10:19 AM, Ravi Phulari <
> >> rphul...@yahoo-inc.com
> >> > >wrote:
> >> > >
> >> > >> Looks like your NameNode is down .
> >> > >> Verify if hadoop process are running (   jps should show you all
> java
> >> > >> running process).
> >> > >> If your hadoop process are running try restarting your hadoop
> process
> >> .
> >> > >> I guess this problem is due to your fsimage not being correct .
> >> > >> You might have to format your namenode.
> >> > >> Hope this helps.
> >> > >>
> >> > >> Thanks,
> >> > >> --
> >> > >> Ravi
> >> > >>
> >> > >>
> >> > >> On 4/15/09 10:15 AM, "Mithila Nagendra"  wrote:
> >> > >>
> >> > >> The log file runs into thousands of line with the same message
> being
> >> > >> displayed every time.
> >> > >>
> >> > >> On Wed, Apr 15, 2009 at 8:10 PM, Mithila Nagendra <
> mnage...@asu.edu>
> >> > >> wrote:
> >> > >>
> >> > >> > The log file : hadoop-mithila-datanode-node19.log.2009-04-14 has
> >> the
> >> > >> > following in it:
> >> > >> >
> >> > >> > 2009-04-14 10:08:11,499 INFO org.apache.hadoop.dfs.DataNode:
> >> > >> STARTUP_MSG:
> >> > >> > /
> >> > >> > STARTUP_MSG: Starting DataNode
> >> > >> > STARTUP_MSG:   host = node19/127.0.0.1
> >> > >> > STARTUP_MSG:   args = []
> >> > >> > STARTUP_MSG:   version = 0.18.3
> >> > >> > STARTUP_MSG:   build =
> >> > >> >
> https://svn.apache.org/repos/asf/hadoop/core/branches/branch-0.18-r
> >> > >> > 736250; compiled by 'ndaley' on Thu Jan 22 23:12:08 UTC 2009
> >> > >> > /
> >> > >> > 2009-04-14 10:08:12,915 INFO org.apache.hadoop.ipc.Client:
> Retrying
> >> > >> connect
> >> > >> > to server: node18/192.168.0.18:54310. Already tried 0 time(s).
> >> > >> > 2009-04-14 10:08:13,925 INFO org.apache.hadoop.ipc.Client:
> Retrying
> >> > >> connect
> >> > >> > to server: node18/192.168.0.18:54310. Already tried 1 time(s).
> >> > >> > 2009-04-14 10:08:14,935 INFO org.apache.hadoop.ipc.Client:
> Retrying
> >> > >> connect
> >> > >> > to server: node18/192.168.0.18:54310. Already tried 2 time(s).
> >> > >> > 2009-04-14 10:08:15,945 INFO org.apache.hadoop.ipc.Client:
> Retrying
> >> > >> connect
> >> > >> > to server: node18/192.168.0.18:54310. Already tried 3 time(s).
> >> > >> > 2009-04-14 10:08:16,955 INFO org.apache.hadoop.ipc.Client:
> Retrying
> >> > >> connect
> >> > >> > to server: node18/192.168.0.18:54310. Already tried 4 time(s).
> >> > >> > 2009-04-14 10:08:17,965 INFO org.apache.hadoop.ipc.Client:
> Retrying
> >> > >> connect
> >> > >> > to server: node18/192.168.0.18:54310. Already tried 5 time(s).
> >> > >> > 2009-04-14 10:08:18,975 INFO org.apache.hadoop.ipc.Client:
> Retrying
> >> > >> connect
> >> > >> > to server: node18/192.168.0.18:54310. Already tried 6 time(s).
> >> > >> > 2009-04-14 10:08:19,985 INFO org.apache.hadoop.ipc.Client:
> Retrying
> >> > >> connect
> >> > >> > to server: node1

Re: hadoop0.18.3 64-bit AMI

2009-04-16 Thread Lalit Kapoor
Parul,
   Do a search for cloudera, they have a 64bit ami avaliable. Also take a
look at this: http://www.cloudera.com/hadoop-ec2 you can start up an hadoop
cluster quickly and be on your way (good for proofs of concept and one time
jobs like they state).

Sincerely,
 Lalit Kapoor

On Thu, Apr 16, 2009 at 5:56 PM, Parul Kudtarkar <
parul_kudtar...@hms.harvard.edu> wrote:

>
> Could any one suggest a working 64-bit Hadoop AMI which is publicly
> available? hadoop0.18.3 64-bit AMI is not publicly available. It would be
> nice if some one could release it.
> Thanks,
> Parul V. Kudtarkar
>
>
> --
> View this message in context:
> http://www.nabble.com/hadoop0.18.3-64-bit-AMI-tp23087309p23087309.html
> Sent from the Hadoop core-user mailing list archive at Nabble.com.
>
>


Sometimes no map tasks are run - X are complete and N-X are pending, none running

2009-04-16 Thread Saptarshi Guha
Hello,
I'm using 0.19.2-dev-core (checked out from cvs and build). With 51 maps, i
have a case where 50 tasks have completed and 1 is pending, about 1400
records left for this one to process. The completed map taska have written
out 18GB to the HDFS.
The last map task is forrever in the pending queue - is this is issue my
setup/config or do others have the problem?

Thank you

Saptarshi Guha


Re: Hadoop basic question

2009-04-16 Thread Jeff Hammerbacher
Also see
http://files.meetup.com/1228907/Hadoop%20Namenode%20High%20Availability.pptx
.

On Thu, Apr 16, 2009 at 4:58 PM, Jim Twensky  wrote:

> http://wiki.apache.org/hadoop/FAQ#7
>
> On Thu, Apr 16, 2009 at 6:52 PM, Jae Joo  wrote:
>
> > Will anyone guide me how to avoid the  the single point failure of master
> > node.
> > This is what I know. If the master node is donw by some reason, the
> hadoop
> > system is down and there is no way to have failover system for master
> node.
> > Please correct me if I am not understanding correctly.
> >
> > Jae
> >
>


Re: Hadoop basic question

2009-04-16 Thread Jim Twensky
http://wiki.apache.org/hadoop/FAQ#7

On Thu, Apr 16, 2009 at 6:52 PM, Jae Joo  wrote:

> Will anyone guide me how to avoid the  the single point failure of master
> node.
> This is what I know. If the master node is donw by some reason, the hadoop
> system is down and there is no way to have failover system for master node.
> Please correct me if I am not understanding correctly.
>
> Jae
>


Hadoop basic question

2009-04-16 Thread Jae Joo
Will anyone guide me how to avoid the  the single point failure of master
node.
This is what I know. If the master node is donw by some reason, the hadoop
system is down and there is no way to have failover system for master node.
Please correct me if I am not understanding correctly.

Jae


Seattle / PNW Hadoop + Lucene User Group?

2009-04-16 Thread Bradford Stephens
Greetings,

Would anybody be willing to join a PNW Hadoop and/or Lucene User Group
with me in the Seattle area? I can donate some facilities, etc. -- I
also always have topics to speak about :)

Cheers,
Bradford


hadoop0.18.3 64-bit AMI

2009-04-16 Thread Parul Kudtarkar

Could any one suggest a working 64-bit Hadoop AMI which is publicly
available? hadoop0.18.3 64-bit AMI is not publicly available. It would be
nice if some one could release it. 
Thanks, 
Parul V. Kudtarkar 


-- 
View this message in context: 
http://www.nabble.com/hadoop0.18.3-64-bit-AMI-tp23087309p23087309.html
Sent from the Hadoop core-user mailing list archive at Nabble.com.



Question about the classpath setting for bin/hadoop jar

2009-04-16 Thread Cole, Richard
Hi,

I noticed that the "bin/hadoop jar" command doesn't add the jar being executed 
to the classpath. Is this deliberate and what is the reasoning? The result is 
that resources in the jar are not accessible from the system class loader. 
Rather they are only available from the thread context class loader and the 
class loader of the main class.

Regards,

Richard.



Re: Map-Reduce Slow Down

2009-04-16 Thread Mithila Nagendra
Jason: the kickstart script - was it something you wrote or is it run when
the system turns on?
Mithila

On Thu, Apr 16, 2009 at 1:06 AM, Mithila Nagendra  wrote:

> Thanks Jason! Will check that out.
> Mithila
>
>
> On Thu, Apr 16, 2009 at 5:23 AM, jason hadoop wrote:
>
>> Double check that there is no firewall in place.
>> At one point a bunch of new machines were kickstarted and placed in a
>> cluster and they all failed with something similar.
>> It turned out the kickstart script turned enabled the firewall with a rule
>> that blocked ports in the 50k range.
>> It took us a while to even think to check that was not a part of our
>> normal
>> machine configuration
>>
>> On Wed, Apr 15, 2009 at 11:04 AM, Mithila Nagendra 
>> wrote:
>>
>> > Hi Aaron
>> > I will look into that thanks!
>> >
>> > I spoke to the admin who overlooks the cluster. He said that the gateway
>> > comes in to the picture only when one of the nodes communicates with a
>> node
>> > outside of the cluster. But in my case the communication is carried out
>> > between the nodes which all belong to the same cluster.
>> >
>> > Mithila
>> >
>> > On Wed, Apr 15, 2009 at 8:59 PM, Aaron Kimball 
>> wrote:
>> >
>> > > Hi,
>> > >
>> > > I wrote a blog post a while back about connecting nodes via a gateway.
>> > See
>> > >
>> >
>> http://www.cloudera.com/blog/2008/12/03/securing-a-hadoop-cluster-through-a-gateway/
>> > >
>> > > This assumes that the client is outside the gateway and all
>> > > datanodes/namenode are inside, but the same principles apply. You'll
>> just
>> > > need to set up ssh tunnels from every datanode to the namenode.
>> > >
>> > > - Aaron
>> > >
>> > >
>> > > On Wed, Apr 15, 2009 at 10:19 AM, Ravi Phulari <
>> rphul...@yahoo-inc.com
>> > >wrote:
>> > >
>> > >> Looks like your NameNode is down .
>> > >> Verify if hadoop process are running (   jps should show you all java
>> > >> running process).
>> > >> If your hadoop process are running try restarting your hadoop process
>> .
>> > >> I guess this problem is due to your fsimage not being correct .
>> > >> You might have to format your namenode.
>> > >> Hope this helps.
>> > >>
>> > >> Thanks,
>> > >> --
>> > >> Ravi
>> > >>
>> > >>
>> > >> On 4/15/09 10:15 AM, "Mithila Nagendra"  wrote:
>> > >>
>> > >> The log file runs into thousands of line with the same message being
>> > >> displayed every time.
>> > >>
>> > >> On Wed, Apr 15, 2009 at 8:10 PM, Mithila Nagendra 
>> > >> wrote:
>> > >>
>> > >> > The log file : hadoop-mithila-datanode-node19.log.2009-04-14 has
>> the
>> > >> > following in it:
>> > >> >
>> > >> > 2009-04-14 10:08:11,499 INFO org.apache.hadoop.dfs.DataNode:
>> > >> STARTUP_MSG:
>> > >> > /
>> > >> > STARTUP_MSG: Starting DataNode
>> > >> > STARTUP_MSG:   host = node19/127.0.0.1
>> > >> > STARTUP_MSG:   args = []
>> > >> > STARTUP_MSG:   version = 0.18.3
>> > >> > STARTUP_MSG:   build =
>> > >> > https://svn.apache.org/repos/asf/hadoop/core/branches/branch-0.18-r
>> > >> > 736250; compiled by 'ndaley' on Thu Jan 22 23:12:08 UTC 2009
>> > >> > /
>> > >> > 2009-04-14 10:08:12,915 INFO org.apache.hadoop.ipc.Client: Retrying
>> > >> connect
>> > >> > to server: node18/192.168.0.18:54310. Already tried 0 time(s).
>> > >> > 2009-04-14 10:08:13,925 INFO org.apache.hadoop.ipc.Client: Retrying
>> > >> connect
>> > >> > to server: node18/192.168.0.18:54310. Already tried 1 time(s).
>> > >> > 2009-04-14 10:08:14,935 INFO org.apache.hadoop.ipc.Client: Retrying
>> > >> connect
>> > >> > to server: node18/192.168.0.18:54310. Already tried 2 time(s).
>> > >> > 2009-04-14 10:08:15,945 INFO org.apache.hadoop.ipc.Client: Retrying
>> > >> connect
>> > >> > to server: node18/192.168.0.18:54310. Already tried 3 time(s).
>> > >> > 2009-04-14 10:08:16,955 INFO org.apache.hadoop.ipc.Client: Retrying
>> > >> connect
>> > >> > to server: node18/192.168.0.18:54310. Already tried 4 time(s).
>> > >> > 2009-04-14 10:08:17,965 INFO org.apache.hadoop.ipc.Client: Retrying
>> > >> connect
>> > >> > to server: node18/192.168.0.18:54310. Already tried 5 time(s).
>> > >> > 2009-04-14 10:08:18,975 INFO org.apache.hadoop.ipc.Client: Retrying
>> > >> connect
>> > >> > to server: node18/192.168.0.18:54310. Already tried 6 time(s).
>> > >> > 2009-04-14 10:08:19,985 INFO org.apache.hadoop.ipc.Client: Retrying
>> > >> connect
>> > >> > to server: node18/192.168.0.18:54310. Already tried 7 time(s).
>> > >> > 2009-04-14 10:08:20,995 INFO org.apache.hadoop.ipc.Client: Retrying
>> > >> connect
>> > >> > to server: node18/192.168.0.18:54310. Already tried 8 time(s).
>> > >> > 2009-04-14 10:08:22,005 INFO org.apache.hadoop.ipc.Client: Retrying
>> > >> connect
>> > >> > to server: node18/192.168.0.18:54310. Already tried 9 time(s).
>> > >> > 2009-04-14 10:08:22,008 INFO org.apache.hadoop.ipc.RPC: Server at
>> > >> node18/
>> > >> > 192.168.0.18:54310 not available yet, Z...
>> > >> > 2

Re: Generating many small PNGs to Amazon S3 with MapReduce

2009-04-16 Thread tim robertson
Thanks Todd and Chuck - sorry, my terminology was wrong... exactly
what I was looking for.

I am letting mysql chuck throught the zoom levels now to get some
final numbers on the tiles and cost to S3 PUT.  Looks like zoom level
8 is feasible for our current data volume but not a long term option
if the input data explodes in volume.

Cheers,

Tim



On Thu, Apr 16, 2009 at 9:05 PM, Chuck Lam  wrote:
> ar.. i totally missed the point you had said about "compete reducers". it
> didn't occur to me that you were talking about hadoop's speculative
> execution. todd's solution to turn off speculative execution is correct.
>
> i'll respond to the rest of your email later today.
>
>
>
> On Thu, Apr 16, 2009 at 5:23 AM, tim robertson 
> wrote:
>>
>> Thanks Chuck,
>>
>> > I'm shooting for finishing the case studies by the end of May, but it'll
>> > be
>> > nice to have a draft done by mid-May so we can edit it to have a
>> > consistent
>> > style with the other case studies.
>>
>> I will do what I can!
>>
>> > I read your blog and found a couple posts on spatial joining. It wasn't
>> > clear to me from reading the posts whether the work was just
>> > experimental or
>> > if it led to some application. If it led to an application, then we may
>> > incorporate that into the case study too.
>>
>> It led to http://widgets.gbif.org/test/PACountry.html#/area/2571 which
>> shows a statistical summary for our data (latitude longitude)
>> cross-referenced with the polygons on the protected areas of the
>> world.  In truth though, we processed it in PostGIS and Hadoop and
>> found that the PostGIS approach, while way slower was fine for now and
>> we developed the scripts for that quicker.  So you can say it was
>> experimental... I do have ambitions to do a basic geospatial join
>> (points in polygons) for PIG, Cloudbase or Hive2.0 but alas have not
>> found time.  Also - the blog is always a late Sunday night effort so
>> really is not written well.
>>
>> > BTW, where in the US are you traveling to? I'm in Silicon Valley, so
>> > maybe
>> > we can meet up if you'll happen to be in the area and can squeeze a
>> > little
>> > time out.
>>
>> Would have loved to... but in Boston and DC this time.  In a few weeks
>> will be in Chicago, but for some reason I have never make it over your
>> neck of the woods.
>>
>> > I don't know what data you need to produce a single PNG file, so I don't
>> > know whether having map output TileId-ZoomLevel-SpeciesId as key is the
>> > right factoring. To me it looks like each PNG represents one tile at one
>> > zoom level but includes multiple species.
>>
>> We do individual species and higher levels of taxa (up to all data).
>> This is all data, grouped to 1x1 degree cells (think 100x100 km) with
>> counts.  Currently preprocessed with mysql, but another hadoop
>> candidate as we grow.
>>
>> http://maps.gbif.org/mapserver/draw.pl?dtype=box&imgonly=1&path=http%3A%2F%2Fdata.gbif.org%2Fmaplayer%2Ftaxon%2F13140803&extent=-180.0+-90.0+180.0+90.0&mode=browse&refresh=Refresh&layer=countryborders
>>
>> > In any case, under Hadoop/MapReduce, all key/value pairs outputted by
>> > the
>> > mappers are grouped by key before being sent to the reducer, so it's
>> > guaranteed that the same key will not go to multiple reducers.
>>
>> That is good to know.  I knew Map tasks would get run on multiple
>> machines if it detects a machine is idle, but wasn't sure if Hadoop
>> would put reducers on machines to compete against each other and kill
>> the one that did not finish first.
>>
>> > You may also want to think more about the actual volume and cost of all
>> > this. You initially said that you will have "billions of PNGs produced
>> > each
>> > at 1-3KB" but then later said the data size is only a few 100GB due to
>> > sparsity. Either you're not really creating billions of PNGs, or a lot
>> > of
>> > them are actually less than 1KB. Kevin brought up a good point that S3
>> > charges $0.01 for every 1000 files ("objects") created, so generating 1
>> > billion files will already set you back $10K plus storage cost (and
>> > transfer
>> > cost if you're not using EC2).
>>
>> Right - my bad... Having not processed this all I am not 100% sure yet
>> what the size will be and to what zoom level I will preprocess to.
>> The challenge is our data is growing continuously, so billions of PNGs
>> was looking into the coming months.  Sorry for the contradiction.
>>
>> You have clearly spotted that I am doing this as a project on the side
>> (evenings really) and not devoting enough time to this!!!  By day I am
>> mysql and postgis still but I am hitting limits and looking to our
>> scalability.
>> I kind of overlooked the PUT cost on S3 thinking stupidly that EC2->S3 was
>> free.
>>
>> I actually have the stuff processed for species only using mysql
>> (http://eol-map.gbif.org/EOLSpeciesMap.html?taxon_id=13839800) but not
>> the higher groupings of species (familys of species etc).  It could be
>> that I end up only pro

Re: getting DiskErrorException during map

2009-04-16 Thread Jim Twensky
Yes, here is how it looks:


hadoop.tmp.dir
/scratch/local/jim/hadoop-${user.name}


so I don't know why it still writes to /tmp. As a temporary workaround, I
created a symbolic link from /tmp/hadoop-jim to /scratch/...
and it works fine now but if you think this might be a considered as a bug,
I can report it.

Thanks,
Jim


On Thu, Apr 16, 2009 at 12:44 PM, Alex Loddengaard wrote:

> Have you set hadoop.tmp.dir away from /tmp as well?  If hadoop.tmp.dir is
> set somewhere in /scratch vs. /tmp, then I'm not sure why Hadoop would be
> writing to /tmp.
>
> Hope this helps!
>
> Alex
>
> On Wed, Apr 15, 2009 at 2:37 PM, Jim Twensky 
> wrote:
>
> > Alex,
> >
> > Yes, I bounced the Hadoop daemons after I changed the configuration
> files.
> >
> > I also tried setting  $HADOOP_CONF_DIR to the directory where my
> > hadop-site.xml file resides but it didn't work.
> > However, I'm sure that HADOOP_CONF_DIR is not the issue because other
> > properties that I changed in hadoop-site.xml
> > seem to be properly set. Also, here is a section from my hadoop-site.xml
> > file:
> >
> >
> >hadoop.tmp.dir
> > /scratch/local/jim/hadoop-${user.name}
> > 
> >
> >mapred.local.dir
> > /scratch/local/jim/hadoop-${user.name
> }/mapred/local
> >
> >
> > I also created /scratch/local/jim/hadoop-jim/mapred/local on each task
> > tracker since I know
> > directories that do not exist are ignored.
> >
> > When I manually ssh to the task trackers, I can see the directory
> > /scratch/local/jim/hadoop-jim/dfs
> > is automatically created so is it seems like  hadoop.tmp.dir is set
> > properly. However, hadoop still creates
> > /tmp/hadoop-jim/mapred/local and uses that directory for the local
> storage.
> >
> > I'm starting to suspect that mapred.local.dir is overwritten to a default
> > value of /tmp/hadoop-${user.name}
> > somewhere inside the binaries.
> >
> > -jim
> >
> > On Tue, Apr 14, 2009 at 4:07 PM, Alex Loddengaard 
> > wrote:
> >
> > > First, did you bounce the Hadoop daemons after you changed the
> > > configuration
> > > files?  I think you'll have to do this.
> > >
> > > Second, I believe 0.19.1 has hadoop-default.xml baked into the jar.
>  Try
> > > setting $HADOOP_CONF_DIR to the directory where hadoop-site.xml lives.
> >  For
> > > whatever reason your hadoop-site.xml (and the hadoop-default.xml you
> > tried
> > > to change) are probably not being loaded.  $HADOOP_CONF_DIR should fix
> > > this.
> > >
> > > Good luck!
> > >
> > > Alex
> > >
> > > On Mon, Apr 13, 2009 at 11:25 AM, Jim Twensky 
> > > wrote:
> > >
> > > > Thank you Alex, you are right. There are quotas on the systems that
> I'm
> > > > working. However, I tried to change mapred.local.dir as follows:
> > > >
> > > > --inside hadoop-site.xml:
> > > >
> > > >
> > > >mapred.child.tmp
> > > >/scratch/local/jim
> > > >
> > > >
> > > >hadoop.tmp.dir
> > > >/scratch/local/jim
> > > >
> > > >
> > > >mapred.local.dir
> > > >/scratch/local/jim
> > > >
> > > >
> > > >  and observed that the intermediate map outputs are still being
> written
> > > > under /tmp/hadoop-jim/mapred/local
> > > >
> > > > I'm confused at this point since I also tried setting these values
> > > directly
> > > > inside the hadoop-default.xml and that didn't work either. Is there
> any
> > > > other property that I'm supposed to change? I tried searching for
> > "/tmp"
> > > in
> > > > the hadoop-default.xml file but couldn't find anything else.
> > > >
> > > > Thanks,
> > > > Jim
> > > >
> > > >
> > > > On Tue, Apr 7, 2009 at 9:35 PM, Alex Loddengaard 
> > > > wrote:
> > > >
> > > > > The getLocalPathForWrite function that throws this Exception
> assumes
> > > that
> > > > > you have space on the disks that mapred.local.dir is configured on.
> > >  Can
> > > > > you
> > > > > verify with `df` that those disks have space available?  You might
> > also
> > > > try
> > > > > moving mapred.local.dir off of /tmp if it's configured to use /tmp
> > > right
> > > > > now; I believe some systems have quotas on /tmp.
> > > > >
> > > > > Hope this helps.
> > > > >
> > > > > Alex
> > > > >
> > > > > On Tue, Apr 7, 2009 at 7:22 PM, Jim Twensky  >
> > > > wrote:
> > > > >
> > > > > > Hi,
> > > > > >
> > > > > > I'm using Hadoop 0.19.1 and I have a very small test cluster with
> 9
> > > > > nodes,
> > > > > > 8
> > > > > > of them being task trackers. I'm getting the following error and
> my
> > > > jobs
> > > > > > keep failing when map processes start hitting 30%:
> > > > > >
> > > > > > org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not
> > find
> > > > any
> > > > > > valid local directory for
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> taskTracker/jobcache/job_200904072051_0001/attempt_200904072051_0001_m_00_1/output/file.out
> > > > > >at
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.fs.LocalDirAlloc

Re: Error reading task output

2009-04-16 Thread Cam Macdonell

Cam Macdonell wrote:


Hi,

I'm getting the following warning when running the simple wordcount and 
grep examples.


09/04/15 16:54:16 INFO mapred.JobClient: Task Id : 
attempt_200904151649_0001_m_19_0, Status : FAILED

Too many fetch-failures
09/04/15 16:54:16 WARN mapred.JobClient: Error reading task 
outputhttp://localhost.localdomain:50060/tasklog?plaintext=true&taskid=attempt_200904151649_0001_m_19_0&filter=stdout 

09/04/15 16:54:16 WARN mapred.JobClient: Error reading task 
outputhttp://localhost.localdomain:50060/tasklog?plaintext=true&taskid=attempt_200904151649_0001_m_19_0&filter=stderr 



The only advice I could find from other posts with similar errors is to 
setup /etc/hosts with all slaves and the host IPs.  I did this, but I 
still get the warning above.  The output seems to come out alright 
however (I guess that's why it is a warning).


I tried running a wget on the http:// address in the warning message and 
I get the following back


2009-04-15 16:53:46 ERROR 400: Argument taskid is required.

So perhaps the wrong task ID is being passed to the http request.  Any 
ideas on what can get rid of these warnings?


Thanks,
Cam


Well, for future googlers, I'll answer my own post.  Watch our for the 
hostname at the end of "localhost" lines on slaves.  One of my slaves 
was registering itself as "localhost.localdomain" with the jobtracker.


Is there a way that Hadoop could be made to not be so dependent on 
/etc/hosts, but on more dynamic hostname resolution?


Cam


Re: getting DiskErrorException during map

2009-04-16 Thread Alex Loddengaard
Have you set hadoop.tmp.dir away from /tmp as well?  If hadoop.tmp.dir is
set somewhere in /scratch vs. /tmp, then I'm not sure why Hadoop would be
writing to /tmp.

Hope this helps!

Alex

On Wed, Apr 15, 2009 at 2:37 PM, Jim Twensky  wrote:

> Alex,
>
> Yes, I bounced the Hadoop daemons after I changed the configuration files.
>
> I also tried setting  $HADOOP_CONF_DIR to the directory where my
> hadop-site.xml file resides but it didn't work.
> However, I'm sure that HADOOP_CONF_DIR is not the issue because other
> properties that I changed in hadoop-site.xml
> seem to be properly set. Also, here is a section from my hadoop-site.xml
> file:
>
>
>hadoop.tmp.dir
> /scratch/local/jim/hadoop-${user.name}
> 
>
>mapred.local.dir
> /scratch/local/jim/hadoop-${user.name}/mapred/local
>
>
> I also created /scratch/local/jim/hadoop-jim/mapred/local on each task
> tracker since I know
> directories that do not exist are ignored.
>
> When I manually ssh to the task trackers, I can see the directory
> /scratch/local/jim/hadoop-jim/dfs
> is automatically created so is it seems like  hadoop.tmp.dir is set
> properly. However, hadoop still creates
> /tmp/hadoop-jim/mapred/local and uses that directory for the local storage.
>
> I'm starting to suspect that mapred.local.dir is overwritten to a default
> value of /tmp/hadoop-${user.name}
> somewhere inside the binaries.
>
> -jim
>
> On Tue, Apr 14, 2009 at 4:07 PM, Alex Loddengaard 
> wrote:
>
> > First, did you bounce the Hadoop daemons after you changed the
> > configuration
> > files?  I think you'll have to do this.
> >
> > Second, I believe 0.19.1 has hadoop-default.xml baked into the jar.  Try
> > setting $HADOOP_CONF_DIR to the directory where hadoop-site.xml lives.
>  For
> > whatever reason your hadoop-site.xml (and the hadoop-default.xml you
> tried
> > to change) are probably not being loaded.  $HADOOP_CONF_DIR should fix
> > this.
> >
> > Good luck!
> >
> > Alex
> >
> > On Mon, Apr 13, 2009 at 11:25 AM, Jim Twensky 
> > wrote:
> >
> > > Thank you Alex, you are right. There are quotas on the systems that I'm
> > > working. However, I tried to change mapred.local.dir as follows:
> > >
> > > --inside hadoop-site.xml:
> > >
> > >
> > >mapred.child.tmp
> > >/scratch/local/jim
> > >
> > >
> > >hadoop.tmp.dir
> > >/scratch/local/jim
> > >
> > >
> > >mapred.local.dir
> > >/scratch/local/jim
> > >
> > >
> > >  and observed that the intermediate map outputs are still being written
> > > under /tmp/hadoop-jim/mapred/local
> > >
> > > I'm confused at this point since I also tried setting these values
> > directly
> > > inside the hadoop-default.xml and that didn't work either. Is there any
> > > other property that I'm supposed to change? I tried searching for
> "/tmp"
> > in
> > > the hadoop-default.xml file but couldn't find anything else.
> > >
> > > Thanks,
> > > Jim
> > >
> > >
> > > On Tue, Apr 7, 2009 at 9:35 PM, Alex Loddengaard 
> > > wrote:
> > >
> > > > The getLocalPathForWrite function that throws this Exception assumes
> > that
> > > > you have space on the disks that mapred.local.dir is configured on.
> >  Can
> > > > you
> > > > verify with `df` that those disks have space available?  You might
> also
> > > try
> > > > moving mapred.local.dir off of /tmp if it's configured to use /tmp
> > right
> > > > now; I believe some systems have quotas on /tmp.
> > > >
> > > > Hope this helps.
> > > >
> > > > Alex
> > > >
> > > > On Tue, Apr 7, 2009 at 7:22 PM, Jim Twensky 
> > > wrote:
> > > >
> > > > > Hi,
> > > > >
> > > > > I'm using Hadoop 0.19.1 and I have a very small test cluster with 9
> > > > nodes,
> > > > > 8
> > > > > of them being task trackers. I'm getting the following error and my
> > > jobs
> > > > > keep failing when map processes start hitting 30%:
> > > > >
> > > > > org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not
> find
> > > any
> > > > > valid local directory for
> > > > >
> > > > >
> > > >
> > >
> >
> taskTracker/jobcache/job_200904072051_0001/attempt_200904072051_0001_m_00_1/output/file.out
> > > > >at
> > > > >
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:335)
> > > > >at
> > > > >
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:124)
> > > > >at
> > > > >
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.mapred.MapOutputFile.getOutputFileForWrite(MapOutputFile.java:61)
> > > > >at
> > > > >
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.mergeParts(MapTask.java:1209)
> > > > >at
> > > > >
> > >
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:867)
> > > > >at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
> > > > >at org.apache.hadoop.mapre

Re: Generating many small PNGs to Amazon S3 with MapReduce

2009-04-16 Thread Todd Lipcon
On Thu, Apr 16, 2009 at 1:27 AM, tim robertson wrote:

>
> What is not 100% clear to me is when to push to S3:
> In the Map I will output the TileId-ZoomLevel-SpeciesId as the key,
> along with the count, and in the Reduce I group the counts into larger
> tiles, and create the PNG.  I could write to Sequencefile here... but
> I suspect I could just push to the s3 bucket here also - as long as
> the task tracker does not send the same Keys to multiple reduce tasks
> - my Hadoop naivity showing here (I wrote an in memory threaded
> MapReduceLite which does not compete reducers, but not got into the
> Hadoop code quite so much yet).
>
>
Hi Tim,

If I understand what you mean by "compete reducers", then you're referring
to the feature called "speculative execution", in which Hadoop schedules
multiple TaskTrackers to perform the same task. When one of the
multiply-scheduled tasks finishes, the other one is killed. As you seem to
already understand, this might cause issues if your tasks have
non-idempotent side effects on the outside world.

The configuration variable you need to look at is
mapred.reduce.tasks.speculative.execution. If this is set to false, only one
reduce task will be run on each key. If it is true, it's possible that some
reduce tasks will be scheduled twice to try to reduce variance in job
completion times due to slow machines.

There's an equivalent configuration variable
mapred.map.tasks.speculative.execution that controls this behavior for your
map tasks.

Hope that helps,
-Todd


RE: Complex workflows in Hadoop

2009-04-16 Thread Brian MacKay

Have you looked at ChainMapper and ChainReducer.  It may not be entirely
what you require, but with some modifications perhaps it might work for
you.

"Using the ChainMapper and the ChainReducer classes is possible to
compose Map/Reduce jobs that look like [MAP+ / REDUCE MAP*]. And
immediate benefit of this pattern is a dramatic reduction in disk IO."

http://hadoop.apache.org/core/docs/r0.19.1/api/org/apache/hadoop/mapred/
lib/ChainMapper.html


-Original Message-
From: Shevek [mailto:had...@anarres.org] 
Sent: Thursday, April 16, 2009 11:23 AM
To: core-user@hadoop.apache.org
Subject: Re: Complex workflows in Hadoop

On Tue, 2009-04-14 at 07:59 -0500, Pankil Doshi wrote:
> Hey,
> 
> I am trying complex queries on hadoop and in which i require more than
one
> job to run to get final result..results of job one captures few joins
of the
> query and I want to pass those results as input to 2nd job and again
do
> processing so that I can get final results.queries are such that I
cant do
> all types of joins and filterin in job1 and so I require two jobs.
> 
> right now I write results of job 1 to hdfs and read dem for job2..but
thats
> take unecessary IO time.So was looking for something that I can store
my
> results of job1 in memory and use them as input for job 2.

Hi,

I am a programming language and compiler designer. We have a workflow
engine which is capable of taking a description of a complex workflow
and analysing it as a multi-stage map-reduce system to generate an
optimal resource allocation. I'm hunting around for people who have
problems like this, since I'm considering whether to port the whole
thing to hadoop as a high-level language.

Do you, or any other users have descriptions of workflows more complex
than "one map, maybe one reduce" which you would like to be able to
express easily?

S.


_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

The information transmitted is intended only for the person or entity to 
which it is addressed and may contain confidential and/or privileged 
material. Any review, retransmission, dissemination or other use of, or 
taking of any action in reliance upon, this information by persons or 
entities other than the intended recipient is prohibited. If you received 
this message in error, please contact the sender and delete the material 
from any computer.




Re: Complex workflows in Hadoop

2009-04-16 Thread Vadim Zaliva
Cascading is great.

If you looking for more pragmatic approach, which would allow you to
build a workflow
from existing Hadoop tasks and PIG scripts without writing additional
Java code you may want to take a look at HAMAKE:

http://code.google.com/p/hamake/

Vadim


Re: Complex workflows in Hadoop

2009-04-16 Thread jason hadoop
Chaining described in chapter 8 of my book provides this to a limited
degree.

Cascading, http://www.cascading.org/, also supports complex flows. I do not
know how cascading works under the covers.

On Thu, Apr 16, 2009 at 8:23 AM, Shevek  wrote:

> On Tue, 2009-04-14 at 07:59 -0500, Pankil Doshi wrote:
> > Hey,
> >
> > I am trying complex queries on hadoop and in which i require more than
> one
> > job to run to get final result..results of job one captures few joins of
> the
> > query and I want to pass those results as input to 2nd job and again do
> > processing so that I can get final results.queries are such that I cant
> do
> > all types of joins and filterin in job1 and so I require two jobs.
> >
> > right now I write results of job 1 to hdfs and read dem for job2..but
> thats
> > take unecessary IO time.So was looking for something that I can store my
> > results of job1 in memory and use them as input for job 2.
>
> Hi,
>
> I am a programming language and compiler designer. We have a workflow
> engine which is capable of taking a description of a complex workflow
> and analysing it as a multi-stage map-reduce system to generate an
> optimal resource allocation. I'm hunting around for people who have
> problems like this, since I'm considering whether to port the whole
> thing to hadoop as a high-level language.
>
> Do you, or any other users have descriptions of workflows more complex
> than "one map, maybe one reduce" which you would like to be able to
> express easily?
>
> S.
>
>


-- 
Alpha Chapters of my book on Hadoop are available
http://www.apress.com/book/view/9781430219422


Re: Complex workflows in Hadoop

2009-04-16 Thread Shevek
On Tue, 2009-04-14 at 07:59 -0500, Pankil Doshi wrote:
> Hey,
> 
> I am trying complex queries on hadoop and in which i require more than one
> job to run to get final result..results of job one captures few joins of the
> query and I want to pass those results as input to 2nd job and again do
> processing so that I can get final results.queries are such that I cant do
> all types of joins and filterin in job1 and so I require two jobs.
> 
> right now I write results of job 1 to hdfs and read dem for job2..but thats
> take unecessary IO time.So was looking for something that I can store my
> results of job1 in memory and use them as input for job 2.

Hi,

I am a programming language and compiler designer. We have a workflow
engine which is capable of taking a description of a complex workflow
and analysing it as a multi-stage map-reduce system to generate an
optimal resource allocation. I'm hunting around for people who have
problems like this, since I'm considering whether to port the whole
thing to hadoop as a high-level language.

Do you, or any other users have descriptions of workflows more complex
than "one map, maybe one reduce" which you would like to be able to
express easily?

S.



Re: No space left on device Exception

2009-04-16 Thread Pankil Doshi
Hey

what's your input size?

from the info you gave it seems you have used 4.2GB and so probably if thats
your input size your intermediate results mostly is less then your input.but
that too depends on your map function. Make sure about the size of
intermediate results.

Pankil

On Thu, Apr 16, 2009 at 3:25 AM, Rakhi Khatwani wrote:

> Thanks,
>  I will check tht
>
> Regards,
> Raakhi
>
> On Thu, Apr 16, 2009 at 1:42 PM, Miles Osborne  wrote:
>
> > it may be that intermediate results are filling your disks and when
> > the jobs crash, this all gets deleted.  so it would look like you have
> > spare space when in reality you don't.
> >
> > i would check on the file system as your jobs run and see if indeed
> > they are filling-up.
> >
> > Miles
> >
> > 2009/4/16 Rakhi Khatwani :
> > > Hi,
> > >following is the output on the df command
> > > [r...@domu-12-31-39-00-e5-d2 conf]# df -h
> > > FilesystemSize  Used Avail Use% Mounted on
> > > /dev/sda1 9.9G  4.2G  5.2G  45% /
> > > /dev/sdb  414G  924M  392G   1% /mnt
> > >
> > > from the o/p it seems that i have quite an amount of memory available.
> > but i
> > > still get the exception :(
> > >
> > > Thanks
> > > Raakhi
> > >
> > > On Thu, Apr 16, 2009 at 1:18 PM, Desai, Milind B  > >wrote:
> > >
> > >> From the exception it appears that there is no space left on machine.
> > You
> > >> can check using 'df'
> > >>
> > >> Thanks
> > >> Milind
> > >>
> > >> -Original Message-
> > >> From: Rakhi Khatwani [mailto:rakhi.khatw...@gmail.com]
> > >> Sent: Thursday, April 16, 2009 1:15 PM
> > >> To: hbase-u...@hadoop.apache.org; core-user@hadoop.apache.org
> > >> Subject: No space left on device Exception
> > >>
> > >> Hi,
> > >> I am running a map-reduce program on 6-Node ec2 cluster. and after
> a
> > >> couple of hours all my tasks gets hanged.
> > >>
> > >> so i started digging into the logs
> > >>
> > >> there were no logs for regionserver
> > >> no logs for tasktracker.
> > >> However for jobtracker i get the following:
> > >>
> > >> 2009-04-16 03:00:29,691 INFO org.apache.hadoop.ipc.Server: IPC Server
> > >> handler 9 on 50002, call
> > >> heartbeat(org.apache.hadoop.mapred.tasktrackersta...@2eed7d11, false,
> > >> true,
> > >> 10745) from 10.254.27.79:44222: error: java.io.IOException:
> > >> org.apache.hadoop.fs.FSError: java.io.IOException: No space left on
> > device
> > >> java.io.IOException: org.apache.hadoop.fs.FSError:
> java.io.IOException:
> > No
> > >> space left on device
> > >>   at
> > >>
> > >>
> >
> org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.write(RawLocalFileSystem.java:199)
> > >>   at
> > >> java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)
> > >>   at
> > java.io.BufferedOutputStream.write(BufferedOutputStream.java:109)
> > >>   at
> > >>
> > >>
> >
> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:47)
> > >>   at java.io.DataOutputStream.write(DataOutputStream.java:90)
> > >>   at
> > >>
> > >>
> >
> org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSOutputSummer.writeChunk(ChecksumFileSystem.java:346)
> > >>   at
> > >>
> > >>
> >
> org.apache.hadoop.fs.FSOutputSummer.writeChecksumChunk(FSOutputSummer.java:150)
> > >>   at
> > >> org.apache.hadoop.fs.FSOutputSummer.write1(FSOutputSummer.java:100)
> > >>   at
> > org.apache.hadoop.fs.FSOutputSummer.write(FSOutputSummer.java:86)
> > >>   at
> > >>
> > >>
> >
> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:47)
> > >>   at java.io.DataOutputStream.write(DataOutputStream.java:90)
> > >>   at sun.nio.cs.StreamEncoder.writeBytes(StreamEncoder.java:202)
> > >>   at sun.nio.cs.StreamEncoder.implClose(StreamEncoder.java:297)
> > >>   at sun.nio.cs.StreamEncoder.close(StreamEncoder.java:130)
> > >>   at java.io.OutputStreamWriter.close(OutputStreamWriter.java:216)
> > >>   at java.io.BufferedWriter.close(BufferedWriter.java:248)
> > >>   at java.io.PrintWriter.close(PrintWriter.java:295)
> > >>   at
> > >>
> > >>
> >
> org.apache.hadoop.mapred.JobHistory$JobInfo.logFinished(JobHistory.java:1024)
> > >>   at
> > >>
> >
> org.apache.hadoop.mapred.JobInProgress.jobComplete(JobInProgress.java:1906)
> > >>   at org.apache.hadoop.mapred.JobInProgress.comp
> > >>
> > >>
> > >>
> > >> following are the disk information on dfs UI
> > >> domU-12-31-39-00-0C-A1<
> > >>
> >
> http://domu-12-31-39-00-0c-a1.compute-1.internal:50075/browseDirectory.jsp?namenodeInfoPort=50070&dir=%2F
> > >> >0In
> > >> Service413.380.8321.19391.360.2
> > >> 94.672353 domU-12-31-39-00-16-F1<
> > >>
> >
> http://domu-12-31-39-00-16-f1.compute-1.internal:50075/browseDirectory.jsp?namenodeInfoPort=50070&dir=%2F
> > >> >1In
> > >> Service413.380.4621.24391.670.11
> > >> 94.752399 domU-12-31-39-00-45-71<
> > >>
> >
> http://domu-12-31-39-00-45-71.compute-1.internal:50075/browseDirect

Hadoop Presentation at Ankara / Turkey

2009-04-16 Thread Enis Soztutar

Hi all,

I will be giving a presentation on Hadoop at "1. Ulusal Yüksek Başarım 
ve Grid Konferansı" tomorrow(Apr 17, 13:10). The conference location is 
at KKM ODTU/Ankara/Turkey. Presentation will be in Turkish. All the 
Hadoop users and wanna-be users in the area are welcome to attend.


More info can be found at : http://basarim09.ceng.metu.edu.tr/

Cheers,
Enis Söztutar



Re: Generating many small PNGs to Amazon S3 with MapReduce

2009-04-16 Thread tim robertson
> However, do the math on the costs for S3. We were doing something similar,
> and found that we were spending a fortune on our put requests at $0.01 per
> 1000, and next to nothing on storage. I've since moved to a more complicated
> model where I pack many small items in each object and store an index in
> simpledb. You'll need to partition your SimpleDBs if you do this.

Thanks a lot for Kevin for this - I stupidly overlooked the S3 put
cost thinking EC2->S3 transfer was free, without realising there is
still a PUT cost...

I will reconsider and look at copying your approach and compare it
with a few rendering EC2 instances running off mysql or so.

Thanks again.

Tim


Re: Interesting Hadoop/FUSE-DFS access patterns

2009-04-16 Thread Brian Bockelman

Hey Tom,

Yup, that's one of the things I've been looking at - however, it  
doesn't appear to be the likely culprit as to why data access is  
fairly random.  The time the operation took does not seem to be a  
factor of the number of bytes read, at least in the smaller range.


Brian

On Apr 16, 2009, at 5:17 AM, Tom White wrote:


Not sure if will affect your findings, but when you read from a
FSDataInputStream you should see how many bytes were actually read by
inspecting the return value and re-read if it was fewer than you want.
See Hadoop's IOUtils readFully() method.

Tom

On Mon, Apr 13, 2009 at 4:22 PM, Brian Bockelman  
 wrote:


Hey Todd,

Been playing more this morning after thinking about it for the  
night -- I
think the culprit is not the network, but actually the cache.   
Here's the
output of your script adjusted to do the same calls as I was doing  
(you had

left out the random I/O part).

[br...@red tmp]$ java hdfs_tester
Mean value for reads of size 0: 0.0447
Mean value for reads of size 16384: 10.4872
Mean value for reads of size 32768: 10.82925
Mean value for reads of size 49152: 6.2417
Mean value for reads of size 65536: 7.0511003
Mean value for reads of size 81920: 9.411599
Mean value for reads of size 98304: 9.378799
Mean value for reads of size 114688: 8.99065
Mean value for reads of size 131072: 5.1378503
Mean value for reads of size 147456: 6.1324
Mean value for reads of size 163840: 17.1187
Mean value for reads of size 180224: 6.5492
Mean value for reads of size 196608: 8.45695
Mean value for reads of size 212992: 7.4292
Mean value for reads of size 229376: 10.7843
Mean value for reads of size 245760: 9.29095
Mean value for reads of size 262144: 6.57865

Copy of the script below.

So, without the FUSE layer, we don't see much (if any) patterns  
here.  The
overhead of randomly skipping through the file is higher than the  
overhead

of reading out the data.

Upon further inspection, the biggest factor affecting the FUSE  
layer is
actually the Linux VFS caching -- if you notice, the bandwidth in  
the given
graph for larger read sizes is *higher* than 1Gbps, which is the  
limit of
the network on that particular node.  If I go in the opposite  
direction -
starting with the largest reads first, then going down to the  
smallest
reads, the graph entirely smooths out for the small values -  
everything is

read from the filesystem cache in the client RAM.  Graph attached.

So, on the upside, mounting through FUSE gives us the opportunity  
to speed
up reads for very complex, non-sequential patterns - for free,  
thanks to the
hardworking Linux kernel.  On the downside, it's incredibly  
difficult to
come up with simple cases to demonstrate performance for an  
application --
the cache performance and size depends on how much activity there's  
on the
client, the previous file system activity that the application did,  
and the
amount of concurrent activity on the server.  I can give you  
results for
performance, but it's not going to be the performance you see in  
real life.

 (Gee, if only file systems were easy...)

Ok, sorry for the list noise -- it seems I'm going to have to think  
more

about this problem before I can come up with something coherent.

Brian





import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.FileStatus;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.fs.FSDataInputStream;
import org.apache.hadoop.conf.Configuration;
import java.io.IOException;
import java.net.URI;
import java.util.Random;

public class hdfs_tester {
 public static void main(String[] args) throws Exception {
  URI uri = new URI("hdfs://hadoop-name:9000/");
  FileSystem fs = FileSystem.get(uri, new Configuration());
  Path path = new
Path("/user/uscms01/pnfs/unl.edu/data4/cms/store/phedex_monarctest/ 
Nebraska/LoadTest07_Nebraska_33");

  FSDataInputStream dis = fs.open(path);
  Random rand = new Random();
  FileStatus status = fs.getFileStatus(path);
  long file_len = status.getLen();
  int iters = 20;
  for (int size=0; size < 1024*1024; size += 4*4096) {
long csum = 0;
for (int i = 0; i < iters; i++) {
  int pos = rand.nextInt((int)((file_len-size-1)/8))*8;
  byte buf[] = new byte[size];
  if (pos < 0)
pos = 0;
  long st = System.nanoTime();
  dis.read(pos, buf, 0, size);
  long et = System.nanoTime();
  csum += et-st;
  //System.out.println(String.valueOf(size) + "\t" +  
String.valueOf(pos)

+ "\t" + String.valueOf(et - st));
}
float csum2 = csum; csum2 /= iters;
System.out.println("Mean value for reads of size " + size + ":  
" +

(csum2/1000/1000));
  }
  fs.close();
 }
}


On Apr 13, 2009, at 3:14 AM, Todd Lipcon wrote:

On Mon, Apr 13, 2009 at 1:07 AM, Todd Lipcon   
wrote:



Hey Brian,

This is really interesting stuff. I'm curious - have you tried  
these same

experiments using the Java API? I'm wondering whether this is
FUSE-specific
or inherent to all HDFS reads. I'll try to reprodu

Migration

2009-04-16 Thread Rakhi Khatwani
Hi,
 Incase we migrate from hadoop 0.19.0 and hbase 0.19.0 to hadoop 0.20.0
and hbase 0.20.0 respectively, how would it affect the existing data on
hadoop dfs and hbase tables? can we migrate the data using distcp only??

Regards
Raakhi


Re: Interesting Hadoop/FUSE-DFS access patterns

2009-04-16 Thread Tom White
Not sure if will affect your findings, but when you read from a
FSDataInputStream you should see how many bytes were actually read by
inspecting the return value and re-read if it was fewer than you want.
See Hadoop's IOUtils readFully() method.

Tom

On Mon, Apr 13, 2009 at 4:22 PM, Brian Bockelman  wrote:
>
> Hey Todd,
>
> Been playing more this morning after thinking about it for the night -- I
> think the culprit is not the network, but actually the cache.  Here's the
> output of your script adjusted to do the same calls as I was doing (you had
> left out the random I/O part).
>
> [br...@red tmp]$ java hdfs_tester
> Mean value for reads of size 0: 0.0447
> Mean value for reads of size 16384: 10.4872
> Mean value for reads of size 32768: 10.82925
> Mean value for reads of size 49152: 6.2417
> Mean value for reads of size 65536: 7.0511003
> Mean value for reads of size 81920: 9.411599
> Mean value for reads of size 98304: 9.378799
> Mean value for reads of size 114688: 8.99065
> Mean value for reads of size 131072: 5.1378503
> Mean value for reads of size 147456: 6.1324
> Mean value for reads of size 163840: 17.1187
> Mean value for reads of size 180224: 6.5492
> Mean value for reads of size 196608: 8.45695
> Mean value for reads of size 212992: 7.4292
> Mean value for reads of size 229376: 10.7843
> Mean value for reads of size 245760: 9.29095
> Mean value for reads of size 262144: 6.57865
>
> Copy of the script below.
>
> So, without the FUSE layer, we don't see much (if any) patterns here.  The
> overhead of randomly skipping through the file is higher than the overhead
> of reading out the data.
>
> Upon further inspection, the biggest factor affecting the FUSE layer is
> actually the Linux VFS caching -- if you notice, the bandwidth in the given
> graph for larger read sizes is *higher* than 1Gbps, which is the limit of
> the network on that particular node.  If I go in the opposite direction -
> starting with the largest reads first, then going down to the smallest
> reads, the graph entirely smooths out for the small values - everything is
> read from the filesystem cache in the client RAM.  Graph attached.
>
> So, on the upside, mounting through FUSE gives us the opportunity to speed
> up reads for very complex, non-sequential patterns - for free, thanks to the
> hardworking Linux kernel.  On the downside, it's incredibly difficult to
> come up with simple cases to demonstrate performance for an application --
> the cache performance and size depends on how much activity there's on the
> client, the previous file system activity that the application did, and the
> amount of concurrent activity on the server.  I can give you results for
> performance, but it's not going to be the performance you see in real life.
>  (Gee, if only file systems were easy...)
>
> Ok, sorry for the list noise -- it seems I'm going to have to think more
> about this problem before I can come up with something coherent.
>
> Brian
>
>
>
>
>
> import org.apache.hadoop.fs.FileSystem;
> import org.apache.hadoop.fs.FileStatus;
> import org.apache.hadoop.fs.Path;
> import org.apache.hadoop.fs.FSDataInputStream;
> import org.apache.hadoop.conf.Configuration;
> import java.io.IOException;
> import java.net.URI;
> import java.util.Random;
>
> public class hdfs_tester {
>  public static void main(String[] args) throws Exception {
>   URI uri = new URI("hdfs://hadoop-name:9000/");
>   FileSystem fs = FileSystem.get(uri, new Configuration());
>   Path path = new
> Path("/user/uscms01/pnfs/unl.edu/data4/cms/store/phedex_monarctest/Nebraska/LoadTest07_Nebraska_33");
>   FSDataInputStream dis = fs.open(path);
>   Random rand = new Random();
>   FileStatus status = fs.getFileStatus(path);
>   long file_len = status.getLen();
>   int iters = 20;
>   for (int size=0; size < 1024*1024; size += 4*4096) {
>     long csum = 0;
>     for (int i = 0; i < iters; i++) {
>       int pos = rand.nextInt((int)((file_len-size-1)/8))*8;
>       byte buf[] = new byte[size];
>       if (pos < 0)
>         pos = 0;
>       long st = System.nanoTime();
>       dis.read(pos, buf, 0, size);
>       long et = System.nanoTime();
>       csum += et-st;
>       //System.out.println(String.valueOf(size) + "\t" + String.valueOf(pos)
> + "\t" + String.valueOf(et - st));
>     }
>     float csum2 = csum; csum2 /= iters;
>     System.out.println("Mean value for reads of size " + size + ": " +
> (csum2/1000/1000));
>   }
>   fs.close();
>  }
> }
>
>
> On Apr 13, 2009, at 3:14 AM, Todd Lipcon wrote:
>
>> On Mon, Apr 13, 2009 at 1:07 AM, Todd Lipcon  wrote:
>>
>>> Hey Brian,
>>>
>>> This is really interesting stuff. I'm curious - have you tried these same
>>> experiments using the Java API? I'm wondering whether this is
>>> FUSE-specific
>>> or inherent to all HDFS reads. I'll try to reproduce this over here as
>>> well.
>>>
>>> This smells sort of nagle-related to me... if you get a chance, you may
>>> want to edit DFSClient.java and change TCP_WINDOW_SIZE to 256 * 10

Re: Generating many small PNGs to Amazon S3 with MapReduce

2009-04-16 Thread tim robertson
Hi Chuck,

Thank you very much for this opportunity.   I also think it is a nice
case study; it goes beyond the typical wordcount example by generating
something that people can actually see and play with immediately
afterwards (e.g. maps).  It is also showcasing nicely the community
effort to collectively bring together information on the worlds
biodiversity - the GBIF network really is a nice example of a free and
open access community who are collectively addressing interoperability
globally.  Can you please tell me what kind of time frame you would
need the case study in?

I have just got my Java PNG generation code down to 130msec on the
Mac, so I am pretty much ready to start running on EC2 and do the
volume tile generation, so will blog the whole thing on
http://biodivertido.blogspot.com at some point soon.  I have to travel
to the US on Saturday for a week so this will delay it somewhat.

What is not 100% clear to me is when to push to S3:
In the Map I will output the TileId-ZoomLevel-SpeciesId as the key,
along with the count, and in the Reduce I group the counts into larger
tiles, and create the PNG.  I could write to Sequencefile here... but
I suspect I could just push to the s3 bucket here also - as long as
the task tracker does not send the same Keys to multiple reduce tasks
- my Hadoop naivity showing here (I wrote an in memory threaded
MapReduceLite which does not compete reducers, but not got into the
Hadoop code quite so much yet).


Cheers,

Tim



On Thu, Apr 16, 2009 at 1:49 AM, Chuck Lam  wrote:
> Hi Tim,
>
> I'm really interested in your application at gbif.org. I'm in the middle of
> writing Hadoop in Action ( http://www.manning.com/lam/ ) and think this may
> make for an interesting hadoop case study, since you're taking advantage of
> a lot of different pieces (EC2, S3, cloudfront, SequenceFiles,
> PHP/streaming). Would you be interested in discussing making a 4-5 page case
> study out of this?
>
> As to your question, I don't know if it's been properly answered, but I
> don't know why you think that "multiple tasks are running on the same
> section of the sequence file." Maybe you can elaborate further and I'll see
> if I can offer any thoughts.
>
>
>
>
> On Tue, Apr 14, 2009 at 7:10 AM, tim robertson 
> wrote:
>>
>> Sorry Brian, can I just ask please...
>>
>> I have the PNGs in the Sequence file for my sample set.  If I use a
>> second MR job and push to S3 in the map, surely I run into the
>> scenario where multiple tasks are running on the same section of the
>> sequence file and thus pushing the same data to S3.  Am I missing
>> something obvious (e.g. can I disable this behavior)?
>>
>> Cheers
>>
>> Tim
>>
>>
>> On Tue, Apr 14, 2009 at 2:44 PM, tim robertson
>>  wrote:
>> > Thanks Brian,
>> >
>> > This is pretty much what I was looking for.
>> >
>> > Your calculations are correct but based on the assumption that at all
>> > zoom levels we will need all tiles generated.  Given the sparsity of
>> > data, it actually results in only a few 100GBs.  I'll run a second MR
>> > job with the map pushing to S3 then to make use of parallel loading.
>> >
>> > Cheers,
>> >
>> > Tim
>> >
>> >
>> > On Tue, Apr 14, 2009 at 2:37 PM, Brian Bockelman 
>> > wrote:
>> >> Hey Tim,
>> >>
>> >> Why don't you put the PNGs in a SequenceFile in the output of your
>> >> reduce
>> >> task?  You could then have a post-processing step that unpacks the PNG
>> >> and
>> >> places it onto S3.  (If my numbers are correct, you're looking at
>> >> around 3TB
>> >> of data; is this right?  With that much, you might want another
>> >> separate Map
>> >> task to unpack all the files in parallel ... really depends on the
>> >> throughput you get to Amazon)
>> >>
>> >> Brian
>> >>
>> >> On Apr 14, 2009, at 4:35 AM, tim robertson wrote:
>> >>
>> >>> Hi all,
>> >>>
>> >>> I am currently processing a lot of raw CSV data and producing a
>> >>> summary text file which I load into mysql.  On top of this I have a
>> >>> PHP application to generate tiles for google mapping (sample tile:
>> >>> http://eol-map.gbif.org/php/map/getEolTile.php?tile=0_0_0_13839800).
>> >>> Here is a (dev server) example of the final map client:
>> >>> http://eol-map.gbif.org/EOLSpeciesMap.html?taxon_id=13839800 - the
>> >>> dynamic grids as you zoom are all pre-calculated.
>> >>>
>> >>> I am considering (for better throughput as maps generate huge request
>> >>> volumes) pregenerating all my tiles (PNG) and storing them in S3 with
>> >>> cloudfront.  There will be billions of PNGs produced each at 1-3KB
>> >>> each.
>> >>>
>> >>> Could someone please recommend the best place to generate the PNGs and
>> >>> when to push them to S3 in a MR system?
>> >>> If I did the PNG generation and upload to S3 in the reduce the same
>> >>> task on multiple machines will compete with each other right?  Should
>> >>> I generate the PNGs to a local directory and then on Task success push
>> >>> the lot up?  I am assuming billions of 1-3KB files on HDFS is not a
>

Re: No space left on device Exception

2009-04-16 Thread Rakhi Khatwani
Thanks,
  I will check tht

Regards,
Raakhi

On Thu, Apr 16, 2009 at 1:42 PM, Miles Osborne  wrote:

> it may be that intermediate results are filling your disks and when
> the jobs crash, this all gets deleted.  so it would look like you have
> spare space when in reality you don't.
>
> i would check on the file system as your jobs run and see if indeed
> they are filling-up.
>
> Miles
>
> 2009/4/16 Rakhi Khatwani :
> > Hi,
> >following is the output on the df command
> > [r...@domu-12-31-39-00-e5-d2 conf]# df -h
> > FilesystemSize  Used Avail Use% Mounted on
> > /dev/sda1 9.9G  4.2G  5.2G  45% /
> > /dev/sdb  414G  924M  392G   1% /mnt
> >
> > from the o/p it seems that i have quite an amount of memory available.
> but i
> > still get the exception :(
> >
> > Thanks
> > Raakhi
> >
> > On Thu, Apr 16, 2009 at 1:18 PM, Desai, Milind B  >wrote:
> >
> >> From the exception it appears that there is no space left on machine.
> You
> >> can check using 'df'
> >>
> >> Thanks
> >> Milind
> >>
> >> -Original Message-
> >> From: Rakhi Khatwani [mailto:rakhi.khatw...@gmail.com]
> >> Sent: Thursday, April 16, 2009 1:15 PM
> >> To: hbase-u...@hadoop.apache.org; core-user@hadoop.apache.org
> >> Subject: No space left on device Exception
> >>
> >> Hi,
> >> I am running a map-reduce program on 6-Node ec2 cluster. and after a
> >> couple of hours all my tasks gets hanged.
> >>
> >> so i started digging into the logs
> >>
> >> there were no logs for regionserver
> >> no logs for tasktracker.
> >> However for jobtracker i get the following:
> >>
> >> 2009-04-16 03:00:29,691 INFO org.apache.hadoop.ipc.Server: IPC Server
> >> handler 9 on 50002, call
> >> heartbeat(org.apache.hadoop.mapred.tasktrackersta...@2eed7d11, false,
> >> true,
> >> 10745) from 10.254.27.79:44222: error: java.io.IOException:
> >> org.apache.hadoop.fs.FSError: java.io.IOException: No space left on
> device
> >> java.io.IOException: org.apache.hadoop.fs.FSError: java.io.IOException:
> No
> >> space left on device
> >>   at
> >>
> >>
> org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.write(RawLocalFileSystem.java:199)
> >>   at
> >> java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)
> >>   at
> java.io.BufferedOutputStream.write(BufferedOutputStream.java:109)
> >>   at
> >>
> >>
> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:47)
> >>   at java.io.DataOutputStream.write(DataOutputStream.java:90)
> >>   at
> >>
> >>
> org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSOutputSummer.writeChunk(ChecksumFileSystem.java:346)
> >>   at
> >>
> >>
> org.apache.hadoop.fs.FSOutputSummer.writeChecksumChunk(FSOutputSummer.java:150)
> >>   at
> >> org.apache.hadoop.fs.FSOutputSummer.write1(FSOutputSummer.java:100)
> >>   at
> org.apache.hadoop.fs.FSOutputSummer.write(FSOutputSummer.java:86)
> >>   at
> >>
> >>
> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:47)
> >>   at java.io.DataOutputStream.write(DataOutputStream.java:90)
> >>   at sun.nio.cs.StreamEncoder.writeBytes(StreamEncoder.java:202)
> >>   at sun.nio.cs.StreamEncoder.implClose(StreamEncoder.java:297)
> >>   at sun.nio.cs.StreamEncoder.close(StreamEncoder.java:130)
> >>   at java.io.OutputStreamWriter.close(OutputStreamWriter.java:216)
> >>   at java.io.BufferedWriter.close(BufferedWriter.java:248)
> >>   at java.io.PrintWriter.close(PrintWriter.java:295)
> >>   at
> >>
> >>
> org.apache.hadoop.mapred.JobHistory$JobInfo.logFinished(JobHistory.java:1024)
> >>   at
> >>
> org.apache.hadoop.mapred.JobInProgress.jobComplete(JobInProgress.java:1906)
> >>   at org.apache.hadoop.mapred.JobInProgress.comp
> >>
> >>
> >>
> >> following are the disk information on dfs UI
> >> domU-12-31-39-00-0C-A1<
> >>
> http://domu-12-31-39-00-0c-a1.compute-1.internal:50075/browseDirectory.jsp?namenodeInfoPort=50070&dir=%2F
> >> >0In
> >> Service413.380.8321.19391.360.2
> >> 94.672353 domU-12-31-39-00-16-F1<
> >>
> http://domu-12-31-39-00-16-f1.compute-1.internal:50075/browseDirectory.jsp?namenodeInfoPort=50070&dir=%2F
> >> >1In
> >> Service413.380.4621.24391.670.11
> >> 94.752399 domU-12-31-39-00-45-71<
> >>
> http://domu-12-31-39-00-45-71.compute-1.internal:50075/browseDirectory.jsp?namenodeInfoPort=50070&dir=%2F
> >> >1In
> >> Service413.380.6421.34391.40.16
> >> 94.682303 domU-12-31-39-00-E5-D2<
> >>
> http://domu-12-31-39-00-e5-d2.compute-1.internal:50075/browseDirectory.jsp?namenodeInfoPort=50070&dir=%2F
> >> >0In
> >> Service413.380.6621.53391.180.16
> >> 94.632319 domU-12-31-39-01-64-12<
> >>
> http://domu-12-31-39-01-64-12.compute-1.internal:50075/browseDirectory.jsp?namenodeInfoPort=50070&dir=%2F
> >> >2In
> >> Service413.380.6421.24391.490.16
> >> 94.712264 domU-12-31-39-01-78-D1<
> >>
> http://domu-12-31-39-01-78-d1.compute-1.internal:50075/browseDirectory.jsp?namenodeI

Re: Generating many small PNGs to Amazon S3 with MapReduce

2009-04-16 Thread tim robertson
Thanks Kevin,

"... well, you're doing it wrong." This is what I'm afraid of :o)

I know the TaskTracker for the Maps for example can run on the same
part of the input file but not so sure on the Reduce.  In the reduce,
will the same keys be run on multiple machines in competition?




On Thu, Apr 16, 2009 at 2:21 AM, Kevin Peterson  wrote:
> On Tue, Apr 14, 2009 at 2:35 AM, tim robertson 
> wrote:
>
>>
>> I am considering (for better throughput as maps generate huge request
>> volumes) pregenerating all my tiles (PNG) and storing them in S3 with
>> cloudfront.  There will be billions of PNGs produced each at 1-3KB
>> each.
>>
>
> Storing billions of PNGs each at 1-3kb each into S3 will be perfectly fine,
> there is no need to generate them and then push them at once, if you are
> storing them each in their own S3 object (which they must be, if you intend
> to fetch them using cloudfront). Each S3 object is unique, and can be
> written fully in parallel. If you are writing to the same S3 object twice,
> ... well, you're doing it wrong.
>
> However, do the math on the costs for S3. We were doing something similar,
> and found that we were spending a fortune on our put requests at $0.01 per
> 1000, and next to nothing on storage. I've since moved to a more complicated
> model where I pack many small items in each object and store an index in
> simpledb. You'll need to partition your SimpleDBs if you do this.
>


Re: No space left on device Exception

2009-04-16 Thread Miles Osborne
it may be that intermediate results are filling your disks and when
the jobs crash, this all gets deleted.  so it would look like you have
spare space when in reality you don't.

i would check on the file system as your jobs run and see if indeed
they are filling-up.

Miles

2009/4/16 Rakhi Khatwani :
> Hi,
>    following is the output on the df command
> [r...@domu-12-31-39-00-e5-d2 conf]# df -h
> Filesystem            Size  Used Avail Use% Mounted on
> /dev/sda1             9.9G  4.2G  5.2G  45% /
> /dev/sdb              414G  924M  392G   1% /mnt
>
> from the o/p it seems that i have quite an amount of memory available. but i
> still get the exception :(
>
> Thanks
> Raakhi
>
> On Thu, Apr 16, 2009 at 1:18 PM, Desai, Milind B wrote:
>
>> From the exception it appears that there is no space left on machine. You
>> can check using 'df'
>>
>> Thanks
>> Milind
>>
>> -Original Message-
>> From: Rakhi Khatwani [mailto:rakhi.khatw...@gmail.com]
>> Sent: Thursday, April 16, 2009 1:15 PM
>> To: hbase-u...@hadoop.apache.org; core-user@hadoop.apache.org
>> Subject: No space left on device Exception
>>
>> Hi,
>>     I am running a map-reduce program on 6-Node ec2 cluster. and after a
>> couple of hours all my tasks gets hanged.
>>
>> so i started digging into the logs
>>
>> there were no logs for regionserver
>> no logs for tasktracker.
>> However for jobtracker i get the following:
>>
>> 2009-04-16 03:00:29,691 INFO org.apache.hadoop.ipc.Server: IPC Server
>> handler 9 on 50002, call
>> heartbeat(org.apache.hadoop.mapred.tasktrackersta...@2eed7d11, false,
>> true,
>> 10745) from 10.254.27.79:44222: error: java.io.IOException:
>> org.apache.hadoop.fs.FSError: java.io.IOException: No space left on device
>> java.io.IOException: org.apache.hadoop.fs.FSError: java.io.IOException: No
>> space left on device
>>       at
>>
>> org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.write(RawLocalFileSystem.java:199)
>>       at
>> java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)
>>       at java.io.BufferedOutputStream.write(BufferedOutputStream.java:109)
>>       at
>>
>> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:47)
>>       at java.io.DataOutputStream.write(DataOutputStream.java:90)
>>       at
>>
>> org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSOutputSummer.writeChunk(ChecksumFileSystem.java:346)
>>       at
>>
>> org.apache.hadoop.fs.FSOutputSummer.writeChecksumChunk(FSOutputSummer.java:150)
>>       at
>> org.apache.hadoop.fs.FSOutputSummer.write1(FSOutputSummer.java:100)
>>       at org.apache.hadoop.fs.FSOutputSummer.write(FSOutputSummer.java:86)
>>       at
>>
>> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:47)
>>       at java.io.DataOutputStream.write(DataOutputStream.java:90)
>>       at sun.nio.cs.StreamEncoder.writeBytes(StreamEncoder.java:202)
>>       at sun.nio.cs.StreamEncoder.implClose(StreamEncoder.java:297)
>>       at sun.nio.cs.StreamEncoder.close(StreamEncoder.java:130)
>>       at java.io.OutputStreamWriter.close(OutputStreamWriter.java:216)
>>       at java.io.BufferedWriter.close(BufferedWriter.java:248)
>>       at java.io.PrintWriter.close(PrintWriter.java:295)
>>       at
>>
>> org.apache.hadoop.mapred.JobHistory$JobInfo.logFinished(JobHistory.java:1024)
>>       at
>> org.apache.hadoop.mapred.JobInProgress.jobComplete(JobInProgress.java:1906)
>>       at org.apache.hadoop.mapred.JobInProgress.comp
>>
>>
>>
>> following are the disk information on dfs UI
>> domU-12-31-39-00-0C-A1<
>> http://domu-12-31-39-00-0c-a1.compute-1.internal:50075/browseDirectory.jsp?namenodeInfoPort=50070&dir=%2F
>> >0In
>> Service413.380.8321.19391.360.2
>> 94.672353 domU-12-31-39-00-16-F1<
>> http://domu-12-31-39-00-16-f1.compute-1.internal:50075/browseDirectory.jsp?namenodeInfoPort=50070&dir=%2F
>> >1In
>> Service413.380.4621.24391.670.11
>> 94.752399 domU-12-31-39-00-45-71<
>> http://domu-12-31-39-00-45-71.compute-1.internal:50075/browseDirectory.jsp?namenodeInfoPort=50070&dir=%2F
>> >1In
>> Service413.380.6421.34391.40.16
>> 94.682303 domU-12-31-39-00-E5-D2<
>> http://domu-12-31-39-00-e5-d2.compute-1.internal:50075/browseDirectory.jsp?namenodeInfoPort=50070&dir=%2F
>> >0In
>> Service413.380.6621.53391.180.16
>> 94.632319 domU-12-31-39-01-64-12<
>> http://domu-12-31-39-01-64-12.compute-1.internal:50075/browseDirectory.jsp?namenodeInfoPort=50070&dir=%2F
>> >2In
>> Service413.380.6421.24391.490.16
>> 94.712264 domU-12-31-39-01-78-D1<
>> http://domu-12-31-39-01-78-d1.compute-1.internal:50075/browseDirectory.jsp?namenodeInfoPort=50070&dir=%2F
>> >0In
>> Service413.380.4921.24391.650.12
>> 94.741952
>>
>> I m using hadoop 0.19.0 and hbase 0.19.0
>>
>> n googling the error i came arcoss the JIRA issue
>> http://issues.apache.org/jira/browse/HADOOP-4163
>>
>> which says tht its been fixed in this version. :(
>>
>> Has anyone else come up with this exception?
>>
>> how do we check the maximum cap

Re: Map-Reduce Slow Down

2009-04-16 Thread Mithila Nagendra
Thanks Jason! Will check that out.
Mithila

On Thu, Apr 16, 2009 at 5:23 AM, jason hadoop wrote:

> Double check that there is no firewall in place.
> At one point a bunch of new machines were kickstarted and placed in a
> cluster and they all failed with something similar.
> It turned out the kickstart script turned enabled the firewall with a rule
> that blocked ports in the 50k range.
> It took us a while to even think to check that was not a part of our normal
> machine configuration
>
> On Wed, Apr 15, 2009 at 11:04 AM, Mithila Nagendra 
> wrote:
>
> > Hi Aaron
> > I will look into that thanks!
> >
> > I spoke to the admin who overlooks the cluster. He said that the gateway
> > comes in to the picture only when one of the nodes communicates with a
> node
> > outside of the cluster. But in my case the communication is carried out
> > between the nodes which all belong to the same cluster.
> >
> > Mithila
> >
> > On Wed, Apr 15, 2009 at 8:59 PM, Aaron Kimball 
> wrote:
> >
> > > Hi,
> > >
> > > I wrote a blog post a while back about connecting nodes via a gateway.
> > See
> > >
> >
> http://www.cloudera.com/blog/2008/12/03/securing-a-hadoop-cluster-through-a-gateway/
> > >
> > > This assumes that the client is outside the gateway and all
> > > datanodes/namenode are inside, but the same principles apply. You'll
> just
> > > need to set up ssh tunnels from every datanode to the namenode.
> > >
> > > - Aaron
> > >
> > >
> > > On Wed, Apr 15, 2009 at 10:19 AM, Ravi Phulari  > >wrote:
> > >
> > >> Looks like your NameNode is down .
> > >> Verify if hadoop process are running (   jps should show you all java
> > >> running process).
> > >> If your hadoop process are running try restarting your hadoop process
> .
> > >> I guess this problem is due to your fsimage not being correct .
> > >> You might have to format your namenode.
> > >> Hope this helps.
> > >>
> > >> Thanks,
> > >> --
> > >> Ravi
> > >>
> > >>
> > >> On 4/15/09 10:15 AM, "Mithila Nagendra"  wrote:
> > >>
> > >> The log file runs into thousands of line with the same message being
> > >> displayed every time.
> > >>
> > >> On Wed, Apr 15, 2009 at 8:10 PM, Mithila Nagendra 
> > >> wrote:
> > >>
> > >> > The log file : hadoop-mithila-datanode-node19.log.2009-04-14 has the
> > >> > following in it:
> > >> >
> > >> > 2009-04-14 10:08:11,499 INFO org.apache.hadoop.dfs.DataNode:
> > >> STARTUP_MSG:
> > >> > /
> > >> > STARTUP_MSG: Starting DataNode
> > >> > STARTUP_MSG:   host = node19/127.0.0.1
> > >> > STARTUP_MSG:   args = []
> > >> > STARTUP_MSG:   version = 0.18.3
> > >> > STARTUP_MSG:   build =
> > >> > https://svn.apache.org/repos/asf/hadoop/core/branches/branch-0.18-r
> > >> > 736250; compiled by 'ndaley' on Thu Jan 22 23:12:08 UTC 2009
> > >> > /
> > >> > 2009-04-14 10:08:12,915 INFO org.apache.hadoop.ipc.Client: Retrying
> > >> connect
> > >> > to server: node18/192.168.0.18:54310. Already tried 0 time(s).
> > >> > 2009-04-14 10:08:13,925 INFO org.apache.hadoop.ipc.Client: Retrying
> > >> connect
> > >> > to server: node18/192.168.0.18:54310. Already tried 1 time(s).
> > >> > 2009-04-14 10:08:14,935 INFO org.apache.hadoop.ipc.Client: Retrying
> > >> connect
> > >> > to server: node18/192.168.0.18:54310. Already tried 2 time(s).
> > >> > 2009-04-14 10:08:15,945 INFO org.apache.hadoop.ipc.Client: Retrying
> > >> connect
> > >> > to server: node18/192.168.0.18:54310. Already tried 3 time(s).
> > >> > 2009-04-14 10:08:16,955 INFO org.apache.hadoop.ipc.Client: Retrying
> > >> connect
> > >> > to server: node18/192.168.0.18:54310. Already tried 4 time(s).
> > >> > 2009-04-14 10:08:17,965 INFO org.apache.hadoop.ipc.Client: Retrying
> > >> connect
> > >> > to server: node18/192.168.0.18:54310. Already tried 5 time(s).
> > >> > 2009-04-14 10:08:18,975 INFO org.apache.hadoop.ipc.Client: Retrying
> > >> connect
> > >> > to server: node18/192.168.0.18:54310. Already tried 6 time(s).
> > >> > 2009-04-14 10:08:19,985 INFO org.apache.hadoop.ipc.Client: Retrying
> > >> connect
> > >> > to server: node18/192.168.0.18:54310. Already tried 7 time(s).
> > >> > 2009-04-14 10:08:20,995 INFO org.apache.hadoop.ipc.Client: Retrying
> > >> connect
> > >> > to server: node18/192.168.0.18:54310. Already tried 8 time(s).
> > >> > 2009-04-14 10:08:22,005 INFO org.apache.hadoop.ipc.Client: Retrying
> > >> connect
> > >> > to server: node18/192.168.0.18:54310. Already tried 9 time(s).
> > >> > 2009-04-14 10:08:22,008 INFO org.apache.hadoop.ipc.RPC: Server at
> > >> node18/
> > >> > 192.168.0.18:54310 not available yet, Z...
> > >> > 2009-04-14 10:08:24,025 INFO org.apache.hadoop.ipc.Client: Retrying
> > >> connect
> > >> > to server: node18/192.168.0.18:54310. Already tried 0 time(s).
> > >> > 2009-04-14 10:08:25,035 INFO org.apache.hadoop.ipc.Client: Retrying
> > >> connect
> > >> > to server: node18/192.168.0.18:54310. Already tried 1 time(s).
> > >

Re: No space left on device Exception

2009-04-16 Thread Rakhi Khatwani
Hi,
following is the output on the df command
[r...@domu-12-31-39-00-e5-d2 conf]# df -h
FilesystemSize  Used Avail Use% Mounted on
/dev/sda1 9.9G  4.2G  5.2G  45% /
/dev/sdb  414G  924M  392G   1% /mnt

from the o/p it seems that i have quite an amount of memory available. but i
still get the exception :(

Thanks
Raakhi

On Thu, Apr 16, 2009 at 1:18 PM, Desai, Milind B wrote:

> From the exception it appears that there is no space left on machine. You
> can check using 'df'
>
> Thanks
> Milind
>
> -Original Message-
> From: Rakhi Khatwani [mailto:rakhi.khatw...@gmail.com]
> Sent: Thursday, April 16, 2009 1:15 PM
> To: hbase-u...@hadoop.apache.org; core-user@hadoop.apache.org
> Subject: No space left on device Exception
>
> Hi,
> I am running a map-reduce program on 6-Node ec2 cluster. and after a
> couple of hours all my tasks gets hanged.
>
> so i started digging into the logs
>
> there were no logs for regionserver
> no logs for tasktracker.
> However for jobtracker i get the following:
>
> 2009-04-16 03:00:29,691 INFO org.apache.hadoop.ipc.Server: IPC Server
> handler 9 on 50002, call
> heartbeat(org.apache.hadoop.mapred.tasktrackersta...@2eed7d11, false,
> true,
> 10745) from 10.254.27.79:44222: error: java.io.IOException:
> org.apache.hadoop.fs.FSError: java.io.IOException: No space left on device
> java.io.IOException: org.apache.hadoop.fs.FSError: java.io.IOException: No
> space left on device
>   at
>
> org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.write(RawLocalFileSystem.java:199)
>   at
> java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)
>   at java.io.BufferedOutputStream.write(BufferedOutputStream.java:109)
>   at
>
> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:47)
>   at java.io.DataOutputStream.write(DataOutputStream.java:90)
>   at
>
> org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSOutputSummer.writeChunk(ChecksumFileSystem.java:346)
>   at
>
> org.apache.hadoop.fs.FSOutputSummer.writeChecksumChunk(FSOutputSummer.java:150)
>   at
> org.apache.hadoop.fs.FSOutputSummer.write1(FSOutputSummer.java:100)
>   at org.apache.hadoop.fs.FSOutputSummer.write(FSOutputSummer.java:86)
>   at
>
> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:47)
>   at java.io.DataOutputStream.write(DataOutputStream.java:90)
>   at sun.nio.cs.StreamEncoder.writeBytes(StreamEncoder.java:202)
>   at sun.nio.cs.StreamEncoder.implClose(StreamEncoder.java:297)
>   at sun.nio.cs.StreamEncoder.close(StreamEncoder.java:130)
>   at java.io.OutputStreamWriter.close(OutputStreamWriter.java:216)
>   at java.io.BufferedWriter.close(BufferedWriter.java:248)
>   at java.io.PrintWriter.close(PrintWriter.java:295)
>   at
>
> org.apache.hadoop.mapred.JobHistory$JobInfo.logFinished(JobHistory.java:1024)
>   at
> org.apache.hadoop.mapred.JobInProgress.jobComplete(JobInProgress.java:1906)
>   at org.apache.hadoop.mapred.JobInProgress.comp
>
>
>
> following are the disk information on dfs UI
> domU-12-31-39-00-0C-A1<
> http://domu-12-31-39-00-0c-a1.compute-1.internal:50075/browseDirectory.jsp?namenodeInfoPort=50070&dir=%2F
> >0In
> Service413.380.8321.19391.360.2
> 94.672353 domU-12-31-39-00-16-F1<
> http://domu-12-31-39-00-16-f1.compute-1.internal:50075/browseDirectory.jsp?namenodeInfoPort=50070&dir=%2F
> >1In
> Service413.380.4621.24391.670.11
> 94.752399 domU-12-31-39-00-45-71<
> http://domu-12-31-39-00-45-71.compute-1.internal:50075/browseDirectory.jsp?namenodeInfoPort=50070&dir=%2F
> >1In
> Service413.380.6421.34391.40.16
> 94.682303 domU-12-31-39-00-E5-D2<
> http://domu-12-31-39-00-e5-d2.compute-1.internal:50075/browseDirectory.jsp?namenodeInfoPort=50070&dir=%2F
> >0In
> Service413.380.6621.53391.180.16
> 94.632319 domU-12-31-39-01-64-12<
> http://domu-12-31-39-01-64-12.compute-1.internal:50075/browseDirectory.jsp?namenodeInfoPort=50070&dir=%2F
> >2In
> Service413.380.6421.24391.490.16
> 94.712264 domU-12-31-39-01-78-D1<
> http://domu-12-31-39-01-78-d1.compute-1.internal:50075/browseDirectory.jsp?namenodeInfoPort=50070&dir=%2F
> >0In
> Service413.380.4921.24391.650.12
> 94.741952
>
> I m using hadoop 0.19.0 and hbase 0.19.0
>
> n googling the error i came arcoss the JIRA issue
> http://issues.apache.org/jira/browse/HADOOP-4163
>
> which says tht its been fixed in this version. :(
>
> Has anyone else come up with this exception?
>
> how do we check the maximum capacity for usable dfs and non usable dfs.
> Thanks
> Raakhi,
>


RE: No space left on device Exception

2009-04-16 Thread Desai, Milind B
>From the exception it appears that there is no space left on machine. You can 
>check using 'df'

Thanks
Milind 

-Original Message-
From: Rakhi Khatwani [mailto:rakhi.khatw...@gmail.com] 
Sent: Thursday, April 16, 2009 1:15 PM
To: hbase-u...@hadoop.apache.org; core-user@hadoop.apache.org
Subject: No space left on device Exception

Hi,
 I am running a map-reduce program on 6-Node ec2 cluster. and after a
couple of hours all my tasks gets hanged.

so i started digging into the logs

there were no logs for regionserver
no logs for tasktracker.
However for jobtracker i get the following:

2009-04-16 03:00:29,691 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 9 on 50002, call
heartbeat(org.apache.hadoop.mapred.tasktrackersta...@2eed7d11, false, true,
10745) from 10.254.27.79:44222: error: java.io.IOException:
org.apache.hadoop.fs.FSError: java.io.IOException: No space left on device
java.io.IOException: org.apache.hadoop.fs.FSError: java.io.IOException: No
space left on device
   at
org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.write(RawLocalFileSystem.java:199)
   at
java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)
   at java.io.BufferedOutputStream.write(BufferedOutputStream.java:109)
   at
org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:47)
   at java.io.DataOutputStream.write(DataOutputStream.java:90)
   at
org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSOutputSummer.writeChunk(ChecksumFileSystem.java:346)
   at
org.apache.hadoop.fs.FSOutputSummer.writeChecksumChunk(FSOutputSummer.java:150)
   at
org.apache.hadoop.fs.FSOutputSummer.write1(FSOutputSummer.java:100)
   at org.apache.hadoop.fs.FSOutputSummer.write(FSOutputSummer.java:86)
   at
org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:47)
   at java.io.DataOutputStream.write(DataOutputStream.java:90)
   at sun.nio.cs.StreamEncoder.writeBytes(StreamEncoder.java:202)
   at sun.nio.cs.StreamEncoder.implClose(StreamEncoder.java:297)
   at sun.nio.cs.StreamEncoder.close(StreamEncoder.java:130)
   at java.io.OutputStreamWriter.close(OutputStreamWriter.java:216)
   at java.io.BufferedWriter.close(BufferedWriter.java:248)
   at java.io.PrintWriter.close(PrintWriter.java:295)
   at
org.apache.hadoop.mapred.JobHistory$JobInfo.logFinished(JobHistory.java:1024)
   at
org.apache.hadoop.mapred.JobInProgress.jobComplete(JobInProgress.java:1906)
   at org.apache.hadoop.mapred.JobInProgress.comp



following are the disk information on dfs UI
domU-12-31-39-00-0C-A10In
Service413.380.8321.19391.360.2
94.672353 
domU-12-31-39-00-16-F11In
Service413.380.4621.24391.670.11
94.752399 
domU-12-31-39-00-45-711In
Service413.380.6421.34391.40.16
94.682303 
domU-12-31-39-00-E5-D20In
Service413.380.6621.53391.180.16
94.632319 
domU-12-31-39-01-64-122In
Service413.380.6421.24391.490.16
94.712264 
domU-12-31-39-01-78-D10In
Service413.380.4921.24391.650.12
94.741952

I m using hadoop 0.19.0 and hbase 0.19.0

n googling the error i came arcoss the JIRA issue
http://issues.apache.org/jira/browse/HADOOP-4163

which says tht its been fixed in this version. :(

Has anyone else come up with this exception?

how do we check the maximum capacity for usable dfs and non usable dfs.
Thanks
Raakhi,


No space left on device Exception

2009-04-16 Thread Rakhi Khatwani
Hi,
 I am running a map-reduce program on 6-Node ec2 cluster. and after a
couple of hours all my tasks gets hanged.

so i started digging into the logs

there were no logs for regionserver
no logs for tasktracker.
However for jobtracker i get the following:

2009-04-16 03:00:29,691 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 9 on 50002, call
heartbeat(org.apache.hadoop.mapred.tasktrackersta...@2eed7d11, false, true,
10745) from 10.254.27.79:44222: error: java.io.IOException:
org.apache.hadoop.fs.FSError: java.io.IOException: No space left on device
java.io.IOException: org.apache.hadoop.fs.FSError: java.io.IOException: No
space left on device
   at
org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.write(RawLocalFileSystem.java:199)
   at
java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)
   at java.io.BufferedOutputStream.write(BufferedOutputStream.java:109)
   at
org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:47)
   at java.io.DataOutputStream.write(DataOutputStream.java:90)
   at
org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSOutputSummer.writeChunk(ChecksumFileSystem.java:346)
   at
org.apache.hadoop.fs.FSOutputSummer.writeChecksumChunk(FSOutputSummer.java:150)
   at
org.apache.hadoop.fs.FSOutputSummer.write1(FSOutputSummer.java:100)
   at org.apache.hadoop.fs.FSOutputSummer.write(FSOutputSummer.java:86)
   at
org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:47)
   at java.io.DataOutputStream.write(DataOutputStream.java:90)
   at sun.nio.cs.StreamEncoder.writeBytes(StreamEncoder.java:202)
   at sun.nio.cs.StreamEncoder.implClose(StreamEncoder.java:297)
   at sun.nio.cs.StreamEncoder.close(StreamEncoder.java:130)
   at java.io.OutputStreamWriter.close(OutputStreamWriter.java:216)
   at java.io.BufferedWriter.close(BufferedWriter.java:248)
   at java.io.PrintWriter.close(PrintWriter.java:295)
   at
org.apache.hadoop.mapred.JobHistory$JobInfo.logFinished(JobHistory.java:1024)
   at
org.apache.hadoop.mapred.JobInProgress.jobComplete(JobInProgress.java:1906)
   at org.apache.hadoop.mapred.JobInProgress.comp



following are the disk information on dfs UI
domU-12-31-39-00-0C-A10In
Service413.380.8321.19391.360.2
94.672353 
domU-12-31-39-00-16-F11In
Service413.380.4621.24391.670.11
94.752399 
domU-12-31-39-00-45-711In
Service413.380.6421.34391.40.16
94.682303 
domU-12-31-39-00-E5-D20In
Service413.380.6621.53391.180.16
94.632319 
domU-12-31-39-01-64-122In
Service413.380.6421.24391.490.16
94.712264 
domU-12-31-39-01-78-D10In
Service413.380.4921.24391.650.12
94.741952

I m using hadoop 0.19.0 and hbase 0.19.0

n googling the error i came arcoss the JIRA issue
http://issues.apache.org/jira/browse/HADOOP-4163

which says tht its been fixed in this version. :(

Has anyone else come up with this exception?

how do we check the maximum capacity for usable dfs and non usable dfs.
Thanks
Raakhi,


Re: Encoding problem.

2009-04-16 Thread Edward J. Yoon
My typos, using TextInputFormat (UTF-8)

On Thu, Apr 16, 2009 at 4:18 PM, Edward J. Yoon  wrote:
> Hi,
>
> I wanted to read the data in EUC-KR format using UTF-8, so I set a up
> a JVM parameter -Dfile.encoding=EUC-KR in the HADOOP_OPTS. But, it did
> not work. Is there any other method than coding my own input format?
>
> --
> Best Regards, Edward J. Yoon
> edwardy...@apache.org
> http://blog.udanax.org
>



-- 
Best Regards, Edward J. Yoon
edwardy...@apache.org
http://blog.udanax.org


Encoding problem.

2009-04-16 Thread Edward J. Yoon
Hi,

I wanted to read the data in EUC-KR format using UTF-8, so I set a up
a JVM parameter -Dfile.encoding=EUC-KR in the HADOOP_OPTS. But, it did
not work. Is there any other method than coding my own input format?

-- 
Best Regards, Edward J. Yoon
edwardy...@apache.org
http://blog.udanax.org