Re: Inconsistent times in Hadoop web interface

2010-12-27 Thread yipeng
Ahh... I don't manage my cluster but you were spot on. Now I know who to
follow up with.

Thanks!!

Yipeng

On Mon, Dec 27, 2010 at 3:55 PM, Harsh J qwertyman...@gmail.com wrote:

 Hey,

 On Mon, Dec 27, 2010 at 10:44 AM, yipeng yip...@gmail.com wrote:
  Hi guys,
 
  I am having some inconsistent timing in the web interface. The job finish
  time as below is 47 secs but the Map  Reduce took significantly longer.
 I
  don't think I did anything that could have caused this. Any ideas what
 might
  have?
 

 Are your cluster nodes' clocks synced right (via ntpd, etc.)?

 --
 Harsh J
 www.harshj.com



Re: Hadoop/Elastic MR on AWS

2010-12-27 Thread Sudhir Vallamkondu
We recently crossed this bridge and here are some insights. We did an
extensive study comparing costs and benchmarking local vs EMR for our
current needs and future trend.

- Scalability you get with EMR is unmatched although you need to look at
your requirement and decide this is something you need.

- When using EMR its cheaper to use reserved instances vs nodes on the fly.
You can always add more nodes when required. I suggest looking at your
current computing needs and reserve instances for a year or two and use
these to run EMR and add nodes at peak needs. In your cost estimation you
will need to factor in the data transfer time/costs unless you are dealing
with public datasets on S3

- EMR fared similar to local cluster on CPU benchmarks (we used MRBench to
benchmark map/reduce) however IO benchmarks were slow on EMR (used DFSIO
benchmark). For IO intensive jobs you will need to add more nodes to
compensate this.

- When compared to local cluster, you will need to factor the time it takes
for the EMR cluster to setup when starting a job. This like data transfer
time, cluster replication time etc

- EMR API is very flexible however you will need to build a custom interface
on top of it to suit your job management and monitoring needs

- EMR bootstrap actions can satisfy most of your native lib needs so no
drawbacks there.


-- Sudhir


On 12/26/10 5:26 AM, common-user-digest-h...@hadoop.apache.org
common-user-digest-h...@hadoop.apache.org wrote:

 From: Otis Gospodnetic otis_gospodne...@yahoo.com
 Date: Fri, 24 Dec 2010 04:41:46 -0800 (PST)
 To: common-user@hadoop.apache.org
 Subject: Re: Hadoop/Elastic MR on AWS
 
 Hello Amandeep,
 
 
 
 - Original Message 
 From: Amandeep Khurana ama...@gmail.com
 To: common-user@hadoop.apache.org
 Sent: Fri, December 10, 2010 1:14:45 AM
 Subject: Re: Hadoop/Elastic MR on AWS
 
 Mark,
 
 Using EMR makes it very easy to start a cluster and add/reduce  capacity as
 and when required. There are certain optimizations that make EMR  an
 attractive choice as compared to building your own cluster out. Using  EMR
 
 
 Could you please point out what optimizations you are referring to?
 
 Thanks,
 Otis
 
 Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - Hadoop - HBase
 Hadoop ecosystem search :: http://search-hadoop.com/
 
 also ensures you are using a production quality, stable system backed by  the
 EMR engineers. You can always use bootstrap actions to put your own  tweaked
 version of Hadoop in there if you want to do that.
 
 Also, you  don't have to tear down your cluster after every job. You can set
 the alive  option when you start your cluster and it will stay there even
 after your  Hadoop job completes.
 
 If you face any issues with EMR, send me a mail  offline and I'll be happy to
 help.
 
 -Amandeep
 
 
 On Thu, Dec 9,  2010 at 9:47 PM, Mark static.void@gmail.com  wrote:
 
 Does anyone have any thoughts/experiences on running Hadoop  in AWS? What
 are some pros/cons?
 
 Are there any good  AMI's out there for this?
 
 Thanks for any advice.
 
 


iCrossing Privileged and Confidential Information
This email message is for the sole use of the intended recipient(s) and may 
contain confidential and privileged information of iCrossing. Any unauthorized 
review, use, disclosure or distribution is prohibited. If you are not the 
intended recipient, please contact the sender by reply email and destroy all 
copies of the original message.




Re: Hadoop/Elastic MR on AWS

2010-12-27 Thread Dave Viner
Hi Sudhir,

Can you publish your findings around pricing, and how you calculated the
various aspects?

This is great information.

Thanks
Dave Viner


On Mon, Dec 27, 2010 at 10:17 AM, Sudhir Vallamkondu 
sudhir.vallamko...@icrossing.com wrote:

 We recently crossed this bridge and here are some insights. We did an
 extensive study comparing costs and benchmarking local vs EMR for our
 current needs and future trend.

 - Scalability you get with EMR is unmatched although you need to look at
 your requirement and decide this is something you need.

 - When using EMR its cheaper to use reserved instances vs nodes on the fly.
 You can always add more nodes when required. I suggest looking at your
 current computing needs and reserve instances for a year or two and use
 these to run EMR and add nodes at peak needs. In your cost estimation you
 will need to factor in the data transfer time/costs unless you are dealing
 with public datasets on S3

 - EMR fared similar to local cluster on CPU benchmarks (we used MRBench to
 benchmark map/reduce) however IO benchmarks were slow on EMR (used DFSIO
 benchmark). For IO intensive jobs you will need to add more nodes to
 compensate this.

 - When compared to local cluster, you will need to factor the time it takes
 for the EMR cluster to setup when starting a job. This like data transfer
 time, cluster replication time etc

 - EMR API is very flexible however you will need to build a custom
 interface
 on top of it to suit your job management and monitoring needs

 - EMR bootstrap actions can satisfy most of your native lib needs so no
 drawbacks there.


 -- Sudhir


 On 12/26/10 5:26 AM, common-user-digest-h...@hadoop.apache.org
 common-user-digest-h...@hadoop.apache.org wrote:

  From: Otis Gospodnetic otis_gospodne...@yahoo.com
  Date: Fri, 24 Dec 2010 04:41:46 -0800 (PST)
  To: common-user@hadoop.apache.org
  Subject: Re: Hadoop/Elastic MR on AWS
 
  Hello Amandeep,
 
 
 
  - Original Message 
  From: Amandeep Khurana ama...@gmail.com
  To: common-user@hadoop.apache.org
  Sent: Fri, December 10, 2010 1:14:45 AM
  Subject: Re: Hadoop/Elastic MR on AWS
 
  Mark,
 
  Using EMR makes it very easy to start a cluster and add/reduce  capacity
 as
  and when required. There are certain optimizations that make EMR  an
  attractive choice as compared to building your own cluster out. Using
  EMR
 
 
  Could you please point out what optimizations you are referring to?
 
  Thanks,
  Otis
  
  Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - Hadoop -
 HBase
  Hadoop ecosystem search :: http://search-hadoop.com/
 
  also ensures you are using a production quality, stable system backed by
  the
  EMR engineers. You can always use bootstrap actions to put your own
  tweaked
  version of Hadoop in there if you want to do that.
 
  Also, you  don't have to tear down your cluster after every job. You can
 set
  the alive  option when you start your cluster and it will stay there
 even
  after your  Hadoop job completes.
 
  If you face any issues with EMR, send me a mail  offline and I'll be
 happy to
  help.
 
  -Amandeep
 
 
  On Thu, Dec 9,  2010 at 9:47 PM, Mark static.void@gmail.com
  wrote:
 
  Does anyone have any thoughts/experiences on running Hadoop  in AWS?
 What
  are some pros/cons?
 
  Are there any good  AMI's out there for this?
 
  Thanks for any advice.
 
 


 iCrossing Privileged and Confidential Information
 This email message is for the sole use of the intended recipient(s) and may
 contain confidential and privileged information of iCrossing. Any
 unauthorized review, use, disclosure or distribution is prohibited. If you
 are not the intended recipient, please contact the sender by reply email and
 destroy all copies of the original message.





Re: Hadoop/Elastic MR on AWS

2010-12-27 Thread James Seigel
Thank you for sharing.

Sent from my mobile. Please excuse the typos.

On 2010-12-27, at 11:18 AM, Sudhir Vallamkondu
sudhir.vallamko...@icrossing.com wrote:

 We recently crossed this bridge and here are some insights. We did an
 extensive study comparing costs and benchmarking local vs EMR for our
 current needs and future trend.

 - Scalability you get with EMR is unmatched although you need to look at
 your requirement and decide this is something you need.

 - When using EMR its cheaper to use reserved instances vs nodes on the fly.
 You can always add more nodes when required. I suggest looking at your
 current computing needs and reserve instances for a year or two and use
 these to run EMR and add nodes at peak needs. In your cost estimation you
 will need to factor in the data transfer time/costs unless you are dealing
 with public datasets on S3

 - EMR fared similar to local cluster on CPU benchmarks (we used MRBench to
 benchmark map/reduce) however IO benchmarks were slow on EMR (used DFSIO
 benchmark). For IO intensive jobs you will need to add more nodes to
 compensate this.

 - When compared to local cluster, you will need to factor the time it takes
 for the EMR cluster to setup when starting a job. This like data transfer
 time, cluster replication time etc

 - EMR API is very flexible however you will need to build a custom interface
 on top of it to suit your job management and monitoring needs

 - EMR bootstrap actions can satisfy most of your native lib needs so no
 drawbacks there.


 -- Sudhir


 On 12/26/10 5:26 AM, common-user-digest-h...@hadoop.apache.org
 common-user-digest-h...@hadoop.apache.org wrote:

 From: Otis Gospodnetic otis_gospodne...@yahoo.com
 Date: Fri, 24 Dec 2010 04:41:46 -0800 (PST)
 To: common-user@hadoop.apache.org
 Subject: Re: Hadoop/Elastic MR on AWS

 Hello Amandeep,



 - Original Message 
 From: Amandeep Khurana ama...@gmail.com
 To: common-user@hadoop.apache.org
 Sent: Fri, December 10, 2010 1:14:45 AM
 Subject: Re: Hadoop/Elastic MR on AWS

 Mark,

 Using EMR makes it very easy to start a cluster and add/reduce  capacity as
 and when required. There are certain optimizations that make EMR  an
 attractive choice as compared to building your own cluster out. Using  EMR


 Could you please point out what optimizations you are referring to?

 Thanks,
 Otis
 
 Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - Hadoop - HBase
 Hadoop ecosystem search :: http://search-hadoop.com/

 also ensures you are using a production quality, stable system backed by  
 the
 EMR engineers. You can always use bootstrap actions to put your own  tweaked
 version of Hadoop in there if you want to do that.

 Also, you  don't have to tear down your cluster after every job. You can set
 the alive  option when you start your cluster and it will stay there even
 after your  Hadoop job completes.

 If you face any issues with EMR, send me a mail  offline and I'll be happy 
 to
 help.

 -Amandeep


 On Thu, Dec 9,  2010 at 9:47 PM, Mark static.void@gmail.com  wrote:

 Does anyone have any thoughts/experiences on running Hadoop  in AWS? What
 are some pros/cons?

 Are there any good  AMI's out there for this?

 Thanks for any advice.




 iCrossing Privileged and Confidential Information
 This email message is for the sole use of the intended recipient(s) and may 
 contain confidential and privileged information of iCrossing. Any 
 unauthorized review, use, disclosure or distribution is prohibited. If you 
 are not the intended recipient, please contact the sender by reply email and 
 destroy all copies of the original message.




UI doesn't work

2010-12-27 Thread maha
Hi,

  I get Error 404 when I try to use hadoop UI to monitor my job execution. I'm 
using Hadoop-0.20.2 and the following are parts of my configuration files.

 in Core-site.xml:
namefs.default.name/name
valuehdfs://speed.cs.ucsb.edu:9000/value

in mapred-site.xml:
namemapred.job.tracker/name
valuespeed.cs.ucsb.edu:9001/value


when I try to open:  http://speed.cs.ucsb.edu:50070/   I get the 404 Error.


Any ideas?

  Thank you,
 Maha



Re: UI doesn't work

2010-12-27 Thread James Seigel
Two quick questions first.

Is the job tracker running on that machine?
Is there a firewall in the way?

James

Sent from my mobile. Please excuse the typos.

On 2010-12-27, at 4:46 PM, maha m...@umail.ucsb.edu wrote:

 Hi,

  I get Error 404 when I try to use hadoop UI to monitor my job execution. I'm 
 using Hadoop-0.20.2 and the following are parts of my configuration files.

 in Core-site.xml:
namefs.default.name/name
valuehdfs://speed.cs.ucsb.edu:9000/value

 in mapred-site.xml:
namemapred.job.tracker/name
valuespeed.cs.ucsb.edu:9001/value


 when I try to open:  http://speed.cs.ucsb.edu:50070/   I get the 404 Error.


 Any ideas?

  Thank you,
 Maha



Hadoop RPC call response post processing

2010-12-27 Thread Stefan Groschupf
Hi All, 
I'm browsing the RPC code since quite a while now trying to find any entry 
point / interceptor slot that allows me to handle a RPC call response writable 
after it was send over the wire.
Does anybody has an idea how break into the RPC code from outside. All the 
interesting methods are private. :(

Background:
Heavy use of the RPC allocates hugh amount of Writable objects. We saw in 
multiple systems  that the garbage collect can get so busy that the jvm almost 
freezes for seconds. Things like zookeeper sessions time out in that cases.
My idea is to create an object pool for writables. Borrowing an object from the 
pool is simple since this happen in our custom code, though we do know when the 
writable return was send over the wire and can be returned into the pool.
A dirty hack would be to overwrite the write(out) method in the writable, 
assuming that is the last thing done with the writable, though turns out that 
this method is called in other cases too, e.g. to measure throughput.

Any ideas?

Thanks, 
Stefan

Re: UI doesn't work

2010-12-27 Thread Harsh J
I remember facing such an issue with the JT (50030) once. None of the
jsp pages would load, 'cept the index. It was some odd issue with the
webapps not getting loaded right while startup. Don't quite remember
how it got solved.

Did you do any ant operation on your release copy of Hadoop prior to
starting it, by the way?

On Tue, Dec 28, 2010 at 5:15 AM, maha m...@umail.ucsb.edu wrote:
 Hi,

  I get Error 404 when I try to use hadoop UI to monitor my job execution. I'm 
 using Hadoop-0.20.2 and the following are parts of my configuration files.

  in Core-site.xml:
    namefs.default.name/name
    valuehdfs://speed.cs.ucsb.edu:9000/value

 in mapred-site.xml:
    namemapred.job.tracker/name
    valuespeed.cs.ucsb.edu:9001/value


 when I try to open:  http://speed.cs.ucsb.edu:50070/   I get the 404 Error.


 Any ideas?

  Thank you,
     Maha





-- 
Harsh J
www.harshj.com


Re: Hadoop RPC call response post processing

2010-12-27 Thread Todd Lipcon
Hi Stefan,

Sounds interesting.

Maybe you're looking for o.a.h.ipc.Server$Responder?

-Todd

On Mon, Dec 27, 2010 at 8:07 PM, Stefan Groschupf s...@101tec.com wrote:

 Hi All,
 I'm browsing the RPC code since quite a while now trying to find any entry
 point / interceptor slot that allows me to handle a RPC call response
 writable after it was send over the wire.
 Does anybody has an idea how break into the RPC code from outside. All the
 interesting methods are private. :(

 Background:
 Heavy use of the RPC allocates hugh amount of Writable objects. We saw in
 multiple systems  that the garbage collect can get so busy that the jvm
 almost freezes for seconds. Things like zookeeper sessions time out in that
 cases.
 My idea is to create an object pool for writables. Borrowing an object from
 the pool is simple since this happen in our custom code, though we do know
 when the writable return was send over the wire and can be returned into the
 pool.
 A dirty hack would be to overwrite the write(out) method in the writable,
 assuming that is the last thing done with the writable, though turns out
 that this method is called in other cases too, e.g. to measure throughput.

 Any ideas?

 Thanks,
 Stefan




-- 
Todd Lipcon
Software Engineer, Cloudera


Re: UI doesn't work

2010-12-27 Thread Adarsh Sharma

maha wrote:

Hi,

  I get Error 404 when I try to use hadoop UI to monitor my job execution. I'm 
using Hadoop-0.20.2 and the following are parts of my configuration files.

 in Core-site.xml:
namefs.default.name/name
valuehdfs://speed.cs.ucsb.edu:9000/value

in mapred-site.xml:
namemapred.job.tracker/name
valuespeed.cs.ucsb.edu:9001/value


when I try to open:  http://speed.cs.ucsb.edu:50070/   I get the 404 Error.


Any ideas?

  Thank you,
 Maha


  

Check the logs of namenode and jobtracker and post their listings.

Best Regards

Adarsh


Re: Hadoop RPC call response post processing

2010-12-27 Thread Ted Dunning
I would be very surprised if allocation itself is the problem as opposed to
good old fashioned excess copying.

It is very hard to write an allocator faster than the java generational gc,
especially if you are talking about objects that are ephemeral.

Have you looked at the tenuring distribution?

On Mon, Dec 27, 2010 at 8:07 PM, Stefan Groschupf s...@101tec.com wrote:

 Hi All,
 I'm browsing the RPC code since quite a while now trying to find any entry
 point / interceptor slot that allows me to handle a RPC call response
 writable after it was send over the wire.
 Does anybody has an idea how break into the RPC code from outside. All the
 interesting methods are private. :(

 Background:
 Heavy use of the RPC allocates hugh amount of Writable objects. We saw in
 multiple systems  that the garbage collect can get so busy that the jvm
 almost freezes for seconds. Things like zookeeper sessions time out in that
 cases.
 My idea is to create an object pool for writables. Borrowing an object from
 the pool is simple since this happen in our custom code, though we do know
 when the writable return was send over the wire and can be returned into the
 pool.
 A dirty hack would be to overwrite the write(out) method in the writable,
 assuming that is the last thing done with the writable, though turns out
 that this method is called in other cases too, e.g. to measure throughput.

 Any ideas?

 Thanks,
 Stefan