RE: aws

2010-02-08 Thread Sanjay Sharma
Hi Prasen,
Amazon Elastic MapReduce is a customized environment/platform for executing 
MapReduce similar to deploying/running a WAR file by dropping it in a J2EE 
container.
So you create MapReduce jar file, upload it onto Elatic mapReduce, setup 
additional parameters and execute the job.
No need to bring up Hadoop specific AMI instance directly as with EC2!

Elastic mapreduce currently provides 0.18.3 hadoop environment while you are 
free to use any version of Hadoop on EC2.

--Sanjay Sharma

-Original Message-
From: prasen@gmail.com [mailto:prasen@gmail.com] On Behalf Of prasenjit 
mukherjee
Sent: Monday, February 08, 2010 11:29 AM
To: common-user@hadoop.apache.org
Subject: Re: aws

Not sure I understand. How is it different from using plain ec2  with
hadoop-specific AMIs ?

On Wed, Feb 3, 2010 at 11:17 PM, Sirota, Peter sir...@amazon.com wrote:

 Elastic MapReduce uses Hadoop .18.3 with several patches that improve S3N
 performance/reliability.



 -Original Message-
 From: Kay Kay [mailto:kaykay.uni...@gmail.com]
 Sent: Wednesday, February 03, 2010 9:43 AM
 To: common-user@hadoop.apache.org
 Subject: Re: aws

 Peter,
   Out of curiosity - What is the version of  Hadoop DFS and M-R are
 being used behind the scenes ?


 On 2/2/10 11:26 PM, Sirota, Peter wrote:
  Hi Brian,
 
  AWS has Elastic MapReduce service where you can run Hadoop starting at
  10 cents per hour.  Check it out at
  http://aws.amazon.com/elasticmapreduce
 
  Disclamer: I work at AWS
 
 
  Sent from my phone
 
  On Feb 2, 2010, at 11:09 PM, Brian Wolfbrw...@gmail.com  wrote:
 
 
  Hi,
 
  Can anybody tell me if there aws/amazon has any  kind of hadoop
  sandbox
  to play in for free?
 
  Thanks
 
  Brian
 
 
 



Impetus Technologies is participating at the GSMA  Mobile World Congress from 
15th to 18th Feb 2010. Meet Impetus in Barcelona at the BlackBerry Developer 
stand (no 7B26, Hall 7) to experience our deep mobile and wireless domain 
expertise. Click on http://impetus.com/events to know more.

Follow our updates on www.twitter.com/impetuscalling.

NOTE: This message may contain information that is confidential, proprietary, 
privileged or otherwise protected by law. The message is intended solely for 
the named addressee. If received in error, please destroy and notify the 
sender. Any use of this email is prohibited when received in error. Impetus 
does not represent, warrant and/or guarantee, that the integrity of this 
communication has been maintained nor that the communication is free of errors, 
virus, interception or interference.


Re: What framework Hadoop uses for daemonizing?

2010-02-08 Thread Thomas Koch
Hi,

I'm working on a hadoop package for Debian, which also includes init scripts 
using the daemon program (Debian package daemon) from
http://www.libslack.org/daemon

You can have a look at the init script(s) at 

http://git.debian.org/?p=pkg-java/hadoop.git;a=blob;f=debian/common-init.sh
http://git.debian.org/?p=pkg-java/hadoop.git;a=blob;f=debian/hadoop-wrapper.sh

The scripts still need some polishing though. The common-init.sh is sourced by 
different init stubs which provide only the variable NAME, see for example:

http://git.debian.org/?p=pkg-java/hadoop.git;a=blob;f=debian/hadoop-
jobtrackerd.init

Best regards,

Thomas Koch, http://www.koch.ro


Re: aws

2010-02-08 Thread 松柳
The main different is the operation interface and the hadoop version. You
can use any version of hadoop as long as you can find suitable hadoop
versions. Elastic Mapreduce only supports 0.18.3.

Song

2010/2/8 prasenjit mukherjee pmukher...@quattrowireless.com

 Not sure I understand. How is it different from using plain ec2  with
 hadoop-specific AMIs ?

 On Wed, Feb 3, 2010 at 11:17 PM, Sirota, Peter sir...@amazon.com wrote:

  Elastic MapReduce uses Hadoop .18.3 with several patches that improve S3N
  performance/reliability.
 
 
 
  -Original Message-
  From: Kay Kay [mailto:kaykay.uni...@gmail.com]
  Sent: Wednesday, February 03, 2010 9:43 AM
  To: common-user@hadoop.apache.org
  Subject: Re: aws
 
  Peter,
Out of curiosity - What is the version of  Hadoop DFS and M-R are
  being used behind the scenes ?
 
 
  On 2/2/10 11:26 PM, Sirota, Peter wrote:
   Hi Brian,
  
   AWS has Elastic MapReduce service where you can run Hadoop starting at
   10 cents per hour.  Check it out at
   http://aws.amazon.com/elasticmapreduce
  
   Disclamer: I work at AWS
  
  
   Sent from my phone
  
   On Feb 2, 2010, at 11:09 PM, Brian Wolfbrw...@gmail.com  wrote:
  
  
   Hi,
  
   Can anybody tell me if there aws/amazon has any  kind of hadoop
   sandbox
   to play in for free?
  
   Thanks
  
   Brian
  
  
  
 
 



Re: has anyone ported hadoop.lib.aggregate?

2010-02-08 Thread Meng Mao
I'm not familiar with the current roadmap for 0.20, but is there any plan to
backport the new mapreduce.lib.aggregate library into 0.20.x?

I suppose our team could attempt to use the patch ourselves, but we'd be
much more comfortable going with a standard release if at all possible.

On Sun, Feb 7, 2010 at 10:41 PM, Amareshwari Sri Ramadasu 
amar...@yahoo-inc.com wrote:

 Org.apache.hadoop.mapred.lib.aggregate has been ported to new api in branch
 0.21.
 See http://issues.apache.org/jira/browse/MAPREDUCE-358

 Thanks
 Amareshwari

 On 2/7/10 5:34 AM, Meng Mao meng...@gmail.com wrote:

 From what I can tell, while the ValueAggregator stuff should be useable,
 the
 ValueAggregatorJob and ValueAggregatorJobBase classes still use the old
 Mapper and Reducer signatures, and basically aren't compatible with the new
 mapreduce.* API. Is that correct?

 Has anyone out there done a port? We've been dragging our feet very hard
 about getting away from use of deprecated API for our classes that take
 advantage of the aggregate lib. It would be a huge boost if there was any
 stuff we could borrow to port over.




maximum number of jobs

2010-02-08 Thread Vasilis Liaskovitis
Hi,

I am trying to submit many independent jobs in paralllel (same user).
This works for up to 16 jobs, but after that I only get 16 jobs in
parallel no matter how many I try to submit. I am using fair scheduler
with the following config:

allocations
 pool name=user2
   minMaps12/minMaps
   minReduces12/minReduces
   maxRunningJobs100/maxRunningJobs
   weight4/weight
 /pool
 userMaxJobsDefault100/userMaxJobsDefault
/allocations

Judging by this config, I would expect my job limit to be 100, not 16
jobs. I am using hadoop-0.20.1. Am I missing some other config option?
any suggestions are welcome,

- Vasilis


Hadoop on a Virtualized O/S vs. the Real O/S

2010-02-08 Thread Stephen Watt
Hi Folks

I need to be able to certify that Hadoop works on various operating 
systems. I do this by running a series it through a series of tests. As 
I'm sure you can empathize, obtaining all the machines for each test run 
can sometimes be tricky. It would be easier for me if I can spin up 
several instances a virtual image of the desired O/S, but to do this, I 
need to know if there are any risks I'm running using that approach.

Is there any reason why Hadoop might work differently on a virtual O/S as 
opposed to running on an actual O/S ? Since just about everything is done 
through the JVM and SSH I don't foresee any issues and I don't believe 
we're doing anything weird with device drivers or have any kernel module 
dependencies.

Kind regards
Steve Watt

Re: Job Tracker questions

2010-02-08 Thread Mark N
Did you check the jobClient source code?


On Thu, Feb 4, 2010 at 5:21 PM, Jeff Zhang zjf...@gmail.com wrote:

 I look at the source code, it seems the job tracker web ui also use the
 proxy of JobTracker to get the counter information rather the xml file.


 On Thu, Feb 4, 2010 at 7:29 PM, Mark N nipen.m...@gmail.com wrote:

  yes we can create a webservice in java which would be called by .net to
  display these counters.
 
  But since the java code to read these counters needs use hadoop APIs  (
 job
  client  ) ,  am not sure we can create a webservice to read the counters
 
  Question is how does the default hadoop task tracker display counter
  information in JSP pages ? does it read from the XML files ?
 
  thanks,
 
  On Thu, Feb 4, 2010 at 5:08 PM, Jeff Zhang zjf...@gmail.com wrote:
 
   I think you can create web service using Java, and then in .net using
 the
   web service to display the result.
  
  
   On Thu, Feb 4, 2010 at 7:21 PM, Jeff Zhang zjf...@gmail.com wrote:
  
Do you mean want to connect the JobTracker using .Net ? If so, I'm
  afraid
   I
have no idea how to this. The rpc of hadoop is language dependent.
   
   
   
   
On Thu, Feb 4, 2010 at 7:18 PM, Mark N nipen.m...@gmail.com wrote:
   
could you please elaborate on this  ( * hint to get started  as am
  very
new
to hadoop? )
So far I could succesfully read all the default and custom counters.
   
Currently we are having a .net client.
   
thanks in advance.
   
   
On Thu, Feb 4, 2010 at 4:53 PM, Jeff Zhang zjf...@gmail.com
 wrote:
   
 Well, you can create a proxy of JobTracker in client side, and
 then
   you
can
 use the API of JobTracker to get the information of jobs. The
 Proxy
   take
 the
 responsibility of  communication with the Master Node.  Read the
   source
 code
 of JobClient can help you.


 On Thu, Feb 4, 2010 at 6:59 PM, Mark N nipen.m...@gmail.com
  wrote:

  Ye currently am using jobclient to read these counters.
 
  But We are not able to use *webservices *because the jar which
 is
   used
to
  read the counters from  running hadoop job  is itself a Hadoop
   program
 
  If we could have pure Java Api which is run without hadoop
 command
then
 we
  could return the counter variable into webservices and show in
 UI.
 
  Any help  or technique to show thsese counters in the UI would
 be
  appreciated  ( not necessarily using web service )
 
 
  I am using webservices because I am having .net VB client
 
  thanks
 
 
 
  On Wed, Feb 3, 2010 at 8:33 PM, Jeff Zhang zjf...@gmail.com
   wrote:
 
   I think you can use JobClient to get the counters in your web
service.
   If you look at the shell script bin/hadoop, you will find that
actually
   this
   shell use the JobClient to get the counters.
  
  
  
   On Wed, Feb 3, 2010 at 4:34 AM, Mark N nipen.m...@gmail.com
wrote:
  
We have a hadoop job running and have used custom counters
 to
track
   few
counters ( like no of successfully processed documents
  matching
 certain
conditions)
   
   
Since we need to get this counters even while the Hadoop job
  is
 running
  ,
we
wrote another Java program to read these counters
   
   
*Counter reader  program *will do the following :
   
   
1)  List all the running jobs.
   
2)   Get the running job using Job name
   
2) Get all the counter for individual running jobs
   
3)  Set this counters in variables.
   We could successfully read these counters  , but
 since
  we
need
  to
show these counters to custom UI , how can we show these
   counters?
   
   we looked into various options to read these counters
  to
show
 in
   UI
as following :
   
 1. Dump these counters to database , however this may
 be
 overhead
 2. Write web service   and UI will invoke the functions
   from
 these
service to show in UI ( However since we need to run
 *Counter
reader
program   *with Hadoop command it might not be feasible to
   write
web
service ?   )
   
 so the question is can we achive to read the counters
  using
 simple
Java APIs ? Does anyone have idea how does the default
   jobtracker
JSP
   works
? we wanted to built something similar to this
   
thanks
   
   
   
--
Nipen Mark
   
  
  
  
   --
   Best Regards
  
   Jeff Zhang
  
 
 
 
  --
  Nipen Mark
 



 --
 Best Regards

 Jeff Zhang

   
   
   
--
Nipen Mark