RE: aws
Hi Prasen, Amazon Elastic MapReduce is a customized environment/platform for executing MapReduce similar to deploying/running a WAR file by dropping it in a J2EE container. So you create MapReduce jar file, upload it onto Elatic mapReduce, setup additional parameters and execute the job. No need to bring up Hadoop specific AMI instance directly as with EC2! Elastic mapreduce currently provides 0.18.3 hadoop environment while you are free to use any version of Hadoop on EC2. --Sanjay Sharma -Original Message- From: prasen@gmail.com [mailto:prasen@gmail.com] On Behalf Of prasenjit mukherjee Sent: Monday, February 08, 2010 11:29 AM To: common-user@hadoop.apache.org Subject: Re: aws Not sure I understand. How is it different from using plain ec2 with hadoop-specific AMIs ? On Wed, Feb 3, 2010 at 11:17 PM, Sirota, Peter sir...@amazon.com wrote: Elastic MapReduce uses Hadoop .18.3 with several patches that improve S3N performance/reliability. -Original Message- From: Kay Kay [mailto:kaykay.uni...@gmail.com] Sent: Wednesday, February 03, 2010 9:43 AM To: common-user@hadoop.apache.org Subject: Re: aws Peter, Out of curiosity - What is the version of Hadoop DFS and M-R are being used behind the scenes ? On 2/2/10 11:26 PM, Sirota, Peter wrote: Hi Brian, AWS has Elastic MapReduce service where you can run Hadoop starting at 10 cents per hour. Check it out at http://aws.amazon.com/elasticmapreduce Disclamer: I work at AWS Sent from my phone On Feb 2, 2010, at 11:09 PM, Brian Wolfbrw...@gmail.com wrote: Hi, Can anybody tell me if there aws/amazon has any kind of hadoop sandbox to play in for free? Thanks Brian Impetus Technologies is participating at the GSMA Mobile World Congress from 15th to 18th Feb 2010. Meet Impetus in Barcelona at the BlackBerry Developer stand (no 7B26, Hall 7) to experience our deep mobile and wireless domain expertise. Click on http://impetus.com/events to know more. Follow our updates on www.twitter.com/impetuscalling. NOTE: This message may contain information that is confidential, proprietary, privileged or otherwise protected by law. The message is intended solely for the named addressee. If received in error, please destroy and notify the sender. Any use of this email is prohibited when received in error. Impetus does not represent, warrant and/or guarantee, that the integrity of this communication has been maintained nor that the communication is free of errors, virus, interception or interference.
Re: What framework Hadoop uses for daemonizing?
Hi, I'm working on a hadoop package for Debian, which also includes init scripts using the daemon program (Debian package daemon) from http://www.libslack.org/daemon You can have a look at the init script(s) at http://git.debian.org/?p=pkg-java/hadoop.git;a=blob;f=debian/common-init.sh http://git.debian.org/?p=pkg-java/hadoop.git;a=blob;f=debian/hadoop-wrapper.sh The scripts still need some polishing though. The common-init.sh is sourced by different init stubs which provide only the variable NAME, see for example: http://git.debian.org/?p=pkg-java/hadoop.git;a=blob;f=debian/hadoop- jobtrackerd.init Best regards, Thomas Koch, http://www.koch.ro
Re: aws
The main different is the operation interface and the hadoop version. You can use any version of hadoop as long as you can find suitable hadoop versions. Elastic Mapreduce only supports 0.18.3. Song 2010/2/8 prasenjit mukherjee pmukher...@quattrowireless.com Not sure I understand. How is it different from using plain ec2 with hadoop-specific AMIs ? On Wed, Feb 3, 2010 at 11:17 PM, Sirota, Peter sir...@amazon.com wrote: Elastic MapReduce uses Hadoop .18.3 with several patches that improve S3N performance/reliability. -Original Message- From: Kay Kay [mailto:kaykay.uni...@gmail.com] Sent: Wednesday, February 03, 2010 9:43 AM To: common-user@hadoop.apache.org Subject: Re: aws Peter, Out of curiosity - What is the version of Hadoop DFS and M-R are being used behind the scenes ? On 2/2/10 11:26 PM, Sirota, Peter wrote: Hi Brian, AWS has Elastic MapReduce service where you can run Hadoop starting at 10 cents per hour. Check it out at http://aws.amazon.com/elasticmapreduce Disclamer: I work at AWS Sent from my phone On Feb 2, 2010, at 11:09 PM, Brian Wolfbrw...@gmail.com wrote: Hi, Can anybody tell me if there aws/amazon has any kind of hadoop sandbox to play in for free? Thanks Brian
Re: has anyone ported hadoop.lib.aggregate?
I'm not familiar with the current roadmap for 0.20, but is there any plan to backport the new mapreduce.lib.aggregate library into 0.20.x? I suppose our team could attempt to use the patch ourselves, but we'd be much more comfortable going with a standard release if at all possible. On Sun, Feb 7, 2010 at 10:41 PM, Amareshwari Sri Ramadasu amar...@yahoo-inc.com wrote: Org.apache.hadoop.mapred.lib.aggregate has been ported to new api in branch 0.21. See http://issues.apache.org/jira/browse/MAPREDUCE-358 Thanks Amareshwari On 2/7/10 5:34 AM, Meng Mao meng...@gmail.com wrote: From what I can tell, while the ValueAggregator stuff should be useable, the ValueAggregatorJob and ValueAggregatorJobBase classes still use the old Mapper and Reducer signatures, and basically aren't compatible with the new mapreduce.* API. Is that correct? Has anyone out there done a port? We've been dragging our feet very hard about getting away from use of deprecated API for our classes that take advantage of the aggregate lib. It would be a huge boost if there was any stuff we could borrow to port over.
maximum number of jobs
Hi, I am trying to submit many independent jobs in paralllel (same user). This works for up to 16 jobs, but after that I only get 16 jobs in parallel no matter how many I try to submit. I am using fair scheduler with the following config: allocations pool name=user2 minMaps12/minMaps minReduces12/minReduces maxRunningJobs100/maxRunningJobs weight4/weight /pool userMaxJobsDefault100/userMaxJobsDefault /allocations Judging by this config, I would expect my job limit to be 100, not 16 jobs. I am using hadoop-0.20.1. Am I missing some other config option? any suggestions are welcome, - Vasilis
Hadoop on a Virtualized O/S vs. the Real O/S
Hi Folks I need to be able to certify that Hadoop works on various operating systems. I do this by running a series it through a series of tests. As I'm sure you can empathize, obtaining all the machines for each test run can sometimes be tricky. It would be easier for me if I can spin up several instances a virtual image of the desired O/S, but to do this, I need to know if there are any risks I'm running using that approach. Is there any reason why Hadoop might work differently on a virtual O/S as opposed to running on an actual O/S ? Since just about everything is done through the JVM and SSH I don't foresee any issues and I don't believe we're doing anything weird with device drivers or have any kernel module dependencies. Kind regards Steve Watt
Re: Job Tracker questions
Did you check the jobClient source code? On Thu, Feb 4, 2010 at 5:21 PM, Jeff Zhang zjf...@gmail.com wrote: I look at the source code, it seems the job tracker web ui also use the proxy of JobTracker to get the counter information rather the xml file. On Thu, Feb 4, 2010 at 7:29 PM, Mark N nipen.m...@gmail.com wrote: yes we can create a webservice in java which would be called by .net to display these counters. But since the java code to read these counters needs use hadoop APIs ( job client ) , am not sure we can create a webservice to read the counters Question is how does the default hadoop task tracker display counter information in JSP pages ? does it read from the XML files ? thanks, On Thu, Feb 4, 2010 at 5:08 PM, Jeff Zhang zjf...@gmail.com wrote: I think you can create web service using Java, and then in .net using the web service to display the result. On Thu, Feb 4, 2010 at 7:21 PM, Jeff Zhang zjf...@gmail.com wrote: Do you mean want to connect the JobTracker using .Net ? If so, I'm afraid I have no idea how to this. The rpc of hadoop is language dependent. On Thu, Feb 4, 2010 at 7:18 PM, Mark N nipen.m...@gmail.com wrote: could you please elaborate on this ( * hint to get started as am very new to hadoop? ) So far I could succesfully read all the default and custom counters. Currently we are having a .net client. thanks in advance. On Thu, Feb 4, 2010 at 4:53 PM, Jeff Zhang zjf...@gmail.com wrote: Well, you can create a proxy of JobTracker in client side, and then you can use the API of JobTracker to get the information of jobs. The Proxy take the responsibility of communication with the Master Node. Read the source code of JobClient can help you. On Thu, Feb 4, 2010 at 6:59 PM, Mark N nipen.m...@gmail.com wrote: Ye currently am using jobclient to read these counters. But We are not able to use *webservices *because the jar which is used to read the counters from running hadoop job is itself a Hadoop program If we could have pure Java Api which is run without hadoop command then we could return the counter variable into webservices and show in UI. Any help or technique to show thsese counters in the UI would be appreciated ( not necessarily using web service ) I am using webservices because I am having .net VB client thanks On Wed, Feb 3, 2010 at 8:33 PM, Jeff Zhang zjf...@gmail.com wrote: I think you can use JobClient to get the counters in your web service. If you look at the shell script bin/hadoop, you will find that actually this shell use the JobClient to get the counters. On Wed, Feb 3, 2010 at 4:34 AM, Mark N nipen.m...@gmail.com wrote: We have a hadoop job running and have used custom counters to track few counters ( like no of successfully processed documents matching certain conditions) Since we need to get this counters even while the Hadoop job is running , we wrote another Java program to read these counters *Counter reader program *will do the following : 1) List all the running jobs. 2) Get the running job using Job name 2) Get all the counter for individual running jobs 3) Set this counters in variables. We could successfully read these counters , but since we need to show these counters to custom UI , how can we show these counters? we looked into various options to read these counters to show in UI as following : 1. Dump these counters to database , however this may be overhead 2. Write web service and UI will invoke the functions from these service to show in UI ( However since we need to run *Counter reader program *with Hadoop command it might not be feasible to write web service ? ) so the question is can we achive to read the counters using simple Java APIs ? Does anyone have idea how does the default jobtracker JSP works ? we wanted to built something similar to this thanks -- Nipen Mark -- Best Regards Jeff Zhang -- Nipen Mark -- Best Regards Jeff Zhang -- Nipen Mark