Re: Starting a Hadoop job outside the cluster
My Job submit code is http://smartfrog.svn.sourceforge.net/viewvc/smartfrog/trunk/core/hadoop-components/hadoop-ops/src/org/smartfrog/services/hadoop/operations/components/submitter/ something to run tool classes http://smartfrog.svn.sourceforge.net/viewvc/smartfrog/trunk/core/hadoop-components/hadoop-ops/src/org/smartfrog/services/hadoop/operations/components/submitter/ToolRunnerComponentImpl.java?revision=8590view=markup something to integrate job submission with some pre-run sanity checks, and to optionally wait for the work to finish http://smartfrog.svn.sourceforge.net/viewvc/smartfrog/trunk/core/hadoop-components/hadoop-ops/src/org/smartfrog/services/hadoop/operations/components/submitter/SubmitterImpl.java?revision=8590view=markup works remotely for short-lived jobs; if you submit something that may run over a weekend you don't normally want to block for it
Re: Starting a Hadoop job outside the cluster
I have tried what you suggest (well sort of) a goof example would help alot - My reducer is set to among other things emit the local os and user.dir - when I try running from my windows box these appear on hdfs but show the windows os and user.dir leading me to believe that the reducer is still running on my windows machine - I will check the values but a working example would be very useful On Sun, May 29, 2011 at 6:19 AM, Ferdy Galema ferdy.gal...@kalooga.comwrote: Would it not also be possible for a Windows machine to submit the job directly from a Java process? This way you don't need Cygwin / a full local copy of the installation (correct my if I'm wrong). The steps would then just be: 1) Create a basic Java project, add minimum required libraries (Hadoop/logging) 2) Set the essential properties (at least this would be the jobtracker and the filesystem) 3) Implement the Tool 4) Run the process (from either the IDE or stand-alone jar) Steps 1-3 could technically be implemented on another machine, if you choose to compile a stand-alone jar. Ferdy. On 05/29/2011 04:50 AM, Harsh J wrote: Keep a local Hadoop installation with a mirror-copy config, and use hadoop jarjar to submit as usual (since the config points to the right areas, the jobs go there). For Windows you'd need Cygwin installed, however. On Sun, May 29, 2011 at 12:56 AM, Steve Lewislordjoe2...@gmail.com wrote: When I want to launch a hadoop job I use SCP to execute a command on the Name node machine. I an wondering if there is a way to launch a Hadoop job from a machine that is not on the cluster. How to do this on a Windows box or a Mac would be of special interest. -- Steven M. Lewis PhD 4221 105th Ave NE Kirkland, WA 98033 206-384-1340 (cell) Skype lordjoe_com -- Steven M. Lewis PhD 4221 105th Ave NE Kirkland, WA 98033 206-384-1340 (cell) Skype lordjoe_com
Re: Starting a Hadoop job outside the cluster
Steve, What do you mean when you say it shows windows os and user.dir? There will be a few properties in the job.xml that may carry client machine information but these shouldn't be a hinderance. Unless a TaskTracker was started on the Windows box (no daemons ought to be started on the client machine), no task may run on it. On Tue, May 31, 2011 at 9:15 PM, Steve Lewis lordjoe2...@gmail.com wrote: I have tried what you suggest (well sort of) a goof example would help alot - My reducer is set to among other things emit the local os and user.dir - when I try running from my windows box these appear on hdfs but show the windows os and user.dir leading me to believe that the reducer is still running on my windows machine - I will check the values but a working example would be very useful On Sun, May 29, 2011 at 6:19 AM, Ferdy Galema ferdy.gal...@kalooga.com wrote: Would it not also be possible for a Windows machine to submit the job directly from a Java process? This way you don't need Cygwin / a full local copy of the installation (correct my if I'm wrong). The steps would then just be: 1) Create a basic Java project, add minimum required libraries (Hadoop/logging) 2) Set the essential properties (at least this would be the jobtracker and the filesystem) 3) Implement the Tool 4) Run the process (from either the IDE or stand-alone jar) Steps 1-3 could technically be implemented on another machine, if you choose to compile a stand-alone jar. Ferdy. On 05/29/2011 04:50 AM, Harsh J wrote: Keep a local Hadoop installation with a mirror-copy config, and use hadoop jarjar to submit as usual (since the config points to the right areas, the jobs go there). For Windows you'd need Cygwin installed, however. On Sun, May 29, 2011 at 12:56 AM, Steve Lewislordjoe2...@gmail.com wrote: When I want to launch a hadoop job I use SCP to execute a command on the Name node machine. I an wondering if there is a way to launch a Hadoop job from a machine that is not on the cluster. How to do this on a Windows box or a Mac would be of special interest. -- Steven M. Lewis PhD 4221 105th Ave NE Kirkland, WA 98033 206-384-1340 (cell) Skype lordjoe_com -- Steven M. Lewis PhD 4221 105th Ave NE Kirkland, WA 98033 206-384-1340 (cell) Skype lordjoe_com -- Harsh J
Re: Starting a Hadoop job outside the cluster
My Reducer code says this: public static class Reduce extends ReducerText, Text, Text, Text { private boolean m_DateSent; /** * This method is called once for each key. Most applications will define * their reduce class by overriding this method. The default implementation * is an identity function. */ @Override protected void reduce(Text key, IterableText values, Context context) throws IOException, InterruptedException { if (!m_DateSent) { Text dkey = new Text(CreationDate); Text dValue = new Text(); writeKeyValue(context, dkey,dValue,CreationDate,new Date().toString()); writeKeyValue(context, dkey,dValue,user.dir,System.getProperty(user.dir)); writeKeyValue(context, dkey,dValue,os.arch,System.getProperty(os.arch)); writeKeyValue(context, dkey,dValue,os.name ,System.getProperty(os.name)); //dkey.set(ip); //java.net.InetAddress addr = java.net.InetAddress.getLocalHost(); //dValue.set(System.getProperty(addr.toString())); //context.write(dkey, dValue); m_DateSent = true; } IteratorText itr = values.iterator(); // Add interesting code here while (itr.hasNext()) { Text vCheck = itr.next(); context.write(key, vCheck); } } } if os.arch is linux I am running on the cluster - if windows I am running locally I run this main hoping to run on the cluster with the NameNode and Job Tracker at glados public static void main(String[] args) throws Exception { String outFile = ./out; Configuration conf = new Configuration(); // cause output to go to the cluster conf.set(fs.default.name, hdfs://glados:9000/); conf.set(mapreduce.jobtracker.address, glados:9000/); conf.set(mapred.jar, NShot.jar); conf.set(fs.defaultFS,hdfs://glados:9000/); Job job = new Job(conf, Generated data); conf = job.getConfiguration(); job.setJarByClass(NShotInputFormat.class); .. Other setup code ... boolean ans = job.waitForCompletion(true); int ret = ans ? 0 : 1; } On Tue, May 31, 2011 at 9:35 AM, Harsh J ha...@cloudera.com wrote: Steve, What do you mean when you say it shows windows os and user.dir? There will be a few properties in the job.xml that may carry client machine information but these shouldn't be a hinderance. Unless a TaskTracker was started on the Windows box (no daemons ought to be started on the client machine), no task may run on it. On Tue, May 31, 2011 at 9:15 PM, Steve Lewis lordjoe2...@gmail.com wrote: I have tried what you suggest (well sort of) a goof example would help alot - My reducer is set to among other things emit the local os and user.dir - when I try running from my windows box these appear on hdfs but show the windows os and user.dir leading me to believe that the reducer is still running on my windows machine - I will check the values but a working example would be very useful On Sun, May 29, 2011 at 6:19 AM, Ferdy Galema ferdy.gal...@kalooga.com wrote: Would it not also be possible for a Windows machine to submit the job directly from a Java process? This way you don't need Cygwin / a full local copy of the installation (correct my if I'm wrong). The steps would then just be: 1) Create a basic Java project, add minimum required libraries (Hadoop/logging) 2) Set the essential properties (at least this would be the jobtracker and the filesystem) 3) Implement the Tool 4) Run the process (from either the IDE or stand-alone jar) Steps 1-3 could technically be implemented on another machine, if you choose to compile a stand-alone jar. Ferdy. On 05/29/2011 04:50 AM, Harsh J wrote: Keep a local Hadoop installation with a mirror-copy config, and use hadoop jarjar to submit as usual (since the config points to the right areas, the jobs go there). For Windows you'd need Cygwin installed, however. On Sun, May 29, 2011 at 12:56 AM, Steve Lewislordjoe2...@gmail.com wrote: When I want to launch a hadoop job I use SCP to execute a command on the Name node machine. I an wondering if there is a way to launch a Hadoop job from a machine that is not on the cluster. How to do this on a Windows box or a Mac would be of special interest. -- Steven M. Lewis PhD 4221 105th Ave NE Kirkland, WA 98033 206-384-1340 (cell) Skype lordjoe_com -- Steven M. Lewis PhD 4221 105th Ave NE Kirkland, WA 98033 206-384-1340 (cell) Skype lordjoe_com -- Harsh J -- Steven M. Lewis PhD 4221 105th Ave NE Kirkland, WA 98033 206-384-1340 (cell) Skype lordjoe_com
Re: Starting a Hadoop job outside the cluster
Simply remove that trailing slash (forgot to catch it earlier, sorry) and you should be set (or at least more set than before surely.) On Tue, May 31, 2011 at 10:51 PM, Steve Lewis lordjoe2...@gmail.com wrote: 0.20.2 - we have been avoiding 0.21 because it is not terribly stable and made some MAJOR changes to critical classes When I say Configuration conf = new Configuration(); // cause output to go to the cluster conf.set(fs.default.name, hdfs://glados:9000/); // conf.set(mapreduce.jobtracker.address, glados:9000/); conf.set(mapred.job.tracker, glados:9000/); conf.set(mapred.jar, NShot.jar); // conf.set(fs.defaultFS,hdfs://glados:9000/); String[] otherArgs = new GenericOptionsParser(conf, args).getRemainingArgs(); // if (otherArgs.length != 2) { // System.err.println(Usage: wordcount in out); // System.exit(2); // } Job job = new Job(conf, Generated data); I get Exception in thread main java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative path in absolute URI: glados:9000 at org.apache.hadoop.fs.Path.initialize(Path.java:140) at org.apache.hadoop.fs.Path.init(Path.java:126) at org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:150) at org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:123) at org.apache.hadoop.mapred.JobTracker.getAddress(JobTracker.java:1807) at org.apache.hadoop.mapred.JobClient.init(JobClient.java:423) at org.apache.hadoop.mapred.JobClient.init(JobClient.java:410) at org.apache.hadoop.mapreduce.Job.init(Job.java:50) at org.apache.hadoop.mapreduce.Job.init(Job.java:54) at org.systemsbiology.hadoopgenerated.NShotTest.main(NShotTest.java:188) Caused by: java.net.URISyntaxException: Relative path in absolute URI: glados:9000 at java.net.URI.checkPath(URI.java:1787) at java.net.URI.init(URI.java:735) at org.apache.hadoop.fs.Path.initialize(Path.java:137) I promise to publish a working example if this ever works On Tue, May 31, 2011 at 10:02 AM, Harsh J ha...@cloudera.com wrote: Steve, On Tue, May 31, 2011 at 10:27 PM, Steve Lewis lordjoe2...@gmail.com wrote: My Reducer code says this: dkey,dValue,os.arch,System.getProperty(os.arch)); writeKeyValue(context, dkey,dValue,os.name,System.getProperty(os.name)); if os.arch is linux I am running on the cluster - if windows I am running locally Correct, so it should be Linux since these are System properties, and if you're getting Windows its probably running locally on your client box itself! conf.set(mapreduce.jobtracker.address, glados:9000/); This here might be your problem. That form of property would only work with 0.21.x, while on 0.20.x if you do not set it as mapred.job.tracker then the local job runner takes over by default, thereby making this odd thing happen (that's my guess). What version of Hadoop are you using? -- Harsh J -- Steven M. Lewis PhD 4221 105th Ave NE Kirkland, WA 98033 206-384-1340 (cell) Skype lordjoe_com -- Harsh J
Re: Starting a Hadoop job outside the cluster
Would it not also be possible for a Windows machine to submit the job directly from a Java process? This way you don't need Cygwin / a full local copy of the installation (correct my if I'm wrong). The steps would then just be: 1) Create a basic Java project, add minimum required libraries (Hadoop/logging) 2) Set the essential properties (at least this would be the jobtracker and the filesystem) 3) Implement the Tool 4) Run the process (from either the IDE or stand-alone jar) Steps 1-3 could technically be implemented on another machine, if you choose to compile a stand-alone jar. Ferdy. On 05/29/2011 04:50 AM, Harsh J wrote: Keep a local Hadoop installation with a mirror-copy config, and use hadoop jarjar to submit as usual (since the config points to the right areas, the jobs go there). For Windows you'd need Cygwin installed, however. On Sun, May 29, 2011 at 12:56 AM, Steve Lewislordjoe2...@gmail.com wrote: When I want to launch a hadoop job I use SCP to execute a command on the Name node machine. I an wondering if there is a way to launch a Hadoop job from a machine that is not on the cluster. How to do this on a Windows box or a Mac would be of special interest. -- Steven M. Lewis PhD 4221 105th Ave NE Kirkland, WA 98033 206-384-1340 (cell) Skype lordjoe_com
Starting a Hadoop job outside the cluster
When I want to launch a hadoop job I use SCP to execute a command on the Name node machine. I an wondering if there is a way to launch a Hadoop job from a machine that is not on the cluster. How to do this on a Windows box or a Mac would be of special interest. -- Steven M. Lewis PhD 4221 105th Ave NE Kirkland, WA 98033 206-384-1340 (cell) Skype lordjoe_com
Re: Starting a Hadoop job outside the cluster
Keep a local Hadoop installation with a mirror-copy config, and use hadoop jar jar to submit as usual (since the config points to the right areas, the jobs go there). For Windows you'd need Cygwin installed, however. On Sun, May 29, 2011 at 12:56 AM, Steve Lewis lordjoe2...@gmail.com wrote: When I want to launch a hadoop job I use SCP to execute a command on the Name node machine. I an wondering if there is a way to launch a Hadoop job from a machine that is not on the cluster. How to do this on a Windows box or a Mac would be of special interest. -- Steven M. Lewis PhD 4221 105th Ave NE Kirkland, WA 98033 206-384-1340 (cell) Skype lordjoe_com -- Harsh J