Re: Starting a Hadoop job outside the cluster

2011-06-06 Thread Steve Loughran

My Job submit code is



http://smartfrog.svn.sourceforge.net/viewvc/smartfrog/trunk/core/hadoop-components/hadoop-ops/src/org/smartfrog/services/hadoop/operations/components/submitter/

something to run tool classes
http://smartfrog.svn.sourceforge.net/viewvc/smartfrog/trunk/core/hadoop-components/hadoop-ops/src/org/smartfrog/services/hadoop/operations/components/submitter/ToolRunnerComponentImpl.java?revision=8590view=markup


something to integrate job submission with some pre-run sanity checks, 
and to optionally wait for the work to finish


http://smartfrog.svn.sourceforge.net/viewvc/smartfrog/trunk/core/hadoop-components/hadoop-ops/src/org/smartfrog/services/hadoop/operations/components/submitter/SubmitterImpl.java?revision=8590view=markup

works remotely for short-lived jobs; if you submit something that may 
run over a weekend you don't normally want to block for it




Re: Starting a Hadoop job outside the cluster

2011-05-31 Thread Steve Lewis
I have tried what you suggest (well sort of) a goof example would help alot
-
My reducer is set to among other things emit the local os and user.dir -
when I try running from
my windows box these appear on hdfs but show the windows os and user.dir
leading me to believe that the reducer is still running on my windows
machine - I will
check the values but a working example would be very useful


On Sun, May 29, 2011 at 6:19 AM, Ferdy Galema ferdy.gal...@kalooga.comwrote:

 Would it not also be possible for a Windows machine to submit the job
 directly from a Java process? This way you don't need Cygwin / a full local
 copy of the installation (correct my if I'm wrong). The steps would then
 just be:
 1) Create a basic Java project, add minimum required libraries
 (Hadoop/logging)
 2) Set the essential properties (at least this would be the jobtracker and
 the filesystem)
 3) Implement the Tool
 4) Run the process (from either the IDE or stand-alone jar)

 Steps 1-3 could technically be implemented on another machine, if you
 choose to compile a stand-alone jar.

 Ferdy.


 On 05/29/2011 04:50 AM, Harsh J wrote:

 Keep a local Hadoop installation with a mirror-copy config, and use
 hadoop jarjar to submit as usual (since the config points to the
 right areas, the jobs go there).

 For Windows you'd need Cygwin installed, however.

 On Sun, May 29, 2011 at 12:56 AM, Steve Lewislordjoe2...@gmail.com
  wrote:

 When I want to launch a hadoop job I use SCP to execute a command on the
 Name node machine. I an wondering if there is
 a way to launch a Hadoop job from a machine that is not on the cluster.
 How
 to do this on a Windows box or a Mac would be
 of special interest.

 --
 Steven M. Lewis PhD
 4221 105th Ave NE
 Kirkland, WA 98033
 206-384-1340 (cell)
 Skype lordjoe_com






-- 
Steven M. Lewis PhD
4221 105th Ave NE
Kirkland, WA 98033
206-384-1340 (cell)
Skype lordjoe_com


Re: Starting a Hadoop job outside the cluster

2011-05-31 Thread Harsh J
Steve,

What do you mean when you say it shows windows os and user.dir?
There will be a few properties in the job.xml that may carry client
machine information but these shouldn't be a hinderance.

Unless a TaskTracker was started on the Windows box (no daemons ought
to be started on the client machine), no task may run on it.

On Tue, May 31, 2011 at 9:15 PM, Steve Lewis lordjoe2...@gmail.com wrote:
 I have tried what you suggest (well sort of) a goof example would help alot
 -
 My reducer is set to among other things emit the local os and user.dir -
 when I try running from
 my windows box these appear on hdfs but show the windows os and user.dir
 leading me to believe that the reducer is still running on my windows
 machine - I will
 check the values but a working example would be very useful

 On Sun, May 29, 2011 at 6:19 AM, Ferdy Galema ferdy.gal...@kalooga.com
 wrote:

 Would it not also be possible for a Windows machine to submit the job
 directly from a Java process? This way you don't need Cygwin / a full local
 copy of the installation (correct my if I'm wrong). The steps would then
 just be:
 1) Create a basic Java project, add minimum required libraries
 (Hadoop/logging)
 2) Set the essential properties (at least this would be the jobtracker and
 the filesystem)
 3) Implement the Tool
 4) Run the process (from either the IDE or stand-alone jar)

 Steps 1-3 could technically be implemented on another machine, if you
 choose to compile a stand-alone jar.

 Ferdy.

 On 05/29/2011 04:50 AM, Harsh J wrote:

 Keep a local Hadoop installation with a mirror-copy config, and use
 hadoop jarjar to submit as usual (since the config points to the
 right areas, the jobs go there).

 For Windows you'd need Cygwin installed, however.

 On Sun, May 29, 2011 at 12:56 AM, Steve Lewislordjoe2...@gmail.com
  wrote:

 When I want to launch a hadoop job I use SCP to execute a command on the
 Name node machine. I an wondering if there is
 a way to launch a Hadoop job from a machine that is not on the cluster.
 How
 to do this on a Windows box or a Mac would be
 of special interest.

 --
 Steven M. Lewis PhD
 4221 105th Ave NE
 Kirkland, WA 98033
 206-384-1340 (cell)
 Skype lordjoe_com






 --
 Steven M. Lewis PhD
 4221 105th Ave NE
 Kirkland, WA 98033
 206-384-1340 (cell)
 Skype lordjoe_com






-- 
Harsh J


Re: Starting a Hadoop job outside the cluster

2011-05-31 Thread Steve Lewis
My Reducer code says this:
 public static class Reduce extends ReducerText, Text, Text, Text {
private boolean m_DateSent;

/**
 * This method is called once for each key. Most applications will
define
 * their reduce class by overriding this method. The default
implementation
 * is an identity function.
 */
@Override
protected void reduce(Text key, IterableText values,
  Context context)
throws IOException, InterruptedException {
if (!m_DateSent) {
Text dkey = new Text(CreationDate);
Text dValue = new Text();
writeKeyValue(context, dkey,dValue,CreationDate,new
Date().toString());
writeKeyValue(context,
dkey,dValue,user.dir,System.getProperty(user.dir));
writeKeyValue(context,
dkey,dValue,os.arch,System.getProperty(os.arch));
writeKeyValue(context, dkey,dValue,os.name
,System.getProperty(os.name));



//dkey.set(ip);
//java.net.InetAddress addr =
java.net.InetAddress.getLocalHost();
//dValue.set(System.getProperty(addr.toString()));
//context.write(dkey, dValue);

m_DateSent = true;
}
IteratorText itr = values.iterator();
// Add interesting code here
while (itr.hasNext()) {
Text vCheck = itr.next();
context.write(key, vCheck);
}

}


}

if os.arch is linux I am running on the cluster -
if windows I am running locally

I run this main hoping to run on the cluster with the NameNode and Job
Tracker at glados

   public static void main(String[] args) throws Exception {
String outFile = ./out;
Configuration conf = new Configuration();

// cause output to go to the cluster
conf.set(fs.default.name, hdfs://glados:9000/);
conf.set(mapreduce.jobtracker.address, glados:9000/);
conf.set(mapred.jar, NShot.jar);

   conf.set(fs.defaultFS,hdfs://glados:9000/);


Job job = new Job(conf, Generated data);
conf = job.getConfiguration();
job.setJarByClass(NShotInputFormat.class);



   .. Other setup code ...

boolean ans = job.waitForCompletion(true);
int ret = ans ? 0 : 1;
}



On Tue, May 31, 2011 at 9:35 AM, Harsh J ha...@cloudera.com wrote:

 Steve,

 What do you mean when you say it shows windows os and user.dir?
 There will be a few properties in the job.xml that may carry client
 machine information but these shouldn't be a hinderance.

 Unless a TaskTracker was started on the Windows box (no daemons ought
 to be started on the client machine), no task may run on it.

 On Tue, May 31, 2011 at 9:15 PM, Steve Lewis lordjoe2...@gmail.com
 wrote:
  I have tried what you suggest (well sort of) a goof example would help
 alot
  -
  My reducer is set to among other things emit the local os and user.dir -
  when I try running from
  my windows box these appear on hdfs but show the windows os and user.dir
  leading me to believe that the reducer is still running on my windows
  machine - I will
  check the values but a working example would be very useful
 
  On Sun, May 29, 2011 at 6:19 AM, Ferdy Galema ferdy.gal...@kalooga.com
  wrote:
 
  Would it not also be possible for a Windows machine to submit the job
  directly from a Java process? This way you don't need Cygwin / a full
 local
  copy of the installation (correct my if I'm wrong). The steps would then
  just be:
  1) Create a basic Java project, add minimum required libraries
  (Hadoop/logging)
  2) Set the essential properties (at least this would be the jobtracker
 and
  the filesystem)
  3) Implement the Tool
  4) Run the process (from either the IDE or stand-alone jar)
 
  Steps 1-3 could technically be implemented on another machine, if you
  choose to compile a stand-alone jar.
 
  Ferdy.
 
  On 05/29/2011 04:50 AM, Harsh J wrote:
 
  Keep a local Hadoop installation with a mirror-copy config, and use
  hadoop jarjar to submit as usual (since the config points to the
  right areas, the jobs go there).
 
  For Windows you'd need Cygwin installed, however.
 
  On Sun, May 29, 2011 at 12:56 AM, Steve Lewislordjoe2...@gmail.com
   wrote:
 
  When I want to launch a hadoop job I use SCP to execute a command on
 the
  Name node machine. I an wondering if there is
  a way to launch a Hadoop job from a machine that is not on the
 cluster.
  How
  to do this on a Windows box or a Mac would be
  of special interest.
 
  --
  Steven M. Lewis PhD
  4221 105th Ave NE
  Kirkland, WA 98033
  206-384-1340 (cell)
  Skype lordjoe_com
 
 
 
 
 
 
  --
  Steven M. Lewis PhD
  4221 105th Ave NE
  Kirkland, WA 98033
  206-384-1340 (cell)
  Skype lordjoe_com
 
 
 



 --
 Harsh J




-- 
Steven M. Lewis PhD
4221 105th Ave NE
Kirkland, WA 98033
206-384-1340 (cell)
Skype lordjoe_com


Re: Starting a Hadoop job outside the cluster

2011-05-31 Thread Harsh J
Simply remove that trailing slash (forgot to catch it earlier, sorry)
and you should be set (or at least more set than before surely.)

On Tue, May 31, 2011 at 10:51 PM, Steve Lewis lordjoe2...@gmail.com wrote:
 0.20.2 - we have been avoiding 0.21 because it is not terribly stable and
 made some MAJOR changes to
 critical classes

 When I say

         Configuration conf = new Configuration();
         // cause output to go to the cluster
         conf.set(fs.default.name, hdfs://glados:9000/);
    //     conf.set(mapreduce.jobtracker.address, glados:9000/);
         conf.set(mapred.job.tracker, glados:9000/);
         conf.set(mapred.jar, NShot.jar);
         //  conf.set(fs.defaultFS,hdfs://glados:9000/);
         String[] otherArgs = new GenericOptionsParser(conf,
 args).getRemainingArgs();
 //        if (otherArgs.length != 2) {
 //            System.err.println(Usage: wordcount in out);
 //            System.exit(2);
 //        }
         Job job = new Job(conf, Generated data);
 I get
 Exception in thread main java.lang.IllegalArgumentException:
 java.net.URISyntaxException: Relative path in absolute URI: glados:9000
 at org.apache.hadoop.fs.Path.initialize(Path.java:140)
 at org.apache.hadoop.fs.Path.init(Path.java:126)
 at org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:150)
 at org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:123)
 at org.apache.hadoop.mapred.JobTracker.getAddress(JobTracker.java:1807)
 at org.apache.hadoop.mapred.JobClient.init(JobClient.java:423)
 at org.apache.hadoop.mapred.JobClient.init(JobClient.java:410)
 at org.apache.hadoop.mapreduce.Job.init(Job.java:50)
 at org.apache.hadoop.mapreduce.Job.init(Job.java:54)
 at org.systemsbiology.hadoopgenerated.NShotTest.main(NShotTest.java:188)
 Caused by: java.net.URISyntaxException: Relative path in absolute URI:
 glados:9000
 at java.net.URI.checkPath(URI.java:1787)
 at java.net.URI.init(URI.java:735)
 at org.apache.hadoop.fs.Path.initialize(Path.java:137)

 I promise to publish a working example if this ever works
 On Tue, May 31, 2011 at 10:02 AM, Harsh J ha...@cloudera.com wrote:

 Steve,

 On Tue, May 31, 2011 at 10:27 PM, Steve Lewis lordjoe2...@gmail.com
 wrote:
  My Reducer code says this:
  dkey,dValue,os.arch,System.getProperty(os.arch));
                  writeKeyValue(context,
  dkey,dValue,os.name,System.getProperty(os.name));
  if os.arch is linux I am running on the cluster -
  if windows I am running locally

 Correct, so it should be Linux since these are System properties, and
 if you're getting Windows its probably running locally on your client
 box itself!

          conf.set(mapreduce.jobtracker.address, glados:9000/);

 This here might be your problem. That form of property would only work
 with 0.21.x, while on 0.20.x if you do not set it as
 mapred.job.tracker then the local job runner takes over by default,
 thereby making this odd thing happen (that's my guess).

 What version of Hadoop are you using?

 --
 Harsh J



 --
 Steven M. Lewis PhD
 4221 105th Ave NE
 Kirkland, WA 98033
 206-384-1340 (cell)
 Skype lordjoe_com






-- 
Harsh J


Re: Starting a Hadoop job outside the cluster

2011-05-29 Thread Ferdy Galema
Would it not also be possible for a Windows machine to submit the job 
directly from a Java process? This way you don't need Cygwin / a full 
local copy of the installation (correct my if I'm wrong). The steps 
would then just be:
1) Create a basic Java project, add minimum required libraries 
(Hadoop/logging)
2) Set the essential properties (at least this would be the jobtracker 
and the filesystem)

3) Implement the Tool
4) Run the process (from either the IDE or stand-alone jar)

Steps 1-3 could technically be implemented on another machine, if you 
choose to compile a stand-alone jar.


Ferdy.

On 05/29/2011 04:50 AM, Harsh J wrote:

Keep a local Hadoop installation with a mirror-copy config, and use
hadoop jarjar to submit as usual (since the config points to the
right areas, the jobs go there).

For Windows you'd need Cygwin installed, however.

On Sun, May 29, 2011 at 12:56 AM, Steve Lewislordjoe2...@gmail.com  wrote:

When I want to launch a hadoop job I use SCP to execute a command on the
Name node machine. I an wondering if there is
a way to launch a Hadoop job from a machine that is not on the cluster. How
to do this on a Windows box or a Mac would be
of special interest.

--
Steven M. Lewis PhD
4221 105th Ave NE
Kirkland, WA 98033
206-384-1340 (cell)
Skype lordjoe_com






Starting a Hadoop job outside the cluster

2011-05-28 Thread Steve Lewis
When I want to launch a hadoop job I use SCP to execute a command on the
Name node machine. I an wondering if there is
a way to launch a Hadoop job from a machine that is not on the cluster. How
to do this on a Windows box or a Mac would be
of special interest.

-- 
Steven M. Lewis PhD
4221 105th Ave NE
Kirkland, WA 98033
206-384-1340 (cell)
Skype lordjoe_com


Re: Starting a Hadoop job outside the cluster

2011-05-28 Thread Harsh J
Keep a local Hadoop installation with a mirror-copy config, and use
hadoop jar jar to submit as usual (since the config points to the
right areas, the jobs go there).

For Windows you'd need Cygwin installed, however.

On Sun, May 29, 2011 at 12:56 AM, Steve Lewis lordjoe2...@gmail.com wrote:
 When I want to launch a hadoop job I use SCP to execute a command on the
 Name node machine. I an wondering if there is
 a way to launch a Hadoop job from a machine that is not on the cluster. How
 to do this on a Windows box or a Mac would be
 of special interest.

 --
 Steven M. Lewis PhD
 4221 105th Ave NE
 Kirkland, WA 98033
 206-384-1340 (cell)
 Skype lordjoe_com




-- 
Harsh J