date:20110727

Re: Submitting and running hadoop jobs Programmatically

2011-07-27 Thread Harsh J

Madhu,

Ditch the '*' in the classpath element that has the configuration
directory. The directory ought to be on the classpath, not the files
AFAIK.

Try and let us know if it then picks up the proper config (right now,
its using the local mode).

On Wed, Jul 27, 2011 at 10:25 AM, madhu phatak phatak@gmail.com wrote:
Hi
I am submitting the job as follows

java -cp
Nectar-analytics-0.0.1-SNAPSHOT.jar:/home/hadoop/hadoop-for-nectar/hadoop-0.21.0/conf/*:$HADOOP_COMMON_HOME/lib/*:$HADOOP_COMMON_HOME/*
com.zinnia.nectar.regression.hadoop.primitive.jobs.SigmaJob input/book.csv
kkk11fffrrw 1

I get the log in CLI as below

11/07/27 10:22:54 INFO security.Groups: Group mapping
impl=org.apache.hadoop.security.ShellBasedUnixGroupsMapping;
cacheTimeout=30
11/07/27 10:22:54 INFO jvm.JvmMetrics: Initializing JVM Metrics with
processName=JobTracker, sessionId=
11/07/27 10:22:54 INFO jvm.JvmMetrics: Cannot initialize JVM Metrics with
processName=JobTracker, sessionId= - already initialized
11/07/27 10:22:54 WARN mapreduce.JobSubmitter: Use GenericOptionsParser for
parsing the arguments. Applications should implement Tool for the same.
11/07/27 10:22:54 INFO mapreduce.JobSubmitter: Cleaning up the staging area
file:/tmp/hadoop-hadoop/mapred/staging/hadoop-1331241340/.staging/job_local_0001

It doesn't create any job in hadoop.

On Tue, Jul 26, 2011 at 5:11 PM, Devaraj K devara...@huawei.com wrote:

Madhu,

Can you check the client logs, whether any error/exception is coming while
submitting the job?

Devaraj K

-Original Message-
From: Harsh J [mailto:ha...@cloudera.com]
Sent: Tuesday, July 26, 2011 5:01 PM
To: common-user@hadoop.apache.org
Subject: Re: Submitting and running hadoop jobs Programmatically

Yes. Internally, it calls regular submit APIs.

On Tue, Jul 26, 2011 at 4:32 PM, madhu phatak phatak@gmail.com
wrote:
I am using JobControl.add() to add a job and running job control in
a separate thread and using JobControl.allFinished() to see all jobs
completed or not . Is this work same as Job.submit()??

On Tue, Jul 26, 2011 at 4:08 PM, Harsh J ha...@cloudera.com wrote:

Madhu,

Do you get a specific error message / stack trace? Could you also
paste your JT logs?

On Tue, Jul 26, 2011 at 4:05 PM, madhu phatak phatak@gmail.com
wrote:
Hi
I am using the same APIs but i am not able to run the jobs by just
adding
the configuration files and jars . It never create a job in Hadoop ,
it
just
shows cleaning up staging area and fails.

On Tue, Jul 26, 2011 at 3:46 PM, Devaraj K devara...@huawei.com
wrote:

Hi Madhu,

You can submit the jobs using the Job API's programmatically from
any
system. The job submission code can be written this way.

// Create a new Job
Job job = new Job(new Configuration());
job.setJarByClass(MyJob.class);

// Specify various job-specific parameters
job.setJobName(myjob);

job.setInputPath(new Path(in));
job.setOutputPath(new Path(out));

job.setMapperClass(MyJob.MyMapper.class);
job.setReducerClass(MyJob.MyReducer.class);

// Submit the job
job.submit();

For submitting this, need to add the hadoop jar files and
configuration
files in the class path of the application from where you want to
submit
the
job.

You can refer this docs for more info on Job API's.

http://hadoop.apache.org/mapreduce/docs/current/api/org/apache/hadoop/mapred
uce/Job.html

Devaraj K

-Original Message-
From: madhu phatak [mailto:phatak@gmail.com]
Sent: Tuesday, July 26, 2011 3:29 PM
To: common-user@hadoop.apache.org
Subject: Submitting and running hadoop jobs Programmatically

Hi,
I am working on a open source project
Nectarhttps://github.com/zinnia-phatak-dev/Nectar where
i am trying to create the hadoop jobs depending upon the user input.
I
was
using Java Process API to run the bin/hadoop shell script to submit
the
jobs. But it seems not good way because the process creation model is
not consistent across different operating systems . Is there any
better
way
to submit the jobs rather than invoking the shell script? I am using
hadoop-0.21.0 version and i am running my program in the same user
where
hadoop is installed . Some of the older thread told if I add
configuration
files in path it will work fine . But i am not able to run in that
way
.
So
anyone tried this before? If So , please can you give detailed
instruction
how to achieve it . Advanced thanks for your help.

Regards,
Madhukara Phatak

--
Harsh J

questions regarding data storage and inputformat

2011-07-27 Thread Tom Melendez

Hi Folks,

I have a bunch of binary files which I've stored in a sequencefile.
The name of the file is the key, the data is the value and I've stored
them sorted by key.  (I'm not tied to using a sequencefile for this).
The current test data is only 50MB, but the real data will be 500MB -
1GB.

My M/R job requires that it's input be several of these records in the
sequence file, which is determined by the key.  The sorting mentioned
above keeps these all packed together.

1. Any reason not to use a sequence file for this?  Perhaps a mapfile?
 Since I've sorted it, I don't need random accesses, but I do need
to be aware of the keys, as I need to be sure that I get all of the
relevant keys sent to a given mapper

2. Looks like I want a custom inputformat for this, extending
SequenceFileInputFormat.  Do you agree?  I'll gladly take some
opinions on this, as I ultimately want to split the based on what's in
the file, which might be a little unorthodox.

3. Another idea might be create separate seq files for chunk of
records and make them non-splittable, ensuring that they go to a
single mapper.  Assuming I can get away with this, see any pros/cons
with that approach?

Thanks,

Tom

-- 
===
Skybox is hiring.
http://www.skyboximaging.com/careers/jobs

Re: Submitting and running hadoop jobs Programmatically

2011-07-27 Thread Steve Loughran


On 27/07/11 05:55, madhu phatak wrote:

Hi
I am submitting the job as follows

java -cp
  
Nectar-analytics-0.0.1-SNAPSHOT.jar:/home/hadoop/hadoop-for-nectar/hadoop-0.21.0/conf/*:$HADOOP_COMMON_HOME/lib/*:$HADOOP_COMMON_HOME/*
com.zinnia.nectar.regression.hadoop.primitive.jobs.SigmaJob input/book.csv
kkk11fffrrw 1


My code to submit jobs (via a declarative configuration) is up online

http://smartfrog.svn.sourceforge.net/viewvc/smartfrog/trunk/core/hadoop-components/hadoop-ops/src/org/smartfrog/services/hadoop/operations/components/submitter/SubmitterImpl.java?revision=8590view=markup

It's LGPL, but ask nicely and I'll change the header to Apache.

That code doesn't set up the classpath by pushing out more JARs (I'm 
planning to push out .groovy scripts instead), but it can also poll for 
job completion, take a timeout (useful in small test runs), and do other 
things. I currently mainly use it for testing

Re: Submitting and running hadoop jobs Programmatically

2011-07-27 Thread madhu phatak

Thank you . Will have a look on it.

On Wed, Jul 27, 2011 at 3:28 PM, Steve Loughran ste...@apache.org wrote:

 On 27/07/11 05:55, madhu phatak wrote:

 Hi
 I am submitting the job as follows

 java -cp
  Nectar-analytics-0.0.1-**SNAPSHOT.jar:/home/hadoop/**
 hadoop-for-nectar/hadoop-0.21.**0/conf/*:$HADOOP_COMMON_HOME/**
 lib/*:$HADOOP_COMMON_HOME/*
 com.zinnia.nectar.regression.**hadoop.primitive.jobs.SigmaJob
 input/book.csv
 kkk11fffrrw 1


 My code to submit jobs (via a declarative configuration) is up online

 http://smartfrog.svn.**sourceforge.net/viewvc/**
 smartfrog/trunk/core/hadoop-**components/hadoop-ops/src/org/**
 smartfrog/services/hadoop/**operations/components/**
 submitter/SubmitterImpl.java?**revision=8590view=markuphttp://smartfrog.svn.sourceforge.net/viewvc/smartfrog/trunk/core/hadoop-components/hadoop-ops/src/org/smartfrog/services/hadoop/operations/components/submitter/SubmitterImpl.java?revision=8590view=markup

 It's LGPL, but ask nicely and I'll change the header to Apache.

 That code doesn't set up the classpath by pushing out more JARs (I'm
 planning to push out .groovy scripts instead), but it can also poll for job
 completion, take a timeout (useful in small test runs), and do other things.
 I currently mainly use it for testing

Re: error of loading logging class

2011-07-27 Thread madhu phatak

Its the problem of multiple versions of same jar.

On Thu, Jul 21, 2011 at 5:15 PM, Steve Loughran ste...@apache.org wrote:

 On 20/07/11 07:16, Juwei Shi wrote:

 Hi,

 We faced a problem of loading logging class when start the name node.  It
 seems that hadoop can not find commons-logging-*.jar

 We have tried other commons-logging-1.0.4.jar and
 commons-logging-api-1.0.4.jar. It does not work!

 The following are error logs from starting console:


 I'd drop the -api file as it isn't needed, and as you say, avoid duplicate
 versions. Make sure that log4j is at the same point in the class hierarchy
 too (e.g in hadoop/lib)

 to debug commons logging, tell it to log to stderr. It's useful in
 emergencies

 -Dorg.apache.commons.logging.**diagnostics.dest=STDERR

Re: questions regarding data storage and inputformat

2011-07-27 Thread Joey Echeverria

 1. Any reason not to use a sequence file for this?  Perhaps a mapfile?
  Since I've sorted it, I don't need random accesses, but I do need
 to be aware of the keys, as I need to be sure that I get all of the
 relevant keys sent to a given mapper

MapFile *may* be better here (see my answer for 2 below).

 2. Looks like I want a custom inputformat for this, extending
 SequenceFileInputFormat.  Do you agree?  I'll gladly take some
 opinions on this, as I ultimately want to split the based on what's in
 the file, which might be a little unorthodox.

If you need to split based on where certain keys are in the file, then
a SequenceFile isn't a great solution. It would require that your
InputFormat scan through all of the data just to find split points.
Assuming you know what keys to split on ahead of time, you could use
MapFiles and find the exact split point more quickly.

 3. Another idea might be create separate seq files for chunk of
 records and make them non-splittable, ensuring that they go to a
 single mapper.  Assuming I can get away with this, see any pros/cons
 with that approach?

Separate sequence files would require the least amount of custom code.

-Joey

-- 
Joseph Echeverria
Cloudera, Inc.
443.305.9434

Re: Multiple Output Formats

2011-07-27 Thread Alejandro Abdelnur

Roger,

Or you can take a look at Hadoop's MultipleOutputs class.

Thanks.

Alejandro

On Tue, Jul 26, 2011 at 11:30 PM, Luca Pireddu pire...@crs4.it wrote:

 On July 26, 2011 06:11:33 PM Roger Chen wrote:
  Hi all,
 
  I am attempting to implement MultipleOutputFormat to write data to
 multiple
  files dependent on the output keys and values. Can somebody provide a
  working example with how to implement this in Hadoop 0.20.2?
 
  Thanks!

 Hello,

 I have a working sample here:

 http://biodoop-seal.bzr.sourceforge.net/bzr/biodoop-
 seal/trunk/annotate/head%3A/src/it/crs4/seal/demux/DemuxOutputFormat.java

 It extends FileOutputFormat.

 --
 Luca Pireddu
 CRS4 - Distributed Computing Group
 Loc. Pixina Manna Edificio 1
 Pula 09010 (CA), Italy
 Tel:  +39 0709250452

Re: Cygwin not working with Hadoop and Eclipse Plugin

2011-07-27 Thread A Df

See (inline at ***)

Cheers,
A Df

From: Harsh J ha...@cloudera.com
To: common-user@hadoop.apache.org; A Df abbey_dragonfor...@yahoo.com
Sent: Tuesday, 26 July 2011, 21:29
Subject: Re: Cygwin not working with Hadoop and Eclipse Plugin

A Df,

On Wed, Jul 27, 2011 at 1:42 AM, A Df abbey_dragonfor...@yahoo.com wrote:
 Harsh:

 See (inline at the **) I hope its easy to follow and for the other 
 responses, I was not sure how to respond to get everything into one. Sorry 
 for top posting!

Np! I don't strongly enforce a style of reply so long as it is
visible, and readable :)

 Eric where would I put the line below and explain in newbie terms, thanks:
 PATH=$PATH:/cygdrive/c/cygwin/bin:/cygdrive/c/cygwin/usr/bin

*** I have added it to the PATH variable as 
$PATH:/cygdrive/c/cygwin/bin:/cygdrive/c/cygwin/usr/bin, I hope this is 
correct.

You'd set this in your Windows environment. A good guide (googled
link): 
http://geekswithblogs.net/renso/archive/2009/10/21/how-to-set-the-windows-path-in-windows-7.aspx

The last known version I'd heard had a no-complains, fully-working
eclipse plugin along with it was Hadoop 0.20.2 (although stable is
203, I've seen lots of issues pop up with eclipse plugin from members
on the ML, but someone else can comment better on if its fixed for 204
or is a non-issue). I've used this one personally on Windows myself
and things work. I think there was just one issue one could encounter
somehow and I'd covered it in a blog post some time ago, here:
http://www.harshj.com/2010/07/18/making-the-eclipse-plugin-work-for-hadoop/

** I tried to use the patch but my cygwin gives the error: bash: patch: 
command not found

Feared you may face it. You need to install the patch program from
Cygwin's package manager/installer. I believe the package name is
(iirc): patchutils

*** I installed the patchutils but now when I reach the ant command it gives 
error: bash: ant: command not found. I searched for a ant plugin but I don't 
see any. How do I get to run the line ant
-Declipse.home=$ECLIPSE_HOME binary?

Beyond that, the tutorial at v-lad.org is the one I'd recommend
following. It has worked well for me over time.

** yes, the screenshots and instructions are easy to follow just that I seem 
to always have a problem with the plugin or cygwin

What specific error do you get when you load the plugin or start the
daemons via cygwin shell, etc.? Its easier for folks to answer if they
see an error message, or a stacktrace.

 I wanted to test on Windows first to get a feel of Hadoop since I am new 
 to it and also because I am newbie Unix/Linux user. I have been trying to 
 follow the tutorials shown at the link above but each time I run into 
 errors with the plugin or not recognizing the import or JAVA_HOME not set. 
 Please can I get some help. Thanks

I'd say use Linux when/where possible. A VM is a good choice as well,
as James pointed out above, if your hardware can handle it.

** Harsh and James, I tried the vmware from the Yahoo tutorial but I had 
problems with the plugin too.

You can setup a raw linux VM, and install stuff atop. I've had better
success with the VMs Cloudera offers:
https://ccp.cloudera.com/display/SUPPORT/Cloudera's+Hadoop+Demo+VM
(They come ready with the whole stack). But basically it all boils
down to using a Linux VM, wherever you source it from.

-- 
Harsh J

cygwin not connecting to Hadoop server

2011-07-27 Thread A Df

Hi All:

I am have Hadoop 0.20.2 and I am using cygwin on Windows 7. I modified the 
files as shown below for the Hadoop configuration.

conf/core-site.xml:

configuration
 property
 namefs.default.name/name
 valuehdfs://localhost:9100/value
 /property
/configuration


conf/hdfs-site.xml:

configuration
 property
 namedfs.replication/name
 value1/value
 /property
/configuration


conf/mapred-site.xml:

configuration
 property
 namemapred.job.tracker/name
 valuelocalhost:9101/value
 /property
/configuration

Then I have the PATH variable with
$PATH:/cygdrive/c/cygwin/bin:/cygdrive/c/cygwin/usr/bin

I added JAVA_HOME to the file in 
cygwin\home\Williams\hadoop-0.20.2\conf\hadoop-env.sh. 
My Java home is now at C:\Java\jdk1.6.0_26 so there is not space. I also turned 
off my firewall. 
However, I get the error from the command line:

CODE
Williams@TWilliams-LTPC ~
$ pwd
/home/Williams

Williams@TWilliams-LTPC ~
$ cd hadoop-0.20.2

Williams@TWilliams-LTPC ~/hadoop-0.20.2
$ bin/start-all.sh
starting namenode, logging to /home/Williams/hadoop-0.20.2/bin/../logs/hadoop-Wi
lliams-namenode-TWilliams-LTPC.out
localhost: starting datanode, logging to /home/Williams/hadoop-0.20.2/bin/../log
s/hadoop-Williams-datanode-TWilliams-LTPC.out
localhost: starting secondarynamenode, logging to /home/Williams/hadoop-0.20.2/b
in/../logs/hadoop-Williams-secondarynamenode-TWilliams-LTPC.out
starting jobtracker, logging to /home/Williams/hadoop-0.20.2/bin/../logs/hadoop-
Williams-jobtracker-TWilliams-LTPC.out
localhost: starting tasktracker, logging to /home/Williams/hadoop-0.20.2/bin/../
logs/hadoop-Williams-tasktracker-TWilliams-LTPC.out

Williams@TWilliams-LTPC ~/hadoop-0.20.2
$ bin/hadoop fs -put conf input
11/07/27 17:11:28 INFO ipc.Client: Retrying connect to server: localhost/127.0.0
.1:9100. Already tried 0 time(s).
11/07/27 17:11:30 INFO ipc.Client: Retrying connect to server: localhost/127.0.0
.1:9100. Already tried 1 time(s).
11/07/27 17:11:32 INFO ipc.Client: Retrying connect to server: localhost/127.0.0
.1:9100. Already tried 2 time(s).
11/07/27 17:11:34 INFO ipc.Client: Retrying connect to server: localhost/127.0.0
.1:9100. Already tried 3 time(s).
11/07/27 17:11:36 INFO ipc.Client: Retrying connect to server: localhost/127.0.0
.1:9100. Already tried 4 time(s).
11/07/27 17:11:38 INFO ipc.Client: Retrying connect to server: localhost/127.0.0
.1:9100. Already tried 5 time(s).
11/07/27 17:11:40 INFO ipc.Client: Retrying connect to server: localhost/127.0.0
.1:9100. Already tried 6 time(s).
11/07/27 17:11:43 INFO ipc.Client: Retrying connect to server: localhost/127.0.0
.1:9100. Already tried 7 time(s).
11/07/27 17:11:45 INFO ipc.Client: Retrying connect to server: localhost/127.0.0
.1:9100. Already tried 8 time(s).
11/07/27 17:11:47 INFO ipc.Client: Retrying connect to server: localhost/127.0.0
.1:9100. Already tried 9 time(s).
Bad connection to FS. command aborted.

Williams@TWilliams-LTPC ~/hadoop-0.20.2
$ bin/hadoop fs -put conf input
11/07/27 17:17:29 INFO ipc.Client: Retrying connect to server: localhost/127.0.0
.1:9100. Already tried 0 time(s).
11/07/27 17:17:31 INFO ipc.Client: Retrying connect to server: localhost/127.0.0
.1:9100. Already tried 1 time(s).
11/07/27 17:17:33 INFO ipc.Client: Retrying connect to server: localhost/127.0.0
.1:9100. Already tried 2 time(s).
11/07/27 17:17:35 INFO ipc.Client: Retrying connect to server: localhost/127.0.0
.1:9100. Already tried 3 time(s).
11/07/27 17:17:37 INFO ipc.Client: Retrying connect to server: localhost/127.0.0
.1:9100. Already tried 4 time(s).
11/07/27 17:17:39 INFO ipc.Client: Retrying connect to server: localhost/127.0.0
.1:9100. Already tried 5 time(s).
11/07/27 17:17:41 INFO ipc.Client: Retrying connect to server: localhost/127.0.0
.1:9100. Already tried 6 time(s).
11/07/27 17:17:44 INFO ipc.Client: Retrying connect to server: localhost/127.0.0
.1:9100. Already tried 7 time(s).
11/07/27 17:17:46 INFO ipc.Client: Retrying connect to server: localhost/127.0.0
.1:9100. Already tried 8 time(s).
11/07/27 17:17:48 INFO ipc.Client: Retrying connect to server: localhost/127.0.0
.1:9100. Already tried 9 time(s).
Bad connection to FS. command aborted.

Williams@TWilliams-LTPC ~/hadoop-0.20.2
$ ping 127.0.0.1:9100
Ping request could not find host 127.0.0.1:9100. Please check the name and try a
gain.
/CODE

I am not sure why the ip address seem to have localhost/127.0.0.1 which seems 
to be repeating itself. The conf files are fine. I also know that when Hadoop 
is running there is a web interface to check but do the default ones work from 
cygwin which are:
    * NameNode - http://localhost:50070/
    * JobTracker - http://localhost:50030/

I wanted to give the cygwin a try once more before just switching to a cloudera 
hadoop vmware. I was hoping that it would not have so many problems just to get 
it working on Windows! Thanks again.

Cheers,
A Df

Re: cygwin not connecting to Hadoop server

2011-07-27 Thread Uma Maheswara Rao G 72686


Hi A Df,

Did you format the NameNode first?

Can you check the NN logs whether NN is started or not?

Regards,
Uma
**
 This email and its attachments contain confidential information from HUAWEI, 
which is intended only for the person or entity whose address is listed above. 
Any use of the information contained here in any way (including, but not 
limited to, total or partial disclosure, reproduction, or dissemination) by 
persons other than the intended recipient(s) is prohibited. If you receive this 
email in error, please notify the sender by phone or email immediately and 
delete it!
 
*

- Original Message -
From: A Df abbey_dragonfor...@yahoo.com
Date: Wednesday, July 27, 2011 9:55 pm
Subject: cygwin not connecting to Hadoop server
To: common-user@hadoop.apache.org common-user@hadoop.apache.org

 Hi All:
 
 I am have Hadoop 0.20.2 and I am using cygwin on Windows 7. I 
 modified the files as shown below for the Hadoop configuration.
 
 conf/core-site.xml:
 
 configuration
  property
  namefs.default.name/name
  valuehdfs://localhost:9100/value
  /property
 /configuration
 
 
 conf/hdfs-site.xml:
 
 configuration
  property
  namedfs.replication/name
  value1/value
  /property
 /configuration
 
 
 conf/mapred-site.xml:
 
 configuration
  property
  namemapred.job.tracker/name
  valuelocalhost:9101/value
  /property
 /configuration
 
 Then I have the PATH variable with
 $PATH:/cygdrive/c/cygwin/bin:/cygdrive/c/cygwin/usr/bin
 
 I added JAVA_HOME to the file in cygwin\home\Williams\hadoop-
 0.20.2\conf\hadoop-env.sh. 
 My Java home is now at C:\Java\jdk1.6.0_26 so there is not space. I 
 also turned off my firewall. 
 However, I get the error from the command line:
 
 CODE
 Williams@TWilliams-LTPC ~
 $ pwd
 /home/Williams
 
 Williams@TWilliams-LTPC ~
 $ cd hadoop-0.20.2
 
 Williams@TWilliams-LTPC ~/hadoop-0.20.2
 $ bin/start-all.sh
 starting namenode, logging to /home/Williams/hadoop-
 0.20.2/bin/../logs/hadoop-Wi
 lliams-namenode-TWilliams-LTPC.out
 localhost: starting datanode, logging to /home/Williams/hadoop-
 0.20.2/bin/../logs/hadoop-Williams-datanode-TWilliams-LTPC.out
 localhost: starting secondarynamenode, logging to 
 /home/Williams/hadoop-0.20.2/b
 in/../logs/hadoop-Williams-secondarynamenode-TWilliams-LTPC.out
 starting jobtracker, logging to /home/Williams/hadoop-
 0.20.2/bin/../logs/hadoop-
 Williams-jobtracker-TWilliams-LTPC.out
 localhost: starting tasktracker, logging to /home/Williams/hadoop-
 0.20.2/bin/../logs/hadoop-Williams-tasktracker-TWilliams-LTPC.out
 
 Williams@TWilliams-LTPC ~/hadoop-0.20.2
 $ bin/hadoop fs -put conf input
 11/07/27 17:11:28 INFO ipc.Client: Retrying connect to server: 
 localhost/127.0.0.1:9100. Already tried 0 time(s).
 11/07/27 17:11:30 INFO ipc.Client: Retrying connect to server: 
 localhost/127.0.0.1:9100. Already tried 1 time(s).
 11/07/27 17:11:32 INFO ipc.Client: Retrying connect to server: 
 localhost/127.0.0.1:9100. Already tried 2 time(s).
 11/07/27 17:11:34 INFO ipc.Client: Retrying connect to server: 
 localhost/127.0.0.1:9100. Already tried 3 time(s).
 11/07/27 17:11:36 INFO ipc.Client: Retrying connect to server: 
 localhost/127.0.0.1:9100. Already tried 4 time(s).
 11/07/27 17:11:38 INFO ipc.Client: Retrying connect to server: 
 localhost/127.0.0.1:9100. Already tried 5 time(s).
 11/07/27 17:11:40 INFO ipc.Client: Retrying connect to server: 
 localhost/127.0.0.1:9100. Already tried 6 time(s).
 11/07/27 17:11:43 INFO ipc.Client: Retrying connect to server: 
 localhost/127.0.0.1:9100. Already tried 7 time(s).
 11/07/27 17:11:45 INFO ipc.Client: Retrying connect to server: 
 localhost/127.0.0.1:9100. Already tried 8 time(s).
 11/07/27 17:11:47 INFO ipc.Client: Retrying connect to server: 
 localhost/127.0.0.1:9100. Already tried 9 time(s).
 Bad connection to FS. command aborted.
 
 Williams@TWilliams-LTPC ~/hadoop-0.20.2
 $ bin/hadoop fs -put conf input
 11/07/27 17:17:29 INFO ipc.Client: Retrying connect to server: 
 localhost/127.0.0.1:9100. Already tried 0 time(s).
 11/07/27 17:17:31 INFO ipc.Client: Retrying connect to server: 
 localhost/127.0.0.1:9100. Already tried 1 time(s).
 11/07/27 17:17:33 INFO ipc.Client: Retrying connect to server: 
 localhost/127.0.0.1:9100. Already tried 2 time(s).
 11/07/27 17:17:35 INFO ipc.Client: Retrying connect to server: 
 localhost/127.0.0.1:9100. Already tried 3 time(s).
 11/07/27 17:17:37 INFO ipc.Client: Retrying connect to server: 
 localhost/127.0.0.1:9100. Already tried 4 time(s).
 11/07/27 17:17:39 INFO ipc.Client: Retrying connect to server: 
 localhost/127.0.0.1:9100. Already tried 5 time(s).
 11/07/27 17:17:41 INFO ipc.Client: Retrying connect to server: 
 localhost/127.0.0.1:9100. Already tried 6 time(s).
 11/07/27 17:17:44 INFO

cannot get configuration settings from API

2011-07-27 Thread Sofia Georgiakaki

Good afternoon,

during writing a MapReduce job, I need to get the value of some configuration 
settings.
For instance, I need to get the value of dfs.write.packet.size inside the 
reducer, so I write, using the context of the reducer:
                Configuration the_conf=context.getConfiguration();
                int data_packet_size=the_conf.getInt(dfs.write.packet.size, 
0);

However, this does not return 64KB (which is the value by default), but it 
gives 0 instead. 

Could you please help me, by telling me how I can get and set the value of 
these configuration parameters? 


Thank you in advance,
Sofia

Re: cygwin not connecting to Hadoop server

2011-07-27 Thread A Df

See inline at **. More questions and many Thanks :D

From: Uma Maheswara Rao G 72686 mahesw...@huawei.com
To: common-user@hadoop.apache.org; A Df abbey_dragonfor...@yahoo.com
Cc: common-user@hadoop.apache.org common-user@hadoop.apache.org
Sent: Wednesday, 27 July 2011, 17:31
Subject: Re: cygwin not connecting to Hadoop server

Hi A Df,

Did you format the NameNode first?

** I had formatted it already but then I had reinstalled Java and upgraded the 
plugins in cygwin so I reformatted it again. :D yes it worked!! I am not sure 
all the steps that got it to finally work but I will have to document it to 
prevent this headache in the future. Although I typed ssh localhost too , so 
question is, do I need to type ssh localhost each time I need to run hadoop?? 
Also, since I need to work with Eclipse maybe you can have a look at my post 
about the plugin cause I can get the patch to work. The subject is Re: Cygwin 
not working with Hadoop and Eclipse Plugin. I plan to read up on how to write 
programs for Hadoop. I am using the tutorial at Yahoo but if you know of any 
really good about coding with Hadoop or just about understanding Hadoop then 
please let me know.

Can you check the NN logs whether NN is started or not?
** I checked and the previous runs had some logs missing but now the last one 
have all 5 logs and I got two conf files in xml. I also copied out the other 
output files which I plan to examine. Where do I specify the output extension 
format that I want for my output file? I was hoping for an txt file it shows 
the output in a file with no extension even though I can read it in Notepad++. 
I also got to view the web interface at:
    NameNode - http://localhost:50070/
    JobTracker - http://localhost:50030/

** See below for the working version, finally!! Thanks
CMD
Williams@TWilliams-LTPC ~/hadoop-0.20.2
$ bin/hadoop jar hadoop-0.20.2-examples.jar grep input
11/07/27 17:42:20 INFO mapred.FileInputFormat: Total in

11/07/27 17:42:20 INFO mapred.JobClient: Running job: j
11/07/27 17:42:21 INFO mapred.JobClient:  map 0% reduce
11/07/27 17:42:33 INFO mapred.JobClient:  map 15% reduc
11/07/27 17:42:36 INFO mapred.JobClient:  map 23% reduc
11/07/27 17:42:39 INFO mapred.JobClient:  map 38% reduc
11/07/27 17:42:42 INFO mapred.JobClient:  map 38% reduc
11/07/27 17:42:45 INFO mapred.JobClient:  map 53% reduc
11/07/27 17:42:48 INFO mapred.JobClient:  map 69% reduc
11/07/27 17:42:51 INFO mapred.JobClient:  map 76% reduc
11/07/27 17:42:54 INFO mapred.JobClient:  map 92% reduc
11/07/27 17:42:57 INFO mapred.JobClient:  map 100% redu
11/07/27 17:43:06 INFO mapred.JobClient:  map 100% redu
11/07/27 17:43:09 INFO mapred.JobClient: Job complete:
11/07/27 17:43:09 INFO mapred.JobClient: Counters: 18
11/07/27 17:43:09 INFO mapred.JobClient:   Job Counters
11/07/27 17:43:09 INFO mapred.JobClient: Launched r
11/07/27 17:43:09 INFO mapred.JobClient: Launched m
11/07/27 17:43:09 INFO mapred.JobClient: Data-local
11/07/27 17:43:09 INFO mapred.JobClient:   FileSystemCo
11/07/27 17:43:09 INFO mapred.JobClient: FILE_BYTES
11/07/27 17:43:09 INFO mapred.JobClient: HDFS_BYTES
11/07/27 17:43:09 INFO mapred.JobClient: FILE_BYTES
11/07/27 17:43:09 INFO mapred.JobClient: HDFS_BYTES
11/07/27 17:43:09 INFO mapred.JobClient:   Map-Reduce F
11/07/27 17:43:09 INFO mapred.JobClient: Reduce inp
11/07/27 17:43:09 INFO mapred.JobClient: Combine ou
11/07/27 17:43:09 INFO mapred.JobClient: Map input
11/07/27 17:43:09 INFO mapred.JobClient: Reduce shu
11/07/27 17:43:09 INFO mapred.JobClient: Reduce out
11/07/27 17:43:09 INFO mapred.JobClient: Spilled Re
11/07/27 17:43:09 INFO mapred.JobClient: Map output
11/07/27 17:43:09 INFO mapred.JobClient: Map input
11/07/27 17:43:09 INFO mapred.JobClient: Combine in
11/07/27 17:43:09 INFO mapred.JobClient: Map output
11/07/27 17:43:09 INFO mapred.JobClient: Reduce inp
11/07/27 17:43:09 WARN mapred.JobClient: Use GenericOpt
e arguments. Applications should implement Tool for the
11/07/27 17:43:09 INFO mapred.FileInputFormat: Total in
11/07/27 17:43:09 INFO mapred.JobClient: Running job: j
11/07/27 17:43:10 INFO mapred.JobClient:  map 0% reduce
11/07/27 17:43:22 INFO mapred.JobClient:  map 100% redu
11/07/27 17:43:31 INFO mapred.JobClient:  map 100% redu
11/07/27 17:43:36 INFO mapred.JobClient:  map 100% redu
11/07/27 17:43:38 INFO mapred.JobClient: Job complete:
11/07/27 17:43:39 INFO mapred.JobClient: Counters: 18
11/07/27 17:43:39 INFO mapred.JobClient:   Job Counters
11/07/27 17:43:39 INFO mapred.JobClient: Launched r
11/07/27 17:43:39 INFO mapred.JobClient: Launched m
11/07/27 17:43:39 INFO mapred.JobClient: Data-local
11/07/27 17:43:39 INFO mapred.JobClient:   FileSystemCo
11/07/27 17:43:39 INFO mapred.JobClient: FILE_BYTES
11/07/27 17:43:39 INFO mapred.JobClient: HDFS_BYTES
11/07/27 17:43:39 INFO mapred.JobClient: FILE_BYTES
11/07/27 17:43:39 INFO

Re: questions regarding data storage and inputformat

2011-07-27 Thread Tom Melendez


 3. Another idea might be create separate seq files for chunk of
 records and make them non-splittable, ensuring that they go to a
 single mapper.  Assuming I can get away with this, see any pros/cons
 with that approach?

 Separate sequence files would require the least amount of custom code.


Thanks for the response, Joey.

So, if I were to do the above, I would still need a custom record
reader to put all the keys and values together, right?

Thanks,

Tom

-- 
===
Skybox is hiring.
http://www.skyboximaging.com/careers/jobs

OSX starting hadoop error

2011-07-27 Thread Ben Cuthbert

All
When starting hadoop on OSX I am getting this error. is there a fix for it

java[22373:1c03] Unable to load realm info from SCDynamicStore

RE: Hadoop-streaming using binary executable c program

2011-07-27 Thread Daniel Yehdego

Hi Bobby,

I just want to ask you if there is away of using a reducer or something like
concatenation to glue my outputs from the mapper and outputs
them as a single file and segment of the predicted RNA 2D structure?

FYI: I have used a reducer NONE before:

HADOOP_HOME$ bin/hadoop jar
/data/yehdego/hadoop-0.20.2/hadoop-0.20.2-streaming.jar -mapper
./hadoopPknotsRG -file /data/yehdego/hadoop-0.20.2/pknotsRG -file
/data/yehdego/hadoop-0.20.2/hadoopPknotsRG -input
/user/yehdego/RF00028_B.bpseqL3G5_seg_Centered_Method.txt -output
/user/yehdego/RF-out -reducer NONE -verbose

and a sample of my output using the mapper of two different slave nodes looks
like this :

AUACCCGCAAAUUCACUCAAAUCUGUAAUAGGUUUGUCAUUCAAAUCUAGUGCAAAUAUUACUUUCGCCAAUUAGGUAUAAUAAUGGUAAGC
and
[...(((...))).].
(-13.46)

GGGACAAGACUCGACAUUUGAUACACUAUUUAUCAAUGGAUGUCUUCU
.(((.((......).. (-11.00)

and I want to concatenate and output them as a single predicated RNA sequence
structure:

AUACCCGCAAAUUCACUCAAAUCUGUAAUAGGUUUGUCAUUCAAAUCUAGUGCAAAUAUUACUUUCGCCAAUUAGGUAUAAUAAUGGUAAGCGGGACAAGACUCGACAUUUGAUACACUAUUUAUCAAUGGAUGUCUUCU

[...(((...))).]..(((.((......)..

Regards,

Daniel T. Yehdego
Computational Science Program
University of Texas at El Paso, UTEP
dtyehd...@miners.utep.edu

From: dtyehd...@miners.utep.edu
To: common-user@hadoop.apache.org
Subject: RE: Hadoop-streaming using binary executable c program
Date: Tue, 26 Jul 2011 16:23:10 +

Good afternoon Bobby,

Thanks so much, now its working excellent. And the speed is also reasonable.
Once again thanks u.

Regards,

Daniel T. Yehdego
Computational Science Program
University of Texas at El Paso, UTEP
dtyehd...@miners.utep.edu

From: ev...@yahoo-inc.com
To: common-user@hadoop.apache.org
Date: Mon, 25 Jul 2011 14:47:34 -0700
Subject: Re: Hadoop-streaming using binary executable c program

This is likely to be slow and it is not ideal. The ideal would be to
modify pknotsRG to be able to read from stdin, but that may not be possible.

The shell script would probably look something like the following

#!/bin/sh
rm -f temp.txt;
while read line
do
echo $line temp.txt;
done
exec pknotsRG temp.txt;

Place it in a file say hadoopPknotsRG Then you probably want to run

chmod +x hadoopPknotsRG

After that you want to test it with

hadoop fs -cat
/user/yehdego/RNAData/RF00028_B.bpseqL3G5_seg_Centered_Method.txt | head -2
| ./hadoopPknotsRG

If that works then you can try it with Hadoop streaming

--Bobby

On 7/25/11 3:37 PM, Daniel Yehdego dtyehd...@miners.utep.edu wrote:

Good afternoon Bobby,

Thanks, you gave me a great help in finding out what the problem was. After
I put the command line you suggested me, I found out that there was a
segmentation error.
The binary executable program pknotsRG only reads a file with a sequence in
it. This means, there should be a shell script, as you have said, that will
take the data coming
from stdin and write it to a temporary file. Any idea on how to do this job
in shell script. The thing is I am from a biology background and don't have
much experience in CS.
looking forward to hear from you. Thanks so much.

Regards,

Daniel T. Yehdego
Computational Science Program
University of Texas at El Paso, UTEP
dtyehd...@miners.utep.edu

From: ev...@yahoo-inc.com
To: common-user@hadoop.apache.org
Date: Fri, 22 Jul 2011 12:39:08 -0700
Subject: Re: Hadoop-streaming using binary executable c program

I would suggest that you do the following to help you debug.

hadoop fs -cat
/user/yehdego/RNAData/RF00028_B.bpseqL3G5_seg_Centered_Method.txt | head
-2 | /data/yehdego/hadoop-0.20.2/pknotsRG-1.3/src/pknotsRG -

This is simulating what hadoop streaming is doing. Here we are taking
the first 2 lines out of the input file and feeding them to the stdin of
pknotsRG. The first step is to make sure that you can get your program
to run correctly with something like this. You may need to change the
command line to pknotsRG to get it to read the data it is processing from
stdin, instead of from a file. Alternatively you may need to write a
shell script that will take the data coming from stdin. Write it to a
file and then call pknotsRG on that

Re: questions regarding data storage and inputformat

2011-07-27 Thread Joey Echeverria

You could either use a custom RecordReader or you could override the
run() method on your Mapper class to do the merging before calling the
map() method.

-Joey

On Wed, Jul 27, 2011 at 11:09 AM, Tom Melendez t...@supertom.com wrote:

 3. Another idea might be create separate seq files for chunk of
 records and make them non-splittable, ensuring that they go to a
 single mapper.  Assuming I can get away with this, see any pros/cons
 with that approach?

 Separate sequence files would require the least amount of custom code.


 Thanks for the response, Joey.

 So, if I were to do the above, I would still need a custom record
 reader to put all the keys and values together, right?

 Thanks,

 Tom

 --
 ===
 Skybox is hiring.
 http://www.skyboximaging.com/careers/jobs




-- 
Joseph Echeverria
Cloudera, Inc.
443.305.9434

RE: Build Hadoop 0.20.2 from source

2011-07-27 Thread Eric Payne

Hi Vighnesh,

Also, Cloudera has a decent screencast that walks you through building in 
eclipse:

http://www.cloudera.com/blog/2009/04/configuring-eclipse-for-hadoop-development-a-screencast/

http://wiki.apache.org/hadoop/EclipseEnvironment

-Eric

 -Original Message-
 From: Uma Maheswara Rao G 72686 [mailto:mahesw...@huawei.com]
 Sent: Wednesday, July 27, 2011 12:47 AM
 To: common-user@hadoop.apache.org
 Subject: Re: Build Hadoop 0.20.2 from source
 
 Hi Vighnesh,
 
 Step 1) Download the code base from apache svn repository.
 Step 2) In root folder you can find build.xml file. In that folder just
 execute   a)ant  and b)ant eclipse
 
 this will generate the eclipse project setings files.
 
 After this directly you can import this project in you eclipse.
 
 Regards,
 Uma
 
 **
 
  This email and its attachments contain confidential information from
 HUAWEI, which is intended only for the person or entity whose address is
 listed above. Any use of the information contained here in any way
 (including, but not limited to, total or partial disclosure, reproduction,
 or dissemination) by persons other than the intended recipient(s) is
 prohibited. If you receive this email in error, please notify the sender
 by phone or email immediately and delete it!
 
 **
 ***
 
 - Original Message -
 From: Vighnesh Avadhani vighnesh.avadh...@gmail.com
 Date: Wednesday, July 27, 2011 11:08 am
 Subject: Build Hadoop 0.20.2 from source
 To: common-user@hadoop.apache.org
 
  Hi,
 
  I want to build Hadoop 0.20.2 from source using the Eclipse IDE.
  Can anyone
  help me with this?
 
  Regards,
  Vighnesh

Replication and failure

2011-07-27 Thread Mohit Anchlia

Just trying to understand what happens if there are 3 nodes with
replication set to 3 and one node fails. Does it fail the writes too?

If there is a link that I can look at will be great. I tried searching
but didn't see any definitive answer.

Thanks,
Mohit

File System Counters.

2011-07-27 Thread R V

Hello
 
I don't know if the question has been answered. I  am trying to understand the 
overlap between FILE_BYTES_READ and HDFS_BYTES_READ. What are the various 
components that provide value to this counter? For example when I see 
FILE_BYTES_READ for a specific task ( Map or Reduce ) , is it purely due to the 
spill during sort phase? If a HDFS read happens on a non local node, does the 
counter increase on the node where the data block resides? What happens when 
the data is local? does the counter increase for both HDFS_BYTES_READ and 
FILE_BYTES_READ? From the values I am seeing, this looks to be the case but I 
am not sure.
 
I am not very fluent in Java , and hence I don't fully understand the source . 
:-(
 
Raj

Re: Submitting and running hadoop jobs Programmatically

2011-07-27 Thread madhu phatak

Thank you Harsha . I am able to run the jobs by ditching *.

On Wed, Jul 27, 2011 at 11:41 AM, Harsh J ha...@cloudera.com wrote:

 Madhu,

 Ditch the '*' in the classpath element that has the configuration
 directory. The directory ought to be on the classpath, not the files
 AFAIK.

 Try and let us know if it then picks up the proper config (right now,
 its using the local mode).

 On Wed, Jul 27, 2011 at 10:25 AM, madhu phatak phatak@gmail.com
 wrote:
  Hi
  I am submitting the job as follows
 
  java -cp
 
  
 Nectar-analytics-0.0.1-SNAPSHOT.jar:/home/hadoop/hadoop-for-nectar/hadoop-0.21.0/conf/*:$HADOOP_COMMON_HOME/lib/*:$HADOOP_COMMON_HOME/*
  com.zinnia.nectar.regression.hadoop.primitive.jobs.SigmaJob
 input/book.csv
  kkk11fffrrw 1
 
  I get the log in CLI as below
 
  11/07/27 10:22:54 INFO security.Groups: Group mapping
  impl=org.apache.hadoop.security.ShellBasedUnixGroupsMapping;
  cacheTimeout=30
  11/07/27 10:22:54 INFO jvm.JvmMetrics: Initializing JVM Metrics with
  processName=JobTracker, sessionId=
  11/07/27 10:22:54 INFO jvm.JvmMetrics: Cannot initialize JVM Metrics with
  processName=JobTracker, sessionId= - already initialized
  11/07/27 10:22:54 WARN mapreduce.JobSubmitter: Use GenericOptionsParser
 for
  parsing the arguments. Applications should implement Tool for the same.
  11/07/27 10:22:54 INFO mapreduce.JobSubmitter: Cleaning up the staging
 area
 
 file:/tmp/hadoop-hadoop/mapred/staging/hadoop-1331241340/.staging/job_local_0001
 
  It doesn't create any job in hadoop.
 
  On Tue, Jul 26, 2011 at 5:11 PM, Devaraj K devara...@huawei.com wrote:
 
  Madhu,
 
   Can you check the client logs, whether any error/exception is coming
 while
  submitting the job?
 
  Devaraj K
 
  -Original Message-
  From: Harsh J [mailto:ha...@cloudera.com]
  Sent: Tuesday, July 26, 2011 5:01 PM
  To: common-user@hadoop.apache.org
  Subject: Re: Submitting and running hadoop jobs Programmatically
 
  Yes. Internally, it calls regular submit APIs.
 
  On Tue, Jul 26, 2011 at 4:32 PM, madhu phatak phatak@gmail.com
  wrote:
   I am using JobControl.add() to add a job and running job control in
   a separate thread and using JobControl.allFinished() to see all jobs
   completed or not . Is this work same as Job.submit()??
  
   On Tue, Jul 26, 2011 at 4:08 PM, Harsh J ha...@cloudera.com wrote:
  
   Madhu,
  
   Do you get a specific error message / stack trace? Could you also
   paste your JT logs?
  
   On Tue, Jul 26, 2011 at 4:05 PM, madhu phatak phatak@gmail.com
   wrote:
Hi
 I am using the same APIs but i am not able to run the jobs by just
   adding
the configuration files and jars . It never create a job in Hadoop
 ,
  it
   just
shows cleaning up staging area and fails.
   
On Tue, Jul 26, 2011 at 3:46 PM, Devaraj K devara...@huawei.com
  wrote:
   
Hi Madhu,
   
  You can submit the jobs using the Job API's programmatically
 from
  any
system. The job submission code can be written this way.
   
// Create a new Job
Job job = new Job(new Configuration());
job.setJarByClass(MyJob.class);
   
// Specify various job-specific parameters
job.setJobName(myjob);
   
job.setInputPath(new Path(in));
job.setOutputPath(new Path(out));
   
job.setMapperClass(MyJob.MyMapper.class);
job.setReducerClass(MyJob.MyReducer.class);
   
// Submit the job
job.submit();
   
   
   
For submitting this, need to add the hadoop jar files and
  configuration
files in the class path of the application from where you want to
  submit
the
job.
   
You can refer this docs for more info on Job API's.
   
   
  
 
 
 http://hadoop.apache.org/mapreduce/docs/current/api/org/apache/hadoop/mapred
uce/Job.html
   
   
   
Devaraj K
   
-Original Message-
From: madhu phatak [mailto:phatak@gmail.com]
Sent: Tuesday, July 26, 2011 3:29 PM
To: common-user@hadoop.apache.org
Subject: Submitting and running hadoop jobs Programmatically
   
Hi,
 I am working on a open source project
Nectarhttps://github.com/zinnia-phatak-dev/Nectar where
i am trying to create the hadoop jobs depending upon the user
 input.
  I
   was
using Java Process API to run the bin/hadoop shell script to
 submit
  the
jobs. But it seems not good way because the process creation model
 is
not consistent across different operating systems . Is there any
  better
   way
to submit the jobs rather than invoking the shell script? I am
 using
hadoop-0.21.0 version and i am running my program in the same user
  where
hadoop is installed . Some of the older thread told if I add
   configuration
files in path it will work fine . But i am not able to run in that
  way
  .
   So
anyone tried this before? If So , please can you give detailed
   instruction
how to achieve it . Advanced thanks for your help.
   
Regards,

Hadoop Question

2011-07-27 Thread Nitin Khandelwal

Hi All,

How can I determine if a file is being written to (by any thread) in HDFS. I
have a continuous process on the master node, which is tracking a particular
folder in HDFS for files to process. On the slave nodes, I am creating files
in the same folder using the following code :

At the slave node:

import org.apache.commons.io.IOUtils;
import org.apache.hadoop.fs.FileSystem;
import java.io.OutputStream;

OutputStream oStream = fileSystem.create(path);
IOUtils.write(Some String, oStream);
IOUtils.closeQuietly(oStream);


At the master node,
I am getting the earliest modified file in the folder. At times when I try
reading the file, I get nothing in the file, mostly because the slave might
be still finishing writing to the file. Is there any way, to somehow tell
the master, that the slave is still writing to the file and to check the
file sometime later for actual content.

Thanks,
-- 


Nitin Khandelwal

Re: Submitting and running hadoop jobs Programmatically

questions regarding data storage and inputformat

Re: Submitting and running hadoop jobs Programmatically

Re: Submitting and running hadoop jobs Programmatically

Re: error of loading logging class

Re: questions regarding data storage and inputformat

Re: Multiple Output Formats

Re: Cygwin not working with Hadoop and Eclipse Plugin

cygwin not connecting to Hadoop server

Re: cygwin not connecting to Hadoop server

cannot get configuration settings from API

Re: cygwin not connecting to Hadoop server

Re: questions regarding data storage and inputformat

OSX starting hadoop error

RE: Hadoop-streaming using binary executable c program

Re: questions regarding data storage and inputformat

RE: Build Hadoop 0.20.2 from source

Replication and failure

File System Counters.

Re: Submitting and running hadoop jobs Programmatically

Hadoop Question

21 matches

Site Navigation

Mail list logo

Footer information