Polymorphic behavior of Maps in One Job?

2009-04-06 Thread Sid123

If I have 2 I/P format classes set up...Both giving different name and value
pairs  Then Is it possibe to configure multiple map and reduce classes
in One job based on different key value pairs? If i overload the map()
method does the framework call them polymorphically based on the varying
parameters(the key and the value)? or do we need seperate classes?
For adding multiple  mappers and I am thinking of using:
MultipleInputs.addInputPath(JobConf conf, Path path, Class? extends
InputFormat inputFormatClass, Class? extends Mapper mapperClass) 
to add the mappers and my I/P format. 
And use MultipleOutputs class to configure the O/P from the mappers. 
IF this is right where do i add the multiple implementations for the
reducers in the JobConf??

-- 
View this message in context: 
http://www.nabble.com/Polymorphic-behavior-of-Maps-in-One-Job--tp22907228p22907228.html
Sent from the Hadoop core-user mailing list archive at Nabble.com.



Re: best practice: mapred.local vs dfs drives

2009-04-06 Thread Craig Macdonald

Thanks for the headsup.

C

Owen O'Malley wrote:

We always share the drives.

-- Owen

On Apr 5, 2009, at 0:52, zsongbo zson...@gmail.com wrote:

I usually set mapred.local.dir to share the disk space with DFS, 
since some

mapreduce job need big temp space.



On Fri, Apr 3, 2009 at 8:36 PM, Craig Macdonald 
cra...@dcs.gla.ac.ukwrote:



Hello all,

Following recent hardware discussions, I thought I'd ask a related
question. Our cluster nodes have 3 drives: 1x 160GB system/scratch 
and 2x

500GB DFS drives.

The 160GB system drive is partitioned such that 100GB is for job
mapred.local space. However, we find that for our application, 
mapred.local
free space for map output space is the limiting parameter on the 
number of

reducers we can have (our application prefers less reducers).

How do people normally work for dfs vs mapred.local space. Do you 
(a) share
the DFS drives with the task tracker temporary files, Or do you (b) 
keep

them on separate partitions or drives?

We originally went with (b) because it prevented a run-away job from 
eating

all the DFS space on the machine, however, I'm beginning to realise the
disadvantages.

Any comments?

Thanks

Craig






Re: RPM spec file for 0.19.1

2009-04-06 Thread Ian Soboroff
Simon Lewis si...@lewis.li writes:

 On 3 Apr 2009, at 15:11, Ian Soboroff wrote:
 Steve Loughran ste...@apache.org writes:

 I think from your perpective it makes sense as it stops anyone
 getting
 itchy fingers and doing their own RPMs.

 Um, what's wrong with that?

 I would certainly like the ability to build RPMs from a source
 checkout, anyone thought of putting a standard spec file in with the
 source somewhere?

Another vote for a .spec file to be included in the standard
distribution as a contrib.

If it's ok with Cloudera (since my spec file just came from them), I
will edit my JIRA to offer that proposal.  If it's Cloudera's spec
that's included, we should also include the init.d script templates
(which are already Apache licensed).

Ian



Hadoop Reduce Job errors, job gets killed.

2009-04-06 Thread Usman Waheed

Hi,

My Hadoop Map/Reduce is giving the following error message right about 
when it is 95% complete with the reducing step on one node. The process 
gets killed. The error message from the logs are noted below.

*java.io.IOException: Filesystem closed*, any ideas please?

2009-04-06 10:41:07,202 INFO org.apache.hadoop.streaming.PipeMapRed: 
Records R/W=10263370/642860
2009-04-06 10:41:17,203 INFO org.apache.hadoop.streaming.PipeMapRed: 
Records R/W=10263370/1033247
2009-04-06 10:41:27,437 INFO org.apache.hadoop.streaming.PipeMapRed: 
Records R/W=10263370/1844222
2009-04-06 10:41:37,438 INFO org.apache.hadoop.streaming.PipeMapRed: 
Records R/W=10263370/2884839
2009-04-06 10:41:44,350 WARN org.apache.hadoop.streaming.PipeMapRed: 
java.io.IOException: Filesystem closed

   at org.apache.hadoop.dfs.DFSClient.checkOpen(DFSClient.java:166)
   at org.apache.hadoop.dfs.DFSClient.access$500(DFSClient.java:58)
   at 
org.apache.hadoop.dfs.DFSClient$DFSOutputStream.writeChunk(DFSClient.java:2104)
   at 
org.apache.hadoop.fs.FSOutputSummer.writeChecksumChunk(FSOutputSummer.java:141)
   at 
org.apache.hadoop.fs.FSOutputSummer.flushBuffer(FSOutputSummer.java:124)

   at org.apache.hadoop.fs.FSOutputSummer.write1(FSOutputSummer.java:112)
   at org.apache.hadoop.fs.FSOutputSummer.write(FSOutputSummer.java:86)
   at 
org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:41)

   at java.io.DataOutputStream.write(DataOutputStream.java:90)
   at 
org.apache.hadoop.mapred.TextOutputFormat$LineRecordWriter.writeObject(TextOutputFormat.java:72)
   at 
org.apache.hadoop.mapred.TextOutputFormat$LineRecordWriter.write(TextOutputFormat.java:87)

   at org.apache.hadoop.mapred.ReduceTask$2.collect(ReduceTask.java:315)
   at 
org.apache.hadoop.streaming.PipeMapRed$MROutputThread.run(PipeMapRed.java:346)


2009-04-06 10:41:44,478 INFO org.apache.hadoop.streaming.PipeMapRed: 
MRErrorThread done
2009-04-06 10:41:44,478 INFO org.apache.hadoop.streaming.PipeMapRed: 
PipeMapRed.waitOutputThreads(): subprocess failed with code 141 in 
org.apache.hadoop.streaming.PipeMapRed
2009-04-06 10:41:44,480 INFO org.apache.hadoop.streaming.PipeMapRed: 
mapRedFinished
2009-04-06 10:41:44,480 INFO org.apache.hadoop.streaming.PipeMapRed: 
PipeMapRed.waitOutputThreads(): subprocess failed with code 141 in 
org.apache.hadoop.streaming.PipeMapRed
2009-04-06 10:41:44,481 WARN org.apache.hadoop.mapred.TaskTracker: Error 
running child

java.io.IOException: Filesystem closed
   at org.apache.hadoop.dfs.DFSClient.checkOpen(DFSClient.java:166)
   at org.apache.hadoop.dfs.DFSClient.access$500(DFSClient.java:58)
   at 
org.apache.hadoop.dfs.DFSClient$DFSOutputStream.flush(DFSClient.java:2176)

   at java.io.FilterOutputStream.flush(FilterOutputStream.java:123)
   at java.io.DataOutputStream.flush(DataOutputStream.java:106)
   at 
org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:66)
   at 
org.apache.hadoop.mapred.TextOutputFormat$LineRecordWriter.close(TextOutputFormat.java:99)

   at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:340)
   at 
org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2084)


Re: Using HDFS to serve www requests

2009-04-06 Thread Brian Bockelman
Indeed, it would be a very nice interface to have (if anyone has some  
free time)!


I know a few Caltech people who'd like to see how how their WAN  
transfer product (http://monalisa.cern.ch/FDT/) would work with HDFS;  
if there was a HDFS NIO interface, playing around with HDFS and FDT  
would be fairly trivial.


Brian

On Apr 3, 2009, at 5:16 AM, Steve Loughran wrote:


Snehal Nagmote wrote:
can you please explain exactly adding NIO bridge means what and how  
it can be

done , what could be advantages in this case ?


NIO: java non-blocking IO. It's a standard API to talk to different  
filesystems; support has been discussed in jira. If the DFS APIs  
were accessible under an NIO front end, then applications written  
for the NIO APIs would work with the supported filesystems, with no  
need to code specifically for hadoop's not-yet-stable APIs



Steve Loughran wrote:

Edward Capriolo wrote:

It is a little more natural to connect to HDFS from apache tomcat.
This will allow you to skip the FUSE mounts and just use the HDFS- 
API.


I have modified this code to run inside tomcat.
http://wiki.apache.org/hadoop/HadoopDfsReadWriteExample

I will not testify to how well this setup will perform under  
internet

traffic, but it does work.

If someone adds an NIO bridge to hadoop filesystems then it would  
be easier; leaving you only with the performance issues.







Re: Amazon Elastic MapReduce

2009-04-06 Thread Patrick A.

Are intermediate results stored in S3 as well?

Also, any plans to support HTable?



Chris K Wensel-2 wrote:
 
 
 FYI
 
 Amazons new Hadoop offering:
 http://aws.amazon.com/elasticmapreduce/
 
 And Cascading 1.0 supports it:
 http://www.cascading.org/2009/04/amazon-elastic-mapreduce.html
 
 cheers,
 ckw
 
 --
 Chris K Wensel
 ch...@wensel.net
 http://www.cascading.org/
 http://www.scaleunlimited.com/
 
 
 

-- 
View this message in context: 
http://www.nabble.com/Amazon-Elastic-MapReduce-tp22842658p22911128.html
Sent from the Hadoop core-user mailing list archive at Nabble.com.



Re: Amazon Elastic MapReduce

2009-04-06 Thread Peter Skomoroch
Intermediate results can be stored in hdfs on the EC2 machines, or in S3
using s3n... performance is better if you store on hdfs:

 -input,
s3n://elasticmapreduce/samples/similarity/lastfm/input/,
 -output,hdfs:///home/hadoop/output2/,



On Mon, Apr 6, 2009 at 11:27 AM, Patrick A. patrickange...@gmail.comwrote:


 Are intermediate results stored in S3 as well?

 Also, any plans to support HTable?



 Chris K Wensel-2 wrote:
 
 
  FYI
 
  Amazons new Hadoop offering:
  http://aws.amazon.com/elasticmapreduce/
 
  And Cascading 1.0 supports it:
  http://www.cascading.org/2009/04/amazon-elastic-mapreduce.html
 
  cheers,
  ckw
 
  --
  Chris K Wensel
  ch...@wensel.net
  http://www.cascading.org/
  http://www.scaleunlimited.com/
 
 
 

 --
 View this message in context:
 http://www.nabble.com/Amazon-Elastic-MapReduce-tp22842658p22911128.html
 Sent from the Hadoop core-user mailing list archive at Nabble.com.




-- 
Peter N. Skomoroch
617.285.8348
http://www.datawrangling.com
http://delicious.com/pskomoroch
http://twitter.com/peteskomoroch


Reduce task attempt retry strategy

2009-04-06 Thread Stefan Will
Hi,

I had a flaky machine the other day that was still accepting jobs and
sending heartbeats, but caused all reduce task attempts to fail. This in
turn caused the whole job to fail because the same reduce task was retried 3
times on that particular machine.

Perhaps I¹m confusing this with the block placement strategy in hdfs, but I
always thought that the framework would retry jobs on a different machine if
retries on the original machine keep failing. E.g. I would have expected to
retry once or twice on the same machine, but then switch to a different one
to minimize the likelihood of getting stuck on a bad machine.

What is the expected behavior in 0.19.1 (which I¹m running) ? Any plans for
improving on this in the future ?

Thanks,
Stefan


problem running on a cluster of mixed hardware due to Incompatible buildVersion of JobTracker adn TaskTracker

2009-04-06 Thread Bill Au
I am using a cluster of mixed hardware, 32-bit and 64-bit machines, to run
Hadoop 0.18.3.  I can't use the distribution tar ball since I need to apply
a couple of patches.  So I build my own Hadoop binaries after applying the
patches that I need.  I build two copies, one for 32-bit machines and one
for 64-bit machines.  I am having problem starting the TaskTracker that are
not the same hardware type as the JobTracker.  I get the Incompatible
buildVesion error because the compile time is part of the buildVersion:

JobTracker's: 0.18.3 from  by httpd on Mon Apr  6 07:35:15 PDT 2009
TaskTracker's: 0.18.3 from  by httpd on Mon Apr  6 07:34:56 PDT 2009

Any advice on how I can get arournd this problem?  Is there a way to build a
single version of Hadoop that will run on both 32-bit and 64-bit machines.
I notice that there are some native libraries under
$HADOOP_HOME/lib/native/Linux-amd64-64 and
$HADOOP_HOME/lib/native/Linux-i386-32.  Do I need to compile my own version
of those libraries since I am applying pathes to the distribution?

I hope I don't have to hack the code to take the compile time out of
buildVersion.

Thanks in advance for your help.

Bill


Job tracker not responding during streaming job

2009-04-06 Thread David Kellogg
I am running Hadoop streaming. After around 42 jobs on an 18-node  
cluster, the jobtracker stops responding. This happens on normally- 
working code. Here are the symptoms.


1. A job is running, but it pauses with reduce stuck at XX%
2. hadoop job -list hangs or takes a very long time to return
3. In the Ganglia metrics on the Jobtracker node:
 a. jvm.metrics__JobTracker__gcTimeMillis rises above 20 k (20  
seconds) before failure
 b. jvm.metrics__JobTracker__memHeapUsedM rises above 600 before  
failure

 c. jvm.metrics__JobTracker__gcCount rises above 1 k before failure


The ticker looks like this.

09/04/06 03:06:28 INFO streaming.StreamJob:  map 24%  reduce 7%
09/04/06 03:13:44 INFO streaming.StreamJob:  map 25%  reduce 7%
After the 03:13:44 line, it hangs for more than 15 minutes.

In the jobtracker log, I see this.

2009-04-04 04:19:13,563 WARN org.apache.hadoop.hdfs.DFSClient: Error  
Recovery for block blk_-8143535428142072268_95993 failed  because  
recovery from primary datanode 10.1.0.156:50010 failed 4 times. Will  
retry...


After restarting both dfs and mapreduce on all nodes, the problem  
goes away, and the formally non-working job proceeds without failure.


Does anyone else see this problem?

David Kellogg


Re: problem running on a cluster of mixed hardware due to Incompatible buildVersion of JobTracker adn TaskTracker

2009-04-06 Thread Brian Bockelman

Hey Bill,

I might be giving you bad advice (I've only verified this for HDFS  
components on the 0.19.x branch, not the JT/TT or the 0.18.x branch),  
but...


In my understanding, Hadoop only compares the base SVN revision  
number, not the build strings.  Make sure that both have the SVN rev.   
Our build machines don't have svn installed, so we actually have to  
generate the correct SVN rev, then submit it to the build clusters.   
We certainly don't have our build time strings match up.


Hope this helps!

Brian

On Apr 6, 2009, at 4:46 PM, Bill Au wrote:

I am using a cluster of mixed hardware, 32-bit and 64-bit machines,  
to run
Hadoop 0.18.3.  I can't use the distribution tar ball since I need  
to apply
a couple of patches.  So I build my own Hadoop binaries after  
applying the
patches that I need.  I build two copies, one for 32-bit machines  
and one
for 64-bit machines.  I am having problem starting the TaskTracker  
that are

not the same hardware type as the JobTracker.  I get the Incompatible
buildVesion error because the compile time is part of the  
buildVersion:


JobTracker's: 0.18.3 from  by httpd on Mon Apr  6 07:35:15 PDT 2009
TaskTracker's: 0.18.3 from  by httpd on Mon Apr  6 07:34:56 PDT 2009

Any advice on how I can get arournd this problem?  Is there a way to  
build a
single version of Hadoop that will run on both 32-bit and 64-bit  
machines.

I notice that there are some native libraries under
$HADOOP_HOME/lib/native/Linux-amd64-64 and
$HADOOP_HOME/lib/native/Linux-i386-32.  Do I need to compile my own  
version

of those libraries since I am applying pathes to the distribution?

I hope I don't have to hack the code to take the compile time out of
buildVersion.

Thanks in advance for your help.

Bill




Re: problem running on a cluster of mixed hardware due to Incompatible buildVersion of JobTracker adn TaskTracker

2009-04-06 Thread Todd Lipcon
On Mon, Apr 6, 2009 at 4:01 PM, Brian Bockelman bbock...@cse.unl.eduwrote:

 Hey Bill,

 I might be giving you bad advice (I've only verified this for HDFS
 components on the 0.19.x branch, not the JT/TT or the 0.18.x branch), but...

 In my understanding, Hadoop only compares the base SVN revision number, not
 the build strings.  Make sure that both have the SVN rev.  Our build
 machines don't have svn installed, so we actually have to generate the
 correct SVN rev, then submit it to the build clusters.  We certainly don't
 have our build time strings match up.


Nope - the JT/TT definitely do verify the entirety of the build string (at
least in the case when not built from SVN). I saw the same behavior as Bill
is reporting last week on 0.18.3.

-Todd


Re: problem running on a cluster of mixed hardware due to Incompatible buildVersion of JobTracker adn TaskTracker

2009-04-06 Thread Brian Bockelman
Ah yes, there you go ... so much for extrapolating on a Monday :).   
Sorry Bill!


Brian

On Apr 6, 2009, at 6:03 PM, Todd Lipcon wrote:

On Mon, Apr 6, 2009 at 4:01 PM, Brian Bockelman  
bbock...@cse.unl.eduwrote:



Hey Bill,

I might be giving you bad advice (I've only verified this for HDFS
components on the 0.19.x branch, not the JT/TT or the 0.18.x  
branch), but...


In my understanding, Hadoop only compares the base SVN revision  
number, not

the build strings.  Make sure that both have the SVN rev.  Our build
machines don't have svn installed, so we actually have to generate  
the
correct SVN rev, then submit it to the build clusters.  We  
certainly don't

have our build time strings match up.



Nope - the JT/TT definitely do verify the entirety of the build  
string (at
least in the case when not built from SVN). I saw the same behavior  
as Bill

is reporting last week on 0.18.3.

-Todd




Re: problem running on a cluster of mixed hardware due to Incompatible buildVersion of JobTracker adn TaskTracker

2009-04-06 Thread Owen O'Malley

This was discussed over on:

https://issues.apache.org/jira/browse/HADOOP-5203

Doug uploaded a patch, but no one seems to be working on it.

-- Owen


hadoop 0.18.3 writing not flushing to hadoop server?

2009-04-06 Thread javateck javateck
I have a strange issue that when I write to hadoop, I find that the content
is not transferred to hadoop even after a long time, is there any way to
force flush the local temp files to hadoop after writing to hadoop? And when
I shutdown the VM, it's getting flushed.
thanks,


Modeling WordCount in a different way

2009-04-06 Thread Aayush Garg
Hi,

I want to make experiments with wordcount example in a different way.

Suppose we have very large data. Instead of splitting all the data one time,
we want to feed some splits in the map-reduce job at a time. I want to model
the hadoop job like this,

Suppose a batch of inputsplits arrive in the beginning to every map, and
reduce gives the word, frequency for this batch of inputsplits.
Now after this another batch of inputsplits arrive and the results from
subsequent reduce are aggregated to the previous results(if the word that
has frequency 2 in previous processing and in this processing it occurs 1
time, then the frequency of that is now maintained as 3).
In next map-reduce that comes 4 times, now its frequency maintained as
7

And this process goes on like this.
Now how would I model inputsplits like this and how these continuous
map-reduces can be made running. In what way should I keep the results of
Map-Reduces so that I could aggregate this with the output of next
Map-reduce.

Thanks,
Aayush


Re: hadoop 0.18.3 writing not flushing to hadoop server?

2009-04-06 Thread jason hadoop
The data is flushed when the file is closed, or the amount written is an
even multiple of the block size specified for the file, which by default is
64meg.

There is no other way to flush the data to HDFS at present.

There is an attempt at this in 0.19.0 but it caused data corruption issues
and was backed out for 0.19.1. Hopefully a working version will appear soon.

On Mon, Apr 6, 2009 at 5:05 PM, javateck javateck javat...@gmail.comwrote:

 I have a strange issue that when I write to hadoop, I find that the content
 is not transferred to hadoop even after a long time, is there any way to
 force flush the local temp files to hadoop after writing to hadoop? And
 when
 I shutdown the VM, it's getting flushed.
 thanks,




-- 
Alpha Chapters of my book on Hadoop are available
http://www.apress.com/book/view/9781430219422


Re: problem running on a cluster of mixed hardware due to Incompatible buildVersion of JobTracker adn TaskTracker

2009-04-06 Thread Bill Au
Owen, thanks for pointing out that Jira.
Bill

On Mon, Apr 6, 2009 at 7:20 PM, Owen O'Malley omal...@apache.org wrote:

 This was discussed over on:

 https://issues.apache.org/jira/browse/HADOOP-5203

 Doug uploaded a patch, but no one seems to be working on it.

 -- Owen



Re: Reduce task attempt retry strategy

2009-04-06 Thread Amar Kamat

Stefan Will wrote:

Hi,

I had a flaky machine the other day that was still accepting jobs and
sending heartbeats, but caused all reduce task attempts to fail. This in
turn caused the whole job to fail because the same reduce task was retried 3
times on that particular machine.
  
What is your cluster size? If a task fails on a machine then its 
re-tried on some other machine (based on number of good machines left in 
the cluster). After certain number of failures, the machine will be 
blacklisted (again based on number of machine left in the cluster). 3 
different reducers might be scheduled on that machine but that should 
not lead to job failure. Can you explain in detail what exactly 
happened. Find out where the attempts got scheduled from the 
jobtracker's log.

Amar

Perhaps I¹m confusing this with the block placement strategy in hdfs, but I
always thought that the framework would retry jobs on a different machine if
retries on the original machine keep failing. E.g. I would have expected to
retry once or twice on the same machine, but then switch to a different one
to minimize the likelihood of getting stuck on a bad machine.

What is the expected behavior in 0.19.1 (which I¹m running) ? Any plans for
improving on this in the future ?

Thanks,
Stefan

  




Re: Job tracker not responding during streaming job

2009-04-06 Thread Amar Kamat

David Kellogg wrote:
I am running Hadoop streaming. After around 42 jobs on an 18-node 
cluster, the jobtracker stops responding. This happens on 
normally-working code. Here are the symptoms.


1. A job is running, but it pauses with reduce stuck at XX%
2. hadoop job -list hangs or takes a very long time to return
3. In the Ganglia metrics on the Jobtracker node:
 a. jvm.metrics__JobTracker__gcTimeMillis rises above 20 k (20 
seconds) before failure
 b. jvm.metrics__JobTracker__memHeapUsedM rises above 600 before 
failure

 c. jvm.metrics__JobTracker__gcCount rises above 1 k before failure


The ticker looks like this.

09/04/06 03:06:28 INFO streaming.StreamJob:  map 24%  reduce 7%
09/04/06 03:13:44 INFO streaming.StreamJob:  map 25%  reduce 7%
After the 03:13:44 line, it hangs for more than 15 minutes.

In the jobtracker log, I see this.

2009-04-04 04:19:13,563 WARN org.apache.hadoop.hdfs.DFSClient: Error 
Recovery for block blk_-8143535428142072268_95993 failed  because 
recovery from primary datanode 10.1.0.156:50010 failed 4 times. Will 
retry...


After restarting both dfs and mapreduce on all nodes, the problem goes 
away, and the formally non-working job proceeds without failure.

David,
What version are you using?
There can be because of :
1) Number of tasks in jobtracker's memory might exceed its limits. What 
is the total number of tasks in the jobtracker's memory? What is the 
jobtracker's heap size? Try increasing the heap size and also try 
setting the mapred.jobtracker.completeuserjobs.maximum parameter to some 
low value.
2) Sometimes some slow/bad datanode causes the jobtracker to get stuck. 
As you have mentioned this might be the cause. Can you let us know the 
output of 'kill -3' on jobtracker process.


Does anyone else see this problem?

David Kellogg




connecting two clusters

2009-04-06 Thread Mithila Nagendra
Hey all
I'm trying to connect two separate Hadoop clusters. Is it possible to do so?
I need data to be shuttled back and forth between the two clusters. Any
suggestions?

Thank you!
Mithila Nagendra
Arizona State University


Re: connecting two clusters

2009-04-06 Thread Philip Zeyliger
DistCp is the standard way to copy data between clusters.  What it does is
run a mapreduce job to copy data between a source cluster and a destination
cluster.  See http://hadoop.apache.org/core/docs/r0.19.1/distcp.html

On Mon, Apr 6, 2009 at 9:49 PM, Mithila Nagendra mnage...@asu.edu wrote:

 Hey all
 I'm trying to connect two separate Hadoop clusters. Is it possible to do
 so?
 I need data to be shuttled back and forth between the two clusters. Any
 suggestions?

 Thank you!
 Mithila Nagendra
 Arizona State University



Re: connecting two clusters

2009-04-06 Thread Owen O'Malley


On Apr 6, 2009, at 9:49 PM, Mithila Nagendra wrote:


Hey all
I'm trying to connect two separate Hadoop clusters. Is it possible  
to do so?
I need data to be shuttled back and forth between the two clusters.  
Any

suggestions?


You should use hadoop distcp. It is a map/reduce program that copies  
data, typically from one cluster to another. If you have the hftp  
interface enabled, you can use that to copy between hdfs clusters that  
are different versions.


hadoop distcp hftp://namenode1:1234/foo/bar hdfs://foo/bar

-- Owen


Re: connecting two clusters

2009-04-06 Thread Mithila Nagendra
Thanks! I was looking at the link sent by Philip. The copy is done with the
following command:
hadoop distcp hdfs://nn1:8020/foo/bar \
hdfs://nn2:8020/bar/foo

I was wondering if nn1 and nn2 are the names of the clusters or the name of
the masters on each cluster.

I wanted map/reduce tasks running on each of the two clusters to communicate
with each other. I dont know if hadoop provides for synchronization between
two map/reduce tasks. The tasks run simultaneouly, and they need to access a
common file - something like a map/reduce task at a higher level utilizing
the data produced by the map/reduce at the lower level.

Mithila

On Tue, Apr 7, 2009 at 7:57 AM, Owen O'Malley omal...@apache.org wrote:


 On Apr 6, 2009, at 9:49 PM, Mithila Nagendra wrote:

  Hey all
 I'm trying to connect two separate Hadoop clusters. Is it possible to do
 so?
 I need data to be shuttled back and forth between the two clusters. Any
 suggestions?


 You should use hadoop distcp. It is a map/reduce program that copies data,
 typically from one cluster to another. If you have the hftp interface
 enabled, you can use that to copy between hdfs clusters that are different
 versions.

 hadoop distcp hftp://namenode1:1234/foo/bar hdfs://foo/bar

 -- Owen



Re: Reduce task attempt retry strategy

2009-04-06 Thread Billy Pearson

I seen the same thing happening on 0.19.branch.

When a task fails on the reduce end it always retries on the same node until 
it kills the job for to many failed tries on one reduce task.


I am running a cluster of 7 nodes.

Billy


Stefan Will stefan.w...@gmx.net wrote in message 
news:c5ff7f91.18c09%stefan.w...@gmx.net...

Hi,

I had a flaky machine the other day that was still accepting jobs and
sending heartbeats, but caused all reduce task attempts to fail. This in
turn caused the whole job to fail because the same reduce task was retried 3
times on that particular machine.

Perhaps I¹m confusing this with the block placement strategy in hdfs, but I
always thought that the framework would retry jobs on a different machine if
retries on the original machine keep failing. E.g. I would have expected to
retry once or twice on the same machine, but then switch to a different one
to minimize the likelihood of getting stuck on a bad machine.

What is the expected behavior in 0.19.1 (which I¹m running) ? Any plans for
improving on this in the future ?

Thanks,
Stefan