Problems running TestDFSIO to a non-default directory

2008-11-25 Thread Joel Welling
, Konstantin Shvachko wrote: Sure. bin/hadoop -Dtest.build.data=/bessemer/welling/hadoop_test/benchmarks/TestDFSIO/ org.apache.hadoop.fs.TestDFSIO -write -nrFiles 2*N -fileSize 360 --Konst Joel Welling wrote: With my setup, I need to change the file directory from /benchmarks/TestDFSIO

How to run sort900?

2008-11-19 Thread Joel Welling
Hi folks; Is there a standard procedure for running the sort900 test? In particular, the timings show time for a verifier, but I can't find where that's implemented. Thanks, -Joel [EMAIL PROTECTED]

Re: reading input for a map function from 2 different files?

2008-11-12 Thread Joel Welling
Amar, isn't there a problem with your method in that it gets a small result by subtracting very large numbers? Given a million inputs, won't A and B be so much larger than the standard deviation that there aren't enough no bits left in the floating point number to represent it? I just thought I

Is there a unique ID associated with each task?

2008-10-30 Thread Joel Welling
Hi folks; I'm writing a Hadoop Pipes application, and I need to generate a bunch of integers that are unique across all map tasks. If each map task has a unique integer ID, I can make sure my integers are unique by including that integer ID. I have this theory that each map task has a unique

Any examples using Hadoop Pipes with binary SequenceFiles?

2008-10-29 Thread Joel Welling
Hi folks; I'm interested in reading binary data, running it through some C++ code, and writing the result as binary data. It looks like SequenceFiles and Pipes are the way to do it, but I can't find any examples or docs beyond the API specification. Can someone point me to an example where

Re: Problems increasing number of tasks per node- really a task management problem!

2008-09-23 Thread Joel Welling
a little surprised that they are killed via their pids rather than by sending them a kill signal via the same mechanism whereby they learn of new work. -Joel [EMAIL PROTECTED] On Tue, 2008-09-23 at 14:29 -0700, Arun C Murthy wrote: On Sep 23, 2008, at 2:21 PM, Joel Welling wrote: Stopping

gridmix on a small cluster?

2008-09-17 Thread Joel Welling
Hi folks; I'd like to try the gridmix benchmark on my small cluster (3 nodes at 8 cores each, Lustre with IB interconnect). The documentation for gridmix suggests that it will take 4 hours on a 500 node cluster, which suggests it would take me something like a week to run. Is there a way to

Ordering of records in output files?

2008-09-10 Thread Joel Welling
Hi folks; I have a simple Streaming job where the mapper produces output records beginning with a 16 character ascii string and passes them to IdentityReducer. When I run it, I get the same number of output files as I have mapred.reduce.tasks . Each one contains some of the strings, and within

Re: Hadoop over Lustre?

2008-08-29 Thread Joel Welling
Joel Welling wrote: So far no success, Konstantin- the hadoop job seems to start up, but fails immediately leaving no logs. What is the appropriate setting for mapred.job.tracker ? The generic value references hdfs, but it also has a port number- I'm not sure what that means. My

Re: Hadoop over Lustre?

2008-08-29 Thread Joel Welling
That seems to have done the trick! I am now running Hadoop 0.18 straight out of Lustre, without an intervening HDFS. The unusual things about my hadoop-site.xml are: property namefs.default.name/name valuefile:///bessemer/welling/value /property property namemapred.system.dir/name

Re: Hadoop over Lustre?

2008-08-23 Thread Joel Welling
they be shared? -Joel On Fri, 2008-08-22 at 15:48 +0100, Steve Loughran wrote: Joel Welling wrote: Thanks, Steve and Arun. I'll definitely try to write something based on the KFS interface. I think that for our applications putting the mapper on the right rack is not going to be that useful

Re: Hadoop over Lustre?

2008-08-22 Thread Joel Welling
-08-22 at 15:48 +0100, Steve Loughran wrote: Joel Welling wrote: Thanks, Steve and Arun. I'll definitely try to write something based on the KFS interface. I think that for our applications putting the mapper on the right rack is not going to be that useful. A lot of our calculations

Hadoop over Lustre?

2008-08-21 Thread Joel Welling
the distributed filesystem? I've seen discussion threads about Hadoop with NFS which said something like 'just specify a local filesystem and everything will be fine', but I don't know how to do that. I'm using Hadoop 0.17.2. Thanks, I hope; -Joel Welling [EMAIL PROTECTED]

Problem with installation of 0.17.0: things start but tests fail

2008-08-13 Thread Joel Welling
for the different datanodes to keep their files in separate directories? Thanks, I hope, -Joel Welling [EMAIL PROTECTED]