I'm able to run both Canopy and LDA on CHD3 after the first parts of build-reuters.sh (thru seq2sparse) have completed. The k-means job fails consistently in the RandomSeedGenerator. I'm investigating what may be different about its file handling compared to the other jobs.

On 10/14/10 9:36 PM, Jeff Eastman wrote:
Well, in this case k-means fails the same way even after I've verified the input file so its a hard failure. And the job runs just fine stand-alone on the box, and in both modes on my Mac, so its got to be something about the Cloudera deployment. Sure would be nice to have 0.4 run that example on CDH3.

On 10/14/10 8:53 PM, Ted Dunning wrote:
There is often a small delay before files appear in HDFS after they are
created.  This has buggered many a work-flow.

On Thu, Oct 14, 2010 at 8:40 PM, Jeff Eastman<j...@windwardsolutions.com>wrote:

  On 10/14/10 7:47 PM, Jeff Eastman wrote:

The recent commit to the POM fixed my build problem on my clean RedHat
box. Currently, build-reuters.sh is failing to run the k-means step on
Hadoop on that box and it looks like it is the same problem we've been
seeing with others running the Cloudera CDH3: hadoop is running under a different user and the local file references don't resolve correctly when the job is run under mine. I haven't yet figured out the best way to fix this or why the other build-reuters job steps don't have this problem (they
all use ./examples... file paths too).

It looks like the RandomSeedGenerator.buildRandom() is somehow seeing an empty input directory when it really has an 11.6 mb part file in it. The
EOFException occurs when executing: SequenceFile.Reader reader = new
SequenceFile.Reader(fs, fileStatus.getPath(), conf); on line 84. There are hdfs and mapred PIDs associated with the hadoop daemons, but why would that matter? The files in hdfs are all under /users/dev/examples... and my jobs
are running as dev so I don't get why this is happening.



Reply via email to