Inconsistent results when reading from s3n

David Rosenstrauch Tue, 07 May 2013 14:07:05 -0700

Ran into a puzzling - and worrisome - issue late last night.

I was running a Hadoop streaming job, which reads input from 2 differentbuckets in Amazon S3 (using s3n://). When the job completed, I realizedthat the number of "map input records" was incorrect. (Several thousandless than it should have been.) So I re-ran the job, and again got anincorrect (and different!) map input record count. I wound upeventually running the job 4 different times (on 2 different Hadoopclusters at EC2) and got 4 different input record counts~

I eventually tried distcp'ing the files from off of S3 down to the localHDFS, and re-ran the job off of HDFS, and then it worked fine. But thefact that there were obviously silent I/O failures which I can't explaintroubles me.

This issue appears to be intermittent, as I just re-ran same the jobtoday twice in a row, and got the correct answer both times.

There's definitely nothing on my end that could explain this. I eachtime ran the exact same code against the exact same data. (Data whichhasn't changed in several weeks.)

It almost appears that under certain conditions, reading from S3 usingS3n (i.e., NativeS3FileSystem) can sometimes result in a premature EOF.I googled around, though, and didn't see anything that could explain this.



Anyone have any ideas what might be going on here and/or how to work around?

I wouldn't care so much if a Hadoop task (or even an entire job) faileddue to premature EOF's when reading from S3. But having silent failureslike this that result in incorrect output - is an unacceptable situation.


Thanks,

DR

Inconsistent results when reading from s3n

Reply via email to