Ian - Thanks for the detailed analysis. It was these issues that lead
me to create a temporary file in NativeS3FileSystem in the first
place. I think we can get NativeS3FileSystem to report progress
though, see https://issues.apache.org/jira/browse/HADOOP-5814.

Ken - I can't see why you would be getting that error. Does it work
with hadoop fs, but not hadoop distcp?

Cheers,
Tom

On Sat, May 9, 2009 at 6:48 AM, Nowland, Ian <nowl...@amazon.com> wrote:
> Hi Tom,
>
> Not creating a temp file is the ideal as it saves you from having to "waste" 
> using the local hard disk by writing an output file just before uploading 
> same to Amazon S3. There are a few problems though:
>
> 1) Amazon S3 PUTs need the file length up front. You could use a chunked 
> POST, but then you have the disadvantage of having to Base64 encode all your 
> data, increasing bandwidth usage, and also you still have the next problems;
>
> 2) You would still want to have MD5 checking. In Amazon S3 both PUT and POST 
> require the MD5 to be supplied before the contents. To work around this then 
> you would have to upload the object without MD5, then check its metadata to 
> make sure the MD5 is correct, then delete it if it is not. This is all 
> possible, but would be difficult to make bulletproof, whereas in the current 
> version, if the MD5 is different the PUT fails atomically and you can easily 
> just retry.
>
> 3) Finally, you would have to be careful in reducers that output only very 
> rarely. If there is too big a gap between data being uploaded through the 
> socket, then S3 may determine the connection has timed out, closing the 
> connection and meaning your task has to rerun (perhaps just to hit the same 
> problem again).
>
> All of this means that the current solution may be best for now as far as 
> general upload. The best I think we can so is fix the fact that the task is 
> not progressed in close(). The best way I can see to do this is introducing a 
> new interface say called ExtendedClosable which defines a close(Progressable 
> p) method. Then, have the various clients of FileSystem output streams (e.g. 
> Distcp, TextOutputFormat) test if their DataOutputStream supports the 
> interface, and if so call this in preference to the default. In the case of 
> NativeS3FileSystem then, this method spins up a thread to keep the 
> Progressable updated as the upload progresses.
>
> As an additional optimization to Distcp, where the source file already exists 
> we could have some extended interface say ExtendedWriteFileSystem that has a 
> create() method that takes the MD5 and the file size, then test for this 
> interface in the Distcp mapper call the extended method. The trade off here 
> is the fact that the checksum HDFS stored is not the MD5 needed by S3, and so 
> two (perhaps distributed) reads would be needed so the tradeoff is these two 
> distributed reads vs a distributed read and a local write then local read.
>
> What do you think?
>
> Cheers,
> Ian Nowland
> Amazon.com
>
> -----Original Message-----
> From: Tom White [mailto:t...@cloudera.com]
> Sent: Friday, May 08, 2009 1:36 AM
> To: core-user@hadoop.apache.org
> Subject: Re: HDFS to S3 copy problems
>
> Perhaps we should revisit the implementation of NativeS3FileSystem so
> that it doesn't always buffer the file on the client. We could have an
> option to make it write directly to S3. Thoughts?
>
> Regarding the problem with HADOOP-3733, you can work around it by
> setting fs.s3.awsAccessKeyId and fs.s3.awsSecretAccessKey in your
> hadoop-site.xml.
>
> Cheers,
> Tom
>
> On Fri, May 8, 2009 at 1:17 AM, Andrew Hitchcock <adpow...@gmail.com> wrote:
>> Hi Ken,
>>
>> S3N doesn't work that well with large files. When uploading a file to
>> S3, S3N saves it to local disk during write() and then uploads to S3
>> during the close(). Close can take a long time for large files and it
>> doesn't report progress, so the call can time out.
>>
>> As a work around, I'd recommend either increasing the timeout or
>> uploading the files by hand. Since you only have a few large files,
>> you might want to copy the files to local disk and then use something
>> like s3cmd to upload them to S3.
>>
>> Regards,
>> Andrew
>>
>> On Thu, May 7, 2009 at 4:42 PM, Ken Krugler <kkrugler_li...@transpac.com> 
>> wrote:
>>> Hi all,
>>>
>>> I have a few large files (4 that are 1.8GB+) I'm trying to copy from HDFS to
>>> S3. My micro EC2 cluster is running Hadoop 0.19.1, and has one master/two
>>> slaves.
>>>
>>> I first tried using the hadoop fs -cp command, as in:
>>>
>>> hadoop fs -cp output/<dir>/ s3n://<bucket>/<dir>/
>>>
>>> This seemed to be working, as I could walk the network traffic spike, and
>>> temp files were being created in S3 (as seen with CyberDuck).
>>>
>>> But then it seemed to hang. Nothing happened for 30 minutes, so I killed the
>>> command.
>>>
>>> Then I tried using the hadoop distcp command, as in:
>>>
>>> hadoop distcp hdfs://<host>:50001/<path>/<dir>/ s3://<public key>:<private
>>> key>@<bucket>/<dir2>/
>>>
>>> This failed, because my secret key has a '/' in it
>>> (http://issues.apache.org/jira/browse/HADOOP-3733)
>>>
>>> Then I tried using hadoop distcp with the s3n URI syntax:
>>>
>>> hadoop distcp hdfs://<host>:50001/<path>/<dir>/ s3n://<bucket>/<dir2>/
>>>
>>> Similar to my first attempt, it seemed to work. Lots of network activity,
>>> temp files being created, and in the terminal I got:
>>>
>>> 09/05/07 18:36:11 INFO mapred.JobClient: Running job: job_200905071339_0004
>>> 09/05/07 18:36:12 INFO mapred.JobClient:  map 0% reduce 0%
>>> 09/05/07 18:36:30 INFO mapred.JobClient:  map 9% reduce 0%
>>> 09/05/07 18:36:35 INFO mapred.JobClient:  map 14% reduce 0%
>>> 09/05/07 18:36:38 INFO mapred.JobClient:  map 20% reduce 0%
>>>
>>> But again it hung. No network traffic, and eventually it dumped out:
>>>
>>> 09/05/07 18:52:34 INFO mapred.JobClient: Task Id :
>>> attempt_200905071339_0004_m_000001_0, Status : FAILED
>>> Task attempt_200905071339_0004_m_000001_0 failed to report status for 601
>>> seconds. Killing!
>>> 09/05/07 18:53:02 INFO mapred.JobClient: Task Id :
>>> attempt_200905071339_0004_m_000004_0, Status : FAILED
>>> Task attempt_200905071339_0004_m_000004_0 failed to report status for 602
>>> seconds. Killing!
>>> 09/05/07 18:53:06 INFO mapred.JobClient: Task Id :
>>> attempt_200905071339_0004_m_000002_0, Status : FAILED
>>> Task attempt_200905071339_0004_m_000002_0 failed to report status for 602
>>> seconds. Killing!
>>> 09/05/07 18:53:09 INFO mapred.JobClient: Task Id :
>>> attempt_200905071339_0004_m_000003_0, Status : FAILED
>>> Task attempt_200905071339_0004_m_000003_0 failed to report status for 601
>>> seconds. Killing!
>>>
>>> In the task GUI, I can see the same tasks failing, and being restarted. But
>>> the restarted tasks seem to be just hanging w/o doing anything.
>>>
>>> Eventually one of the tasks made a bit more progress, but then it finally
>>> died with:
>>>
>>> Copy failed: java.io.IOException: Job failed!
>>>        at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1232)
>>>        at org.apache.hadoop.tools.DistCp.copy(DistCp.java:647)
>>>        at org.apache.hadoop.tools.DistCp.run(DistCp.java:844)
>>>        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>>>        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
>>>        at org.apache.hadoop.tools.DistCp.main(DistCp.java:871)
>>>
>>> So - any thoughts on what's going wrong?
>>>
>>> Thanks,
>>>
>>> -- Ken
>>> --
>>> Ken Krugler
>>> +1 530-210-6378
>>>
>>
>

Reply via email to