[jira] [Commented] (HADOOP-14071) S3a: Failed to reset the request input stream

2017-02-16 Thread Seth Fitzsimmons (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-14071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15869635#comment-15869635
 ] 

Seth Fitzsimmons commented on HADOOP-14071:
---

For sure.

> S3a: Failed to reset the request input stream
> -
>
> Key: HADOOP-14071
> URL: https://issues.apache.org/jira/browse/HADOOP-14071
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs/s3
>Affects Versions: 3.0.0-alpha2
>Reporter: Seth Fitzsimmons
>
> When using the patch from HADOOP-14028, I fairly consistently get {{Failed to 
> reset the request input stream}} exceptions. They're more likely to occur the 
> larger the file that's being written (70GB in the extreme case, but it needs 
> to be one file).
> {code}
> 2017-02-10 04:21:43 WARN S3ABlockOutputStream:692 - Transfer failure of block 
> FileBlock{index=416, 
> destFile=/tmp/hadoop-root/s3a/s3ablock-0416-4228067786955989475.tmp, 
> state=Upload, dataSize=11591473, limit=104857600}
> 2017-02-10 04:21:43 WARN S3AInstrumentation:777 - Closing output stream 
> statistics while data is still marked as pending upload in 
> OutputStreamStatistics{blocksSubmitted=416, blocksInQueue=0, blocksActive=0, 
> blockUploadsCompleted=416, blockUploadsFailed=3, 
> bytesPendingUpload=209747761, bytesUploaded=43317747712, blocksAllocated=416, 
> blocksReleased=416, blocksActivelyAllocated=0, 
> exceptionsInMultipartFinalize=0, transferDuration=1389936 ms, 
> queueDuration=519 ms, averageQueueTime=1 ms, totalUploadDuration=1390455 ms, 
> effectiveBandwidth=3.1153649497466657E7 bytes/s}
> at org.apache.hadoop.fs.s3a.S3AUtils.extractException(S3AUtils.java:200)
> at org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:128)
> Exception in thread "main" org.apache.hadoop.fs.s3a.AWSClientIOException: 
> Multi-part upload with id 
> 'Xx.ezqT5hWrY1W92GrcodCip88i8rkJiOcom2nuUAqHtb6aQX__26FYh5uYWKlRNX5vY5ktdmQWlOovsbR8CLmxUVmwFkISXxDRHeor8iH9nPhI3OkNbWJJBLrvB3xLUuLX0zvGZWo7bUrAKB6IGxA--'
>  to 2017/planet-170206.orc on 2017/planet-170206.orc: 
> com.amazonaws.ResetException: Failed to reset the request input stream; If 
> the request involves an input stream, the maximum stream buffer size can be 
> configured via request.getRequestClientOptions().setReadLimit(int): Failed to 
> reset the request input stream; If the request involves an input stream, the 
> maximum stream buffer size can be configured via 
> request.getRequestClientOptions().setReadLimit(int)
> at 
> org.apache.hadoop.fs.s3a.S3ABlockOutputStream$MultiPartUpload.waitForAllPartUploads(S3ABlockOutputStream.java:539)
> at 
> org.apache.hadoop.fs.s3a.S3ABlockOutputStream$MultiPartUpload.access$100(S3ABlockOutputStream.java:456)
> at 
> org.apache.hadoop.fs.s3a.S3ABlockOutputStream.close(S3ABlockOutputStream.java:351)
> at 
> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:72)
> at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:101)
> at org.apache.orc.impl.PhysicalFsWriter.close(PhysicalFsWriter.java:221)
> at org.apache.orc.impl.WriterImpl.close(WriterImpl.java:2827)
> at net.mojodna.osm2orc.standalone.OsmPbf2Orc.convert(OsmPbf2Orc.java:296)
> at net.mojodna.osm2orc.Osm2Orc.main(Osm2Orc.java:47)
> Caused by: com.amazonaws.ResetException: Failed to reset the request input 
> stream; If the request involves an input stream, the maximum stream buffer 
> size can be configured via request.getRequestClientOptions().setReadLimit(int)
> at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.resetRequestInputStream(AmazonHttpClient.java:1221)
> at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1042)
> at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:948)
> at 
> org.apache.hadoop.fs.s3a.SemaphoredDelegatingExecutor$CallableWithPermitRelease.call(SemaphoredDelegatingExecutor.java:222)
> at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:635)
> at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:618)
> at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:661)
> at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:573)
> at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:445)
> at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4041)
> at 
> com.amazonaws.services.s3.AmazonS3Client.doUploadPart(AmazonS3Client.java:3041)
> at 
> com.amazonaws.services.s3.AmazonS3Client.uploadPart(AmazonS3Client.java:3026)
> at org.apache.hadoop.fs.s3a.S3AFileSystem.uploadPart(S3AFileSystem.java:1114)
> at 
> org.apache.hadoop.fs.s3a.S3ABlockOutputStream$MultiPartUpload$1.call(S3A

[jira] [Commented] (HADOOP-14071) S3a: Failed to reset the request input stream

2017-02-14 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-14071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15866256#comment-15866256
 ] 

Steve Loughran commented on HADOOP-14071:
-

can we move discussion to HADOOP-14208, and I resolve this as a dupe? Keep 
discussion in one place

> S3a: Failed to reset the request input stream
> -
>
> Key: HADOOP-14071
> URL: https://issues.apache.org/jira/browse/HADOOP-14071
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs/s3
>Affects Versions: 3.0.0-alpha2
>Reporter: Seth Fitzsimmons
>
> When using the patch from HADOOP-14028, I fairly consistently get {{Failed to 
> reset the request input stream}} exceptions. They're more likely to occur the 
> larger the file that's being written (70GB in the extreme case, but it needs 
> to be one file).
> {code}
> 2017-02-10 04:21:43 WARN S3ABlockOutputStream:692 - Transfer failure of block 
> FileBlock{index=416, 
> destFile=/tmp/hadoop-root/s3a/s3ablock-0416-4228067786955989475.tmp, 
> state=Upload, dataSize=11591473, limit=104857600}
> 2017-02-10 04:21:43 WARN S3AInstrumentation:777 - Closing output stream 
> statistics while data is still marked as pending upload in 
> OutputStreamStatistics{blocksSubmitted=416, blocksInQueue=0, blocksActive=0, 
> blockUploadsCompleted=416, blockUploadsFailed=3, 
> bytesPendingUpload=209747761, bytesUploaded=43317747712, blocksAllocated=416, 
> blocksReleased=416, blocksActivelyAllocated=0, 
> exceptionsInMultipartFinalize=0, transferDuration=1389936 ms, 
> queueDuration=519 ms, averageQueueTime=1 ms, totalUploadDuration=1390455 ms, 
> effectiveBandwidth=3.1153649497466657E7 bytes/s}
> at org.apache.hadoop.fs.s3a.S3AUtils.extractException(S3AUtils.java:200)
> at org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:128)
> Exception in thread "main" org.apache.hadoop.fs.s3a.AWSClientIOException: 
> Multi-part upload with id 
> 'Xx.ezqT5hWrY1W92GrcodCip88i8rkJiOcom2nuUAqHtb6aQX__26FYh5uYWKlRNX5vY5ktdmQWlOovsbR8CLmxUVmwFkISXxDRHeor8iH9nPhI3OkNbWJJBLrvB3xLUuLX0zvGZWo7bUrAKB6IGxA--'
>  to 2017/planet-170206.orc on 2017/planet-170206.orc: 
> com.amazonaws.ResetException: Failed to reset the request input stream; If 
> the request involves an input stream, the maximum stream buffer size can be 
> configured via request.getRequestClientOptions().setReadLimit(int): Failed to 
> reset the request input stream; If the request involves an input stream, the 
> maximum stream buffer size can be configured via 
> request.getRequestClientOptions().setReadLimit(int)
> at 
> org.apache.hadoop.fs.s3a.S3ABlockOutputStream$MultiPartUpload.waitForAllPartUploads(S3ABlockOutputStream.java:539)
> at 
> org.apache.hadoop.fs.s3a.S3ABlockOutputStream$MultiPartUpload.access$100(S3ABlockOutputStream.java:456)
> at 
> org.apache.hadoop.fs.s3a.S3ABlockOutputStream.close(S3ABlockOutputStream.java:351)
> at 
> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:72)
> at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:101)
> at org.apache.orc.impl.PhysicalFsWriter.close(PhysicalFsWriter.java:221)
> at org.apache.orc.impl.WriterImpl.close(WriterImpl.java:2827)
> at net.mojodna.osm2orc.standalone.OsmPbf2Orc.convert(OsmPbf2Orc.java:296)
> at net.mojodna.osm2orc.Osm2Orc.main(Osm2Orc.java:47)
> Caused by: com.amazonaws.ResetException: Failed to reset the request input 
> stream; If the request involves an input stream, the maximum stream buffer 
> size can be configured via request.getRequestClientOptions().setReadLimit(int)
> at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.resetRequestInputStream(AmazonHttpClient.java:1221)
> at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1042)
> at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:948)
> at 
> org.apache.hadoop.fs.s3a.SemaphoredDelegatingExecutor$CallableWithPermitRelease.call(SemaphoredDelegatingExecutor.java:222)
> at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:635)
> at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:618)
> at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:661)
> at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:573)
> at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:445)
> at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4041)
> at 
> com.amazonaws.services.s3.AmazonS3Client.doUploadPart(AmazonS3Client.java:3041)
> at 
> com.amazonaws.services.s3.AmazonS3Client.uploadPart(AmazonS3Client.java:3026)
> at org.apache.hadoop.fs.s3a.S3AFileSystem.uploadPart(S3AFileSystem.java:

[jira] [Commented] (HADOOP-14071) S3a: Failed to reset the request input stream

2017-02-14 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-14071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15865735#comment-15865735
 ] 

Steve Loughran commented on HADOOP-14071:
-

OK. I'm redoing the HADOOP-14028 patch with the File ref being passed down to 
AWS. Due to some technical issues ("laptop is toast") my dev time is somewhat 
crippled this week, so I'm not going to give a schedule for that being 
available. Hopefully in the next day or two —I just haven't got hadoop building 
locally right now

> S3a: Failed to reset the request input stream
> -
>
> Key: HADOOP-14071
> URL: https://issues.apache.org/jira/browse/HADOOP-14071
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs/s3
>Affects Versions: 3.0.0-alpha2
>Reporter: Seth Fitzsimmons
>
> When using the patch from HADOOP-14028, I fairly consistently get {{Failed to 
> reset the request input stream}} exceptions. They're more likely to occur the 
> larger the file that's being written (70GB in the extreme case, but it needs 
> to be one file).
> {code}
> 2017-02-10 04:21:43 WARN S3ABlockOutputStream:692 - Transfer failure of block 
> FileBlock{index=416, 
> destFile=/tmp/hadoop-root/s3a/s3ablock-0416-4228067786955989475.tmp, 
> state=Upload, dataSize=11591473, limit=104857600}
> 2017-02-10 04:21:43 WARN S3AInstrumentation:777 - Closing output stream 
> statistics while data is still marked as pending upload in 
> OutputStreamStatistics{blocksSubmitted=416, blocksInQueue=0, blocksActive=0, 
> blockUploadsCompleted=416, blockUploadsFailed=3, 
> bytesPendingUpload=209747761, bytesUploaded=43317747712, blocksAllocated=416, 
> blocksReleased=416, blocksActivelyAllocated=0, 
> exceptionsInMultipartFinalize=0, transferDuration=1389936 ms, 
> queueDuration=519 ms, averageQueueTime=1 ms, totalUploadDuration=1390455 ms, 
> effectiveBandwidth=3.1153649497466657E7 bytes/s}
> at org.apache.hadoop.fs.s3a.S3AUtils.extractException(S3AUtils.java:200)
> at org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:128)
> Exception in thread "main" org.apache.hadoop.fs.s3a.AWSClientIOException: 
> Multi-part upload with id 
> 'Xx.ezqT5hWrY1W92GrcodCip88i8rkJiOcom2nuUAqHtb6aQX__26FYh5uYWKlRNX5vY5ktdmQWlOovsbR8CLmxUVmwFkISXxDRHeor8iH9nPhI3OkNbWJJBLrvB3xLUuLX0zvGZWo7bUrAKB6IGxA--'
>  to 2017/planet-170206.orc on 2017/planet-170206.orc: 
> com.amazonaws.ResetException: Failed to reset the request input stream; If 
> the request involves an input stream, the maximum stream buffer size can be 
> configured via request.getRequestClientOptions().setReadLimit(int): Failed to 
> reset the request input stream; If the request involves an input stream, the 
> maximum stream buffer size can be configured via 
> request.getRequestClientOptions().setReadLimit(int)
> at 
> org.apache.hadoop.fs.s3a.S3ABlockOutputStream$MultiPartUpload.waitForAllPartUploads(S3ABlockOutputStream.java:539)
> at 
> org.apache.hadoop.fs.s3a.S3ABlockOutputStream$MultiPartUpload.access$100(S3ABlockOutputStream.java:456)
> at 
> org.apache.hadoop.fs.s3a.S3ABlockOutputStream.close(S3ABlockOutputStream.java:351)
> at 
> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:72)
> at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:101)
> at org.apache.orc.impl.PhysicalFsWriter.close(PhysicalFsWriter.java:221)
> at org.apache.orc.impl.WriterImpl.close(WriterImpl.java:2827)
> at net.mojodna.osm2orc.standalone.OsmPbf2Orc.convert(OsmPbf2Orc.java:296)
> at net.mojodna.osm2orc.Osm2Orc.main(Osm2Orc.java:47)
> Caused by: com.amazonaws.ResetException: Failed to reset the request input 
> stream; If the request involves an input stream, the maximum stream buffer 
> size can be configured via request.getRequestClientOptions().setReadLimit(int)
> at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.resetRequestInputStream(AmazonHttpClient.java:1221)
> at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1042)
> at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:948)
> at 
> org.apache.hadoop.fs.s3a.SemaphoredDelegatingExecutor$CallableWithPermitRelease.call(SemaphoredDelegatingExecutor.java:222)
> at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:635)
> at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:618)
> at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:661)
> at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:573)
> at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:445)
> at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4041)
> at 
> com.amazona

[jira] [Commented] (HADOOP-14071) S3a: Failed to reset the request input stream

2017-02-10 Thread Seth Fitzsimmons (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-14071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15861684#comment-15861684
 ] 

Seth Fitzsimmons commented on HADOOP-14071:
---

This is short-haul (EC2 in us-east-1 to S3 in us-standard).

> S3a: Failed to reset the request input stream
> -
>
> Key: HADOOP-14071
> URL: https://issues.apache.org/jira/browse/HADOOP-14071
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs/s3
>Affects Versions: 3.0.0-alpha2
>Reporter: Seth Fitzsimmons
>
> When using the patch from HADOOP-14028, I fairly consistently get {{Failed to 
> reset the request input stream}} exceptions. They're more likely to occur the 
> larger the file that's being written (70GB in the extreme case, but it needs 
> to be one file).
> {code}
> 2017-02-10 04:21:43 WARN S3ABlockOutputStream:692 - Transfer failure of block 
> FileBlock{index=416, 
> destFile=/tmp/hadoop-root/s3a/s3ablock-0416-4228067786955989475.tmp, 
> state=Upload, dataSize=11591473, limit=104857600}
> 2017-02-10 04:21:43 WARN S3AInstrumentation:777 - Closing output stream 
> statistics while data is still marked as pending upload in 
> OutputStreamStatistics{blocksSubmitted=416, blocksInQueue=0, blocksActive=0, 
> blockUploadsCompleted=416, blockUploadsFailed=3, 
> bytesPendingUpload=209747761, bytesUploaded=43317747712, blocksAllocated=416, 
> blocksReleased=416, blocksActivelyAllocated=0, 
> exceptionsInMultipartFinalize=0, transferDuration=1389936 ms, 
> queueDuration=519 ms, averageQueueTime=1 ms, totalUploadDuration=1390455 ms, 
> effectiveBandwidth=3.1153649497466657E7 bytes/s}
> at org.apache.hadoop.fs.s3a.S3AUtils.extractException(S3AUtils.java:200)
> at org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:128)
> Exception in thread "main" org.apache.hadoop.fs.s3a.AWSClientIOException: 
> Multi-part upload with id 
> 'Xx.ezqT5hWrY1W92GrcodCip88i8rkJiOcom2nuUAqHtb6aQX__26FYh5uYWKlRNX5vY5ktdmQWlOovsbR8CLmxUVmwFkISXxDRHeor8iH9nPhI3OkNbWJJBLrvB3xLUuLX0zvGZWo7bUrAKB6IGxA--'
>  to 2017/planet-170206.orc on 2017/planet-170206.orc: 
> com.amazonaws.ResetException: Failed to reset the request input stream; If 
> the request involves an input stream, the maximum stream buffer size can be 
> configured via request.getRequestClientOptions().setReadLimit(int): Failed to 
> reset the request input stream; If the request involves an input stream, the 
> maximum stream buffer size can be configured via 
> request.getRequestClientOptions().setReadLimit(int)
> at 
> org.apache.hadoop.fs.s3a.S3ABlockOutputStream$MultiPartUpload.waitForAllPartUploads(S3ABlockOutputStream.java:539)
> at 
> org.apache.hadoop.fs.s3a.S3ABlockOutputStream$MultiPartUpload.access$100(S3ABlockOutputStream.java:456)
> at 
> org.apache.hadoop.fs.s3a.S3ABlockOutputStream.close(S3ABlockOutputStream.java:351)
> at 
> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:72)
> at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:101)
> at org.apache.orc.impl.PhysicalFsWriter.close(PhysicalFsWriter.java:221)
> at org.apache.orc.impl.WriterImpl.close(WriterImpl.java:2827)
> at net.mojodna.osm2orc.standalone.OsmPbf2Orc.convert(OsmPbf2Orc.java:296)
> at net.mojodna.osm2orc.Osm2Orc.main(Osm2Orc.java:47)
> Caused by: com.amazonaws.ResetException: Failed to reset the request input 
> stream; If the request involves an input stream, the maximum stream buffer 
> size can be configured via request.getRequestClientOptions().setReadLimit(int)
> at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.resetRequestInputStream(AmazonHttpClient.java:1221)
> at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1042)
> at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:948)
> at 
> org.apache.hadoop.fs.s3a.SemaphoredDelegatingExecutor$CallableWithPermitRelease.call(SemaphoredDelegatingExecutor.java:222)
> at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:635)
> at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:618)
> at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:661)
> at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:573)
> at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:445)
> at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4041)
> at 
> com.amazonaws.services.s3.AmazonS3Client.doUploadPart(AmazonS3Client.java:3041)
> at 
> com.amazonaws.services.s3.AmazonS3Client.uploadPart(AmazonS3Client.java:3026)
> at org.apache.hadoop.fs.s3a.S3AFileSystem.uploadPart(S3AFileSystem.java:1114)
> at 
> org.apache.hadoop.fs.s

[jira] [Commented] (HADOOP-14071) S3a: Failed to reset the request input stream

2017-02-10 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-14071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15861412#comment-15861412
 ] 

Steve Loughran commented on HADOOP-14071:
-

That AWS SDK issue does look relevant.

For the specific case of data source being a file, we could have the MPU 
request using that, rather than opening it ourselves. Will need some changes in 
the code, as currently the BlockOutputStream assumes the source is always some 
input stream...it'll have to support the option of a File, and if supplied, 
prefer that as the upload option.

> S3a: Failed to reset the request input stream
> -
>
> Key: HADOOP-14071
> URL: https://issues.apache.org/jira/browse/HADOOP-14071
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs/s3
>Affects Versions: 3.0.0-alpha2
>Reporter: Seth Fitzsimmons
>
> When using the patch from HADOOP-14028, I fairly consistently get {{Failed to 
> reset the request input stream}} exceptions. They're more likely to occur the 
> larger the file that's being written (70GB in the extreme case, but it needs 
> to be one file).
> {code}
> 2017-02-10 04:21:43 WARN S3ABlockOutputStream:692 - Transfer failure of block 
> FileBlock{index=416, 
> destFile=/tmp/hadoop-root/s3a/s3ablock-0416-4228067786955989475.tmp, 
> state=Upload, dataSize=11591473, limit=104857600}
> 2017-02-10 04:21:43 WARN S3AInstrumentation:777 - Closing output stream 
> statistics while data is still marked as pending upload in 
> OutputStreamStatistics{blocksSubmitted=416, blocksInQueue=0, blocksActive=0, 
> blockUploadsCompleted=416, blockUploadsFailed=3, 
> bytesPendingUpload=209747761, bytesUploaded=43317747712, blocksAllocated=416, 
> blocksReleased=416, blocksActivelyAllocated=0, 
> exceptionsInMultipartFinalize=0, transferDuration=1389936 ms, 
> queueDuration=519 ms, averageQueueTime=1 ms, totalUploadDuration=1390455 ms, 
> effectiveBandwidth=3.1153649497466657E7 bytes/s}
> at org.apache.hadoop.fs.s3a.S3AUtils.extractException(S3AUtils.java:200)
> at org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:128)
> Exception in thread "main" org.apache.hadoop.fs.s3a.AWSClientIOException: 
> Multi-part upload with id 
> 'Xx.ezqT5hWrY1W92GrcodCip88i8rkJiOcom2nuUAqHtb6aQX__26FYh5uYWKlRNX5vY5ktdmQWlOovsbR8CLmxUVmwFkISXxDRHeor8iH9nPhI3OkNbWJJBLrvB3xLUuLX0zvGZWo7bUrAKB6IGxA--'
>  to 2017/planet-170206.orc on 2017/planet-170206.orc: 
> com.amazonaws.ResetException: Failed to reset the request input stream; If 
> the request involves an input stream, the maximum stream buffer size can be 
> configured via request.getRequestClientOptions().setReadLimit(int): Failed to 
> reset the request input stream; If the request involves an input stream, the 
> maximum stream buffer size can be configured via 
> request.getRequestClientOptions().setReadLimit(int)
> at 
> org.apache.hadoop.fs.s3a.S3ABlockOutputStream$MultiPartUpload.waitForAllPartUploads(S3ABlockOutputStream.java:539)
> at 
> org.apache.hadoop.fs.s3a.S3ABlockOutputStream$MultiPartUpload.access$100(S3ABlockOutputStream.java:456)
> at 
> org.apache.hadoop.fs.s3a.S3ABlockOutputStream.close(S3ABlockOutputStream.java:351)
> at 
> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:72)
> at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:101)
> at org.apache.orc.impl.PhysicalFsWriter.close(PhysicalFsWriter.java:221)
> at org.apache.orc.impl.WriterImpl.close(WriterImpl.java:2827)
> at net.mojodna.osm2orc.standalone.OsmPbf2Orc.convert(OsmPbf2Orc.java:296)
> at net.mojodna.osm2orc.Osm2Orc.main(Osm2Orc.java:47)
> Caused by: com.amazonaws.ResetException: Failed to reset the request input 
> stream; If the request involves an input stream, the maximum stream buffer 
> size can be configured via request.getRequestClientOptions().setReadLimit(int)
> at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.resetRequestInputStream(AmazonHttpClient.java:1221)
> at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1042)
> at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:948)
> at 
> org.apache.hadoop.fs.s3a.SemaphoredDelegatingExecutor$CallableWithPermitRelease.call(SemaphoredDelegatingExecutor.java:222)
> at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:635)
> at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:618)
> at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:661)
> at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:573)
> at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:445)
> at com.amazonaws.services.s3.AmazonS3Cl

[jira] [Commented] (HADOOP-14071) S3a: Failed to reset the request input stream

2017-02-10 Thread Thomas Demoor (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-14071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15861404#comment-15861404
 ] 

Thomas Demoor commented on HADOOP-14071:


For the ByteArrayInputStream & ByteBufferInputStream I don't think we currently 
set  {{request.getRequestClientOptions().setReadLimit}}. My understanding is 
that, based on the above, we should do this. Is that correct?


> S3a: Failed to reset the request input stream
> -
>
> Key: HADOOP-14071
> URL: https://issues.apache.org/jira/browse/HADOOP-14071
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs/s3
>Affects Versions: 3.0.0-alpha2
>Reporter: Seth Fitzsimmons
>
> When using the patch from HADOOP-14028, I fairly consistently get {{Failed to 
> reset the request input stream}} exceptions. They're more likely to occur the 
> larger the file that's being written (70GB in the extreme case, but it needs 
> to be one file).
> {code}
> 2017-02-10 04:21:43 WARN S3ABlockOutputStream:692 - Transfer failure of block 
> FileBlock{index=416, 
> destFile=/tmp/hadoop-root/s3a/s3ablock-0416-4228067786955989475.tmp, 
> state=Upload, dataSize=11591473, limit=104857600}
> 2017-02-10 04:21:43 WARN S3AInstrumentation:777 - Closing output stream 
> statistics while data is still marked as pending upload in 
> OutputStreamStatistics{blocksSubmitted=416, blocksInQueue=0, blocksActive=0, 
> blockUploadsCompleted=416, blockUploadsFailed=3, 
> bytesPendingUpload=209747761, bytesUploaded=43317747712, blocksAllocated=416, 
> blocksReleased=416, blocksActivelyAllocated=0, 
> exceptionsInMultipartFinalize=0, transferDuration=1389936 ms, 
> queueDuration=519 ms, averageQueueTime=1 ms, totalUploadDuration=1390455 ms, 
> effectiveBandwidth=3.1153649497466657E7 bytes/s}
> at org.apache.hadoop.fs.s3a.S3AUtils.extractException(S3AUtils.java:200)
> at org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:128)
> Exception in thread "main" org.apache.hadoop.fs.s3a.AWSClientIOException: 
> Multi-part upload with id 
> 'Xx.ezqT5hWrY1W92GrcodCip88i8rkJiOcom2nuUAqHtb6aQX__26FYh5uYWKlRNX5vY5ktdmQWlOovsbR8CLmxUVmwFkISXxDRHeor8iH9nPhI3OkNbWJJBLrvB3xLUuLX0zvGZWo7bUrAKB6IGxA--'
>  to 2017/planet-170206.orc on 2017/planet-170206.orc: 
> com.amazonaws.ResetException: Failed to reset the request input stream; If 
> the request involves an input stream, the maximum stream buffer size can be 
> configured via request.getRequestClientOptions().setReadLimit(int): Failed to 
> reset the request input stream; If the request involves an input stream, the 
> maximum stream buffer size can be configured via 
> request.getRequestClientOptions().setReadLimit(int)
> at 
> org.apache.hadoop.fs.s3a.S3ABlockOutputStream$MultiPartUpload.waitForAllPartUploads(S3ABlockOutputStream.java:539)
> at 
> org.apache.hadoop.fs.s3a.S3ABlockOutputStream$MultiPartUpload.access$100(S3ABlockOutputStream.java:456)
> at 
> org.apache.hadoop.fs.s3a.S3ABlockOutputStream.close(S3ABlockOutputStream.java:351)
> at 
> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:72)
> at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:101)
> at org.apache.orc.impl.PhysicalFsWriter.close(PhysicalFsWriter.java:221)
> at org.apache.orc.impl.WriterImpl.close(WriterImpl.java:2827)
> at net.mojodna.osm2orc.standalone.OsmPbf2Orc.convert(OsmPbf2Orc.java:296)
> at net.mojodna.osm2orc.Osm2Orc.main(Osm2Orc.java:47)
> Caused by: com.amazonaws.ResetException: Failed to reset the request input 
> stream; If the request involves an input stream, the maximum stream buffer 
> size can be configured via request.getRequestClientOptions().setReadLimit(int)
> at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.resetRequestInputStream(AmazonHttpClient.java:1221)
> at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1042)
> at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:948)
> at 
> org.apache.hadoop.fs.s3a.SemaphoredDelegatingExecutor$CallableWithPermitRelease.call(SemaphoredDelegatingExecutor.java:222)
> at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:635)
> at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:618)
> at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:661)
> at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:573)
> at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:445)
> at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4041)
> at 
> com.amazonaws.services.s3.AmazonS3Client.doUploadPart(AmazonS3Client.java:3041)
> at 
> com.amazonaws.services.s3.Amazon

[jira] [Commented] (HADOOP-14071) S3a: Failed to reset the request input stream

2017-02-10 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-14071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15861215#comment-15861215
 ] 

Steve Loughran commented on HADOOP-14071:
-


What's happened is that the HTTP connection failed (is this long haul?), aws 
tried to reset the pointer (using mark/reset) and the buffer couldn't go back 
that far.


We saw this before, thought I'd eliminated it by not buffering the input 
stream, but instead sending the file input stream up direct. I'll review that 
code.

It may be we need to address this differently, simply by recognising the 
specific exception and retrying to send the block. That is: we implement the 
retry logic.

> S3a: Failed to reset the request input stream
> -
>
> Key: HADOOP-14071
> URL: https://issues.apache.org/jira/browse/HADOOP-14071
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs/s3
>Affects Versions: 3.0.0-alpha2
>Reporter: Seth Fitzsimmons
>
> When using the patch from HADOOP-14028, I fairly consistently get {{Failed to 
> reset the request input stream}} exceptions. They're more likely to occur the 
> larger the file that's being written (70GB in the extreme case, but it needs 
> to be one file).
> {code}
> 2017-02-10 04:21:43 WARN S3ABlockOutputStream:692 - Transfer failure of block 
> FileBlock{index=416, 
> destFile=/tmp/hadoop-root/s3a/s3ablock-0416-4228067786955989475.tmp, 
> state=Upload, dataSize=11591473, limit=104857600}
> 2017-02-10 04:21:43 WARN S3AInstrumentation:777 - Closing output stream 
> statistics while data is still marked as pending upload in 
> OutputStreamStatistics{blocksSubmitted=416, blocksInQueue=0, blocksActive=0, 
> blockUploadsCompleted=416, blockUploadsFailed=3, 
> bytesPendingUpload=209747761, bytesUploaded=43317747712, blocksAllocated=416, 
> blocksReleased=416, blocksActivelyAllocated=0, 
> exceptionsInMultipartFinalize=0, transferDuration=1389936 ms, 
> queueDuration=519 ms, averageQueueTime=1 ms, totalUploadDuration=1390455 ms, 
> effectiveBandwidth=3.1153649497466657E7 bytes/s}
> at org.apache.hadoop.fs.s3a.S3AUtils.extractException(S3AUtils.java:200)
> at org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:128)
> Exception in thread "main" org.apache.hadoop.fs.s3a.AWSClientIOException: 
> Multi-part upload with id 
> 'Xx.ezqT5hWrY1W92GrcodCip88i8rkJiOcom2nuUAqHtb6aQX__26FYh5uYWKlRNX5vY5ktdmQWlOovsbR8CLmxUVmwFkISXxDRHeor8iH9nPhI3OkNbWJJBLrvB3xLUuLX0zvGZWo7bUrAKB6IGxA--'
>  to 2017/planet-170206.orc on 2017/planet-170206.orc: 
> com.amazonaws.ResetException: Failed to reset the request input stream; If 
> the request involves an input stream, the maximum stream buffer size can be 
> configured via request.getRequestClientOptions().setReadLimit(int): Failed to 
> reset the request input stream; If the request involves an input stream, the 
> maximum stream buffer size can be configured via 
> request.getRequestClientOptions().setReadLimit(int)
> at 
> org.apache.hadoop.fs.s3a.S3ABlockOutputStream$MultiPartUpload.waitForAllPartUploads(S3ABlockOutputStream.java:539)
> at 
> org.apache.hadoop.fs.s3a.S3ABlockOutputStream$MultiPartUpload.access$100(S3ABlockOutputStream.java:456)
> at 
> org.apache.hadoop.fs.s3a.S3ABlockOutputStream.close(S3ABlockOutputStream.java:351)
> at 
> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:72)
> at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:101)
> at org.apache.orc.impl.PhysicalFsWriter.close(PhysicalFsWriter.java:221)
> at org.apache.orc.impl.WriterImpl.close(WriterImpl.java:2827)
> at net.mojodna.osm2orc.standalone.OsmPbf2Orc.convert(OsmPbf2Orc.java:296)
> at net.mojodna.osm2orc.Osm2Orc.main(Osm2Orc.java:47)
> Caused by: com.amazonaws.ResetException: Failed to reset the request input 
> stream; If the request involves an input stream, the maximum stream buffer 
> size can be configured via request.getRequestClientOptions().setReadLimit(int)
> at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.resetRequestInputStream(AmazonHttpClient.java:1221)
> at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1042)
> at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:948)
> at 
> org.apache.hadoop.fs.s3a.SemaphoredDelegatingExecutor$CallableWithPermitRelease.call(SemaphoredDelegatingExecutor.java:222)
> at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:635)
> at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:618)
> at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:661)
> at 
> com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:573)
> at com