Steve Loughran created HADOOP-19516:
---------------------------------------
Summary: S3A: SDK reads content twice during PUT to S3 Express
store.
Key: HADOOP-19516
URL: https://issues.apache.org/jira/browse/HADOOP-19516
Project: Hadoop Common
Issue Type: Bug
Components: fs/s3
Affects Versions: 3.4.1, 3.4.2
Environment: client in uK talking to s3 express ucket in us-w-2
Reporter: Steve Loughran
During PUT calls, even of 0 byte objects, our UploadContentProver is reporting
a recreation of
of the input stream of an UploadContentProvider, as seen by our logging at info
of this happening
{code}
bin/hadoop fs -touchz $v3/4
2025-03-26 13:38:53,377 [main] INFO impl.UploadContentProviders
(UploadContentProviders.java:newStream(289)) - Stream recreated:
FileWithOffsetContentProvider{file=/tmp/hadoop-stevel/s3a/s3ablock-0001-659277820991634509.tmp,
offset=0} BaseContentProvider{size=0, initiated at 2025-03-26T13:38:53.355,
streamCreationCount=2, currentStream=null}
{code}
This code was added in HADOOP-19221, S3A: Unable to recover from failure of
multipart block upload attempt "Status Code: 400; Error Code: RequestTimeout";
it logs at INFO because it is considered both rare and serious enough that we
should log it, based on our hypothesis that it was triggered by a transient
failure of the S3 service front and and the inability of the SDK to recover
from it
It turns out that uploading even a zero byte file to S3 triggers the dual
creation of the stream, apparently from a dual signing.
This *does not* happen on multipart uploads.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]