[ https://issues.apache.org/jira/browse/HADOOP-12970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15215889#comment-15215889 ]
Steve Loughran commented on HADOOP-12970: ----------------------------------------- Harsha: can you look at HADOOP-11687 to see if this will address your problems. It copies all the known headers, skips things like x-emc headers. If it also skips the connection closed header, then it will address your needs I now believe this header logic is testable; make the relevant metadata copying package-private static and you can write some tests tests against it. Closing this as contained-within HADOOP-11687; please add tests there. Thanks > Intermittent signature match failures in S3AFileSystem due connection closure > ----------------------------------------------------------------------------- > > Key: HADOOP-12970 > URL: https://issues.apache.org/jira/browse/HADOOP-12970 > Project: Hadoop Common > Issue Type: Bug > Components: fs/s3 > Affects Versions: 2.7.0 > Reporter: Harsh J > Assignee: Harsh J > Attachments: HADOOP-12970.patch, HADOOP-12970.patch > > > S3AFileSystem's use of {{ObjectMetadata#clone()}} method inside the > {{copyFile}} implementation may fail in circumstances where the connection > used for obtaining the metadata is closed by the server (i.e. response > carries a {{Connection: close}} header). Due to this header not being > stripped away when the {{ObjectMetadata}} is created, and due to us cloning > it for use in the next {{CopyObjectRequest}}, it causes the request to use > {{Connection: close}} headers as a part of itself. > This causes signer related exceptions because the client now includes the > {{Connection}} header as part of the {{SignedHeaders}}, but the S3 server > does not receive the same value for it ({{Connection}} headers are likely > stripped away before the S3 Server tries to match signature hashes), causing > a failure like below: > {code} > 2016-03-29 19:59:30,120 DEBUG [s3a-transfer-shared--pool1-t35] > org.apache.http.wire: >> "Authorization: AWS4-HMAC-SHA256 > Credential=XXX/20160329/eu-central-1/s3/aws4_request, > SignedHeaders=accept-ranges;connection;content-length;content-type;etag;host;last-modified;user-agent;x-amz-acl;x-amz-content-sha256;x-amz-copy-source;x-amz-date;x-amz-metadata-directive;x-amz-server-side-encryption;x-amz-version-id, > Signature=MNOPQRSTUVWXYZ[\r][\n]" > … > com.amazonaws.services.s3.model.AmazonS3Exception: The request signature we > calculated does not match the signature you provided. Check your key and > signing method. (Service: Amazon S3; Status Code: 403; Error Code: > SignatureDoesNotMatch; Request ID: ABC), S3 Extended Request ID: XYZ > {code} > This is intermittent because the S3 Server does not always add a > {{Connection: close}} directive in its response, but whenever we receive it > AND we clone it, the above exception would happen for the copy request. The > copy request is often used in the context of FileOutputCommitter, when a lot > of the MR attempt files on {{s3a://}} destination filesystem are to be moved > to their parent directories post-commit. > I've also submitted a fix upstream with AWS Java SDK to strip out the > {{Connection}} headers when dealing with {{ObjectMetadata}}, which is pending > acceptance and release at: https://github.com/aws/aws-sdk-java/pull/669, but > until that release is available and can be used by us, we'll need to > workaround the clone approach by manually excluding the {{Connection}} header > (not straight-forward due to the {{metadata}} object being private with no > mutable access). We can remove such a change in future when there's a release > available with the upstream fix. -- This message was sent by Atlassian JIRA (v6.3.4#6332)