kdn36 commented on code in PR #633:
URL: 
https://github.com/apache/arrow-rs-object-store/pull/633#discussion_r2784091778


##########
src/aws/client.rs:
##########
@@ -708,30 +734,52 @@ impl S3Client {
         }
 
         let (parts, body) = request.send().await?.into_parts();
-        let (e_tag, checksum_sha256) = if is_copy {
+        let (e_tag, checksum_sha256, checksum_crc64nvme) = if is_copy {
             let response = body
                 .bytes()
                 .await
                 .map_err(|source| Error::CreateMultipartResponseBody { source 
})?;
             let response: CopyPartResult = 
quick_xml::de::from_reader(response.reader())
                 .map_err(|source| Error::InvalidMultipartResponse { source })?;
-            (response.e_tag, response.checksum_sha256)
+            (
+                response.e_tag,
+                response.checksum_sha256,
+                response.checksum_crc64nvme,
+            )
         } else {
             let e_tag = get_etag(&parts.headers).map_err(|source| 
Error::Metadata { source })?;
             let checksum_sha256 = parts
                 .headers
                 .get(SHA256_CHECKSUM)
                 .and_then(|v| v.to_str().ok())
                 .map(|v| v.to_string());
-            (e_tag, checksum_sha256)
+            let checksum_crc64nvme = parts
+                .headers
+                .get(CRC64NVME_CHECKSUM)
+                .and_then(|v| v.to_str().ok())
+                .map(|v| v.to_string());
+            (e_tag, checksum_sha256, checksum_crc64nvme)
         };
 
-        let content_id = if self.config.checksum == Some(Checksum::SHA256) {
-            let meta = PartMetadata {
-                e_tag,
-                checksum_sha256,
-            };
-            quick_xml::se::to_string(&meta).unwrap()
+        let content_id = if let Some(checksum) = self.config.checksum {

Review Comment:
   I am no expert, but as I understand, we do not send a checksum on copy 
(write) since we do not have the bytes to calculate one. We do collect the 
response checksum(s), but only retain them if they match the configured 
checksum algorithm (we could also raise a warning or error, I think). The use 
case of specifying a checksum algorithm on copy is to change the checksum 
metadata as stored in S3.
   
   However, I failed to properly test this locally, as minio does not properly 
support checksums on multipart copy. Instead, it errors out 
(https://github.com/minio/minio/issues/17013).
   
   Testing in CI and directly on AWS works as expected.
   
   I added a test case where we modify the checksum algorithm as part of a 
copy. That works as expected. Unfortunately I could only verify this manually 
as I could not find an easy way to obtain the checksum metadata for 
programmatic verification (a simple `head()..meta` does not contain this 
metdata).
   
   Again, I am new to this, and will happily stand corrected.
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to