dmedora opened a new issue, #34980:
URL: https://github.com/apache/airflow/issues/34980

   ### Apache Airflow version
   
   2.7.2
   
   ### What happened
   
   The GCSSynchronizeBucketsOperator eventually calls the `_prepare_sync_plan` 
function in GCS' `hooks.py`. This function retrieves objects in the buckets 
using the [`list_blobs` 
method](https://github.com/apache/airflow/blob/8fdf3582c2967161dd794f7efb53691d092f0ce6/airflow/providers/google/cloud/hooks/gcs.py#L1307).
 However, at present, the Cloud Storage Objects.List API does not return the 
crc32c for CMEK-protected objects. As per the [GCP public 
docs](https://cloud.google.com/storage/docs/encryption/customer-managed-keys#restrictions),
 "The CRC32C checksum and MD5 hash of objects encrypted with customer-managed 
encryption keys are not returned when listing objects with the JSON API." 
   
   As a result, if an object is CMEK-protected, its crc32c value is always 
None, leading to incorrect synchronization ([crc32c 
comparison](https://github.com/apache/airflow/blob/8fdf3582c2967161dd794f7efb53691d092f0ce6/airflow/providers/google/cloud/hooks/gcs.py#L1330)).
   
   ### What you think should happen instead
   
   This should be handled by making an Objects.Get call to retrieve the crc32c 
for CMEK'd objects.
   
   ### How to reproduce
   
   1. Create a GCP Cloud Key Management Service (KMS) key.
   2. Create two Cloud Storage buckets with a default bucket CMEK key.
   3. Upload an object with the same name but different contents to each bucket.
   4. Run the GCSSynchronizeBucketsOperator with one bucket as source and one 
as destination, and `allow_overwrite=True`. Since a file is found with the same 
name in each, the crc32c will be compared. Since they are both None, they are 
seen as equal, and the source object does not overwrite the destination one.
   
   ### Operating System
   
   Linux / Cloud Composer
   
   ### Versions of Apache Airflow Providers
   
   _No response_
   
   ### Deployment
   
   Google Cloud Composer
   
   ### Deployment details
   
   _No response_
   
   ### Anything else
   
   _No response_
   
   ### Are you willing to submit PR?
   
   - [X] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of 
Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to