jiakai-li commented on code in PR #1453:
URL: https://github.com/apache/iceberg-python/pull/1453#discussion_r1899180251
##########
pyiceberg/io/pyarrow.py:
##########
@@ -362,6 +362,12 @@ def _initialize_fs(self, scheme: str, netloc:
Optional[str] = None) -> FileSyste
"region": get_first_property_value(self.properties, S3_REGION,
AWS_REGION),
}
+ # Override the default s3.region if netloc(bucket) resolves to a
different region
+ try:
+ client_kwargs["region"] = resolve_s3_region(netloc)
Review Comment:
Thank you Fokko, my understanding is that the problem occurs when the
provided `region` doesn't match the data file bucket region, and that will fail
the file read for pyarrow. And by overwriting the bucket region (fall back to
provided region), we make sure the real bucket region that a data file is
stored takes precedence. (this function is cached when using `fs_by_scheme`, so
it will be called only for new bucket that's not resolved previously to save
calls to S3)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]