kevinjqliu commented on code in PR #1453:
URL: https://github.com/apache/iceberg-python/pull/1453#discussion_r1896072489
##########
pyiceberg/io/pyarrow.py:
##########
@@ -377,6 +377,12 @@ def _initialize_fs(self, scheme: str, netloc:
Optional[str] = None) -> FileSyste
if force_virtual_addressing :=
self.properties.get(S3_FORCE_VIRTUAL_ADDRESSING):
client_kwargs["force_virtual_addressing"] =
property_as_bool(self.properties, force_virtual_addressing, False)
+ # Override the default s3.region if netloc(bucket) resolves to a
different region
Review Comment:
nit: what do you think of moving this closer to where `region` is set?
easier to debug in the future
##########
pyiceberg/io/pyarrow.py:
##########
@@ -1394,7 +1399,6 @@ def __init__(
) -> None:
self._table_metadata = table_metadata
self._io = io
- self._fs = _fs_from_file_path(table_metadata.location, io) # TODO:
use different FileSystem per file
Review Comment:
:)
##########
tests/io/test_pyarrow.py:
##########
Review Comment:
I saw a way to set up multiple minio endpoints and pretend that they are in
different regions. This will require us to override s3 endpoint per "region"
i.e. port 9001 is us-east-1, port 9002 is us-east-2.
I think its too tedious and doesn't help us much in terms of testing
##########
tests/io/test_pyarrow.py:
##########
@@ -360,10 +360,11 @@ def test_pyarrow_s3_session_properties() -> None:
**UNIFIED_AWS_SESSION_PROPERTIES,
}
- with patch("pyarrow.fs.S3FileSystem") as mock_s3fs:
+ with patch("pyarrow.fs.S3FileSystem") as mock_s3fs,
patch("pyarrow.fs.resolve_s3_region") as mock_s3_region_resolver:
Review Comment:
nit: maybe if `s3.region` is set in the config, we just use it and dont
override the region. what do you think?
##########
tests/io/test_pyarrow.py:
##########
@@ -2074,3 +2076,34 @@ def
test__to_requested_schema_timestamps_without_downcast_raises_exception(
_to_requested_schema(requested_schema, file_schema, batch,
downcast_ns_timestamp_to_us=False, include_field_ids=False)
assert "Unsupported schema projection from timestamp[ns] to timestamp[us]"
in str(exc_info.value)
+
+
+def test_pyarrow_file_io_fs_by_scheme_cache() -> None:
+ pyarrow_file_io = PyArrowFileIO()
+ us_east_1_region = "us-eas1-1"
Review Comment:
```suggestion
us_east_1_region = "us-east-1"
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]