danhphan commented on issue #1279:
URL:
https://github.com/apache/iceberg-python/issues/1279#issuecomment-2466676408
Thanks @kevinjqliu , I'm reading the code base.
Can you please give me an example of expected unit-tests for the feature if
possible? For instance, if we create the follow `s3_fileio` with "s3.region":
"us-east-1" in the `session_properties`. Then we create an `input_file` on s3
bucket of `warehouse`, which is actually located in "eu-central-1" region, what
should be the expected results?
```
session_properties: Properties = {
"s3.endpoint": "http://localhost:9000",
"s3.access-key-id": "admin",
"s3.secret-access-key": "password",
"s3.region": "us-east-1",
"s3.session-token": "s3.session-token",
**UNIFIED_AWS_SESSION_PROPERTIES,
}
s3_fileio = PyArrowFileIO(properties=session_properties)
print(s3_fileio.properties['s3.region']) #--> us-east-1
filename = str(uuid.uuid4())
input_file = s3_fileio.new_input(location=f"s3://warehouse/{filename}")
print(pyarrow.fs.resolve_s3_region('warehouse')) #--> eu-central-1
output_file = s3_fileio.new_output(location=f"s3://foo/{filename}")
print(pyarrow.fs.resolve_s3_region('foo')) #--> us-east-1
```
I'm thinking may be in the `def _initialize_fs(self, scheme: str, netloc:
Optional[str] = None) -> FileSystem` in your above comments, we can assign the
value for "region" in `client_kwargs` based on the value of `netloc` (or s3
bucket), but not sure if it is the right direction.
Like: `"region": pyarrow.fs.resolve_s3_region(netloc), `
```
def _initialize_fs(self, scheme: str, netloc: Optional[str] = None) ->
FileSystem:
if scheme in {"s3", "s3a", "s3n"}:
from pyarrow.fs import S3FileSystem
client_kwargs: Dict[str, Any] = {
"endpoint_override": self.properties.get(S3_ENDPOINT),
"access_key": get_first_property_value(self.properties,
S3_ACCESS_KEY_ID, AWS_ACCESS_KEY_ID),
"secret_key": get_first_property_value(self.properties,
S3_SECRET_ACCESS_KEY, AWS_SECRET_ACCESS_KEY),
"session_token": get_first_property_value(self.properties,
S3_SESSION_TOKEN, AWS_SESSION_TOKEN),
"region": get_first_property_value(self.properties, S3_REGION,
AWS_REGION),
}
```
Thank you.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]