joshuarobinson commented on code in PR #5747: URL: https://github.com/apache/iceberg/pull/5747#discussion_r972358818
########## python/pyiceberg/io/pyarrow.py: ########## @@ -165,6 +169,24 @@ def to_input_file(self) -> "PyArrowFile": class PyArrowFileIO(FileIO): + def __init__(self, properties: Properties = EMPTY_DICT): + self.get_fs_and_path: Callable = lru_cache(self._get_fs_and_path) + super().__init__(properties=properties) + + def _get_fs_and_path(self, location: str) -> Tuple[FileSystem, str]: + uri = urlparse(location) # Create a ParseResult from the URI + if not uri.scheme: # If no scheme, assume the path is to a local file + return FileSystem.from_uri(os.path.abspath(location)) + elif uri.scheme in {"s3", "s3a", "s3n"}: + client_kwargs = { + "endpoint_override": self.properties.get("s3.endpoint"), Review Comment: the encryption "scheme" (http or https) is actually partially overlapping with the endpoint_override config (this is actually a function of the underlying AWS SDK for S3) - I can specify the schema as part of the endpoint, e.g., "https://localhost:9000" - My setup is SSL-only and I verified in my environment that if I switch "https://hostname.io" to "http://hostname.io", I can't connect anymore. Long-term, I suspect the "scheme" is yet another option we'll want to expose through the client_kwargs. The aim(hope) of this PR is that adding something like that in the future is a one-line change. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org