rdblue commented on code in PR #5488:
URL: https://github.com/apache/iceberg/pull/5488#discussion_r947198542
##########
python/pyiceberg/catalog/__init__.py:
##########
@@ -53,7 +54,15 @@ class Catalog(ABC):
def __init__(self, name: str | None, **properties: str):
self.name = name
- self.properties = properties
+
+ if name is not None:
+ configuration = Config().get_catalog_config(name)
Review Comment:
I took at look at the method, it's looking really good to me. Don't worry
about `py-impl` -- I think the current approach of using the URI to decide is
better. We can iterate from there.
For `FileIO`, you may need to change the IO based on the table you're using.
A lot of people use the `HadoopFileIO` in Java, which supports many different
storage schemes. But other use cases may need to choose between S3 and GCS per
table. It's the responsibility of the catalog to pass a reasonable `FileIO`
instance in when creating the table in Java. That's commonly the catalog's
`FileIO`.
For Python, we can do things a little differently. I think it makes sense to
combine information from the table and the environment to come up with the
right `FileIO`. The table has storage and a policy for where to create new data
files (`LocationProvider` in Java). I think that boils down to a storage
scheme, like `s3`. The environment then determines what can be used to handle
that scheme. In Python, pyarrow could be used, by there's also a S3 file system
from fsspec.
In the short term, I think we should detect which file systems are available
and use one that can handle the scheme for a table, where the scheme is taken
from the table's location. Instantiating just the `PyArrowFileIO` works for now
though.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]