rdblue commented on code in PR #5488:
URL: https://github.com/apache/iceberg/pull/5488#discussion_r947198542


##########
python/pyiceberg/catalog/__init__.py:
##########
@@ -53,7 +54,15 @@ class Catalog(ABC):
 
     def __init__(self, name: str | None, **properties: str):
         self.name = name
-        self.properties = properties
+
+        if name is not None:
+            configuration = Config().get_catalog_config(name)

Review Comment:
   I took at look at the method, it's looking really good to me. Don't worry 
about `py-impl` -- I think the current approach of using the URI to decide is 
better. We can iterate from there.
   
   For `FileIO`, you may need to change the IO based on the table you're using. 
A lot of people use the `HadoopFileIO` in Java, which supports many different 
storage schemes. But other use cases may need to choose between S3 and GCS per 
table. It's the responsibility of the catalog to pass a reasonable `FileIO` 
instance in when creating the table in Java. That's commonly the catalog's 
`FileIO`.
   
   For Python, we can do things a little differently. I think it makes sense to 
combine information from the table and the environment to come up with the 
right `FileIO`. The table has storage and a policy for where to create new data 
files (`LocationProvider` in Java). I think that boils down to a storage 
scheme, like `s3`. The environment then determines what can be used to handle 
that scheme. In Python, pyarrow could be used, by there's also a S3 file system 
from fsspec.
   
   In the short term, I think we should detect which file systems are available 
and use one that can handle the scheme for a table, where the scheme is taken 
from the table's location. Instantiating just the `PyArrowFileIO` works for now 
though.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to