ccancellieri opened a new issue, #2122:
URL: https://github.com/apache/iceberg-python/issues/2122
Dear all,
I'm working on a GCP environment and I'm configuring pyIceberg to work over
the BigLake API Metastore catalog.
I'm pretty satisfied of the result (it almost works!) but I've a blocking
issue that prevent me to instanciate the Catalog.
The issue is located here:
https://github.com/apache/iceberg-python/blob/f71806ee816cf0fb1e7f785aec81932741a0c6ca/pyiceberg/catalog/rest/__init__.py#L181
Pydantic in fact validates the output of the Catalog requiring a mandatory
field called "defaults".
This is unfortunately NOT returned by the BigLake catalog and the result is
that we're not able to correctly instantiate the catalog.
I'm now testing the catalog using the following configuration:
`
config = {
"type": "rest",
"uri": "https://biglake.googleapis.com/iceberg/v1beta/restcatalog",
"warehouse": gcs_warehouse_path,
"py-io-impl": "pyiceberg.io.pyarrow.PyArrowFileIO", # Crucial for GCS
"rest-metrics-reporting-enabled": "false", # Disable metrics reporting
if not needed
"oauth2-server-uri": "https://oauth2.googleapis.com/token",
"token": access_token,
"header.x-goog-user-project": biglake_project_id,
# Optional: Set the logging level for pyiceberg if you need more debug
info
"pyiceberg.logging-level": "DEBUG",
}
`
For this reason instead of forking I would like to ask to apply the followin
fix if possible:
```
class ConfigResponse(IcebergBaseModel):
defaults: Optional[Properties] = Field(default={})
overrides: Properties = Field()
```
This will allow _fetch_config() to not fail while passing the
response.json() to the ConfigResponse constructor
[here](https://github.com/apache/iceberg-python/blob/f71806ee816cf0fb1e7f785aec81932741a0c6ca/pyiceberg/catalog/rest/__init__.py#L353):
```
def _fetch_config(self) -> None:
params = {}
if warehouse_location := self.properties.get(WAREHOUSE_LOCATION):
params[WAREHOUSE_LOCATION] = warehouse_location
with self._create_session() as session:
response = session.get(self.url(Endpoints.get_config,
prefixed=False), params=params)
try:
response.raise_for_status()
except HTTPError as exc:
self._handle_non_200_response(exc, {})
config_response = ConfigResponse(**response.json())
config = config_response.defaults
config.update(self.properties)
config.update(config_response.overrides)
self.properties = config
```
Doing this I'm able to have a working BigLake catalog and all the calls are
working now.
_Another issue is that list_namespaces() and list_tables() are failing in a
similar way since BigLake is not returning an empty list but we could survive
catching the exception and creating the first namespace and table, this works
and after that all the calls are working fine._
I'm not sure about the Iceberg spec but I hope we could apply the suggested
fix so we will be able to use pyIceberg with no issue also in GCP!!!
Thanks all.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]