kevinjqliu commented on code in PR #2013:
URL: https://github.com/apache/iceberg-python/pull/2013#discussion_r2122646431
##########
pyiceberg/catalog/hive.py:
##########
@@ -211,11 +211,18 @@ def _construct_hive_storage_descriptor(
DEFAULT_PROPERTIES = {TableProperties.PARQUET_COMPRESSION:
TableProperties.PARQUET_COMPRESSION_DEFAULT}
-def _construct_parameters(metadata_location: str, previous_metadata_location:
Optional[str] = None) -> Dict[str, Any]:
+def _construct_parameters(
+ metadata_location: str, previous_metadata_location: Optional[str] = None,
metadata_properties: Optional[Properties] = None
+) -> Dict[str, Any]:
properties = {PROP_EXTERNAL: "TRUE", PROP_TABLE_TYPE: "ICEBERG",
PROP_METADATA_LOCATION: metadata_location}
if previous_metadata_location:
properties[PROP_PREVIOUS_METADATA_LOCATION] =
previous_metadata_location
+ if metadata_properties:
+ for key, value in metadata_properties.items():
+ if key not in properties:
Review Comment:
👍 this is fine, it helps with not re-setting `PROP_EXTERNAL`,
`PROP_TABLE_TYPE`, `PROP_METADATA_LOCATION`, and
`PROP_PREVIOUS_METADATA_LOCATION`
##########
tests/integration/test_reads.py:
##########
@@ -111,6 +112,23 @@ def test_table_properties(catalog: Catalog) -> None:
table.transaction().set_properties(property_name=None).commit_transaction()
assert "None type is not a supported value in properties: property_name"
in str(exc_info.value)
+ if isinstance(catalog, HiveCatalog):
Review Comment:
this is a great test! could you move this into its own test function?
with just hive catalog
```
@pytest.mark.integration
@pytest.mark.parametrize("catalog",
[pytest.lazy_fixture("session_catalog_hive")])
```
##########
tests/integration/test_reads.py:
##########
@@ -111,6 +112,23 @@ def test_table_properties(catalog: Catalog) -> None:
table.transaction().set_properties(property_name=None).commit_transaction()
Review Comment:
I was wondering why the rest of these tests pass since we're not setting the
properties in the HMS. Turns out the table properties are saved in the [table
metadata](https://iceberg.apache.org/spec/#table-metadata-fields) using its
`properties` field.
This is not what the table metadata's properties field should be used for,
```
properties A string to string map of table properties. This is used to
control settings that affect reading and writing and is not intended to be used
for arbitrary metadata. For example, commit.retry.num-retries is used to
control the number of commit retries.
```
This is a side affect of
https://github.com/apache/iceberg-python/blob/a67c5592f3243d255519581fedfcc5d93274b9c8/pyiceberg/table/__init__.py#L1131-L1134
and
https://github.com/apache/iceberg-python/blob/a67c5592f3243d255519581fedfcc5d93274b9c8/pyiceberg/catalog/hive.py#L338-L344
We should fix this behavior and read/write properties using the HMS's table
parameters. We can fix this separately from the current issue.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]