kevinjqliu commented on code in PR #1304:
URL: https://github.com/apache/iceberg-python/pull/1304#discussion_r1855257218
##########
pyiceberg/table/metadata.py:
##########
@@ -517,12 +517,15 @@ def new_table_metadata(
location: str,
properties: Properties = EMPTY_DICT,
table_uuid: Optional[uuid.UUID] = None,
+ assign_fresh_ids: bool = True,
) -> TableMetadata:
from pyiceberg.table import TableProperties
- fresh_schema = assign_fresh_schema_ids(schema)
- fresh_partition_spec = assign_fresh_partition_spec_ids(partition_spec,
schema, fresh_schema)
- fresh_sort_order = assign_fresh_sort_order_ids(sort_order, schema,
fresh_schema)
+ if assign_fresh_ids:
+ fresh_schema = assign_fresh_schema_ids(schema)
+ partition_spec = assign_fresh_partition_spec_ids(partition_spec,
schema, fresh_schema)
+ sort_order = assign_fresh_sort_order_ids(sort_order, schema,
fresh_schema)
+ schema = fresh_schema
Review Comment:
this is where `assign_fresh_ids` is ultimately used
##########
pyiceberg/table/metadata.py:
##########
@@ -517,12 +517,15 @@ def new_table_metadata(
location: str,
properties: Properties = EMPTY_DICT,
table_uuid: Optional[uuid.UUID] = None,
+ assign_fresh_ids: bool = True,
) -> TableMetadata:
from pyiceberg.table import TableProperties
- fresh_schema = assign_fresh_schema_ids(schema)
- fresh_partition_spec = assign_fresh_partition_spec_ids(partition_spec,
schema, fresh_schema)
- fresh_sort_order = assign_fresh_sort_order_ids(sort_order, schema,
fresh_schema)
+ if assign_fresh_ids:
+ fresh_schema = assign_fresh_schema_ids(schema)
+ partition_spec = assign_fresh_partition_spec_ids(partition_spec,
schema, fresh_schema)
+ sort_order = assign_fresh_sort_order_ids(sort_order, schema,
fresh_schema)
+ schema = fresh_schema
Review Comment:
and this function is called by `_create_staged_table` and `create_table`
##########
pyiceberg/table/metadata.py:
##########
@@ -517,12 +517,15 @@ def new_table_metadata(
location: str,
properties: Properties = EMPTY_DICT,
table_uuid: Optional[uuid.UUID] = None,
+ assign_fresh_ids: bool = True,
) -> TableMetadata:
from pyiceberg.table import TableProperties
- fresh_schema = assign_fresh_schema_ids(schema)
- fresh_partition_spec = assign_fresh_partition_spec_ids(partition_spec,
schema, fresh_schema)
- fresh_sort_order = assign_fresh_sort_order_ids(sort_order, schema,
fresh_schema)
+ if assign_fresh_ids:
+ fresh_schema = assign_fresh_schema_ids(schema)
+ partition_spec = assign_fresh_partition_spec_ids(partition_spec,
schema, fresh_schema)
+ sort_order = assign_fresh_sort_order_ids(sort_order, schema,
fresh_schema)
+ schema = fresh_schema
Review Comment:
My proposal is to not include `assign_fresh_ids` as a flag in functions
other than `new_table_metadata`.
So when `_create_staged_table` and `create_table` is given
* a `pa.Schema`, convert to `Schema` and set `assign_fresh_ids` to True in
`new_table_metadata`
* a `Schema`. Assume the user created `Schema` with the correct IDs
(possibly verify some correctness characteristics such as uniqueness). And use
the schema as is
If a user wants to reassign IDs for `Schema`, this can be done outside the
`create_table` functions and we can even provide a helper function to do so.
I feel like this way can help break apart the responsibilities of schema id
assignment from the `create_table` methods.
##########
pyiceberg/table/metadata.py:
##########
@@ -517,12 +517,15 @@ def new_table_metadata(
location: str,
properties: Properties = EMPTY_DICT,
table_uuid: Optional[uuid.UUID] = None,
+ assign_fresh_ids: bool = True,
) -> TableMetadata:
from pyiceberg.table import TableProperties
- fresh_schema = assign_fresh_schema_ids(schema)
- fresh_partition_spec = assign_fresh_partition_spec_ids(partition_spec,
schema, fresh_schema)
- fresh_sort_order = assign_fresh_sort_order_ids(sort_order, schema,
fresh_schema)
+ if assign_fresh_ids:
+ fresh_schema = assign_fresh_schema_ids(schema)
+ partition_spec = assign_fresh_partition_spec_ids(partition_spec,
schema, fresh_schema)
+ sort_order = assign_fresh_sort_order_ids(sort_order, schema,
fresh_schema)
+ schema = fresh_schema
Review Comment:
both functions take `schema: Union[Schema, "pa.Schema"],` as input.
* If `pa.Schema` is given, we want to convert and assign id (this is
currently done by [setting the `assign_fresh_ids` flag to
True](https://github.com/apache/iceberg-python/pull/1304/files#diff-276259d15e1e921df3b4783ec29c6390c828afa76bcf707f0f4923ff7f2d4194R862-R864))
* If `Schema` is given, currently the default is to assign the schema ids,
`assign_fresh_ids: bool = True`.
##########
pyiceberg/table/metadata.py:
##########
@@ -517,12 +517,15 @@ def new_table_metadata(
location: str,
properties: Properties = EMPTY_DICT,
table_uuid: Optional[uuid.UUID] = None,
+ assign_fresh_ids: bool = True,
) -> TableMetadata:
from pyiceberg.table import TableProperties
- fresh_schema = assign_fresh_schema_ids(schema)
- fresh_partition_spec = assign_fresh_partition_spec_ids(partition_spec,
schema, fresh_schema)
- fresh_sort_order = assign_fresh_sort_order_ids(sort_order, schema,
fresh_schema)
+ if assign_fresh_ids:
+ fresh_schema = assign_fresh_schema_ids(schema)
+ partition_spec = assign_fresh_partition_spec_ids(partition_spec,
schema, fresh_schema)
+ sort_order = assign_fresh_sort_order_ids(sort_order, schema,
fresh_schema)
+ schema = fresh_schema
Review Comment:
LMK if this makes sense or if im missing something!
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]