HonahX commented on PR #1246:
URL: https://github.com/apache/iceberg-python/pull/1246#issuecomment-2439772946
@stevie9868 @kevinjqliu Thanks for the great PR and discussions! I agree
that there is some issue with the current Transaction mechanism: the
`commit_transaction` can be incorrectly called when we should just abandon
everything
The following pattern has been the most common practice of updating tables
in pyiceberg since the beginning
```python
with tbl.transaction() as txn:
txn.overwrite(...)
....
```
The "with" statement will ensure that the context manager---Transaction
object's `__exit__()`(`commit_transaction`[) will always be called (even there
is an exception)](https://docs.python.org/3/reference/compound_stmts.html#with)
as long as the transaction object is successfully initialized. However, we
should only call `commit_transaction` when there is no exception along the way.
A simpler example would be:
```python
pa_table_with_column = pa.Table.from_pydict(
{
"foo": ["a", None, "z"],
"bar": [19, None, 25],
},
schema=pa.schema([
pa.field("foo", pa.large_string(), nullable=True),
pa.field("bar", pa.int32(), nullable=True),
]),
)
tbl = catalog.create_table(identifier=identifier,
schema=pa_table_with_column.schema)
with pytest.raises(ValueError):
with tbl.transaction() as txn:
txn.append(pa_table_with_column)
raise ValueError
txn.append(pa_table_with_column)
assert len(tbl.scan().to_pandas()) == 0
```
Since I explicitly raise an error during the transaction, the whole
transaction should be abandoned. But this code block still insert 3 rows (first
append) to the table.
Please let me know if these make sense. Would love to hear your thoughts on
this!
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]