geruh commented on code in PR #3285:
URL: https://github.com/apache/iceberg-python/pull/3285#discussion_r3142752056
##########
pyiceberg/table/delete_file_index.py:
##########
@@ -103,45 +147,75 @@ def _partition_key(spec_id: int, partition: Record |
None) -> tuple[int, Record]
class DeleteFileIndex:
- """Indexes position delete files by partition and by exact data file
path."""
+ """Indexes position and equality delete files by partition and by exact
data file path."""
- def __init__(self) -> None:
+ def __init__(self, schema: Schema | None = None) -> None:
+ self._schema = schema
self._by_partition: dict[tuple[int, Record], PositionDeletes] = {}
self._by_path: dict[str, PositionDeletes] = {}
+ self._eq_by_partition: dict[tuple[int, Record], EqualityDeletes] = {}
Review Comment:
Do we need these additional variables, can we not just add together here? I
want to be careful about adding java-shaped work.
##########
tests/table/test_delete_file_index.py:
##########
@@ -16,19 +16,29 @@
# under the License.
import pytest
+from pyiceberg.conversions import to_bytes
from pyiceberg.manifest import DataFile, DataFileContent, FileFormat,
ManifestEntry, ManifestEntryStatus
+from pyiceberg.schema import Schema
from pyiceberg.table.delete_file_index import PATH_FIELD_ID, DeleteFileIndex,
PositionDeletes
from pyiceberg.typedef import Record
+from pyiceberg.types import IntegerType, NestedField
-def _create_data_file(file_path: str = "s3://bucket/data.parquet", spec_id:
int = 0) -> DataFile:
Review Comment:
can we add a test similar for a unpartitioned equality delete and position
delete at the same sequence num to ensure that equality deletes apply at seq <
N?
##########
pyiceberg/io/pyarrow.py:
##########
@@ -1693,7 +1693,12 @@ def _task_to_record_batches(
def _read_all_delete_files(io: FileIO, tasks: Iterable[FileScanTask]) ->
dict[str, list[ChunkedArray]]:
Review Comment:
is this needed?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]