XiaoHongbo-Hope commented on code in PR #7199:
URL: https://github.com/apache/paimon/pull/7199#discussion_r2759900681
##########
paimon-python/pypaimon/read/scanner/data_evolution_split_generator.py:
##########
@@ -171,76 +171,99 @@ def _build_split_from_pack_for_data_evolution(
splits.append(split)
return splits
- def _wrap_to_sliced_splits(self, splits: List[Split], plan_start_pos: int,
plan_end_pos: int) -> List[Split]:
+ def _calculate_slice_row_ranges(self, partitioned_files: defaultdict) ->
List[Range]:
"""
- Wrap splits with SlicedSplit to add file-level slicing information.
+ Calculate Row ID ranges for slice-based filtering based on start_pos
and end_pos.
"""
- sliced_splits = []
- file_end_pos = 0 # end row position of current file in all splits data
+ # Collect all Row ID ranges from files
+ list_ranges = []
+ for file_entries in partitioned_files.values():
+ for entry in file_entries:
+ first_row_id = entry.file.first_row_id
+ # Range is inclusive [from_, to], so use row_count - 1
+ list_ranges.append(Range(first_row_id, first_row_id +
entry.file.row_count - 1))
Review Comment:
> What does a blob have to do with it?
It should be ok, I misunderstood
##########
paimon-python/pypaimon/read/scanner/data_evolution_split_generator.py:
##########
@@ -171,76 +171,99 @@ def _build_split_from_pack_for_data_evolution(
splits.append(split)
return splits
- def _wrap_to_sliced_splits(self, splits: List[Split], plan_start_pos: int,
plan_end_pos: int) -> List[Split]:
+ def _calculate_slice_row_ranges(self, partitioned_files: defaultdict) ->
List[Range]:
"""
- Wrap splits with SlicedSplit to add file-level slicing information.
+ Calculate Row ID ranges for slice-based filtering based on start_pos
and end_pos.
"""
- sliced_splits = []
- file_end_pos = 0 # end row position of current file in all splits data
+ # Collect all Row ID ranges from files
+ list_ranges = []
+ for file_entries in partitioned_files.values():
+ for entry in file_entries:
+ first_row_id = entry.file.first_row_id
+ # Range is inclusive [from_, to], so use row_count - 1
+ list_ranges.append(Range(first_row_id, first_row_id +
entry.file.row_count - 1))
Review Comment:
> What does a blob have to do with it?
It should be ok, I misunderstood.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]