Re: [PR] [python] Data Evolution with_slice should use row id to do slice [paimon]

via GitHub Tue, 03 Feb 2026 08:22:07 -0800


XiaoHongbo-Hope commented on code in PR #7199:
URL: https://github.com/apache/paimon/pull/7199#discussion_r2759900681



##########
paimon-python/pypaimon/read/scanner/data_evolution_split_generator.py:
##########
@@ -171,76 +171,99 @@ def _build_split_from_pack_for_data_evolution(
                 splits.append(split)
         return splits
 
-    def _wrap_to_sliced_splits(self, splits: List[Split], plan_start_pos: int, 
plan_end_pos: int) -> List[Split]:
+    def _calculate_slice_row_ranges(self, partitioned_files: defaultdict) -> 
List[Range]:
         """
-        Wrap splits with SlicedSplit to add file-level slicing information.
+        Calculate Row ID ranges for slice-based filtering based on start_pos 
and end_pos.
         """
-        sliced_splits = []
-        file_end_pos = 0  # end row position of current file in all splits data
+        # Collect all Row ID ranges from files
+        list_ranges = []
+        for file_entries in partitioned_files.values():
+            for entry in file_entries:
+                first_row_id = entry.file.first_row_id
+                # Range is inclusive [from_, to], so use row_count - 1
+                list_ranges.append(Range(first_row_id, first_row_id + 
entry.file.row_count - 1))
 

Review Comment:
   > What does a blob have to do with it?
   
   It should be ok, I misunderstood



##########
paimon-python/pypaimon/read/scanner/data_evolution_split_generator.py:
##########
@@ -171,76 +171,99 @@ def _build_split_from_pack_for_data_evolution(
                 splits.append(split)
         return splits
 
-    def _wrap_to_sliced_splits(self, splits: List[Split], plan_start_pos: int, 
plan_end_pos: int) -> List[Split]:
+    def _calculate_slice_row_ranges(self, partitioned_files: defaultdict) -> 
List[Range]:
         """
-        Wrap splits with SlicedSplit to add file-level slicing information.
+        Calculate Row ID ranges for slice-based filtering based on start_pos 
and end_pos.
         """
-        sliced_splits = []
-        file_end_pos = 0  # end row position of current file in all splits data
+        # Collect all Row ID ranges from files
+        list_ranges = []
+        for file_entries in partitioned_files.values():
+            for entry in file_entries:
+                first_row_id = entry.file.first_row_id
+                # Range is inclusive [from_, to], so use row_count - 1
+                list_ranges.append(Range(first_row_id, first_row_id + 
entry.file.row_count - 1))
 

Review Comment:
   > What does a blob have to do with it?
   
   It should be ok, I misunderstood.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] [python] Data Evolution with_slice should use row id to do slice [paimon]

Reply via email to