Re: [PR] [Python] Support tree-model TsFiles in TsFileDataFrame [tsfile]

via GitHub Fri, 15 May 2026 20:44:19 -0700


JackieTien97 commented on code in PR #816:
URL: https://github.com/apache/tsfile/pull/816#discussion_r3251952124



##########
python/tsfile/dataset/reader.py:
##########
@@ -403,6 +526,60 @@ def read_series_by_row(
             return timestamp_parts[0], value_parts[0]
         return np.concatenate(timestamp_parts), np.concatenate(value_parts)
 
+    def _read_series_by_row_tree(
+        self, device_path: str, field_name: str, offset: int, limit: int
+    ) -> Tuple[np.ndarray, np.ndarray]:
+        """Tree-model row read: scan on-tree result, filter device, apply 
offset/limit."""
+        target_path_segments = device_path.split(".")
+        # +1 because cwrapper prepends the root as an extra col_i cell.
+        expected_path_len = max(
+            len(t.tag_columns) for t in self._catalog.table_entries
+        ) + 1
+        timestamps = []
+        values = []
+        skipped = 0
+        with self._reader.query_table_on_tree([field_name]) as result_set:
+            md = result_set.get_metadata()
+            num_cols = md.get_column_num()
+            col_names = [md.get_column_name(i + 1) for i in range(num_cols)]
+            try:
+                field_idx = col_names.index(field_name) + 1
+            except ValueError:
+                return np.array([], dtype=np.int64), np.array([], 
dtype=np.float64)
+            all_col_indices = [
+                idx + 1
+                for idx, name in enumerate(col_names)
+                if name.startswith("col_")
+            ]
+            # Only the trailing expected_path_len col_i cells are genuine; the
+            # leading duplicates are stale from prior queries on this reader.
+            col_indices = all_col_indices[-expected_path_len:]

Review Comment:
   The "stale col_ columns from prior queries" workaround 
(`all_col_indices[-expected_path_len:]`) appears in both 
`_read_series_by_row_tree` (line 556) and `_read_arrow_tree` (line 747). This 
is a fragile assumption about cwrapper internal state.
   
   If the cwrapper behavior changes (e.g., fixed to not leak duplicate columns, 
or leaks in a different pattern), this slicing logic will silently break. 
Consider at minimum an assertion:
   ```python
   assert len(col_indices) == expected_path_len, (
       f"Expected {expected_path_len} col_ columns, got {len(col_indices)} "
       f"(total {len(all_col_indices)})"
   )
   ```
   This would fail fast if the assumption is violated rather than silently 
returning wrong data.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] [Python] Support tree-model TsFiles in TsFileDataFrame [tsfile]

Reply via email to