JackieTien97 commented on code in PR #816:
URL: https://github.com/apache/tsfile/pull/816#discussion_r3251951044
##########
python/tsfile/dataset/reader.py:
##########
@@ -403,6 +526,60 @@ def read_series_by_row(
return timestamp_parts[0], value_parts[0]
return np.concatenate(timestamp_parts), np.concatenate(value_parts)
+ def _read_series_by_row_tree(
+ self, device_path: str, field_name: str, offset: int, limit: int
+ ) -> Tuple[np.ndarray, np.ndarray]:
+ """Tree-model row read: scan on-tree result, filter device, apply
offset/limit."""
+ target_path_segments = device_path.split(".")
+ # +1 because cwrapper prepends the root as an extra col_i cell.
+ expected_path_len = max(
+ len(t.tag_columns) for t in self._catalog.table_entries
+ ) + 1
+ timestamps = []
+ values = []
+ skipped = 0
+ with self._reader.query_table_on_tree([field_name]) as result_set:
Review Comment:
Both `_read_series_by_row_tree` and `_read_arrow_tree` do a full
`query_table_on_tree` scan over the entire file and then client-side filter by
`target_path_segments`. For a file with many devices, this means reading every
row to extract data for a single device.
This is acknowledged in the PR description as a cwrapper limitation
workaround, which is fine for now — but worth flagging for future readers: this
is **O(total_rows)** per device, so an aligned read across N devices in the
same file becomes **O(N × total_rows)**. The comments at the call-site document
the "why" well; a brief `# O(total_rows) — see PR #816 for cwrapper
limitations` here would also help future profilers find the hot path.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]