JackieTien97 commented on code in PR #816:
URL: https://github.com/apache/tsfile/pull/816#discussion_r3251951756
##########
python/tsfile/dataset/dataframe.py:
##########
@@ -763,32 +776,46 @@ def _resolve_series_name(self, series_name: str) ->
SeriesRefKey:
device_idx = candidate_indices[0]
series_ref = (device_idx, field_idx)
- if series_ref not in self._index.series_ref_set:
+ if series_ref not in self._index.series_shards:
raise KeyError(_series_lookup_hint(series_name))
return series_ref
def _build_series_info(self, series_ref: SeriesRefKey) -> dict:
device_idx, field_idx = series_ref
device_key, table_entry, _ = self._get_series_components(series_ref)
- field_stats = self._cache.field_stats[series_ref]
+ # Aggregate per-shard timeline stats lazily on demand for this series.
+ field_stats = _build_field_stats(self._index.series_shards[series_ref])
Review Comment:
`_build_field_stats` is now called lazily on every `_build_series_info`
invocation instead of being precomputed once in `_DerivedCache`. This means:
- `list_timeseries_metadata()` calls `_build_series_info` for every series,
each of which calls `_build_field_stats`. For N series across K shards, this is
O(N×K) reader calls.
- `__repr__` and `__getitem__(column_name)` also go through
`_build_series_info`.
The old approach precomputed these stats once at load time. The PR
description frames this as removing `_DerivedCache`, but the trade-off is that
repeated access patterns (e.g., interactive exploration calling `repr()` then
`list_timeseries_metadata()` then column access) now recompute stats from
scratch each time.
For typical usage with moderate series counts this is fine. For wide schemas
(the PR's own benchmark cites 5k devices × 5 fields), it could add up. Worth
noting in a comment or considering a `@functools.lru_cache` on the hot path.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]