Hi all, I’d like to introduce a new feature in TsFile Python API: PR: https://github.com/apache/tsfile/pull/816
Overview This PR extends TsFileDataFrame to support tree-model TsFile files in addition to the existing table-model, while keeping the current dataset API unchanged. In the same change set, it also refactors the internal dataset index to reduce memory overhead, especially for sparse and wide schemas. Key changes 1. Tree-model support Automatically detects table vs tree model when opening a file Adapts tree structure into a virtual table view (device path → columns) Ensures only actually written (device, field) pairs are registered (no phantom series) Tree reads are executed via query_table_on_tree with client-side adaptation Prevents mixing table-model and tree-model inputs in one dataset 2. Dataset index optimization Replace per-series dict with compact NamedTuple representation Remove _DerivedCache and compute derived views lazily Aggregate device time bounds for O(1) query access Remove unused placeholder series entries in table model General cleanup of internal catalog structures 3. New public API TsFileDataFrame.model → indicates "table" or "tree" list_timeseries_metadata() → unified metadata view for both models Compatibility No change to existing dataset APIs (__getitem__, .loc, __len__, etc.) No change to on-disk format or C++/Java layers Existing table-model workflows remain fully compatible Tree-model support is additive only Memory impact Up to ~33% reduction in dense workloads Up to ~38% reduction in sparse/wide schemas due to removal of phantom series tracking Testing All existing tests pass (41/41) New tests added for tree-model functionality, alignment, metadata, and mixed-model safety Feedback Feedback is welcome, especially on: Tree-model → table abstraction design Query routing strategy in tree mode Any edge cases in mixed schema handling Thanks! Best regards, Le Yang Lyyy [email protected]
