hongzhi-gao opened a new pull request, #745:
URL: https://github.com/apache/tsfile/pull/745
# PR: Query by Row (OFFSET/LIMIT) for Tree and Table Model
## Summary
Add `queryByRow(paths/table, offset, limit)` for both tree and table model.
Results are equivalent to “full query then skip first `offset` rows and take at
most `limit` rows,” but offset/limit are pushed down so that Chunk/Page-level
skipping avoids decoding unnecessary data where possible.
## Changes
### Tree model
- **API**: `TsFileReader::queryByRow(path_list, offset, limit)` /
`TsFileTreeReader::queryByRow(devices, measurements, offset, limit)`.
- **Pushdown**: Single-path: `set_row_range(offset, limit)` on SSI →
Chunk/Page skipped by count. Multi-path: offset/limit applied in merge loop;
`min_time_hint` used to skip stale Chunks.
- **Tests**: Correctness (no offset/limit, offset only, limit only,
offset+limit, boundaries, multi-path merge) +
**QueryByRowFasterThanManualNext** (timing: queryByRow faster than full query +
manual next, 5% tolerance).
### Table model
- **API**: `TsFileReader::queryByRow(table_name, column_names, offset,
limit)`.
- **Pushdown**:
- Device: skip whole device when `remaining_offset >= device_row_count`
(Dense (in this codebase) means: within one device, every queried column has
the same number of rows and the same timestamps.).
- SSI: when dense, `set_row_range(offset, limit)` on each column’s SSI →
Chunk/Page skip by count.
- TsBlock: when sparse or not fully consumed at SSI, offset/limit applied
in merge loop.
- **Tests**: Correctness (single/multi device, offset/limit, boundaries,
equivalence with manual skip, SSI-level pushdown) +
**QueryByRowFasterThanManualNext** (same timing check as tree).
## Review focus
- **Semantics**: queryByRow(offset, limit) matches “full query + skip offset
+ take limit” (existing equivalence tests).
- **Performance**: New timing tests require queryByRow to be no slower than
manual next within 5% (min of 5 runs); confirms pushdown is used in practice.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]