ColinLeeo opened a new pull request, #823: URL: https://github.com/apache/tsfile/pull/823
## Summary Brings together batch decode infrastructure, multi-value aligned read, parallel page decode, columnar tablet write, and SIMD micro-optimizations from the long-lived `final` branch into a single review-ready change. This PR is a **code snapshot, not a replay** of `final`'s commit history — the upstream history was a long sequence of WIP commits that wasn't fit for review. Squashed into a single commit on purpose. **Supersedes** #749, #754, #774. ## Changes by area ### Read path - **Batch decode infrastructure.** `Decoder` base class gains `read_batch_int32/int64/float/double` and `skip_*` APIs; PLAIN, TS2DIFF, and Gorilla decoders implement them. TS2DIFF exposes block-level peeking so time filters can skip whole blocks without decoding. Gorilla adds a raw-pointer `GorillaBitReader` that bypasses ByteStream overhead in the hot loop. - **TsBlock-level batch read.** `ChunkReader` / `AlignedChunkReader` add `*_DECODE_TV_BATCH` methods that decode time + value into a `TsBlock` in one pass and apply batch time filters before append. - **Multi-value aligned read.** `AlignedChunkReader` supports one time chunk + N value chunks decoded in a single pass, sharing decoded timestamps and the filter mask. `SingleDeviceTsBlockReader` auto-detects same-device measurements via `VectorMeasurementColumnContext`. - **Parallel page decode (opt-in).** When `ENABLE_THREADS` is set a `DecodeThreadPool` + `BlockingQueue` can decompress and predecode pages in parallel. Page-plan classification (`SKIP` / `FULL_PASS` / `BOUNDARY`) lets a scatter-free `memcpy` fast path fire when every row passes and no column has nulls. ### Write path - **Batch write into pages.** `ValuePageWriter::write_batch` / `write_string_batch` take timestamp + value + nullness arrays directly, removing the per-value append loop. - **Columnar tablet.** `Tablet` exposes `set_timestamps`, `set_column_values`, `set_column_string_repeated`, `reset` for bulk reuse, and switches `StringColumn` to an Arrow-compatible offset + buffer layout. - **Batched bit-pack on TS2DIFF flush.** `TS2DIFFEncoder::flush` packs all deltas with a single `pack_bits_msb` + `write_buf` instead of per-value `write_bits`, falling back to the scalar path for the rare `bit_width > 56` case. - **Statistics.** `Int64Statistic::update_batch` adds NEON-accelerated min/max/sum. ### Encoding / SIMD - TS2DIFF batch decode adds AVX2 helpers via SIMDe (already on develop from #755) for both i32 and i64; scalar fallback unchanged. - PLAIN byte-swap path uses ARM NEON (`vrev64q_u8` / `vrev32q_u8`) when available, with `__builtin_bswap` as fallback. - `cpp/CMakeLists.txt` adds `ENABLE_SIMD` and turns on `-O3 -march=native -flto` in Release. ### Allocator / ByteStream - `ByteStream` caches `page_mask_` (= page_size − 1) so the hot path uses a bitmask instead of modulo; `wrap_from` rounds buffer sizes up to a power of two so the mask remains correct. `total_size_` widened to `uint64_t` to support files > 4 GB. - `UncompressedCompressor` now copies its output instead of aliasing caller buffers, letting callers free the input safely. ### C wrapper / Arrow - Trimmed unused metadata-export surface (`TsFileStatisticBase`, `TimeseriesMetadata`, `DeviceTimeseriesMetadataEntry`, tag-filter handles) out of the public C API. Internal tag filtering is unaffected. - `arrow_c.cc` simplified: per-row offset handling for sliced variable-length arrays in place of the `InvertArrowBitmap` copy. ### Tests / benchmarks - New `tsfile_reader_table_batch_test.cc` covers the TsBlock batch read path. - `gorilla_codec_test.cc` adds `Int32BatchDecode` / `Int64BatchDecode` / `FloatBatchDecode` tests. - `examples/cpp_examples/bench_read.cpp` + `.h` and `examples/read_perf_compare/` for benchmarking. - Removed `cwrapper_metadata_test.cc` (covered the removed C metadata API) and `common/path.cc` (Path member bodies inlined into `path.h`). ## Compatibility notes - All new C++ methods are **additions** — no existing C++ API was removed. - **C wrapper headers** lose the metadata export / tag filter symbols listed above. Downstream callers (notably the Python wrapper) want a sanity check before merge. - `cpp/third_party/` is intentionally left at develop's state so the recent MSVC compatibility fixes (`WITH_STATIC_CRT OFF`, `CMP0054 NEW`, `CMAKE_POLICY_VERSION_MINIMUM=3.5`, `_MSC_VER` guards) are preserved. ## Verification - `cmake` configure + `make -j` on macOS arm64 (AppleClang, C++11) builds cleanly: `libtsfile.2.2.1.dev.dylib` and `TsFile_Test` both link, **zero errors**, only `unused-lambda-capture` warnings in pre-existing tests. ## Test plan - [ ] Run `TsFile_Test` and confirm the existing suites still pass - [ ] Run new batch-read / batch-decode tests - [ ] Verify Python binding still loads and queries this `libtsfile` - [ ] Run the included `bench_read` against develop baseline; spot-check the throughput claims from #754 - [ ] Cross-platform sanity (Linux + MSVC) once macOS review feedback is incorporated 🤖 Generated with [Claude Code](https://claude.com/claude-code) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
