ColinLeeo opened a new pull request, #823:
URL: https://github.com/apache/tsfile/pull/823

   ## Summary
   
   Brings together batch decode infrastructure, multi-value aligned read, 
parallel page decode, columnar tablet write, and SIMD micro-optimizations from 
the long-lived `final` branch into a single review-ready change.
   
   This PR is a **code snapshot, not a replay** of `final`'s commit history — 
the upstream history was a long sequence of WIP commits that wasn't fit for 
review. Squashed into a single commit on purpose.
   
   **Supersedes** #749, #754, #774.
   
   ## Changes by area
   
   ### Read path
   
   - **Batch decode infrastructure.** `Decoder` base class gains 
`read_batch_int32/int64/float/double` and `skip_*` APIs; PLAIN, TS2DIFF, and 
Gorilla decoders implement them. TS2DIFF exposes block-level peeking so time 
filters can skip whole blocks without decoding. Gorilla adds a raw-pointer 
`GorillaBitReader` that bypasses ByteStream overhead in the hot loop.
   - **TsBlock-level batch read.** `ChunkReader` / `AlignedChunkReader` add 
`*_DECODE_TV_BATCH` methods that decode time + value into a `TsBlock` in one 
pass and apply batch time filters before append.
   - **Multi-value aligned read.** `AlignedChunkReader` supports one time chunk 
+ N value chunks decoded in a single pass, sharing decoded timestamps and the 
filter mask. `SingleDeviceTsBlockReader` auto-detects same-device measurements 
via `VectorMeasurementColumnContext`.
   - **Parallel page decode (opt-in).** When `ENABLE_THREADS` is set a 
`DecodeThreadPool` + `BlockingQueue` can decompress and predecode pages in 
parallel. Page-plan classification (`SKIP` / `FULL_PASS` / `BOUNDARY`) lets a 
scatter-free `memcpy` fast path fire when every row passes and no column has 
nulls.
   
   ### Write path
   
   - **Batch write into pages.** `ValuePageWriter::write_batch` / 
`write_string_batch` take timestamp + value + nullness arrays directly, 
removing the per-value append loop.
   - **Columnar tablet.** `Tablet` exposes `set_timestamps`, 
`set_column_values`, `set_column_string_repeated`, `reset` for bulk reuse, and 
switches `StringColumn` to an Arrow-compatible offset + buffer layout.
   - **Batched bit-pack on TS2DIFF flush.** `TS2DIFFEncoder::flush` packs all 
deltas with a single `pack_bits_msb` + `write_buf` instead of per-value 
`write_bits`, falling back to the scalar path for the rare `bit_width > 56` 
case.
   - **Statistics.** `Int64Statistic::update_batch` adds NEON-accelerated 
min/max/sum.
   
   ### Encoding / SIMD
   
   - TS2DIFF batch decode adds AVX2 helpers via SIMDe (already on develop from 
#755) for both i32 and i64; scalar fallback unchanged.
   - PLAIN byte-swap path uses ARM NEON (`vrev64q_u8` / `vrev32q_u8`) when 
available, with `__builtin_bswap` as fallback.
   - `cpp/CMakeLists.txt` adds `ENABLE_SIMD` and turns on `-O3 -march=native 
-flto` in Release.
   
   ### Allocator / ByteStream
   
   - `ByteStream` caches `page_mask_` (= page_size − 1) so the hot path uses a 
bitmask instead of modulo; `wrap_from` rounds buffer sizes up to a power of two 
so the mask remains correct. `total_size_` widened to `uint64_t` to support 
files > 4 GB.
   - `UncompressedCompressor` now copies its output instead of aliasing caller 
buffers, letting callers free the input safely.
   
   ### C wrapper / Arrow
   
   - Trimmed unused metadata-export surface (`TsFileStatisticBase`, 
`TimeseriesMetadata`, `DeviceTimeseriesMetadataEntry`, tag-filter handles) out 
of the public C API. Internal tag filtering is unaffected.
   - `arrow_c.cc` simplified: per-row offset handling for sliced 
variable-length arrays in place of the `InvertArrowBitmap` copy.
   
   ### Tests / benchmarks
   
   - New `tsfile_reader_table_batch_test.cc` covers the TsBlock batch read path.
   - `gorilla_codec_test.cc` adds `Int32BatchDecode` / `Int64BatchDecode` / 
`FloatBatchDecode` tests.
   - `examples/cpp_examples/bench_read.cpp` + `.h` and 
`examples/read_perf_compare/` for benchmarking.
   - Removed `cwrapper_metadata_test.cc` (covered the removed C metadata API) 
and `common/path.cc` (Path member bodies inlined into `path.h`).
   
   ## Compatibility notes
   
   - All new C++ methods are **additions** — no existing C++ API was removed.
   - **C wrapper headers** lose the metadata export / tag filter symbols listed 
above. Downstream callers (notably the Python wrapper) want a sanity check 
before merge.
   - `cpp/third_party/` is intentionally left at develop's state so the recent 
MSVC compatibility fixes (`WITH_STATIC_CRT OFF`, `CMP0054 NEW`, 
`CMAKE_POLICY_VERSION_MINIMUM=3.5`, `_MSC_VER` guards) are preserved.
   
   ## Verification
   
   - `cmake` configure + `make -j` on macOS arm64 (AppleClang, C++11) builds 
cleanly: `libtsfile.2.2.1.dev.dylib` and `TsFile_Test` both link, **zero 
errors**, only `unused-lambda-capture` warnings in pre-existing tests.
   
   ## Test plan
   
   - [ ] Run `TsFile_Test` and confirm the existing suites still pass
   - [ ] Run new batch-read / batch-decode tests
   - [ ] Verify Python binding still loads and queries this `libtsfile`
   - [ ] Run the included `bench_read` against develop baseline; spot-check the 
throughput claims from #754
   - [ ] Cross-platform sanity (Linux + MSVC) once macOS review feedback is 
incorporated
   
   🤖 Generated with [Claude Code](https://claude.com/claude-code)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to