Hi all, Recently I’ve been working on improving the read and write performance of TsFile. In earlier work, we introduced the Arrow C Data Interface, which significantly improved batch reading performance from Python.
For further optimization, I’d like to propose introducing SIMDe (SIMD Everywhere) as the SIMD implementation in our project. SIMDe provides a well-designed abstraction layer that bridges SIMD instruction differences across hardware platforms (e.g., x86, ARM), allowing us to write portable vectorized code with a unified interface while improving overall performance. Project link: https://github.com/simd-everywhere/simde License: MIT License (Category A, Apache-compatible) Regarding the version choice, here is some context: The latest official release of SIMDe is relatively old (around two years), while the project itself is still actively maintained. In comparison, version 0.84-rc3 is closer to the current upstream state and includes more recent improvements and optimizations. Based on this, I propose to vendor this version (as a fixed snapshot corresponding to a specific upstream tag/commit) and include it as a header-only dependency under the third_party directory. This approach allows us to use a more up-to-date implementation while ensuring that we can keep the build reproducible and fully controlled. We will preserve its LICENSE/NOTICE files and clearly document its source and version.We will also track upstream releases and evaluate upgrading to an official release when it becomes available. If there are any concerns about introducing SIMDe or using an RC version, or if there are alternative approaches worth considering, I’d be happy to discuss. Thanks! Best regards, Colin
