GitHub user GlutenPerfBot created a discussion: October 03, 2025: Weekly Status Update in Gluten
*This weekly update is generated by LLMs. You're welcome to join our [Github](https://github.com/apache/incubator-gluten/discussions) for in-depth discussions.* ## Overall Activity Summary The past 7 days have been dominated by **release-prep fever**: 1.5.0 is being finalized, so version-bump PRs, doc polish and back-ports are everywhere. Meanwhile Velox keeps moving (daily bumps), Iceberg & Delta lake-house features are expanding, and a long tail of micro-optimizations (memcpy, vector copy, hash-table configs) show the community is squeezing every last cycle out of the engine. ## Key Ongoing Projects * **1.5.0 release** – @PHILO-HE drove #10829 (version bump) and #10827 (release-automation back-ports); @zhouyuan added Spark-4.0 unit-test suite (#10725, 56 k lines!). * **Native Delta/Iceberg write** – @zhztheplayer’s 1 787-line PoC (#10801) plus companion fixes (#10822, #10830) bring off-loaded Parquet writer for Delta 3.3 on Spark 3.5; @Zouxxyy landed dynamic-partition overwrite for Iceberg (#10823, #10760). * **Velox daily sync** – @GlutenPerfBot keeps us current with upstream (#10832, #10837, #10826, #10814). * **Omni-backend proposal** – @wjunLu opened #10188 to add an ARM-optimized OmniOperator backend (70 % TPC-DS speed-up claimed). ## Priority Items * **Performance regression** – #10811 by @beliefer flips `propagateIgnoreNullKeys` default after TPC-DS q87-90 slowdown; needs review. * **Column-to-row memcpy** – #10824 by @zhouyuan and #10825 by @zhli1142015 both attack redundant copies; quick wins, low risk. * **Build stability** – #10804 (Iceberg jar version clash on ARM) closed but similar flakes in #10756 (CH AQE test) still open; @jinchengchenghh and @lgbo-ustc are chasing. * **Input-file expressions** – #10840 tracks missing Delta/hudi support after #10831 fixed Iceberg; @JunhyungSong has PR #10831 open and needs follow-ups. ## Notable Discussions * #10188: “Add a new backend: Omni” – ARM-centric accelerator; community weighing maintenance & CI resourcing. * #10813: weekly status bot summary (38 merged PRs, 29 open) – shows sustained velocity across Velox/CH/Flink. ## Emerging Trends 1. **Lake-house parity race** – Iceberg & Delta getting simultaneous native read/write features; expect similar push for Hudi. 2. **Micro-perf everywhere** – C2R memory zeroing, vector flatten avoidance, hash-table tuning: death-by-a-thousand-cuts phase. 3. **Multi-backend convergence** – same feature (input-file expr, cudf connector, config API) implemented for both Velox and CH within days. 4. **Release automation** – new GHA workflows (#10807, #10827) and CentOS-7 branch checkout (#10839) show infra maturing. ## Good First Issues * #6814: implement `MakeYMInterval` for ClickHouse – simple date-interval UDF, great to learn CH function registry. * #4730: add `date_from_unix_date` – follows existing date-function pattern. * #6807: support `split_part` string function – self-contained, touches parser + UDF stub. * #6812: expose `SparkPartitionID` – reuses existing partition metadata, no native code needed. * #6815: add `MapZipWith` – entry-level map function; good excuse to peek into Velox/CH map kernels. All need basic C++ or Scala, come with clear signatures, and have prior examples to copy. GitHub link: https://github.com/apache/incubator-gluten/discussions/10841 ---- This is an automatically sent email for [email protected]. To unsubscribe, please send an email to: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
