GitHub user GlutenPerfBot created a discussion: October 03, 2025: Weekly Status 
Update in Gluten

*This weekly update is generated by LLMs. You're welcome to join our 
[Github](https://github.com/apache/incubator-gluten/discussions) for in-depth 
discussions.*

## Overall Activity Summary
The past 7 days have been dominated by **release-prep fever**: 1.5.0 is being 
finalized, so version-bump PRs, doc polish and back-ports are everywhere. 
Meanwhile Velox keeps moving (daily bumps), Iceberg & Delta lake-house features 
are expanding, and a long tail of micro-optimizations (memcpy, vector copy, 
hash-table configs) show the community is squeezing every last cycle out of the 
engine.

## Key Ongoing Projects
* **1.5.0 release** – @PHILO-HE drove #10829 (version bump) and #10827 
(release-automation back-ports); @zhouyuan added Spark-4.0 unit-test suite 
(#10725, 56 k lines!).
* **Native Delta/Iceberg write** – @zhztheplayer’s 1 787-line PoC (#10801) plus 
companion fixes (#10822, #10830) bring off-loaded Parquet writer for Delta 3.3 
on Spark 3.5; @Zouxxyy landed dynamic-partition overwrite for Iceberg (#10823, 
#10760).
* **Velox daily sync** – @GlutenPerfBot keeps us current with upstream (#10832, 
#10837, #10826, #10814).
* **Omni-backend proposal** – @wjunLu opened #10188 to add an ARM-optimized 
OmniOperator backend (70 % TPC-DS speed-up claimed).

## Priority Items
* **Performance regression** – #10811 by @beliefer flips 
`propagateIgnoreNullKeys` default after TPC-DS q87-90 slowdown; needs review.
* **Column-to-row memcpy** – #10824 by @zhouyuan and #10825 by @zhli1142015 
both attack redundant copies; quick wins, low risk.
* **Build stability** – #10804 (Iceberg jar version clash on ARM) closed but 
similar flakes in #10756 (CH AQE test) still open; @jinchengchenghh and 
@lgbo-ustc are chasing.
* **Input-file expressions** – #10840 tracks missing Delta/hudi support after 
#10831 fixed Iceberg; @JunhyungSong has PR #10831 open and needs follow-ups.

## Notable Discussions
* #10188: “Add a new backend: Omni” – ARM-centric accelerator; community 
weighing maintenance & CI resourcing.
* #10813: weekly status bot summary (38 merged PRs, 29 open) – shows sustained 
velocity across Velox/CH/Flink.

## Emerging Trends
1. **Lake-house parity race** – Iceberg & Delta getting simultaneous native 
read/write features; expect similar push for Hudi.
2. **Micro-perf everywhere** – C2R memory zeroing, vector flatten avoidance, 
hash-table tuning: death-by-a-thousand-cuts phase.
3. **Multi-backend convergence** – same feature (input-file expr, cudf 
connector, config API) implemented for both Velox and CH within days.
4. **Release automation** – new GHA workflows (#10807, #10827) and CentOS-7 
branch checkout (#10839) show infra maturing.

## Good First Issues
* #6814: implement `MakeYMInterval` for ClickHouse – simple date-interval UDF, 
great to learn CH function registry.
* #4730: add `date_from_unix_date` – follows existing date-function pattern.
* #6807: support `split_part` string function – self-contained, touches parser 
+ UDF stub.
* #6812: expose `SparkPartitionID` – reuses existing partition metadata, no 
native code needed.
* #6815: add `MapZipWith` – entry-level map function; good excuse to peek into 
Velox/CH map kernels.

All need basic C++ or Scala, come with clear signatures, and have prior 
examples to copy.

GitHub link: https://github.com/apache/incubator-gluten/discussions/10841

----
This is an automatically sent email for [email protected].
To unsubscribe, please send an email to: [email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to