GitHub user GlutenPerfBot created a discussion: September 26, 2025: Weekly 
Status Update in Gluten

*This weekly update is generated by LLMs. You're welcome to join our 
[Github](https://github.com/apache/incubator-gluten/discussions) for in-depth 
discussions.*

## Overall Activity Summary
The past 7 days saw 38 merged PRs and 29 open PRs across Velox, ClickHouse, 
Flink, and infrastructure areas. Major themes include Delta Lake native write 
support, Flink Nexmark benchmark expansion, daily Velox version 
synchronization, and performance optimizations for hash joins and window 
operations.

## Key Ongoing Projects
- **Delta Lake Native Write** - @zhztheplayer leads the effort to add native 
Delta Lake write support for Spark 3.5 + Delta 3.3, with #10801 and #10802 
providing the core implementation and test coverage
- **Flink Nexmark Benchmark** - @shuai-xu and @KevinyhZou continue expanding 
Nexmark query coverage, recently adding tests for q18-q21 (#10757) and decimal 
type support (#10769)
- **Daily Velox Sync** - @GlutenPerfBot maintains daily version updates 
(#10808, #10800, #10792) keeping the Velox backend current with upstream 
improvements
- **Performance Optimization** - Multiple contributors focus on hash join 
performance (#7548), window operator streaming conversion (#10734), and memory 
usage improvements (#10662)

## Priority Items
- **Release 1.5.0 Preparation** - @PHILO-HE coordinates final patches for the 
upcoming release (#10574), including documentation updates (#10793)
- **Critical Bug Fixes** - @lgbo-ustc addresses ClickHouse crashes during task 
cancellation (#10797, #10775), while @beliefer fixes performance regression by 
changing propagateIgnoreNullKeys default (#10811)
- **Build Issues** - @jinchengchenghh works on Iceberg jar compatibility issues 
(#10804) affecting ARM builds

## Notable Discussions
- #10759: Weekly status updates show strong momentum across all backends with 
38 merged PRs
- #10215: Delta Lake write support roadmap outlines remaining work for Spark 
4.0 compatibility and native statistics tracking
- #7548: Broadcast hash join performance degradation investigation reveals 
optimization opportunities

## Emerging Trends
- **Lake-house Acceleration** - Increased focus on native read/write support 
for Delta Lake and Iceberg formats
- **Micro-performance Focus** - Detailed optimizations in hash table builds, 
lazy vector loading metrics, and memory pool management
- **Multi-backend Convergence** - Efforts to unify features across Velox and 
ClickHouse backends
- **Flink Integration Maturity** - Comprehensive Nexmark benchmark support 
demonstrates production readiness

## Good First Issues
- #6814: Implement MakeYMInterval for ClickHouse - straightforward date 
interval function implementation
- #4730: Add date_from_unix_date function - follows existing date function 
patterns in ClickHouse backend
- #6807: Support split_part string function - basic string manipulation 
suitable for learning CH UDF framework
- #6812: Implement SparkPartitionID function - reuses existing Spark partition 
ID functionality
- #6815: Add MapZipWith expression support - entry-level map function 
implementation

These issues require basic C++ knowledge and ClickHouse function registration 
understanding, making them ideal for newcomers to contribute to the project.

GitHub link: https://github.com/apache/incubator-gluten/discussions/10813

----
This is an automatically sent email for [email protected].
To unsubscribe, please send an email to: [email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to