GitHub user GlutenPerfBot created a discussion: September 26, 2025: Weekly Status Update in Gluten
*This weekly update is generated by LLMs. You're welcome to join our [Github](https://github.com/apache/incubator-gluten/discussions) for in-depth discussions.* ## Overall Activity Summary The past 7 days saw 38 merged PRs and 29 open PRs across Velox, ClickHouse, Flink, and infrastructure areas. Major themes include Delta Lake native write support, Flink Nexmark benchmark expansion, daily Velox version synchronization, and performance optimizations for hash joins and window operations. ## Key Ongoing Projects - **Delta Lake Native Write** - @zhztheplayer leads the effort to add native Delta Lake write support for Spark 3.5 + Delta 3.3, with #10801 and #10802 providing the core implementation and test coverage - **Flink Nexmark Benchmark** - @shuai-xu and @KevinyhZou continue expanding Nexmark query coverage, recently adding tests for q18-q21 (#10757) and decimal type support (#10769) - **Daily Velox Sync** - @GlutenPerfBot maintains daily version updates (#10808, #10800, #10792) keeping the Velox backend current with upstream improvements - **Performance Optimization** - Multiple contributors focus on hash join performance (#7548), window operator streaming conversion (#10734), and memory usage improvements (#10662) ## Priority Items - **Release 1.5.0 Preparation** - @PHILO-HE coordinates final patches for the upcoming release (#10574), including documentation updates (#10793) - **Critical Bug Fixes** - @lgbo-ustc addresses ClickHouse crashes during task cancellation (#10797, #10775), while @beliefer fixes performance regression by changing propagateIgnoreNullKeys default (#10811) - **Build Issues** - @jinchengchenghh works on Iceberg jar compatibility issues (#10804) affecting ARM builds ## Notable Discussions - #10759: Weekly status updates show strong momentum across all backends with 38 merged PRs - #10215: Delta Lake write support roadmap outlines remaining work for Spark 4.0 compatibility and native statistics tracking - #7548: Broadcast hash join performance degradation investigation reveals optimization opportunities ## Emerging Trends - **Lake-house Acceleration** - Increased focus on native read/write support for Delta Lake and Iceberg formats - **Micro-performance Focus** - Detailed optimizations in hash table builds, lazy vector loading metrics, and memory pool management - **Multi-backend Convergence** - Efforts to unify features across Velox and ClickHouse backends - **Flink Integration Maturity** - Comprehensive Nexmark benchmark support demonstrates production readiness ## Good First Issues - #6814: Implement MakeYMInterval for ClickHouse - straightforward date interval function implementation - #4730: Add date_from_unix_date function - follows existing date function patterns in ClickHouse backend - #6807: Support split_part string function - basic string manipulation suitable for learning CH UDF framework - #6812: Implement SparkPartitionID function - reuses existing Spark partition ID functionality - #6815: Add MapZipWith expression support - entry-level map function implementation These issues require basic C++ knowledge and ClickHouse function registration understanding, making them ideal for newcomers to contribute to the project. GitHub link: https://github.com/apache/incubator-gluten/discussions/10813 ---- This is an automatically sent email for [email protected]. To unsubscribe, please send an email to: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
