lifulong opened a new pull request, #12127: URL: https://github.com/apache/gluten/pull/12127
## What changes are proposed in this pull request? Gluten jobs on the Velox backend are more prone to driver memory pressure than vanilla Spark in some production workloads. Investigation points to scan operators registering too many SQL metrics (accumulators). Each BatchScanExecTransformer / FileSourceScanExecTransformer / HiveTableScanExecTransformer previously registered 30+ executor-side metrics per scan node. Vanilla Spark is much leaner—for example, BatchScanExec only exposes numOutputRows (+ connector customMetrics), and FileSourceScanExec adds a small set of driver metrics (numFiles, metadataTime, etc.). This gap increases driver heap usage and can contribute to driver OOM, especially on scan-heavy queries. <img width="1004" height="352" alt="企业微信截图_7f05f208-9f83-472b-b638-0aa70650abfc" src="https://github.com/user-attachments/assets/fa71ac80-c593-4277-b2b1-d80affb58923" /> <img width="590" height="143" alt="企业微信截图_0f06b928-eff5-4ba8-a1ae-6f87aca571be" src="https://github.com/user-attachments/assets/db15f11f-617f-4486-90a0-35ae3825d50d" /> (Gluten has been failed in first scan stage, while vanilla spark finish success.) Introduce a Velox-only minimal scan metrics set by default, with an opt-in switch for full metrics collection (debugging / advanced troubleshooting). spark.gluten.sql.scan.detailedMetrics.enabled ClickHouse backend is unchanged—this config does not affect CH scan metrics. Default minimal metrics (Velox) BatchScan (9 executor metrics): rawInputRows, rawInputBytes, numOutputRows, outputBytes, scanTime, wallNanos, peakMemoryBytes, ioWaitTime, storageReadBytes FileSourceScan / HiveTableScan — above plus Spark-aligned driver metrics: numFiles, metadataTime, filesSize, numPartitions, pruningTime Moved to full collection only (when detailed metrics enabled) Examples include: numInputRows, inputVectors, inputBytes, outputVectors, cpuCount, numMemoryAllocations, skippedSplits, processedSplits, numDynamicFiltersAccepted, loadLazyVectorTime, skippedStrides, processedStrides, connector timing (preloadSplits, pageLoadTime, dataSourceAddSplitTime, dataSourceReadTime), storage cache details (storageReads, localReadBytes, ramReadBytes), etc. ## How was this patch tested? WIP on our produce envriment <!-- Describe how the changes were tested, if applicable. Include new tests to validate the functionality, if necessary. For UI-related changes, attach screenshots to demonstrate the updates. --> ## Was this patch authored or co-authored using generative AI tooling? co-authored using cursor. <!-- If generative AI tooling has been used in the process of authoring this patch, please include the phrase: 'Generated-by: ' followed by the name of the tool and its version. If no, write 'No'. Please refer to the [ASF Generative Tooling Guidance](https://www.apache.org/legal/generative-tooling.html) for details. --> -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
