This is an automated email from the ASF dual-hosted git repository.
yaooqinn pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/gluten.git
The following commit(s) were added to refs/heads/main by this push:
new 2563a40ed5 [GLUTEN-3456][VL] Enable columnar table cache by default
and extend benchmark coverage (#12138)
2563a40ed5 is described below
commit 2563a40ed5aa78a984c5e6437c377f48f8234364
Author: Kent Yao <[email protected]>
AuthorDate: Tue May 26 19:50:32 2026 +0800
[GLUTEN-3456][VL] Enable columnar table cache by default and extend
benchmark coverage (#12138)
### What changes were proposed in this pull request?
1. Flip `spark.gluten.sql.columnar.tableCache` default from `false` to
`true`.
2. Extend `ColumnarTableCacheBenchmark` to cover:
- 3 sources: `parquet` (Velox-native columnar), `csv`, `json` (row-based
fallback per GLUTEN-3456).
- 2 schema shapes: 5-col numeric mix, and a 16-col x ~200-char
wide-string shape (the GLUTEN-3488 hazard).
- Cases: `count` / column-pruning / filter for numeric; `count` for
wide-string.
3. Regenerate `ColumnarTableCacheBenchmark-results.txt` with the new matrix.
### Why are the changes needed?
GLUTEN-3456 raised the concern that for row-based sources (csv/json) the
Velox columnar cache would lose to vanilla Spark because of the R2C/C2R
conversion tax. The extended benchmark shows the opposite is true on
Velox today:
| Case | disable (ms) | enable (ms) |
speedup |
|---------------------------------------|-------------:|------------:|--------:|
| numeric/parquet count | 17377 | 2975 |
5.84x |
| numeric/parquet column pruning | 20768 | 3778 |
5.50x |
| numeric/parquet filter | 22681 | 4242 |
5.35x |
| numeric/csv count | 40502 | 30146 |
1.34x |
| numeric/csv column pruning | 42245 | 30667 |
1.38x |
| numeric/csv filter | 43929 | 31077 |
1.41x |
| numeric/json count | 44659 | 28467 |
1.57x |
| numeric/json column pruning | 46961 | 29230 |
1.61x |
| numeric/json filter | 49106 | 29061 |
1.69x |
| wide-string/parquet count | 40888 | 11863 |
3.45x |
| wide-string/csv count | 82433 | 86708 |
0.95x |
| wide-string/json count | 70729 | 54856 |
1.29x |
11 / 12 cases improve. The only regression is `wide-string/csv count`
(-5%), where the Arrow CSV scan + R2C cost on a 16-col x ~200-char shape
slightly outweighs the cache benefit. Given how narrow that corner is,
flipping the default to `true` is the right trade-off; users hitting
that shape can still set `spark.gluten.sql.columnar.tableCache=false`.
Hardware: `Intel(R) Xeon(R) Platinum 8473C`, Linux WSL2, JDK 17. Numbers
are 3-iter Best Time from the regenerated golden file.
### How was this patch tested?
- Re-ran `ColumnarTableCacheBenchmark` in three modes (vanilla / Gluten
off / Gluten on) and regenerated the golden file.
- `./build/mvn -Pbackends-velox -Pspark-3.5 spotless:apply` clean.
Generated-by: Claude Opus 4.7
---
.../ColumnarTableCacheBenchmark-results.txt | 101 +++++++++++++++---
.../benchmark/ColumnarTableCacheBenchmark.scala | 114 +++++++++++++++------
.../org/apache/gluten/config/GlutenConfig.scala | 2 +-
3 files changed, 171 insertions(+), 46 deletions(-)
diff --git a/backends-velox/benchmark/ColumnarTableCacheBenchmark-results.txt
b/backends-velox/benchmark/ColumnarTableCacheBenchmark-results.txt
index 43e1faa7cc..d20b9ef334 100644
--- a/backends-velox/benchmark/ColumnarTableCacheBenchmark-results.txt
+++ b/backends-velox/benchmark/ColumnarTableCacheBenchmark-results.txt
@@ -1,23 +1,94 @@
-OpenJDK 64-Bit Server VM 1.8.0_322-b06 on Mac OS X 13.5
-Apple M1 Pro
-table cache count: Best Time(ms) Avg Time(ms)
Stdev(ms) Rate(M/s) Per Row(ns) Relative
+OpenJDK 64-Bit Server VM 17.0.18+8-Ubuntu-124.04.1 on Linux
6.6.87.2-microsoft-standard-WSL2
+Intel(R) Xeon(R) Platinum 8473C
+numeric/parquet table cache count: Best Time(ms) Avg Time(ms)
Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
-disable columnar table cache 16773 17024
401 1.2 838.7 1.0X
-enable columnar table cache 9985 10051
65 2.0 499.3 1.0X
+disable columnar table cache 17377 18043
1147 0.6 1737.7 1.0X
+enable columnar table cache 2975 3304
361 3.4 297.5 1.0X
-OpenJDK 64-Bit Server VM 1.8.0_322-b06 on Mac OS X 13.5
-Apple M1 Pro
-table cache column pruning: Best Time(ms) Avg Time(ms)
Stdev(ms) Rate(M/s) Per Row(ns) Relative
+OpenJDK 64-Bit Server VM 17.0.18+8-Ubuntu-124.04.1 on Linux
6.6.87.2-microsoft-standard-WSL2
+Intel(R) Xeon(R) Platinum 8473C
+numeric/parquet table cache column pruning: Best Time(ms) Avg Time(ms)
Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
-disable columnar table cache 16429 16873
688 1.2 821.5 1.0X
-enable columnar table cache 15118 15495
456 1.3 755.9 1.0X
+disable columnar table cache 20768 20972
307 0.5 2076.8 1.0X
+enable columnar table cache 3778 3988
297 2.6 377.8 1.0X
-OpenJDK 64-Bit Server VM 1.8.0_322-b06 on Mac OS X 13.5
-Apple M1 Pro
-table cache filter: Best Time(ms) Avg Time(ms)
Stdev(ms) Rate(M/s) Per Row(ns) Relative
+OpenJDK 64-Bit Server VM 17.0.18+8-Ubuntu-124.04.1 on Linux
6.6.87.2-microsoft-standard-WSL2
+Intel(R) Xeon(R) Platinum 8473C
+numeric/parquet table cache filter: Best Time(ms) Avg Time(ms)
Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
-disable columnar table cache 22895 23527
722 0.9 1144.7 1.0X
-enable columnar table cache 16673 17462
765 1.2 833.7 1.0X
+disable columnar table cache 22681 22850
248 0.4 2268.1 1.0X
+enable columnar table cache 4242 4353
98 2.4 424.2 1.0X
+
+OpenJDK 64-Bit Server VM 17.0.18+8-Ubuntu-124.04.1 on Linux
6.6.87.2-microsoft-standard-WSL2
+Intel(R) Xeon(R) Platinum 8473C
+numeric/csv table cache count: Best Time(ms) Avg Time(ms)
Stdev(ms) Rate(M/s) Per Row(ns) Relative
+------------------------------------------------------------------------------------------------------------------------
+disable columnar table cache 40502 40834
301 0.2 4050.2 1.0X
+enable columnar table cache 30146 30620
471 0.3 3014.6 1.0X
+
+
+OpenJDK 64-Bit Server VM 17.0.18+8-Ubuntu-124.04.1 on Linux
6.6.87.2-microsoft-standard-WSL2
+Intel(R) Xeon(R) Platinum 8473C
+numeric/csv table cache column pruning: Best Time(ms) Avg Time(ms)
Stdev(ms) Rate(M/s) Per Row(ns) Relative
+------------------------------------------------------------------------------------------------------------------------
+disable columnar table cache 42245 42431
165 0.2 4224.5 1.0X
+enable columnar table cache 30667 31005
395 0.3 3066.7 1.0X
+
+
+OpenJDK 64-Bit Server VM 17.0.18+8-Ubuntu-124.04.1 on Linux
6.6.87.2-microsoft-standard-WSL2
+Intel(R) Xeon(R) Platinum 8473C
+numeric/csv table cache filter: Best Time(ms) Avg Time(ms)
Stdev(ms) Rate(M/s) Per Row(ns) Relative
+------------------------------------------------------------------------------------------------------------------------
+disable columnar table cache 43929 44071
127 0.2 4392.9 1.0X
+enable columnar table cache 31077 31505
376 0.3 3107.7 1.0X
+
+
+OpenJDK 64-Bit Server VM 17.0.18+8-Ubuntu-124.04.1 on Linux
6.6.87.2-microsoft-standard-WSL2
+Intel(R) Xeon(R) Platinum 8473C
+numeric/json table cache count: Best Time(ms) Avg Time(ms)
Stdev(ms) Rate(M/s) Per Row(ns) Relative
+------------------------------------------------------------------------------------------------------------------------
+disable columnar table cache 44659 44947
274 0.2 4465.9 1.0X
+enable columnar table cache 28467 28798
502 0.4 2846.7 1.0X
+
+
+OpenJDK 64-Bit Server VM 17.0.18+8-Ubuntu-124.04.1 on Linux
6.6.87.2-microsoft-standard-WSL2
+Intel(R) Xeon(R) Platinum 8473C
+numeric/json table cache column pruning: Best Time(ms) Avg Time(ms)
Stdev(ms) Rate(M/s) Per Row(ns) Relative
+------------------------------------------------------------------------------------------------------------------------
+disable columnar table cache 46961 47499
465 0.2 4696.1 1.0X
+enable columnar table cache 29230 29620
427 0.3 2923.0 1.0X
+
+
+OpenJDK 64-Bit Server VM 17.0.18+8-Ubuntu-124.04.1 on Linux
6.6.87.2-microsoft-standard-WSL2
+Intel(R) Xeon(R) Platinum 8473C
+numeric/json table cache filter: Best Time(ms) Avg Time(ms)
Stdev(ms) Rate(M/s) Per Row(ns) Relative
+------------------------------------------------------------------------------------------------------------------------
+disable columnar table cache 49106 49767
638 0.2 4910.6 1.0X
+enable columnar table cache 29061 29906
742 0.3 2906.1 1.0X
+
+
+OpenJDK 64-Bit Server VM 17.0.18+8-Ubuntu-124.04.1 on Linux
6.6.87.2-microsoft-standard-WSL2
+Intel(R) Xeon(R) Platinum 8473C
+wide-string/parquet table cache count: Best Time(ms) Avg Time(ms)
Stdev(ms) Rate(M/s) Per Row(ns) Relative
+------------------------------------------------------------------------------------------------------------------------
+disable columnar table cache 40888 41183
267 0.0 20444.2 1.0X
+enable columnar table cache 11863 16043
NaN 0.2 5931.7 1.0X
+
+
+OpenJDK 64-Bit Server VM 17.0.18+8-Ubuntu-124.04.1 on Linux
6.6.87.2-microsoft-standard-WSL2
+Intel(R) Xeon(R) Platinum 8473C
+wide-string/csv table cache count: Best Time(ms) Avg Time(ms)
Stdev(ms) Rate(M/s) Per Row(ns) Relative
+------------------------------------------------------------------------------------------------------------------------
+disable columnar table cache 82433 83101
849 0.0 41216.5 1.0X
+enable columnar table cache 86708 86768
58 0.0 43353.8 1.0X
+
+
+OpenJDK 64-Bit Server VM 17.0.18+8-Ubuntu-124.04.1 on Linux
6.6.87.2-microsoft-standard-WSL2
+Intel(R) Xeon(R) Platinum 8473C
+wide-string/json table cache count: Best Time(ms) Avg Time(ms)
Stdev(ms) Rate(M/s) Per Row(ns) Relative
+------------------------------------------------------------------------------------------------------------------------
+disable columnar table cache 70729 74004
NaN 0.0 35364.4 1.0X
+enable columnar table cache 54856 56680
1607 0.0 27428.2 1.0X
diff --git
a/backends-velox/src/test/scala/org/apache/spark/sql/execution/benchmark/ColumnarTableCacheBenchmark.scala
b/backends-velox/src/test/scala/org/apache/spark/sql/execution/benchmark/ColumnarTableCacheBenchmark.scala
index 0cecae47a9..3d80a1c3fa 100644
---
a/backends-velox/src/test/scala/org/apache/spark/sql/execution/benchmark/ColumnarTableCacheBenchmark.scala
+++
b/backends-velox/src/test/scala/org/apache/spark/sql/execution/benchmark/ColumnarTableCacheBenchmark.scala
@@ -19,17 +19,66 @@ package org.apache.spark.sql.execution.benchmark
import org.apache.gluten.config.GlutenConfig
import org.apache.spark.benchmark.Benchmark
+import org.apache.spark.sql.DataFrame
+import org.apache.spark.sql.functions.{col, repeat}
import org.apache.spark.storage.StorageLevel
+import java.io.File
+
/**
- * Benchmark to measure performance for columnar table cache. To run this
benchmark:
+ * Benchmark to measure performance for columnar table cache across source
formats and schema
+ * shapes. To run this benchmark:
* {{{
* 1. without sbt:
* bin/spark-submit --class <this class> --jars <spark core test jar>
<sql core test jar>
* }}}
+ *
+ * Matrix:
+ * - sources : parquet (velox-native columnar), csv, json (row-based
fallback per GLUTEN-3456)
+ * - shapes : numeric (5 cols mixed), wide-string (16 x ~200 char) -- the
GLUTEN-3488 hazard
+ * - cases : count / column pruning / filter for numeric; count only for
wide-string
*/
object ColumnarTableCacheBenchmark extends SqlBasedBenchmark {
- private val numRows = 20L * 1000 * 1000
+ private val numRows = 10L * 1000 * 1000
+ private val wideStringRows = 2L * 1000 * 1000
+ private val wideStringCols = 16
+ private val wideStringLen = 200
+
+ private val sources = Seq("parquet", "csv", "json")
+
+ private def numericDf(): DataFrame = {
+ spark
+ .range(numRows)
+ .selectExpr(
+ "cast(id as int) as c0",
+ "cast(id as double) as c1",
+ "id as c2",
+ "cast(id as string) as c3",
+ "uuid() as c4")
+ }
+
+ private def wideStringDf(): DataFrame = {
+ val cols = (0 until wideStringCols).map(
+ i =>
+ repeat(col("id").cast("string"), wideStringLen / 8 + 1).as(s"c$i"))
+ spark.range(wideStringRows).select(cols: _*)
+ }
+
+ private def writeSource(df: DataFrame, source: String, dir: File): String = {
+ val path = new File(dir, source).getCanonicalPath
+ val writer = df.write.format(source)
+ val w = if (source == "csv") writer.option("header", "true") else writer
+ w.save(path)
+ path
+ }
+
+ private def readSource(source: String, path: String): DataFrame = {
+ val reader = spark.read.format(source)
+ val r = if (source == "csv") {
+ reader.option("header", "true").option("inferSchema", "true")
+ } else reader
+ r.load(path)
+ }
private def doBenchmark(name: String, cardinality: Long)(f: => Unit): Unit =
{
val benchmark = new Benchmark(name, cardinality, output = output)
@@ -50,38 +99,43 @@ object ColumnarTableCacheBenchmark extends
SqlBasedBenchmark {
override def runBenchmarkSuite(mainArgs: Array[String]): Unit = {
withTempPath {
f =>
- spark
- .range(numRows)
- .selectExpr(
- "cast(id as int) as c0",
- "cast(id as double) as c1",
- "id as c2",
- "cast(id as string) as c3",
- "uuid() as c4")
- .write
- .parquet(f.getCanonicalPath)
+ // --- numeric workload: write once per source, run 3 cases ---
+ val numericPaths =
+ sources.map(src => src -> writeSource(numericDf(), src, new File(f,
"numeric"))).toMap
- doBenchmark("table cache count", numRows) {
-
spark.read.parquet(f.getCanonicalPath).persist(StorageLevel.MEMORY_ONLY).count()
- spark.catalog.clearCache()
+ sources.foreach {
+ src =>
+ val path = numericPaths(src)
+ doBenchmark(s"numeric/$src table cache count", numRows) {
+ readSource(src, path).persist(StorageLevel.MEMORY_ONLY).count()
+ spark.catalog.clearCache()
+ }
+ doBenchmark(s"numeric/$src table cache column pruning", numRows) {
+ val cached = readSource(src,
path).persist(StorageLevel.MEMORY_ONLY)
+ cached.select("c1", "c2").noop()
+ cached.select("c0", "c3").noop()
+ spark.catalog.clearCache()
+ }
+ doBenchmark(s"numeric/$src table cache filter", numRows) {
+ val cached = readSource(src,
path).persist(StorageLevel.MEMORY_ONLY)
+ cached.where("c1 % 100 > 10").noop()
+ cached.where("c1 % 100 > 20").noop()
+ spark.catalog.clearCache()
+ }
}
- doBenchmark("table cache column pruning", numRows) {
- val cached = spark.read
- .parquet(f.getCanonicalPath)
- .persist(StorageLevel.MEMORY_ONLY)
- cached.select("c1", "c2").noop()
- cached.select("c0", "c3").noop()
- spark.catalog.clearCache()
- }
+ // --- wide-string workload: GLUTEN-3488 hazard reproducer, count only
---
+ val wsPaths = sources.map {
+ src => src -> writeSource(wideStringDf(), src, new File(f,
"widestring"))
+ }.toMap
- doBenchmark("table cache filter", numRows) {
- val cached = spark.read
- .parquet(f.getCanonicalPath)
- .persist(StorageLevel.MEMORY_ONLY)
- cached.where("c1 % 100 > 10").noop()
- cached.where("c1 % 100 > 20").noop()
- spark.catalog.clearCache()
+ sources.foreach {
+ src =>
+ val path = wsPaths(src)
+ doBenchmark(s"wide-string/$src table cache count", wideStringRows)
{
+ readSource(src, path).persist(StorageLevel.MEMORY_ONLY).count()
+ spark.catalog.clearCache()
+ }
}
}
}
diff --git
a/gluten-substrait/src/main/scala/org/apache/gluten/config/GlutenConfig.scala
b/gluten-substrait/src/main/scala/org/apache/gluten/config/GlutenConfig.scala
index 414bf113db..821d747228 100644
---
a/gluten-substrait/src/main/scala/org/apache/gluten/config/GlutenConfig.scala
+++
b/gluten-substrait/src/main/scala/org/apache/gluten/config/GlutenConfig.scala
@@ -1023,7 +1023,7 @@ object GlutenConfig extends ConfigRegistry {
buildStaticConf("spark.gluten.sql.columnar.tableCache")
.doc("Enable or disable columnar table cache.")
.booleanConf
- .createWithDefault(false)
+ .createWithDefault(true)
val COLUMNAR_TABLE_CACHE_PARTITION_STATS_ENABLED =
buildConf("spark.gluten.sql.columnar.tableCache.partitionStats.enabled")
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]