[ https://issues.apache.org/jira/browse/SPARK-32706?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Yuming Wang updated SPARK-32706: -------------------------------- Description: How to reproduce this issue: {code:java} spark.read.parquet("/path/to/part-00000.parquet").selectExpr("cast(bd as decimal(18, 0)) as x").write.mode("overwrite").save("/tmp/spark/decimal") {code} was: Benchmark result: {code:scala} import org.apache.spark.benchmark.Benchmark val N = 100000000L val path = "/tmp/spark/data" spark.range(N).selectExpr("concat('x', cast(id as string)) as str1", "cast(id * 10 as string) as str2").write.mode("overwrite").parquet(path) val benchmark = new Benchmark(s"Benchmark cast string to decimal", valuesPerIteration = N, minNumIters = 2) val df = spark.read.parquet(path) benchmark.addCase("valid decimal") { _ => df.selectExpr("cast(str2 as decimal(18,0))").write.format("noop").mode("overwrite").save() } benchmark.addCase("invalid decimal") { _ => df.selectExpr("cast(str1 as decimal(18,0))").write.format("noop").mode("overwrite").save() } benchmark.run() {code} {noformat} Java HotSpot(TM) 64-Bit Server VM 1.8.0_251-b08 on Mac OS X 10.15.6 Intel(R) Core(TM) i9-9980HK CPU @ 2.40GHz Benchmark cast string to decimal: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------------------------------ valid decimal 9571 9645 104 10.4 95.7 1.0X invalid decimal 81150 81198 67 1.2 811.5 0.1X {noformat} > Poor performance when casting invalid decimal string to decimal type > -------------------------------------------------------------------- > > Key: SPARK-32706 > URL: https://issues.apache.org/jira/browse/SPARK-32706 > Project: Spark > Issue Type: Improvement > Components: SQL > Affects Versions: 3.1.0 > Reporter: Yuming Wang > Priority: Minor > Attachments: part-00000.parquet > > > How to reproduce this issue: > {code:java} > spark.read.parquet("/path/to/part-00000.parquet").selectExpr("cast(bd as > decimal(18, 0)) as x").write.mode("overwrite").save("/tmp/spark/decimal") > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org