[ 
https://issues.apache.org/jira/browse/SPARK-32706?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang updated SPARK-32706:
--------------------------------
    Description: 
How to reproduce this issue:

{code:java}
spark.read.parquet("/path/to/part-00000.parquet").selectExpr("cast(bd as 
decimal(18, 0)) as x").write.mode("overwrite").save("/tmp/spark/decimal")
{code}


  was:
Benchmark result:

{code:scala}
import org.apache.spark.benchmark.Benchmark

val N = 100000000L
val path = "/tmp/spark/data"
spark.range(N).selectExpr("concat('x', cast(id as string)) as str1", "cast(id * 
10 as string) as str2").write.mode("overwrite").parquet(path)

val benchmark = new Benchmark(s"Benchmark cast string to decimal", 
valuesPerIteration = N, minNumIters = 2)

val df = spark.read.parquet(path)
benchmark.addCase("valid decimal") { _ =>
  df.selectExpr("cast(str2 as 
decimal(18,0))").write.format("noop").mode("overwrite").save()
}

benchmark.addCase("invalid decimal") { _ =>
  df.selectExpr("cast(str1 as 
decimal(18,0))").write.format("noop").mode("overwrite").save()
}

benchmark.run()
{code}


{noformat}
Java HotSpot(TM) 64-Bit Server VM 1.8.0_251-b08 on Mac OS X 10.15.6
Intel(R) Core(TM) i9-9980HK CPU @ 2.40GHz
Benchmark cast string to decimal:         Best Time(ms)   Avg Time(ms)   
Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------------------------------
valid decimal                                      9571           9645         
104         10.4          95.7       1.0X
invalid decimal                                   81150          81198          
67          1.2         811.5       0.1X
{noformat}




> Poor performance when casting invalid decimal string to decimal type
> --------------------------------------------------------------------
>
>                 Key: SPARK-32706
>                 URL: https://issues.apache.org/jira/browse/SPARK-32706
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 3.1.0
>            Reporter: Yuming Wang
>            Priority: Minor
>         Attachments: part-00000.parquet
>
>
> How to reproduce this issue:
> {code:java}
> spark.read.parquet("/path/to/part-00000.parquet").selectExpr("cast(bd as 
> decimal(18, 0)) as x").write.mode("overwrite").save("/tmp/spark/decimal")
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to