[GitHub] [spark] razajafri commented on pull request #31284: [SPARK-34167][SQL]Reading parquet with IntDecimal written as a LongDecimal blows up

GitBox Wed, 27 Jan 2021 11:20:47 -0800


razajafri commented on pull request #31284:
URL: https://github.com/apache/spark/pull/31284#issuecomment-768517258



   @revans2 I ran a test manually with two files with 1M records written with 
Spark 3.0.0. They were read in with Spark-3.0.0, Spark-3.1 and with master with 
my fix. Each file was read in 3 times, I used spark.time to time the read which 
isn't the best way I know but still gives us a ball park number
   
   File 1 contains rows of [Decimal(18,0), Decimal(7,3), Decimal(7,7), 
Decimal(12,2)]. Read times avg ms are 
       spark-3.0: 3960 ms 
       spark-3.1: 4262 ms
       spark-master-with-fix: 4129 ms
   
   File 2 contains rows of [Decimal(12,2)]
       spark-3.0: 683 ms 
       spark-3.1: 668 ms
       spark-master-with-fix: 638 ms
   
   I don't know if/how we can automate a unit test for this. Let me know what 
you think


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] razajafri commented on pull request #31284: [SPARK-34167][SQL]Reading parquet with IntDecimal written as a LongDecimal blows up

Reply via email to