Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/22501 @cloud-fan After updating on EC2, almost ratio and values looks more stable and reasonable for now. The following two are noticeable changes, but it looks like Parquet Writer improvement (instead of regression). **1. Read/Write ratio is reverted (`0.8` -> `1.7`)** I'm not sure but Parquet writer for `deep ```scala - 128 x 8 deep x 1000 rows (read parquet) 69 / 74 1.4 693.9 0.2X - 128 x 8 deep x 1000 rows (write parquet) 78 / 83 1.3 777.7 0.2X + 128 x 8 deep x 1000 rows (read parquet) 351 / 379 0.3 3510.3 0.1X + 128 x 8 deep x 1000 rows (write parquet) 199 / 203 0.5 1988.3 0.2X ``` **2. Read/Write ratio is changed noticeably (`4.6` -> `8.3`)** ```scala - 1024 x 11 deep x 100 rows (read parquet) 426 / 433 0.2 4263.7 0.0X - 1024 x 11 deep x 100 rows (write parquet) 91 / 98 1.1 913.5 0.1X + 1024 x 11 deep x 100 rows (read parquet) 2063 / 2078 0.0 20629.2 0.0X + 1024 x 11 deep x 100 rows (write parquet) 248 / 266 0.4 2475.1 0.1X ``` Since this is the first attempt to track this and the previous result is too old, there exists some obvious limitation during comparison. From Spark 2.4.0, we can get a consistent compasison instead of `different` personal mac.
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org