Github user dongjoon-hyun commented on the issue:

    https://github.com/apache/spark/pull/22501
  
    @cloud-fan After updating on EC2, almost ratio and values looks more stable 
and reasonable for now. The following two are noticeable changes, but it looks 
like Parquet Writer improvement (instead of regression).
    
    **1. Read/Write ratio is reverted (`0.8` -> `1.7`)**
    I'm not sure but Parquet writer for `deep
    ```scala
    - 128 x 8 deep x 1000 rows (read parquet)         69 /   74          1.4    
     693.9       0.2X
    - 128 x 8 deep x 1000 rows (write parquet)        78 /   83          1.3    
     777.7       0.2X
    + 128 x 8 deep x 1000 rows (read parquet)        351 /  379          0.3    
    3510.3       0.1X
    + 128 x 8 deep x 1000 rows (write parquet)       199 /  203          0.5    
    1988.3       0.2X
    ```
    
    **2. Read/Write ratio is changed noticeably (`4.6` -> `8.3`)**
    ```scala
    - 1024 x 11 deep x 100 rows (read parquet)        426 /  433          0.2   
     4263.7       0.0X
    - 1024 x 11 deep x 100 rows (write parquet)        91 /   98          1.1   
      913.5       0.1X
    + 1024 x 11 deep x 100 rows (read parquet)       2063 / 2078          0.0   
    20629.2       0.0X
    + 1024 x 11 deep x 100 rows (write parquet)       248 /  266          0.4   
     2475.1       0.1X
    ```
    
    Since this is the first attempt to track this and the previous result is 
too old, there exists some obvious limitation during comparison. From Spark 
2.4.0, we can get a consistent compasison instead of `different` personal mac.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to