[ 
https://issues.apache.org/jira/browse/SPARK-25438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-25438:
-------------------------------------

    Assignee: Dongjoon Hyun

> Fix FilterPushdownBenchmark to use the same memory assumption
> -------------------------------------------------------------
>
>                 Key: SPARK-25438
>                 URL: https://issues.apache.org/jira/browse/SPARK-25438
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL, Tests
>    Affects Versions: 2.4.0
>            Reporter: Dongjoon Hyun
>            Assignee: Dongjoon Hyun
>            Priority: Major
>
> This issue aims to fix three things in `FilterPushdownBenchmark`.
> 1. Use the same memory assumption. 
> The following configurations are used in ORC and Parquet.
> *Memory buffer for writing*
> - parquet.block.size (default: 128MB)
> - orc.stripe.size (default: 64MB)
> *Compression chunk size*
> - parquet.page.size (default: 1MB)
> - orc.compress.size (default: 256KB)
> SPARK-24692 used 1MB, the default value of `parquet.page.size`, for 
> `parquet.block.size` and `orc.stripe.size`. But, it missed to match 
> `orc.compress.size`. So, the current benchmark shows the result from ORC with 
> 256KB memory for compression and Parquet with 1MB. To compare correctly, we 
> need to be consistent.
> 2. Dictionary encoding should not be enforced for all cases.
> SPARK-24206 enforced dictionary encoding for all test cases. This issue 
> recovers the ORC behavior in general and enforces dictionary encoding only 
> for `prepareStringDictTable`.
> 3. Generate test result on AWS r3.xlarge.
> We do not 
> SPARK-24206 generates the result on AWS in order to reproduce and compare 
> easily. This issue also aims to update the result on the same machine again 
> in the same reason. Specifically, AWS r3.xlarge with Instance Store is used.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to