[ https://issues.apache.org/jira/browse/SPARK-25438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Dongjoon Hyun reassigned SPARK-25438: ------------------------------------- Assignee: Dongjoon Hyun > Fix FilterPushdownBenchmark to use the same memory assumption > ------------------------------------------------------------- > > Key: SPARK-25438 > URL: https://issues.apache.org/jira/browse/SPARK-25438 > Project: Spark > Issue Type: Bug > Components: SQL, Tests > Affects Versions: 2.4.0 > Reporter: Dongjoon Hyun > Assignee: Dongjoon Hyun > Priority: Major > > This issue aims to fix three things in `FilterPushdownBenchmark`. > 1. Use the same memory assumption. > The following configurations are used in ORC and Parquet. > *Memory buffer for writing* > - parquet.block.size (default: 128MB) > - orc.stripe.size (default: 64MB) > *Compression chunk size* > - parquet.page.size (default: 1MB) > - orc.compress.size (default: 256KB) > SPARK-24692 used 1MB, the default value of `parquet.page.size`, for > `parquet.block.size` and `orc.stripe.size`. But, it missed to match > `orc.compress.size`. So, the current benchmark shows the result from ORC with > 256KB memory for compression and Parquet with 1MB. To compare correctly, we > need to be consistent. > 2. Dictionary encoding should not be enforced for all cases. > SPARK-24206 enforced dictionary encoding for all test cases. This issue > recovers the ORC behavior in general and enforces dictionary encoding only > for `prepareStringDictTable`. > 3. Generate test result on AWS r3.xlarge. > We do not > SPARK-24206 generates the result on AWS in order to reproduce and compare > easily. This issue also aims to update the result on the same machine again > in the same reason. Specifically, AWS r3.xlarge with Instance Store is used. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org