[SPARK-25438][SQL][TEST] Fix FilterPushdownBenchmark to use the same memory 
assumption

## What changes were proposed in this pull request?

This PR aims to fix three things in `FilterPushdownBenchmark`.

**1. Use the same memory assumption.**
The following configurations are used in ORC and Parquet.

- Memory buffer for writing
  - parquet.block.size (default: 128MB)
  - orc.stripe.size (default: 64MB)

- Compression chunk size
  - parquet.page.size (default: 1MB)
  - orc.compress.size (default: 256KB)

SPARK-24692 used 1MB, the default value of `parquet.page.size`, for 
`parquet.block.size` and `orc.stripe.size`. But, it missed to match 
`orc.compress.size`. So, the current benchmark shows the result from ORC with 
256KB memory for compression and Parquet with 1MB. To compare correctly, we 
need to be consistent.

**2. Dictionary encoding should not be enforced for all cases.**
SPARK-24206 enforced dictionary encoding for all test cases. This PR recovers 
the default behavior in general and enforces dictionary encoding only in case 
of `prepareStringDictTable`.

**3. Generate test result on AWS r3.xlarge**
SPARK-24206 generated the result on AWS in order to reproduce and compare 
easily. This PR also aims to update the result on the same machine again in the 
same reason. Specifically, AWS r3.xlarge with Instance Store is used.

## How was this patch tested?

Manual. Enable the test cases and run `FilterPushdownBenchmark` on `AWS 
r3.xlarge`. It takes about 4 hours 15 minutes.

Closes #22427 from dongjoon-hyun/SPARK-25438.

Authored-by: Dongjoon Hyun <dongj...@apache.org>
Signed-off-by: Dongjoon Hyun <dongj...@apache.org>
(cherry picked from commit fefaa3c30df2c56046370081cb51bfe68d26976b)
Signed-off-by: Dongjoon Hyun <dongj...@apache.org>


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/b40e5fee
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/b40e5fee
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/b40e5fee

Branch: refs/heads/branch-2.4
Commit: b40e5feec2660891590e21807133a508cbd004d3
Parents: ae2ca0e
Author: Dongjoon Hyun <dongj...@apache.org>
Authored: Sat Sep 15 17:48:39 2018 -0700
Committer: Dongjoon Hyun <dongj...@apache.org>
Committed: Sat Sep 15 17:48:53 2018 -0700

----------------------------------------------------------------------
 .../FilterPushdownBenchmark-results.txt         | 912 +++++++++----------
 .../benchmark/FilterPushdownBenchmark.scala     |  11 +-
 2 files changed, 428 insertions(+), 495 deletions(-)
----------------------------------------------------------------------



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

Reply via email to