This is an automated email from the ASF dual-hosted git repository.
alamb pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/datafusion.git
The following commit(s) were added to refs/heads/main by this push:
new ca0b760af6 cleanup bench.sh usage message (#16416)
ca0b760af6 is described below
commit ca0b760af6137c0dbec8b07daa5f48e262420cb5
Author: Yongting You <[email protected]>
AuthorDate: Sun Jun 15 18:04:44 2025 +0800
cleanup bench.sh usage message (#16416)
---
benchmarks/bench.sh | 46 ++++++++++++++++++++++++++++++----------------
1 file changed, 30 insertions(+), 16 deletions(-)
diff --git a/benchmarks/bench.sh b/benchmarks/bench.sh
index f1780f8844..9ad12d1f63 100755
--- a/benchmarks/bench.sh
+++ b/benchmarks/bench.sh
@@ -55,42 +55,49 @@ $0 compare <branch1> <branch2>
$0 compare_detail <branch1> <branch2>
$0 venv
-**********
+━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Examples:
-**********
+━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
# Create the datasets for all benchmarks in $DATA_DIR
./bench.sh data
# Run the 'tpch' benchmark on the datafusion checkout in /source/datafusion
DATAFUSION_DIR=/source/datafusion ./bench.sh run tpch
-**********
-* Commands
-**********
+━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
+Commands
+━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
data: Generates or downloads data needed for benchmarking
run: Runs the named benchmark
compare: Compares fastest results from benchmark runs
compare_detail: Compares minimum, average (±stddev), and maximum results from
benchmark runs
venv: Creates new venv (unless already exists) and installs
compare's requirements into it
-**********
-* Benchmarks
-**********
+━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
+Benchmarks
+━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
+# Run all of the following benchmarks
all(default): Data/Run/Compare for all benchmarks
+
+# TPC-H Benchmarks
tpch: TPCH inspired benchmark on Scale Factor (SF) 1 (~1GB),
single parquet file per table, hash join
tpch_csv: TPCH inspired benchmark on Scale Factor (SF) 1 (~1GB),
single csv file per table, hash join
tpch_mem: TPCH inspired benchmark on Scale Factor (SF) 1 (~1GB),
query from memory
tpch10: TPCH inspired benchmark on Scale Factor (SF) 10
(~10GB), single parquet file per table, hash join
tpch_csv10: TPCH inspired benchmark on Scale Factor (SF) 10
(~10GB), single csv file per table, hash join
tpch_mem10: TPCH inspired benchmark on Scale Factor (SF) 10
(~10GB), query from memory
-cancellation: How long cancelling a query takes
-parquet: Benchmark of parquet reader's filtering speed
-sort: Benchmark of sorting speed
-sort_tpch: Benchmark of sorting speed for end-to-end sort queries
on TPCH dataset
+
+# Extended TPC-H Benchmarks
+sort_tpch: Benchmark of sorting speed for end-to-end sort queries
on TPC-H dataset (SF=1)
+topk_tpch: Benchmark of top-k (sorting with limit) queries on
TPC-H dataset (SF=1)
+external_aggr: External aggregation benchmark on TPC-H dataset (SF=1)
+
+# ClickBench Benchmarks
clickbench_1: ClickBench queries against a single parquet file
clickbench_partitioned: ClickBench queries against a partitioned (100 files)
parquet
clickbench_extended: ClickBench \"inspired\" queries against a single
parquet (DataFusion specific)
-external_aggr: External aggregation benchmark
+
+# H2O.ai Benchmarks (Group By, Join, Window)
h2o_small: h2oai benchmark with small dataset (1e7 rows) for
groupby, default file format is csv
h2o_medium: h2oai benchmark with medium dataset (1e8 rows) for
groupby, default file format is csv
h2o_big: h2oai benchmark with large dataset (1e9 rows) for
groupby, default file format is csv
@@ -100,11 +107,18 @@ h2o_big_join: h2oai benchmark with large
dataset (1e9 rows) for join,
h2o_small_window: Extended h2oai benchmark with small dataset (1e7 rows)
for window, default file format is csv
h2o_medium_window: Extended h2oai benchmark with medium dataset (1e8
rows) for window, default file format is csv
h2o_big_window: Extended h2oai benchmark with large dataset (1e9 rows)
for window, default file format is csv
+
+# Join Order Benchmark (IMDB)
imdb: Join Order Benchmark (JOB) using the IMDB dataset
converted to parquet
-**********
-* Supported Configuration (Environment Variables)
-**********
+# Micro-Benchmarks (specific operators and features)
+cancellation: How long cancelling a query takes
+parquet: Benchmark of parquet reader's filtering speed
+sort: Benchmark of sorting speed
+
+━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
+Supported Configuration (Environment Variables)
+━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
DATA_DIR directory to store datasets
CARGO_COMMAND command that runs the benchmark binary
DATAFUSION_DIR directory to use (default $DATAFUSION_DIR)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]