This is an automated email from the ASF dual-hosted git repository.

agrove pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/datafusion-comet.git


The following commit(s) were added to refs/heads/main by this push:
     new 414e7a36 docs: Add benchmarking guide (#444)
414e7a36 is described below

commit 414e7a36a7aa8340c0ebf85a749e4306c8376a19
Author: Andy Grove <[email protected]>
AuthorDate: Fri May 17 16:24:48 2024 -0600

    docs: Add benchmarking guide (#444)
    
    * add benchmarking guide
    
    * add ASF header
---
 docs/source/contributor-guide/benchmarking.md | 62 +++++++++++++++++++++++++++
 docs/source/index.rst                         |  1 +
 2 files changed, 63 insertions(+)

diff --git a/docs/source/contributor-guide/benchmarking.md 
b/docs/source/contributor-guide/benchmarking.md
new file mode 100644
index 00000000..502b35c2
--- /dev/null
+++ b/docs/source/contributor-guide/benchmarking.md
@@ -0,0 +1,62 @@
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+# Comet Benchmarking Guide
+
+To track progress on performance, we regularly run benchmarks derived from 
TPC-H and TPC-DS. Benchmarking scripts are
+available in the [DataFusion 
Benchmarks](https://github.com/apache/datafusion-benchmarks) GitHub repository.
+
+Here is an example command for running the benchmarks. This command will need 
to be adapted based on the Spark 
+environment and location of data files.
+
+This command assumes that `datafusion-benchmarks` is checked out in a parallel 
directory to `datafusion-comet`.
+
+```shell
+$SPARK_HOME/bin/spark-submit \ 
+    --master "local[*]" \ 
+    --conf spark.driver.memory=8G \ 
+    --conf spark.executor.memory=64G \ 
+    --conf spark.executor.cores=16 \ 
+    --conf spark.cores.max=16 \ 
+    --conf spark.eventLog.enabled=true \ 
+    --conf spark.sql.autoBroadcastJoinThreshold=-1 \ 
+    --jars $COMET_JAR \ 
+    --conf spark.driver.extraClassPath=$COMET_JAR \ 
+    --conf spark.executor.extraClassPath=$COMET_JAR \ 
+    --conf spark.sql.extensions=org.apache.comet.CometSparkSessionExtensions \ 
+    --conf spark.comet.enabled=true \ 
+    --conf spark.comet.exec.enabled=true \ 
+    --conf spark.comet.exec.all.enabled=true \ 
+    --conf spark.comet.cast.allowIncompatible=true \ 
+    --conf spark.comet.explainFallback.enabled=true \ 
+    --conf spark.comet.parquet.io.enabled=false \ 
+    --conf spark.comet.batchSize=8192 \ 
+    --conf spark.comet.columnar.shuffle.enabled=false \ 
+    --conf spark.comet.exec.shuffle.enabled=true \ 
+    --conf 
spark.shuffle.manager=org.apache.spark.sql.comet.execution.shuffle.CometShuffleManager
 \ 
+    --conf spark.sql.adaptive.coalescePartitions.enabled=false \ 
+    --conf spark.comet.shuffle.enforceMode.enabled=true \
+    ../datafusion-benchmarks/runners/datafusion-comet/tpcbench.py \
+    --benchmark tpch \ 
+    --data /mnt/bigdata/tpch/sf100-parquet/ \ 
+    --queries ../datafusion-benchmarks/tpch/queries 
+```
+
+Comet performance can be compared to regular Spark performance by running the 
benchmark twice, once with 
+`spark.comet.enabled` set to `true` and once with it set to `false`. 
\ No newline at end of file
diff --git a/docs/source/index.rst b/docs/source/index.rst
index eb42950b..819f7201 100644
--- a/docs/source/index.rst
+++ b/docs/source/index.rst
@@ -58,6 +58,7 @@ as a native runtime to achieve improvement in terms of query 
efficiency and quer
    Comet Plugin Overview <contributor-guide/plugin_overview>
    Development Guide <contributor-guide/development>
    Debugging Guide <contributor-guide/debugging>
+   Benchmarking Guide <contributor-guide/benchmarking>
    Profiling Native Code <contributor-guide/profiling_native_code>
    Github and Issue Tracker <https://github.com/apache/datafusion-comet>
 


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to