This is an automated email from the ASF dual-hosted git repository.
guanmingchiu pushed a commit to branch dev-qdp
in repository https://gitbox.apache.org/repos/asf/mahout.git
The following commit(s) were added to refs/heads/dev-qdp by this push:
new dd3856b78 [QDP] Rename benchmark throughput script and refresh docs
(#776)
dd3856b78 is described below
commit dd3856b784607d34446c80243e4b1afc9809e49d
Author: Ping <[email protected]>
AuthorDate: Wed Dec 31 22:35:17 2025 +0800
[QDP] Rename benchmark throughput script and refresh docs (#776)
Signed-off-by: 400Ping <[email protected]>
---
qdp/DEVELOPMENT.md | 4 +-
qdp/Makefile | 2 +-
qdp/qdp-python/benchmark/README.md | 83 +++++++++++++++++++++-
.../qdp-python/benchmark/benchmark_throughput.md | 10 ++-
...oader_throughput.py => benchmark_throughput.py} | 2 +-
5 files changed, 93 insertions(+), 8 deletions(-)
diff --git a/qdp/DEVELOPMENT.md b/qdp/DEVELOPMENT.md
index e8664802e..d28a3ffcf 100644
--- a/qdp/DEVELOPMENT.md
+++ b/qdp/DEVELOPMENT.md
@@ -168,7 +168,7 @@ You can also run individual tests manually from the
`qdp-python/benchmark/` dire
```sh
# Benchmark test for dataloader throughput
-python benchmark_dataloader_throughput.py
+python benchmark_throughput.py
# E2E test
python benchmark_e2e.py
@@ -194,7 +194,7 @@ A: Check available GPUs with `nvidia-smi`. Verify GPU
visibility with `echo $CUD
### Q: Benchmark tests fail or produce unexpected results
-A: Ensure all dependencies are installed with `uv pip install -r
benchmark/requirements.txt`. Check GPU memory availability using `nvidia-smi`.
If you don't need qiskit/pennylane comparisons, uninstall them as mentioned in
the [E2e test section](#e2e-tests).
+A: Ensure all dependencies are installed with `uv sync --group benchmark`
(from `qdp/qdp-python`). Check GPU memory availability using `nvidia-smi`. If
you don't need qiskit/pennylane comparisons, uninstall them as mentioned in the
[E2e test section](#e2e-tests).
### Q: Pre-commit hooks fail
diff --git a/qdp/Makefile b/qdp/Makefile
index 51f37a551..53572ccf8 100644
--- a/qdp/Makefile
+++ b/qdp/Makefile
@@ -48,7 +48,7 @@ install_benchmark:
benchmark: install install_benchmark
@echo "Running e2e benchmark tests..."
uv run python qdp-python/benchmark/benchmark_e2e.py
- uv run python qdp-python/benchmark/benchmark_dataloader_throughput.py
+ uv run python qdp-python/benchmark/benchmark_throughput.py
run_nvtx_profile:
$(eval EXAMPLE ?= nvtx_profile)
diff --git a/qdp/qdp-python/benchmark/README.md
b/qdp/qdp-python/benchmark/README.md
index f8f413d41..6fcef290e 100644
--- a/qdp/qdp-python/benchmark/README.md
+++ b/qdp/qdp-python/benchmark/README.md
@@ -1,5 +1,86 @@
-<!-- TODO: benchmark docs -->
+# Benchmarks
+This directory contains Python benchmarks for Mahout QDP. There are two main
+scripts:
+
+- `benchmark_e2e.py`: end-to-end latency from disk to GPU VRAM (includes IO,
+ normalization, encoding, transfer, and a dummy forward pass).
+- `benchmark_throughput.py`: DataLoader-style throughput benchmark
+ that measures vectors/sec across Mahout, PennyLane, and Qiskit.
+
+## Quick Start
+
+From the repo root:
+
+```bash
+cd qdp
+make benchmark
+```
+
+This installs the QDP Python package (if needed), installs benchmark
+dependencies, and runs both benchmarks.
+
+## Manual Setup
+
+```bash
+cd qdp/qdp-python
+uv sync --group benchmark
+```
+
+Then run benchmarks with `uv run python ...` or activate the virtual
+environment and use `python ...`.
+
+## E2E Benchmark (Disk -> GPU)
+
+```bash
+cd qdp/qdp-python/benchmark
+python benchmark_e2e.py
+```
+
+Additional options:
+
+```bash
+python benchmark_e2e.py --qubits 16 --samples 200 --frameworks mahout-parquet
mahout-arrow
+python benchmark_e2e.py --frameworks all
+```
+
+Notes:
+
+- `--frameworks` accepts a space-separated list or `all`.
+ Options: `mahout-parquet`, `mahout-arrow`, `pennylane`, `qiskit`.
+- The script writes `final_benchmark_data.parquet` and
+ `final_benchmark_data.arrow` in the current working directory and overwrites
+ them on each run.
+- If multiple frameworks run, the script compares output states for
+ correctness at the end.
+
+## DataLoader Throughput Benchmark
+
+Simulates a typical QML training loop by continuously loading batches of 64
+vectors (default). Goal: demonstrate that QDP can saturate GPU utilization and
+avoid the "starvation" often seen in hybrid training loops.
+
+See `qdp/qdp-python/benchmark/benchmark_throughput.md` for details and example
+output.
+
+```bash
+cd qdp/qdp-python/benchmark
+python benchmark_throughput.py --qubits 16 --batches 200 --batch-size 64
--prefetch 16
+python benchmark_throughput.py --frameworks mahout,pennylane
+```
+
+Notes:
+
+- `--frameworks` is a comma-separated list or `all`.
+ Options: `mahout`, `pennylane`, `qiskit`.
+- Throughput is reported in vectors/sec (higher is better).
+
+## Dependency Notes
+
+- Qiskit and PennyLane are optional. If they are not installed, their benchmark
+ legs are skipped automatically.
+- For Mahout-only runs, you can uninstall the competitor frameworks:
+ `uv pip uninstall qiskit pennylane`.
### We can also run benchmarks on colab notebooks(without owning a GPU)
diff --git a/docs/benchmarks/dataloader_throughput.md
b/qdp/qdp-python/benchmark/benchmark_throughput.md
similarity index 86%
rename from docs/benchmarks/dataloader_throughput.md
rename to qdp/qdp-python/benchmark/benchmark_throughput.md
index 242340c26..ba26f0e60 100644
--- a/docs/benchmarks/dataloader_throughput.md
+++ b/qdp/qdp-python/benchmark/benchmark_throughput.md
@@ -2,6 +2,10 @@
This benchmark mirrors the `qdp-core/examples/dataloader_throughput.rs`
pipeline and compares Mahout (QDP) against PennyLane and Qiskit on the same
workload. It streams batches from a CPU-side producer, encodes amplitude states
on GPU, and reports vectors-per-second.
+Goal: simulate a typical QML training loop by continuously loading batches of
+64 vectors (default), showing that QDP can keep GPU utilization high and avoid
+the "starvation" often seen in hybrid training loops.
+
## Workload
- Qubits: 16 (vector length `2^16`)
@@ -15,11 +19,11 @@ This benchmark mirrors the
`qdp-core/examples/dataloader_throughput.rs` pipeline
# QDP-only Rust example
cargo run -p qdp-core --example dataloader_throughput --release
-# Cross-framework comparison (requires deps in qdp/benchmark/requirements.txt)
-python qdp/benchmark/benchmark_dataloader_throughput.py --qubits 16 --batches
200 --batch-size 64 --prefetch 16
+# Cross-framework comparison (requires benchmark deps)
+python qdp/qdp-python/benchmark/benchmark_throughput.py --qubits 16 --batches
200 --batch-size 64 --prefetch 16
# Run only Mahout + PennyLane legs
-python qdp/benchmark/benchmark_dataloader_throughput.py --frameworks
mahout,pennylane
+python qdp/qdp-python/benchmark/benchmark_throughput.py --frameworks
mahout,pennylane
```
## Example Output
diff --git a/qdp/qdp-python/benchmark/benchmark_dataloader_throughput.py
b/qdp/qdp-python/benchmark/benchmark_throughput.py
similarity index 99%
rename from qdp/qdp-python/benchmark/benchmark_dataloader_throughput.py
rename to qdp/qdp-python/benchmark/benchmark_throughput.py
index 9ce974084..0d7916fec 100644
--- a/qdp/qdp-python/benchmark/benchmark_dataloader_throughput.py
+++ b/qdp/qdp-python/benchmark/benchmark_throughput.py
@@ -24,7 +24,7 @@ The workload mirrors the
`qdp-core/examples/dataloader_throughput.rs` pipeline:
- Encode vectors into amplitude states on GPU and run a tiny consumer op.
Run:
- python qdp/benchmark/benchmark_dataloader_throughput.py --qubits 16
--batches 200 --batch-size 64
+ python qdp/benchmark/benchmark_throughput.py --qubits 16 --batches 200
--batch-size 64
"""
import argparse