westonpace opened a new pull request #12586:
URL: https://github.com/apache/arrow/pull/12586


   This PR is based on #12537 and will need to remain in draft until that is 
merged.  In addition, I'd like to fit in a few more builtin queries in this 
initial PR.
   
   This PR creates a standalone query testing executable query_tester.
   
   The tool takes a number of command line options today:
   
   ```
   Usage: query_tester [options] query 
   
   Positional arguments:
   query                name of the query to run [required]
   
   Optional arguments:
   -h --help            shows help message and exits
   -v --version         prints version information and exits
   --num-iterations     [default: 1]
   --cpu-threads        size to use for the CPU thread pool, default controlled 
by Arrow
   --io-threads         size to use for the I/O thread pool, default controlled 
by Arrow
   --validate           if set the program will validate the query results 
[default: false] (not yet implemented)
   ```
   
   The tool will first look for a Substrait query in the `queries` folder 
(there is an example of TPC-H Q1 in JSON format).  At the moment this isn't 
very useful as our Substrait support is very limited.
   
   The tool will then look for builtin queries, I'd like to add support for all 
of the TPC-H builtin queries.
   
   There is a `datasets` folder that is also created.  In the future I'd like 
to add support for downloading remote datasets to this folder.
   
   The current output simply prints a few statistics:
   
   ```
   (conbench3) pace@pace-desktop:~/dev/arrow/dev/qtester/debug-build$ 
./query_tester tpch-1 --num-iterations 10
   Average       Duration: 1.19292s (+/- 0.00569406s)
   Average Output  Rows/S: 3.35311rps
   Average Output Bytes/S: 409.08bps
   ```
   
   Note that `Output Rows/S` and `Output Bytes/S` is not very useful for TPC-H 
queries (which aggregate most of their data so they have very little output for 
the amount of work done).  I've prototyped adding a much more exhaustive 
breakdown of time spent by intercepting OT events but I'd like to save that 
work for a future PR.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to