[ https://issues.apache.org/jira/browse/ARROW-8782?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
ASF GitHub Bot updated ARROW-8782: ---------------------------------- Labels: pull-request-available (was: ) > [Rust] [DataFusion] Add benchmarks based on NYC Taxi data set > ------------------------------------------------------------- > > Key: ARROW-8782 > URL: https://issues.apache.org/jira/browse/ARROW-8782 > Project: Apache Arrow > Issue Type: Improvement > Components: Rust, Rust - DataFusion > Reporter: Andy Grove > Assignee: Andy Grove > Priority: Major > Labels: pull-request-available > Fix For: 1.0.0 > > Time Spent: 10m > Remaining Estimate: 0h > > I plan on adding a new benchmarks folder beneatch the datafusion crate, > containing benchmarks based on the NYC Taxi data set. The benchmark will be a > CLI and will support running a number of different queries against CSV and > Parquet. > The README will contain instructions for downloading the data set. > The benchmark will produce CSV files containing results. > These benchmarks will allow us to manually verify performance before major > releases and on an ongoing basis as we make changes to > Arrow/Parquet/DataFusion. > I will be basing this on existing benchmarks I recently built in Ballista [1] > (I am the only contributor to these benchmarks so far). > A dockerfile will be provided, making it easy to restrict CPU and RAM when > running these benchmarks. > [1] https://github.com/ballista-compute/ballista/tree/master/rust/benchmarks > -- This message was sent by Atlassian Jira (v8.3.4#803005)