[ 
https://issues.apache.org/jira/browse/ARROW-8782?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17106316#comment-17106316
 ] 

Andy Grove commented on ARROW-8782:
-----------------------------------

[~wesm] I know you have opinions on benchmarks so I wanted to make sure that 
you saw this and had the chance to comment. There are a couple of things 
specifically that I wanted your opinion on.
 # Could you confirm that it is OK to reference public data sets like this from 
the repo (we wouldn't be including any data files, just instructions on how to 
download them)
 # Do you think there is any value in the future on having common data sets 
like this that we can use across implementations to get an idea of comparative 
performance and to have examples that are similar between implementations?

> [Rust] [DataFusion] Add benchmarks based on NYC Taxi data set
> -------------------------------------------------------------
>
>                 Key: ARROW-8782
>                 URL: https://issues.apache.org/jira/browse/ARROW-8782
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: Rust, Rust - DataFusion
>            Reporter: Andy Grove
>            Assignee: Andy Grove
>            Priority: Major
>             Fix For: 1.0.0
>
>
> I plan on adding a new benchmarks folder beneatch the datafusion crate, 
> containing benchmarks based on the NYC Taxi data set. The benchmark will be a 
> CLI and will support running a number of different queries against CSV and 
> Parquet.
> The README will contain instructions for downloading the data set.
> The benchmark will produce CSV files containing results.
> These benchmarks will allow us to manually verify performance before major 
> releases and on an ongoing basis as we make changes to 
> Arrow/Parquet/DataFusion.
> I will be basing this on existing benchmarks I recently built in Ballista [1] 
> (I am the only contributor to these benchmarks so far).
> A dockerfile will be provided, making it easy to restrict CPU and RAM when 
> running these benchmarks.
> [1] https://github.com/ballista-compute/ballista/tree/master/rust/benchmarks
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to