andygrove commented on PR #2632: URL: https://github.com/apache/datafusion-comet/pull/2632#issuecomment-3437685842
> > I don't think that we should have a combined fuzz-testing-and-tpc-benchmark tool. They serve quite different purposes. I think it would be better to move the DataFrame comparison logic into a shared class somewhere and then update our benchmarking tool to be able to use it. > > This probably means that we need to convert our benchmark script from Python to Scala. > > Another option would be to update the existing Python benchmark script to save query results to Parquet, and then implement a command-line tool for comparing the Parquet files produced from the Spark and Comet runs. I created https://github.com/apache/datafusion-comet/pull/2640 to add a new option to the benchmark script, to write query results to Parquet. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
