zhuqi-lucas commented on issue #16710: URL: https://github.com/apache/datafusion/issues/16710#issuecomment-3086584860
> > > > Updated: our benchmark is using datafusion internal source to benchmark instead of datafusion-python, i am not sure if it will make a difference. > > > > > > > > > The results are similar when running with datafusion-python as well. > > > > > > Interesting [@UBarney](https://github.com/UBarney) , so next step is to see if the dataset is different. > > [@zhuqi-lucas](https://github.com/zhuqi-lucas) Agreed. The selectivity of the join condition on a specific dataset could affect performance. Perhaps we can start by comparing the number of rows in the result sets returned by these queries. @UBarney May be we can try to optimize based our current dataset first since we don't have the other dataset to compare besides our benchmark data. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org