Re: [I] Optimize the join operators [datafusion]

via GitHub Thu, 17 Jul 2025 20:33:03 -0700


zhuqi-lucas commented on issue #16710:
URL: https://github.com/apache/datafusion/issues/16710#issuecomment-3086584860


   > > > > Updated: our benchmark is using datafusion internal source to 
benchmark instead of datafusion-python, i am not sure if it will make a 
difference.
   > > > 
   > > > 
   > > > The results are similar when running with datafusion-python as well.
   > > 
   > > 
   > > Interesting [@UBarney](https://github.com/UBarney) , so next step is to 
see if the dataset is different.
   > 
   > [@zhuqi-lucas](https://github.com/zhuqi-lucas) Agreed. The selectivity of 
the join condition on a specific dataset could affect performance. Perhaps we 
can start by comparing the number of rows in the result sets returned by these 
queries.
   
   @UBarney May be we can try to optimize based our current dataset first since 
we don't have the other dataset to compare besides our benchmark data.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [I] Optimize the join operators [datafusion]

Reply via email to