alamb commented on issue #17025:
URL: https://github.com/apache/datafusion/issues/17025#issuecomment-3151817573

   > To address this problem, i believe that add a new configuration option 
that allows users to disable metrics collection. By setting an environment 
variable or a configuration flag, users can choose to bypass the metrics system 
entirely. This change will significantly reduce the overhead associated with 
metrics, leading to improved performance for workloads involving small data 
batches or empty data sources.
   > 
   > I believe this solution offers a practical way to balance the need for 
performance with the utility of having detailed metrics, giving users the 
flexibility to optimize DataFusion for their specific use cases.
   
   I think turning off metrics might be treating the symptom of the problem 
rather than the underlying cause
   
   By default DataFusion is designed to maximize the parallelism to all cores 
-- so if you have 32 cores, each plan by default will split the data into 32 
partitions. 
   
   However, if your data is small, the overhead of splitting to enable 
parallelization can dominate the actual computation, which is what you might be 
seeing.
   
   You can find out how many cores are being used using `EXPLAIN ..` to look at 
the plan
   
   You can see how much faster/slower the plan goes if you limit it to a single 
partition via a [config 
setting](https://datafusion.apache.org/user-guide/configs.html#runtime-configuration-settings)
   
   Specifcally, set `datafusion.execution.target_partitions` to 1
   
   
   If this helps, can you please let us know -- it would be good to document 
this as you are not the first person to notice something like this


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to