alamb commented on issue #17025: URL: https://github.com/apache/datafusion/issues/17025#issuecomment-3151817573
> To address this problem, i believe that add a new configuration option that allows users to disable metrics collection. By setting an environment variable or a configuration flag, users can choose to bypass the metrics system entirely. This change will significantly reduce the overhead associated with metrics, leading to improved performance for workloads involving small data batches or empty data sources. > > I believe this solution offers a practical way to balance the need for performance with the utility of having detailed metrics, giving users the flexibility to optimize DataFusion for their specific use cases. I think turning off metrics might be treating the symptom of the problem rather than the underlying cause By default DataFusion is designed to maximize the parallelism to all cores -- so if you have 32 cores, each plan by default will split the data into 32 partitions. However, if your data is small, the overhead of splitting to enable parallelization can dominate the actual computation, which is what you might be seeing. You can find out how many cores are being used using `EXPLAIN ..` to look at the plan You can see how much faster/slower the plan goes if you limit it to a single partition via a [config setting](https://datafusion.apache.org/user-guide/configs.html#runtime-configuration-settings) Specifcally, set `datafusion.execution.target_partitions` to 1 If this helps, can you please let us know -- it would be good to document this as you are not the first person to notice something like this -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org