This is an automated email from the ASF dual-hosted git repository. github-bot pushed a commit to branch asf-staging in repository https://gitbox.apache.org/repos/asf/datafusion-site.git
The following commit(s) were added to refs/heads/asf-staging by this push: new 4c90750 Commit build products 4c90750 is described below commit 4c907500ee5c196cd6d1369dba3fef804344e08e Author: Build Pelican (action) <priv...@infra.apache.org> AuthorDate: Thu Mar 20 20:24:14 2025 +0000 Commit build products --- blog/2025/03/20/datafusion-comet-0.7.0/index.html | 9 ++++----- blog/feeds/all-en.atom.xml | 9 ++++----- blog/feeds/blog.atom.xml | 9 ++++----- blog/feeds/pmc.atom.xml | 9 ++++----- blog/images/comet-0.7.0/performance.png | Bin 0 -> 34242 bytes 5 files changed, 16 insertions(+), 20 deletions(-) diff --git a/blog/2025/03/20/datafusion-comet-0.7.0/index.html b/blog/2025/03/20/datafusion-comet-0.7.0/index.html index f3949d7..5db2ca1 100644 --- a/blog/2025/03/20/datafusion-comet-0.7.0/index.html +++ b/blog/2025/03/20/datafusion-comet-0.7.0/index.html @@ -73,8 +73,8 @@ contributors. See the <a href="https://github.com/apache/datafusion-comet/blob/m <h3>Performance</h3> <p>Comet 0.7.0 has improved performance compared to the previous release due to improvements in the native shuffle implementation and performance improvements in DataFusion 46.</p> -<p>For single-node TPC-H at 100 GB, Comet now delivers a <strong>2.2x speedup</strong> compared to Spark using the same CPU and RAM. Even -with <strong>half the resources</strong>, Comet still provides a measurable performance improvement.</p> +<p>For single-node TPC-H at 100 GB, Comet now delivers a <strong>greater than 2x speedup</strong> compared to Spark using the same +CPU and RAM. Even with <strong>half the resources</strong>, Comet still provides a measurable performance improvement.</p> <p><img alt="Chart showing TPC-H benchmark results for Comet 0.7.0" class="img-responsive" src="/blog/images/comet-0.7.0/performance.png" width="100%"/></p> <p><em>These benchmarks were performed on a Linux workstation with PCIe 5, AMD 7950X CPU (16 cores), 128 GB RAM, and data stored locally in Parquet format on NVMe storage. Spark was running in Kubernetes with hard memory limits.</em></p> @@ -93,9 +93,8 @@ stored locally in Parquet format on NVMe storage. Spark was running in Kubernete </ul> <h2>Improved Hash Join Performance</h2> <p>When using the <code>spark.comet.exec.replaceSortMergeJoin</code> setting to replace sort-merge joins with hash joins, Comet -will now do a better job of picking the optimal build side (thanks to [@hayman42] for suggesting this, and thanks to the -[Apache Gluten(incubating)] project for the inspiration in implementing this feature).</p> -<p><a href="https://github.com/hayman42">@hayman42</a></p> +will now do a better job of picking the optimal build side (thanks to <a href="https://github.com/hayman42">@hayman42</a> for suggesting this, and thanks to the +<a href="https://github.com/apache/incubator-gluten/">Apache Gluten(incubating)</a> project for the inspiration in implementing this feature).</p> <h2>Experimental Support for DataFusion’s DataSourceExec</h2> <p>It is now possible to configure Comet to use DataFusion’s <code>DataSourceExec</code> instead of Comet’s current Parquet reader. Support should still be considered experimental, but most of Comet’s unit tests are now passing with the new reader. diff --git a/blog/feeds/all-en.atom.xml b/blog/feeds/all-en.atom.xml index 9ab59a2..01a65f5 100644 --- a/blog/feeds/all-en.atom.xml +++ b/blog/feeds/all-en.atom.xml @@ -50,8 +50,8 @@ contributors. See the <a href="https://github.com/apache/datafusion-comet/blo <h3>Performance</h3> <p>Comet 0.7.0 has improved performance compared to the previous release due to improvements in the native shuffle implementation and performance improvements in DataFusion 46.</p> -<p>For single-node TPC-H at 100 GB, Comet now delivers a <strong>2.2x speedup</strong> compared to Spark using the same CPU and RAM. Even -with <strong>half the resources</strong>, Comet still provides a measurable performance improvement.</p> +<p>For single-node TPC-H at 100 GB, Comet now delivers a <strong>greater than 2x speedup</strong> compared to Spark using the same +CPU and RAM. Even with <strong>half the resources</strong>, Comet still provides a measurable performance improvement.</p> <p><img alt="Chart showing TPC-H benchmark results for Comet 0.7.0" class="img-responsive" src="/blog/images/comet-0.7.0/performance.png" width="100%"/></p> <p><em>These benchmarks were performed on a Linux workstation with PCIe 5, AMD 7950X CPU (16 cores), 128 GB RAM, and data stored locally in Parquet format on NVMe storage. Spark was running in Kubernetes with hard memory limits.</em></p> @@ -70,9 +70,8 @@ stored locally in Parquet format on NVMe storage. Spark was running in Kubernete </ul> <h2>Improved Hash Join Performance</h2> <p>When using the <code>spark.comet.exec.replaceSortMergeJoin</code> setting to replace sort-merge joins with hash joins, Comet -will now do a better job of picking the optimal build side (thanks to [@hayman42] for suggesting this, and thanks to the -[Apache Gluten(incubating)] project for the inspiration in implementing this feature).</p> -<p><a href="https://github.com/hayman42">@hayman42</a></p> +will now do a better job of picking the optimal build side (thanks to <a href="https://github.com/hayman42">@hayman42</a> for suggesting this, and thanks to the +<a href="https://github.com/apache/incubator-gluten/">Apache Gluten(incubating)</a> project for the inspiration in implementing this feature).</p> <h2>Experimental Support for DataFusion&rsquo;s DataSourceExec</h2> <p>It is now possible to configure Comet to use DataFusion&rsquo;s <code>DataSourceExec</code> instead of Comet&rsquo;s current Parquet reader. Support should still be considered experimental, but most of Comet&rsquo;s unit tests are now passing with the new reader. diff --git a/blog/feeds/blog.atom.xml b/blog/feeds/blog.atom.xml index a7a1e4f..bb2e0ef 100644 --- a/blog/feeds/blog.atom.xml +++ b/blog/feeds/blog.atom.xml @@ -50,8 +50,8 @@ contributors. See the <a href="https://github.com/apache/datafusion-comet/blo <h3>Performance</h3> <p>Comet 0.7.0 has improved performance compared to the previous release due to improvements in the native shuffle implementation and performance improvements in DataFusion 46.</p> -<p>For single-node TPC-H at 100 GB, Comet now delivers a <strong>2.2x speedup</strong> compared to Spark using the same CPU and RAM. Even -with <strong>half the resources</strong>, Comet still provides a measurable performance improvement.</p> +<p>For single-node TPC-H at 100 GB, Comet now delivers a <strong>greater than 2x speedup</strong> compared to Spark using the same +CPU and RAM. Even with <strong>half the resources</strong>, Comet still provides a measurable performance improvement.</p> <p><img alt="Chart showing TPC-H benchmark results for Comet 0.7.0" class="img-responsive" src="/blog/images/comet-0.7.0/performance.png" width="100%"/></p> <p><em>These benchmarks were performed on a Linux workstation with PCIe 5, AMD 7950X CPU (16 cores), 128 GB RAM, and data stored locally in Parquet format on NVMe storage. Spark was running in Kubernetes with hard memory limits.</em></p> @@ -70,9 +70,8 @@ stored locally in Parquet format on NVMe storage. Spark was running in Kubernete </ul> <h2>Improved Hash Join Performance</h2> <p>When using the <code>spark.comet.exec.replaceSortMergeJoin</code> setting to replace sort-merge joins with hash joins, Comet -will now do a better job of picking the optimal build side (thanks to [@hayman42] for suggesting this, and thanks to the -[Apache Gluten(incubating)] project for the inspiration in implementing this feature).</p> -<p><a href="https://github.com/hayman42">@hayman42</a></p> +will now do a better job of picking the optimal build side (thanks to <a href="https://github.com/hayman42">@hayman42</a> for suggesting this, and thanks to the +<a href="https://github.com/apache/incubator-gluten/">Apache Gluten(incubating)</a> project for the inspiration in implementing this feature).</p> <h2>Experimental Support for DataFusion&rsquo;s DataSourceExec</h2> <p>It is now possible to configure Comet to use DataFusion&rsquo;s <code>DataSourceExec</code> instead of Comet&rsquo;s current Parquet reader. Support should still be considered experimental, but most of Comet&rsquo;s unit tests are now passing with the new reader. diff --git a/blog/feeds/pmc.atom.xml b/blog/feeds/pmc.atom.xml index 3ca7d1c..64778cb 100644 --- a/blog/feeds/pmc.atom.xml +++ b/blog/feeds/pmc.atom.xml @@ -50,8 +50,8 @@ contributors. See the <a href="https://github.com/apache/datafusion-comet/blo <h3>Performance</h3> <p>Comet 0.7.0 has improved performance compared to the previous release due to improvements in the native shuffle implementation and performance improvements in DataFusion 46.</p> -<p>For single-node TPC-H at 100 GB, Comet now delivers a <strong>2.2x speedup</strong> compared to Spark using the same CPU and RAM. Even -with <strong>half the resources</strong>, Comet still provides a measurable performance improvement.</p> +<p>For single-node TPC-H at 100 GB, Comet now delivers a <strong>greater than 2x speedup</strong> compared to Spark using the same +CPU and RAM. Even with <strong>half the resources</strong>, Comet still provides a measurable performance improvement.</p> <p><img alt="Chart showing TPC-H benchmark results for Comet 0.7.0" class="img-responsive" src="/blog/images/comet-0.7.0/performance.png" width="100%"/></p> <p><em>These benchmarks were performed on a Linux workstation with PCIe 5, AMD 7950X CPU (16 cores), 128 GB RAM, and data stored locally in Parquet format on NVMe storage. Spark was running in Kubernetes with hard memory limits.</em></p> @@ -70,9 +70,8 @@ stored locally in Parquet format on NVMe storage. Spark was running in Kubernete </ul> <h2>Improved Hash Join Performance</h2> <p>When using the <code>spark.comet.exec.replaceSortMergeJoin</code> setting to replace sort-merge joins with hash joins, Comet -will now do a better job of picking the optimal build side (thanks to [@hayman42] for suggesting this, and thanks to the -[Apache Gluten(incubating)] project for the inspiration in implementing this feature).</p> -<p><a href="https://github.com/hayman42">@hayman42</a></p> +will now do a better job of picking the optimal build side (thanks to <a href="https://github.com/hayman42">@hayman42</a> for suggesting this, and thanks to the +<a href="https://github.com/apache/incubator-gluten/">Apache Gluten(incubating)</a> project for the inspiration in implementing this feature).</p> <h2>Experimental Support for DataFusion&rsquo;s DataSourceExec</h2> <p>It is now possible to configure Comet to use DataFusion&rsquo;s <code>DataSourceExec</code> instead of Comet&rsquo;s current Parquet reader. Support should still be considered experimental, but most of Comet&rsquo;s unit tests are now passing with the new reader. diff --git a/blog/images/comet-0.7.0/performance.png b/blog/images/comet-0.7.0/performance.png new file mode 100644 index 0000000..20b7cf7 Binary files /dev/null and b/blog/images/comet-0.7.0/performance.png differ --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@datafusion.apache.org For additional commands, e-mail: commits-h...@datafusion.apache.org