(datafusion-site) branch asf-staging updated: Commit build products

github-bot Thu, 20 Mar 2025 13:24:30 -0700

This is an automated email from the ASF dual-hosted git repository.

github-bot pushed a commit to branch asf-staging
in repository https://gitbox.apache.org/repos/asf/datafusion-site.git



The following commit(s) were added to refs/heads/asf-staging by this push:
     new 4c90750  Commit build products
4c90750 is described below

commit 4c907500ee5c196cd6d1369dba3fef804344e08e
Author: Build Pelican (action) <priv...@infra.apache.org>
AuthorDate: Thu Mar 20 20:24:14 2025 +0000

    Commit build products
---
 blog/2025/03/20/datafusion-comet-0.7.0/index.html |   9 ++++-----
 blog/feeds/all-en.atom.xml                        |   9 ++++-----
 blog/feeds/blog.atom.xml                          |   9 ++++-----
 blog/feeds/pmc.atom.xml                           |   9 ++++-----
 blog/images/comet-0.7.0/performance.png           | Bin 0 -> 34242 bytes
 5 files changed, 16 insertions(+), 20 deletions(-)

diff --git a/blog/2025/03/20/datafusion-comet-0.7.0/index.html 
b/blog/2025/03/20/datafusion-comet-0.7.0/index.html
index f3949d7..5db2ca1 100644
--- a/blog/2025/03/20/datafusion-comet-0.7.0/index.html
+++ b/blog/2025/03/20/datafusion-comet-0.7.0/index.html
@@ -73,8 +73,8 @@ contributors. See the <a 
href="https://github.com/apache/datafusion-comet/blob/m
 <h3>Performance</h3>
 <p>Comet 0.7.0 has improved performance compared to the previous release due 
to improvements in the native shuffle 
 implementation and performance improvements in DataFusion 46.</p>
-<p>For single-node TPC-H at 100 GB, Comet now delivers a <strong>2.2x 
speedup</strong> compared to Spark using the same CPU and RAM. Even 
-with <strong>half the resources</strong>, Comet still provides a measurable 
performance improvement.</p>
+<p>For single-node TPC-H at 100 GB, Comet now delivers a <strong>greater than 
2x speedup</strong> compared to Spark using the same 
+CPU and RAM. Even with <strong>half the resources</strong>, Comet still 
provides a measurable performance improvement.</p>
 <p><img alt="Chart showing TPC-H benchmark results for Comet 0.7.0" 
class="img-responsive" src="/blog/images/comet-0.7.0/performance.png" 
width="100%"/></p>
 <p><em>These benchmarks were performed on a Linux workstation with PCIe 5, AMD 
7950X CPU (16 cores), 128 GB RAM, and data 
 stored locally in Parquet format on NVMe storage. Spark was running in 
Kubernetes with hard memory limits.</em></p>
@@ -93,9 +93,8 @@ stored locally in Parquet format on NVMe storage. Spark was 
running in Kubernete
 </ul>
 <h2>Improved Hash Join Performance</h2>
 <p>When using the <code>spark.comet.exec.replaceSortMergeJoin</code> setting 
to replace sort-merge joins with hash joins, Comet 
-will now do a better job of picking the optimal build side (thanks to 
[@hayman42] for suggesting this, and thanks to the 
-[Apache Gluten(incubating)] project for the inspiration in implementing this 
feature).</p>
-<p><a href="https://github.com/hayman42";>@hayman42</a></p>
+will now do a better job of picking the optimal build side (thanks to <a 
href="https://github.com/hayman42";>@hayman42</a> for suggesting this, and 
thanks to the 
+<a href="https://github.com/apache/incubator-gluten/";>Apache 
Gluten(incubating)</a> project for the inspiration in implementing this 
feature).</p>
 <h2>Experimental Support for DataFusion&rsquo;s DataSourceExec</h2>
 <p>It is now possible to configure Comet to use DataFusion&rsquo;s 
<code>DataSourceExec</code> instead of Comet&rsquo;s current Parquet reader. 
 Support should still be considered experimental, but most of Comet&rsquo;s 
unit tests are now passing with the new reader. 
diff --git a/blog/feeds/all-en.atom.xml b/blog/feeds/all-en.atom.xml
index 9ab59a2..01a65f5 100644
--- a/blog/feeds/all-en.atom.xml
+++ b/blog/feeds/all-en.atom.xml
@@ -50,8 +50,8 @@ contributors. See the &lt;a 
href="https://github.com/apache/datafusion-comet/blo
 &lt;h3&gt;Performance&lt;/h3&gt;
 &lt;p&gt;Comet 0.7.0 has improved performance compared to the previous release 
due to improvements in the native shuffle 
 implementation and performance improvements in DataFusion 46.&lt;/p&gt;
-&lt;p&gt;For single-node TPC-H at 100 GB, Comet now delivers a 
&lt;strong&gt;2.2x speedup&lt;/strong&gt; compared to Spark using the same CPU 
and RAM. Even 
-with &lt;strong&gt;half the resources&lt;/strong&gt;, Comet still provides a 
measurable performance improvement.&lt;/p&gt;
+&lt;p&gt;For single-node TPC-H at 100 GB, Comet now delivers a 
&lt;strong&gt;greater than 2x speedup&lt;/strong&gt; compared to Spark using 
the same 
+CPU and RAM. Even with &lt;strong&gt;half the resources&lt;/strong&gt;, Comet 
still provides a measurable performance improvement.&lt;/p&gt;
 &lt;p&gt;&lt;img alt="Chart showing TPC-H benchmark results for Comet 0.7.0" 
class="img-responsive" src="/blog/images/comet-0.7.0/performance.png" 
width="100%"/&gt;&lt;/p&gt;
 &lt;p&gt;&lt;em&gt;These benchmarks were performed on a Linux workstation with 
PCIe 5, AMD 7950X CPU (16 cores), 128 GB RAM, and data 
 stored locally in Parquet format on NVMe storage. Spark was running in 
Kubernetes with hard memory limits.&lt;/em&gt;&lt;/p&gt;
@@ -70,9 +70,8 @@ stored locally in Parquet format on NVMe storage. Spark was 
running in Kubernete
 &lt;/ul&gt;
 &lt;h2&gt;Improved Hash Join Performance&lt;/h2&gt;
 &lt;p&gt;When using the 
&lt;code&gt;spark.comet.exec.replaceSortMergeJoin&lt;/code&gt; setting to 
replace sort-merge joins with hash joins, Comet 
-will now do a better job of picking the optimal build side (thanks to 
[@hayman42] for suggesting this, and thanks to the 
-[Apache Gluten(incubating)] project for the inspiration in implementing this 
feature).&lt;/p&gt;
-&lt;p&gt;&lt;a 
href="https://github.com/hayman42"&gt;@hayman42&lt;/a&gt;&lt;/p&gt;
+will now do a better job of picking the optimal build side (thanks to &lt;a 
href="https://github.com/hayman42"&gt;@hayman42&lt;/a&gt; for suggesting this, 
and thanks to the 
+&lt;a href="https://github.com/apache/incubator-gluten/"&gt;Apache 
Gluten(incubating)&lt;/a&gt; project for the inspiration in implementing this 
feature).&lt;/p&gt;
 &lt;h2&gt;Experimental Support for DataFusion&amp;rsquo;s 
DataSourceExec&lt;/h2&gt;
 &lt;p&gt;It is now possible to configure Comet to use DataFusion&amp;rsquo;s 
&lt;code&gt;DataSourceExec&lt;/code&gt; instead of Comet&amp;rsquo;s current 
Parquet reader. 
 Support should still be considered experimental, but most of Comet&amp;rsquo;s 
unit tests are now passing with the new reader. 
diff --git a/blog/feeds/blog.atom.xml b/blog/feeds/blog.atom.xml
index a7a1e4f..bb2e0ef 100644
--- a/blog/feeds/blog.atom.xml
+++ b/blog/feeds/blog.atom.xml
@@ -50,8 +50,8 @@ contributors. See the &lt;a 
href="https://github.com/apache/datafusion-comet/blo
 &lt;h3&gt;Performance&lt;/h3&gt;
 &lt;p&gt;Comet 0.7.0 has improved performance compared to the previous release 
due to improvements in the native shuffle 
 implementation and performance improvements in DataFusion 46.&lt;/p&gt;
-&lt;p&gt;For single-node TPC-H at 100 GB, Comet now delivers a 
&lt;strong&gt;2.2x speedup&lt;/strong&gt; compared to Spark using the same CPU 
and RAM. Even 
-with &lt;strong&gt;half the resources&lt;/strong&gt;, Comet still provides a 
measurable performance improvement.&lt;/p&gt;
+&lt;p&gt;For single-node TPC-H at 100 GB, Comet now delivers a 
&lt;strong&gt;greater than 2x speedup&lt;/strong&gt; compared to Spark using 
the same 
+CPU and RAM. Even with &lt;strong&gt;half the resources&lt;/strong&gt;, Comet 
still provides a measurable performance improvement.&lt;/p&gt;
 &lt;p&gt;&lt;img alt="Chart showing TPC-H benchmark results for Comet 0.7.0" 
class="img-responsive" src="/blog/images/comet-0.7.0/performance.png" 
width="100%"/&gt;&lt;/p&gt;
 &lt;p&gt;&lt;em&gt;These benchmarks were performed on a Linux workstation with 
PCIe 5, AMD 7950X CPU (16 cores), 128 GB RAM, and data 
 stored locally in Parquet format on NVMe storage. Spark was running in 
Kubernetes with hard memory limits.&lt;/em&gt;&lt;/p&gt;
@@ -70,9 +70,8 @@ stored locally in Parquet format on NVMe storage. Spark was 
running in Kubernete
 &lt;/ul&gt;
 &lt;h2&gt;Improved Hash Join Performance&lt;/h2&gt;
 &lt;p&gt;When using the 
&lt;code&gt;spark.comet.exec.replaceSortMergeJoin&lt;/code&gt; setting to 
replace sort-merge joins with hash joins, Comet 
-will now do a better job of picking the optimal build side (thanks to 
[@hayman42] for suggesting this, and thanks to the 
-[Apache Gluten(incubating)] project for the inspiration in implementing this 
feature).&lt;/p&gt;
-&lt;p&gt;&lt;a 
href="https://github.com/hayman42"&gt;@hayman42&lt;/a&gt;&lt;/p&gt;
+will now do a better job of picking the optimal build side (thanks to &lt;a 
href="https://github.com/hayman42"&gt;@hayman42&lt;/a&gt; for suggesting this, 
and thanks to the 
+&lt;a href="https://github.com/apache/incubator-gluten/"&gt;Apache 
Gluten(incubating)&lt;/a&gt; project for the inspiration in implementing this 
feature).&lt;/p&gt;
 &lt;h2&gt;Experimental Support for DataFusion&amp;rsquo;s 
DataSourceExec&lt;/h2&gt;
 &lt;p&gt;It is now possible to configure Comet to use DataFusion&amp;rsquo;s 
&lt;code&gt;DataSourceExec&lt;/code&gt; instead of Comet&amp;rsquo;s current 
Parquet reader. 
 Support should still be considered experimental, but most of Comet&amp;rsquo;s 
unit tests are now passing with the new reader. 
diff --git a/blog/feeds/pmc.atom.xml b/blog/feeds/pmc.atom.xml
index 3ca7d1c..64778cb 100644
--- a/blog/feeds/pmc.atom.xml
+++ b/blog/feeds/pmc.atom.xml
@@ -50,8 +50,8 @@ contributors. See the &lt;a 
href="https://github.com/apache/datafusion-comet/blo
 &lt;h3&gt;Performance&lt;/h3&gt;
 &lt;p&gt;Comet 0.7.0 has improved performance compared to the previous release 
due to improvements in the native shuffle 
 implementation and performance improvements in DataFusion 46.&lt;/p&gt;
-&lt;p&gt;For single-node TPC-H at 100 GB, Comet now delivers a 
&lt;strong&gt;2.2x speedup&lt;/strong&gt; compared to Spark using the same CPU 
and RAM. Even 
-with &lt;strong&gt;half the resources&lt;/strong&gt;, Comet still provides a 
measurable performance improvement.&lt;/p&gt;
+&lt;p&gt;For single-node TPC-H at 100 GB, Comet now delivers a 
&lt;strong&gt;greater than 2x speedup&lt;/strong&gt; compared to Spark using 
the same 
+CPU and RAM. Even with &lt;strong&gt;half the resources&lt;/strong&gt;, Comet 
still provides a measurable performance improvement.&lt;/p&gt;
 &lt;p&gt;&lt;img alt="Chart showing TPC-H benchmark results for Comet 0.7.0" 
class="img-responsive" src="/blog/images/comet-0.7.0/performance.png" 
width="100%"/&gt;&lt;/p&gt;
 &lt;p&gt;&lt;em&gt;These benchmarks were performed on a Linux workstation with 
PCIe 5, AMD 7950X CPU (16 cores), 128 GB RAM, and data 
 stored locally in Parquet format on NVMe storage. Spark was running in 
Kubernetes with hard memory limits.&lt;/em&gt;&lt;/p&gt;
@@ -70,9 +70,8 @@ stored locally in Parquet format on NVMe storage. Spark was 
running in Kubernete
 &lt;/ul&gt;
 &lt;h2&gt;Improved Hash Join Performance&lt;/h2&gt;
 &lt;p&gt;When using the 
&lt;code&gt;spark.comet.exec.replaceSortMergeJoin&lt;/code&gt; setting to 
replace sort-merge joins with hash joins, Comet 
-will now do a better job of picking the optimal build side (thanks to 
[@hayman42] for suggesting this, and thanks to the 
-[Apache Gluten(incubating)] project for the inspiration in implementing this 
feature).&lt;/p&gt;
-&lt;p&gt;&lt;a 
href="https://github.com/hayman42"&gt;@hayman42&lt;/a&gt;&lt;/p&gt;
+will now do a better job of picking the optimal build side (thanks to &lt;a 
href="https://github.com/hayman42"&gt;@hayman42&lt;/a&gt; for suggesting this, 
and thanks to the 
+&lt;a href="https://github.com/apache/incubator-gluten/"&gt;Apache 
Gluten(incubating)&lt;/a&gt; project for the inspiration in implementing this 
feature).&lt;/p&gt;
 &lt;h2&gt;Experimental Support for DataFusion&amp;rsquo;s 
DataSourceExec&lt;/h2&gt;
 &lt;p&gt;It is now possible to configure Comet to use DataFusion&amp;rsquo;s 
&lt;code&gt;DataSourceExec&lt;/code&gt; instead of Comet&amp;rsquo;s current 
Parquet reader. 
 Support should still be considered experimental, but most of Comet&amp;rsquo;s 
unit tests are now passing with the new reader. 
diff --git a/blog/images/comet-0.7.0/performance.png 
b/blog/images/comet-0.7.0/performance.png
new file mode 100644
index 0000000..20b7cf7
Binary files /dev/null and b/blog/images/comet-0.7.0/performance.png differ


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@datafusion.apache.org
For additional commands, e-mail: commits-h...@datafusion.apache.org

(datafusion-site) branch asf-staging updated: Commit build products

Reply via email to