This is an automated email from the ASF dual-hosted git repository.
github-bot pushed a commit to branch asf-staging
in repository https://gitbox.apache.org/repos/asf/datafusion-site.git
The following commit(s) were added to refs/heads/asf-staging by this push:
new 89cceee Commit build products
89cceee is described below
commit 89cceee21ace3105f682d2f98e79845ffa01aa91
Author: Build Pelican (action) <[email protected]>
AuthorDate: Thu Nov 20 15:38:04 2025 +0000
Commit build products
---
blog/2025/11/25/datafusion-51.0.0/index.html | 27 ++++++++++++++-------------
blog/author/pmc.html | 3 +--
blog/category/blog.html | 3 +--
blog/feed.xml | 3 +--
blog/feeds/all-en.atom.xml | 26 +++++++++++++-------------
blog/feeds/blog.atom.xml | 26 +++++++++++++-------------
blog/feeds/pmc.atom.xml | 26 +++++++++++++-------------
blog/feeds/pmc.rss.xml | 3 +--
blog/index.html | 3 +--
9 files changed, 58 insertions(+), 62 deletions(-)
diff --git a/blog/2025/11/25/datafusion-51.0.0/index.html
b/blog/2025/11/25/datafusion-51.0.0/index.html
index ac5a65d..4b4de5d 100644
--- a/blog/2025/11/25/datafusion-51.0.0/index.html
+++ b/blog/2025/11/25/datafusion-51.0.0/index.html
@@ -59,7 +59,7 @@
<li><a href="#io-profiling-in-datafusion-cli">I/O Profiling in
datafusion-cli</a></li>
<li><a href="#describe-query">DESCRIBE <query></a></li>
<li><a href="#named-arguments-in-sql-functions">Named arguments in SQL
functions</a></li>
-<li><a href="#metrics-improvement">Metrics improvement</a></li>
+<li><a href="#metrics-improvements">Metrics improvements</a></li>
</ul>
</li>
<li><a href="#upgrade-guide-and-changelog">Upgrade Guide and Changelog</a></li>
@@ -94,6 +94,8 @@ some of the major improvements since <a
href="https://datafusion.apache.org/blog
changes is available in the <a
href="https://github.com/apache/datafusion/blob/branch-51/dev/changelog/51.0.0.md">changelog</a>.
Thanks to the <a
href="https://github.com/apache/datafusion/blob/branch-51/dev/changelog/51.0.0.md#credits">128
contributors</a> for
making this release possible.</p>
<h2 id="performance-improvements">Performance Improvements 🚀<a
class="headerlink" href="#performance-improvements" title="Permanent
link">¶</a></h2>
+<p>We continue to make significant performance improvements in DataFusion,
both in
+the core engine and in the Parquet reader.</p>
<p><img alt="Performance over time" class="img-responsive"
src="/blog/images/datafusion-51.0.0/performance_over_time_clickbench.png"
width="100%"/></p>
<p><strong>Figure 1</strong>: Average and median normalized query execution
times for ClickBench queries for DataFusion 51.0.0 compared to previous
releases.
Query times are normalized using the ClickBench definition. See the
@@ -102,10 +104,10 @@ for more details.</p>
<h3 id="faster-case-expression-evaluation">Faster <code>CASE</code> expression
evaluation<a class="headerlink" href="#faster-case-expression-evaluation"
title="Permanent link">¶</a></h3>
<p>This release builds on the <a
href="https://github.com/apache/datafusion/issues/18075">CASE performance
epic</a> with significant improvements.
Expressions short‑circuit earlier, reuse partial results, and avoid unnecessary
-scattering, speeding up common ETL patterns. Thanks to <a
href="https://github.com/pepijnve">pepijnve</a>, <a
href="https://github.com/chenkovsky">chenkovsky</a>
+scattering, speeding up common ETL patterns. Thanks to <a
href="https://github.com/pepijnve">pepijnve</a>, <a
href="https://github.com/chenkovsky">chenkovsky</a>,
and <a href="https://github.com/petern48">petern48</a> for leading this
effort. We hope to share more details on our
implementation in a future post.</p>
-<p><strong>Fewer object store round-trips for Parquet by Default</strong></p>
+<p><strong>Fewer object store round-trips for Parquet by default</strong></p>
<p>DataFusion now sets a default <code>metadata_size_hint</code> for <a
href="https://parquet.apache.org/">Apache Parquet</a> scans
(<a href="https://github.com/apache/datafusion/issues/18118">#18118</a>),
avoiding the extra
“last 8‑byte” request many clouds require to read file footers. Remote scans
@@ -113,16 +115,15 @@ typically drop from five requests to four per file,
cutting latency and transfer
costs without any application changes. Thanks to <a
href="https://github.com/zhuqi-lucas">zhuqi-lucas</a> for leading this
effort.</p>
<h3 id="faster-parquet-metadata-parsing">Faster Parquet metadata parsing<a
class="headerlink" href="#faster-parquet-metadata-parsing" title="Permanent
link">¶</a></h3>
-<p>DataFusion 51 also includes the latest Parquet reader from
-<a href="https://arrow.apache.org/blog/2025/10/30/arrow-rs-57.0.0/">Arrow Rust
57.0.0</a>, which is significantly faster parsing Parquet metadata. This is
+<p>DataFusion 51 also includes the latest Parquet reader from
+<a href="https://arrow.apache.org/blog/2025/10/30/arrow-rs-57.0.0/">Arrow Rust
57.0.0</a>, which parses Parquet metadata significantly faster. This is
especially beneficial for workloads with many small Parquet files and scenarios
where startup time or low latency is important. You can read more about the
upstream work by
-<a href="https://github.com/etseidl">etseidl</a> and <a
href="https://github.com/jhorstmann">jhorstmann</a> that enabled these
improvements
-in the <a
href="https://arrow.apache.org/blog/2025/10/23/rust-parquet-metadata/">Faster
Apache Parquet Footer Metadata Using a Custom Thrift Parser</a> blog.</p>
+<a href="https://github.com/etseidl">etseidl</a> and <a
href="https://github.com/jhorstmann">jhorstmann</a> that enabled these
improvements in the <a
href="https://arrow.apache.org/blog/2025/10/23/rust-parquet-metadata/">Faster
Apache Parquet Footer Metadata Using a Custom Thrift Parser</a> blog.</p>
<p><img alt="Metadata Parsing Performance Improvements in Arrow/Parquet 57"
class="img-responsive"
src="/blog/images/datafusion-51.0.0/arrow-57-metadata-parsing.png"
width="100%"/></p>
<p><strong>Figure 2</strong>: Metadata parsing performance improvements in
Arrow/Parquet 57.0.0. </p>
<h3 id="better-defaults-for-remote-parquet-reads">Better Defaults for Remote
Parquet Reads<a class="headerlink"
href="#better-defaults-for-remote-parquet-reads" title="Permanent
link">¶</a></h3>
-<p>DataFusion by default now fetches the last 512KB (configurable) of Parquet
files
+<p>By default, DataFusion now fetches the last 512KB (configurable) of Parquet
files
so the first request usually includes the full footer (<a
href="https://github.com/apache/datafusion/issues/18118">#18118</a>). This will
typically avoid two distinct I/O requests for each Parquet file. While this
setting has existed in DataFusion for many years, it was not previously enabled
@@ -212,10 +213,10 @@ functions benefit from the same syntax. Thanks to <a
href="https://github.com/ti
<p>For example, you can pass arguments to functions like this:</p>
<pre><code class="language-sql">SELECT power(exponent => 3.0, base =>
2.0);
</code></pre>
-<h3 id="metrics-improvement">Metrics improvement<a class="headerlink"
href="#metrics-improvement" title="Permanent link">¶</a></h3>
+<h3 id="metrics-improvements">Metrics improvements<a class="headerlink"
href="#metrics-improvements" title="Permanent link">¶</a></h3>
<p>The output of <a
href="https://datafusion.apache.org/user-guide/sql/explain.html#explain-analyze">EXPLAIN
ANALYZE</a> has been improved to include more metrics
-about execution time and memory usage of each operator (<a
href="https://github.com/apache/datafusion/issues/18217">#18217</a>).
-You can find more about these new metrics in the <a
href="https://datafusion.apache.org/user-guide/metrics.html">metrics user
guide</a>. Thanks to
+about execution time and memory usage of each operator (<a
href="https://github.com/apache/datafusion/issues/18217">#18217</a>).
+You can learn more about these new metrics in the <a
href="https://datafusion.apache.org/user-guide/metrics.html">metrics user
guide</a>. Thanks to
<a href="https://github.com/2010YOUY01">2010YOUY01</a> for leading this
effort.</p>
<p>The <code>51.0.0</code> release adds:</p>
<ul>
@@ -228,7 +229,7 @@ You can find more about these new metrics in the <a
href="https://datafusion.apa
<li><strong>NestedLoopJoinExec</strong>: adds a <code>selectivity</code>
metric (<code>output_rows / (left_rows * right_rows)</code>) to show how many
combinations actually pass the join condition.</li>
<li>Several display formatting improvements were added to make <code>EXPLAIN
ANALYZE</code> output easier to read.</li>
</ul>
-<p>For example, the following query</p>
+<p>For example, the following query:</p>
<pre><code class="language-sql">set datafusion.explain.analyze_level = summary
explain analyze
@@ -323,7 +324,7 @@ can find out how to reach us on the <a
href="https://datafusion.apache.org/contr
<li><a href="#io-profiling-in-datafusion-cli">I/O Profiling in
datafusion-cli</a></li>
<li><a href="#describe-query">DESCRIBE <query></a></li>
<li><a href="#named-arguments-in-sql-functions">Named arguments in SQL
functions</a></li>
-<li><a href="#metrics-improvement">Metrics improvement</a></li>
+<li><a href="#metrics-improvements">Metrics improvements</a></li>
</ul>
</li>
<li><a href="#upgrade-guide-and-changelog">Upgrade Guide and Changelog</a></li>
diff --git a/blog/author/pmc.html b/blog/author/pmc.html
index 9a8aa92..17352bd 100644
--- a/blog/author/pmc.html
+++ b/blog/author/pmc.html
@@ -53,8 +53,7 @@ some of the major improvements since <a
href="https://datafusion.apache.org/blog
changes is available in the <a
href="https://github.com/apache/datafusion/blob/branch-51/dev/changelog/51.0.0.md">changelog</a>.
Thanks to the <a
href="https://github.com/apache/datafusion/blob/branch-51/dev/changelog/51.0.0.md#credits">128
contributors</a> for
making this release possible.</p>
<h2 id="performance-improvements">Performance Improvements 🚀<a
class="headerlink" href="#performance-improvements" title="Permanent
link">¶</a></h2>
-<p><img alt="Performance over time" class="img-responsive"
src="/blog/images/datafusion-51.0.0/performance_over_time_clickbench.png"
width="100%"/></p>
-<p><strong>Figure 1 …</strong></p> </div><!-- /.entry-content -->
+<p>We continue …</p> </div><!-- /.entry-content -->
</article></li>
<li><article class="hentry">
<header> <h2 class="entry-title"><a
href="https://datafusion.apache.org/blog/2025/10/21/datafusion-comet-0.11.0"
rel="bookmark" title="Permalink to Apache DataFusion Comet 0.11.0
Release">Apache DataFusion Comet 0.11.0 Release</a></h2> </header>
diff --git a/blog/category/blog.html b/blog/category/blog.html
index 065b364..ca7402f 100644
--- a/blog/category/blog.html
+++ b/blog/category/blog.html
@@ -54,8 +54,7 @@ some of the major improvements since <a
href="https://datafusion.apache.org/blog
changes is available in the <a
href="https://github.com/apache/datafusion/blob/branch-51/dev/changelog/51.0.0.md">changelog</a>.
Thanks to the <a
href="https://github.com/apache/datafusion/blob/branch-51/dev/changelog/51.0.0.md#credits">128
contributors</a> for
making this release possible.</p>
<h2 id="performance-improvements">Performance Improvements 🚀<a
class="headerlink" href="#performance-improvements" title="Permanent
link">¶</a></h2>
-<p><img alt="Performance over time" class="img-responsive"
src="/blog/images/datafusion-51.0.0/performance_over_time_clickbench.png"
width="100%"/></p>
-<p><strong>Figure 1 …</strong></p> </div><!-- /.entry-content -->
+<p>We continue …</p> </div><!-- /.entry-content -->
</article></li>
<li><article class="hentry">
<header> <h2 class="entry-title"><a
href="https://datafusion.apache.org/blog/2025/10/21/datafusion-comet-0.11.0"
rel="bookmark" title="Permalink to Apache DataFusion Comet 0.11.0
Release">Apache DataFusion Comet 0.11.0 Release</a></h2> </header>
diff --git a/blog/feed.xml b/blog/feed.xml
index 9d01a04..9402597 100644
--- a/blog/feed.xml
+++ b/blog/feed.xml
@@ -24,8 +24,7 @@ some of the major improvements since <a
href="https://datafusion.apache.org/b
changes is available in the <a
href="https://github.com/apache/datafusion/blob/branch-51/dev/changelog/51.0.0.md">changelog</a>.
Thanks to the <a
href="https://github.com/apache/datafusion/blob/branch-51/dev/changelog/51.0.0.md#credits">128
contributors</a> for
making this release possible.</p>
<h2 id="performance-improvements">Performance Improvements 🚀<a
class="headerlink" href="#performance-improvements" title="Permanent
link">¶</a></h2>
-<p><img alt="Performance over time" class="img-responsive"
src="/blog/images/datafusion-51.0.0/performance_over_time_clickbench.png"
width="100%"/></p>
-<p><strong>Figure 1
…</strong></p></description><dc:creator
xmlns:dc="http://purl.org/dc/elements/1.1/">pmc</dc:creator><pubDate>Tue, 25
Nov 2025 00:00:00 +0000</pubDate><guid
isPermaLink="false">tag:datafusion.apache.org,2025-11-25:/blog/2025/11/25/datafusion-51.0.0</guid><category>blog</category></item><item><title>Apache
DataFusion Comet 0.11.0
Release</title><link>https://datafusion.apache.org/blog/2025/10/21/datafusion-comet-0.11.0</link><description><!--
+<p>We continue …</p></description><dc:creator
xmlns:dc="http://purl.org/dc/elements/1.1/">pmc</dc:creator><pubDate>Tue, 25
Nov 2025 00:00:00 +0000</pubDate><guid
isPermaLink="false">tag:datafusion.apache.org,2025-11-25:/blog/2025/11/25/datafusion-51.0.0</guid><category>blog</category></item><item><title>Apache
DataFusion Comet 0.11.0
Release</title><link>https://datafusion.apache.org/blog/2025/10/21/datafusion-comet-0.11.0</link><description><!--
{% comment %}
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
diff --git a/blog/feeds/all-en.atom.xml b/blog/feeds/all-en.atom.xml
index c4afbbf..482cdf5 100644
--- a/blog/feeds/all-en.atom.xml
+++ b/blog/feeds/all-en.atom.xml
@@ -24,8 +24,7 @@ some of the major improvements since <a
href="https://datafusion.apache.org/b
changes is available in the <a
href="https://github.com/apache/datafusion/blob/branch-51/dev/changelog/51.0.0.md">changelog</a>.
Thanks to the <a
href="https://github.com/apache/datafusion/blob/branch-51/dev/changelog/51.0.0.md#credits">128
contributors</a> for
making this release possible.</p>
<h2 id="performance-improvements">Performance Improvements 🚀<a
class="headerlink" href="#performance-improvements" title="Permanent
link">¶</a></h2>
-<p><img alt="Performance over time" class="img-responsive"
src="/blog/images/datafusion-51.0.0/performance_over_time_clickbench.png"
width="100%"/></p>
-<p><strong>Figure 1 …</strong></p></summary><content
type="html"><!--
+<p>We continue …</p></summary><content type="html"><!--
{% comment %}
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
@@ -50,6 +49,8 @@ some of the major improvements since <a
href="https://datafusion.apache.org/b
changes is available in the <a
href="https://github.com/apache/datafusion/blob/branch-51/dev/changelog/51.0.0.md">changelog</a>.
Thanks to the <a
href="https://github.com/apache/datafusion/blob/branch-51/dev/changelog/51.0.0.md#credits">128
contributors</a> for
making this release possible.</p>
<h2 id="performance-improvements">Performance Improvements 🚀<a
class="headerlink" href="#performance-improvements" title="Permanent
link">¶</a></h2>
+<p>We continue to make significant performance improvements in
DataFusion, both in
+the core engine and in the Parquet reader.</p>
<p><img alt="Performance over time" class="img-responsive"
src="/blog/images/datafusion-51.0.0/performance_over_time_clickbench.png"
width="100%"/></p>
<p><strong>Figure 1</strong>: Average and median normalized
query execution times for ClickBench queries for DataFusion 51.0.0 compared to
previous releases.
Query times are normalized using the ClickBench definition. See the
@@ -58,10 +59,10 @@ for more details.</p>
<h3 id="faster-case-expression-evaluation">Faster
<code>CASE</code> expression evaluation<a class="headerlink"
href="#faster-case-expression-evaluation" title="Permanent
link">¶</a></h3>
<p>This release builds on the <a
href="https://github.com/apache/datafusion/issues/18075">CASE performance
epic</a> with significant improvements.
Expressions short‑circuit earlier, reuse partial results, and avoid unnecessary
-scattering, speeding up common ETL patterns. Thanks to <a
href="https://github.com/pepijnve">pepijnve</a>, <a
href="https://github.com/chenkovsky">chenkovsky</a>
+scattering, speeding up common ETL patterns. Thanks to <a
href="https://github.com/pepijnve">pepijnve</a>, <a
href="https://github.com/chenkovsky">chenkovsky</a>,
and <a href="https://github.com/petern48">petern48</a> for leading
this effort. We hope to share more details on our
implementation in a future post.</p>
-<p><strong>Fewer object store round-trips for Parquet by
Default</strong></p>
+<p><strong>Fewer object store round-trips for Parquet by
default</strong></p>
<p>DataFusion now sets a default
<code>metadata_size_hint</code> for <a
href="https://parquet.apache.org/">Apache Parquet</a> scans
(<a
href="https://github.com/apache/datafusion/issues/18118">#18118</a>),
avoiding the extra
“last 8‑byte” request many clouds require to read file footers. Remote scans
@@ -69,16 +70,15 @@ typically drop from five requests to four per file, cutting
latency and transfer
costs without any application changes. Thanks to <a
href="https://github.com/zhuqi-lucas">zhuqi-lucas</a> for leading this
effort.</p>
<h3 id="faster-parquet-metadata-parsing">Faster Parquet metadata
parsing<a class="headerlink" href="#faster-parquet-metadata-parsing"
title="Permanent link">¶</a></h3>
-<p>DataFusion 51 also includes the latest Parquet reader from
-<a
href="https://arrow.apache.org/blog/2025/10/30/arrow-rs-57.0.0/">Arrow Rust
57.0.0</a>, which is significantly faster parsing Parquet metadata. This
is
+<p>DataFusion 51 also includes the latest Parquet reader from
+<a
href="https://arrow.apache.org/blog/2025/10/30/arrow-rs-57.0.0/">Arrow Rust
57.0.0</a>, which parses Parquet metadata significantly faster. This is
especially beneficial for workloads with many small Parquet files and scenarios
where startup time or low latency is important. You can read more about the
upstream work by
-<a href="https://github.com/etseidl">etseidl</a> and <a
href="https://github.com/jhorstmann">jhorstmann</a> that enabled these
improvements
-in the <a
href="https://arrow.apache.org/blog/2025/10/23/rust-parquet-metadata/">Faster
Apache Parquet Footer Metadata Using a Custom Thrift Parser</a>
blog.</p>
+<a href="https://github.com/etseidl">etseidl</a> and <a
href="https://github.com/jhorstmann">jhorstmann</a> that enabled these
improvements in the <a
href="https://arrow.apache.org/blog/2025/10/23/rust-parquet-metadata/">Faster
Apache Parquet Footer Metadata Using a Custom Thrift Parser</a>
blog.</p>
<p><img alt="Metadata Parsing Performance Improvements in
Arrow/Parquet 57" class="img-responsive"
src="/blog/images/datafusion-51.0.0/arrow-57-metadata-parsing.png"
width="100%"/></p>
<p><strong>Figure 2</strong>: Metadata parsing performance
improvements in Arrow/Parquet 57.0.0. </p>
<h3 id="better-defaults-for-remote-parquet-reads">Better Defaults for
Remote Parquet Reads<a class="headerlink"
href="#better-defaults-for-remote-parquet-reads" title="Permanent
link">¶</a></h3>
-<p>DataFusion by default now fetches the last 512KB (configurable) of
Parquet files
+<p>By default, DataFusion now fetches the last 512KB (configurable) of
Parquet files
so the first request usually includes the full footer (<a
href="https://github.com/apache/datafusion/issues/18118">#18118</a>).
This will
typically avoid two distinct I/O requests for each Parquet file. While this
setting has existed in DataFusion for many years, it was not previously enabled
@@ -168,10 +168,10 @@ functions benefit from the same syntax. Thanks to <a
href="https://github.com
<p>For example, you can pass arguments to functions like this:</p>
<pre><code class="language-sql">SELECT power(exponent =&gt;
3.0, base =&gt; 2.0);
</code></pre>
-<h3 id="metrics-improvement">Metrics improvement<a class="headerlink"
href="#metrics-improvement" title="Permanent link">¶</a></h3>
+<h3 id="metrics-improvements">Metrics improvements<a
class="headerlink" href="#metrics-improvements" title="Permanent
link">¶</a></h3>
<p>The output of <a
href="https://datafusion.apache.org/user-guide/sql/explain.html#explain-analyze">EXPLAIN
ANALYZE</a> has been improved to include more metrics
-about execution time and memory usage of each operator (<a
href="https://github.com/apache/datafusion/issues/18217">#18217</a>).
-You can find more about these new metrics in the <a
href="https://datafusion.apache.org/user-guide/metrics.html">metrics user
guide</a>. Thanks to
+about execution time and memory usage of each operator (<a
href="https://github.com/apache/datafusion/issues/18217">#18217</a>).
+You can learn more about these new metrics in the <a
href="https://datafusion.apache.org/user-guide/metrics.html">metrics user
guide</a>. Thanks to
<a href="https://github.com/2010YOUY01">2010YOUY01</a> for leading
this effort.</p>
<p>The <code>51.0.0</code> release adds:</p>
<ul>
@@ -184,7 +184,7 @@ You can find more about these new metrics in the <a
href="https://datafusion.
<li><strong>NestedLoopJoinExec</strong>: adds a
<code>selectivity</code> metric (<code>output_rows /
(left_rows * right_rows)</code>) to show how many combinations actually
pass the join condition.</li>
<li>Several display formatting improvements were added to make
<code>EXPLAIN ANALYZE</code> output easier to read.</li>
</ul>
-<p>For example, the following query</p>
+<p>For example, the following query:</p>
<pre><code class="language-sql">set
datafusion.explain.analyze_level = summary
explain analyze
diff --git a/blog/feeds/blog.atom.xml b/blog/feeds/blog.atom.xml
index ce68dfc..8ae26c1 100644
--- a/blog/feeds/blog.atom.xml
+++ b/blog/feeds/blog.atom.xml
@@ -24,8 +24,7 @@ some of the major improvements since <a
href="https://datafusion.apache.org/b
changes is available in the <a
href="https://github.com/apache/datafusion/blob/branch-51/dev/changelog/51.0.0.md">changelog</a>.
Thanks to the <a
href="https://github.com/apache/datafusion/blob/branch-51/dev/changelog/51.0.0.md#credits">128
contributors</a> for
making this release possible.</p>
<h2 id="performance-improvements">Performance Improvements 🚀<a
class="headerlink" href="#performance-improvements" title="Permanent
link">¶</a></h2>
-<p><img alt="Performance over time" class="img-responsive"
src="/blog/images/datafusion-51.0.0/performance_over_time_clickbench.png"
width="100%"/></p>
-<p><strong>Figure 1 …</strong></p></summary><content
type="html"><!--
+<p>We continue …</p></summary><content type="html"><!--
{% comment %}
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
@@ -50,6 +49,8 @@ some of the major improvements since <a
href="https://datafusion.apache.org/b
changes is available in the <a
href="https://github.com/apache/datafusion/blob/branch-51/dev/changelog/51.0.0.md">changelog</a>.
Thanks to the <a
href="https://github.com/apache/datafusion/blob/branch-51/dev/changelog/51.0.0.md#credits">128
contributors</a> for
making this release possible.</p>
<h2 id="performance-improvements">Performance Improvements 🚀<a
class="headerlink" href="#performance-improvements" title="Permanent
link">¶</a></h2>
+<p>We continue to make significant performance improvements in
DataFusion, both in
+the core engine and in the Parquet reader.</p>
<p><img alt="Performance over time" class="img-responsive"
src="/blog/images/datafusion-51.0.0/performance_over_time_clickbench.png"
width="100%"/></p>
<p><strong>Figure 1</strong>: Average and median normalized
query execution times for ClickBench queries for DataFusion 51.0.0 compared to
previous releases.
Query times are normalized using the ClickBench definition. See the
@@ -58,10 +59,10 @@ for more details.</p>
<h3 id="faster-case-expression-evaluation">Faster
<code>CASE</code> expression evaluation<a class="headerlink"
href="#faster-case-expression-evaluation" title="Permanent
link">¶</a></h3>
<p>This release builds on the <a
href="https://github.com/apache/datafusion/issues/18075">CASE performance
epic</a> with significant improvements.
Expressions short‑circuit earlier, reuse partial results, and avoid unnecessary
-scattering, speeding up common ETL patterns. Thanks to <a
href="https://github.com/pepijnve">pepijnve</a>, <a
href="https://github.com/chenkovsky">chenkovsky</a>
+scattering, speeding up common ETL patterns. Thanks to <a
href="https://github.com/pepijnve">pepijnve</a>, <a
href="https://github.com/chenkovsky">chenkovsky</a>,
and <a href="https://github.com/petern48">petern48</a> for leading
this effort. We hope to share more details on our
implementation in a future post.</p>
-<p><strong>Fewer object store round-trips for Parquet by
Default</strong></p>
+<p><strong>Fewer object store round-trips for Parquet by
default</strong></p>
<p>DataFusion now sets a default
<code>metadata_size_hint</code> for <a
href="https://parquet.apache.org/">Apache Parquet</a> scans
(<a
href="https://github.com/apache/datafusion/issues/18118">#18118</a>),
avoiding the extra
“last 8‑byte” request many clouds require to read file footers. Remote scans
@@ -69,16 +70,15 @@ typically drop from five requests to four per file, cutting
latency and transfer
costs without any application changes. Thanks to <a
href="https://github.com/zhuqi-lucas">zhuqi-lucas</a> for leading this
effort.</p>
<h3 id="faster-parquet-metadata-parsing">Faster Parquet metadata
parsing<a class="headerlink" href="#faster-parquet-metadata-parsing"
title="Permanent link">¶</a></h3>
-<p>DataFusion 51 also includes the latest Parquet reader from
-<a
href="https://arrow.apache.org/blog/2025/10/30/arrow-rs-57.0.0/">Arrow Rust
57.0.0</a>, which is significantly faster parsing Parquet metadata. This
is
+<p>DataFusion 51 also includes the latest Parquet reader from
+<a
href="https://arrow.apache.org/blog/2025/10/30/arrow-rs-57.0.0/">Arrow Rust
57.0.0</a>, which parses Parquet metadata significantly faster. This is
especially beneficial for workloads with many small Parquet files and scenarios
where startup time or low latency is important. You can read more about the
upstream work by
-<a href="https://github.com/etseidl">etseidl</a> and <a
href="https://github.com/jhorstmann">jhorstmann</a> that enabled these
improvements
-in the <a
href="https://arrow.apache.org/blog/2025/10/23/rust-parquet-metadata/">Faster
Apache Parquet Footer Metadata Using a Custom Thrift Parser</a>
blog.</p>
+<a href="https://github.com/etseidl">etseidl</a> and <a
href="https://github.com/jhorstmann">jhorstmann</a> that enabled these
improvements in the <a
href="https://arrow.apache.org/blog/2025/10/23/rust-parquet-metadata/">Faster
Apache Parquet Footer Metadata Using a Custom Thrift Parser</a>
blog.</p>
<p><img alt="Metadata Parsing Performance Improvements in
Arrow/Parquet 57" class="img-responsive"
src="/blog/images/datafusion-51.0.0/arrow-57-metadata-parsing.png"
width="100%"/></p>
<p><strong>Figure 2</strong>: Metadata parsing performance
improvements in Arrow/Parquet 57.0.0. </p>
<h3 id="better-defaults-for-remote-parquet-reads">Better Defaults for
Remote Parquet Reads<a class="headerlink"
href="#better-defaults-for-remote-parquet-reads" title="Permanent
link">¶</a></h3>
-<p>DataFusion by default now fetches the last 512KB (configurable) of
Parquet files
+<p>By default, DataFusion now fetches the last 512KB (configurable) of
Parquet files
so the first request usually includes the full footer (<a
href="https://github.com/apache/datafusion/issues/18118">#18118</a>).
This will
typically avoid two distinct I/O requests for each Parquet file. While this
setting has existed in DataFusion for many years, it was not previously enabled
@@ -168,10 +168,10 @@ functions benefit from the same syntax. Thanks to <a
href="https://github.com
<p>For example, you can pass arguments to functions like this:</p>
<pre><code class="language-sql">SELECT power(exponent =&gt;
3.0, base =&gt; 2.0);
</code></pre>
-<h3 id="metrics-improvement">Metrics improvement<a class="headerlink"
href="#metrics-improvement" title="Permanent link">¶</a></h3>
+<h3 id="metrics-improvements">Metrics improvements<a
class="headerlink" href="#metrics-improvements" title="Permanent
link">¶</a></h3>
<p>The output of <a
href="https://datafusion.apache.org/user-guide/sql/explain.html#explain-analyze">EXPLAIN
ANALYZE</a> has been improved to include more metrics
-about execution time and memory usage of each operator (<a
href="https://github.com/apache/datafusion/issues/18217">#18217</a>).
-You can find more about these new metrics in the <a
href="https://datafusion.apache.org/user-guide/metrics.html">metrics user
guide</a>. Thanks to
+about execution time and memory usage of each operator (<a
href="https://github.com/apache/datafusion/issues/18217">#18217</a>).
+You can learn more about these new metrics in the <a
href="https://datafusion.apache.org/user-guide/metrics.html">metrics user
guide</a>. Thanks to
<a href="https://github.com/2010YOUY01">2010YOUY01</a> for leading
this effort.</p>
<p>The <code>51.0.0</code> release adds:</p>
<ul>
@@ -184,7 +184,7 @@ You can find more about these new metrics in the <a
href="https://datafusion.
<li><strong>NestedLoopJoinExec</strong>: adds a
<code>selectivity</code> metric (<code>output_rows /
(left_rows * right_rows)</code>) to show how many combinations actually
pass the join condition.</li>
<li>Several display formatting improvements were added to make
<code>EXPLAIN ANALYZE</code> output easier to read.</li>
</ul>
-<p>For example, the following query</p>
+<p>For example, the following query:</p>
<pre><code class="language-sql">set
datafusion.explain.analyze_level = summary
explain analyze
diff --git a/blog/feeds/pmc.atom.xml b/blog/feeds/pmc.atom.xml
index 06c1af5..7eefc86 100644
--- a/blog/feeds/pmc.atom.xml
+++ b/blog/feeds/pmc.atom.xml
@@ -24,8 +24,7 @@ some of the major improvements since <a
href="https://datafusion.apache.org/b
changes is available in the <a
href="https://github.com/apache/datafusion/blob/branch-51/dev/changelog/51.0.0.md">changelog</a>.
Thanks to the <a
href="https://github.com/apache/datafusion/blob/branch-51/dev/changelog/51.0.0.md#credits">128
contributors</a> for
making this release possible.</p>
<h2 id="performance-improvements">Performance Improvements 🚀<a
class="headerlink" href="#performance-improvements" title="Permanent
link">¶</a></h2>
-<p><img alt="Performance over time" class="img-responsive"
src="/blog/images/datafusion-51.0.0/performance_over_time_clickbench.png"
width="100%"/></p>
-<p><strong>Figure 1 …</strong></p></summary><content
type="html"><!--
+<p>We continue …</p></summary><content type="html"><!--
{% comment %}
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
@@ -50,6 +49,8 @@ some of the major improvements since <a
href="https://datafusion.apache.org/b
changes is available in the <a
href="https://github.com/apache/datafusion/blob/branch-51/dev/changelog/51.0.0.md">changelog</a>.
Thanks to the <a
href="https://github.com/apache/datafusion/blob/branch-51/dev/changelog/51.0.0.md#credits">128
contributors</a> for
making this release possible.</p>
<h2 id="performance-improvements">Performance Improvements 🚀<a
class="headerlink" href="#performance-improvements" title="Permanent
link">¶</a></h2>
+<p>We continue to make significant performance improvements in
DataFusion, both in
+the core engine and in the Parquet reader.</p>
<p><img alt="Performance over time" class="img-responsive"
src="/blog/images/datafusion-51.0.0/performance_over_time_clickbench.png"
width="100%"/></p>
<p><strong>Figure 1</strong>: Average and median normalized
query execution times for ClickBench queries for DataFusion 51.0.0 compared to
previous releases.
Query times are normalized using the ClickBench definition. See the
@@ -58,10 +59,10 @@ for more details.</p>
<h3 id="faster-case-expression-evaluation">Faster
<code>CASE</code> expression evaluation<a class="headerlink"
href="#faster-case-expression-evaluation" title="Permanent
link">¶</a></h3>
<p>This release builds on the <a
href="https://github.com/apache/datafusion/issues/18075">CASE performance
epic</a> with significant improvements.
Expressions short‑circuit earlier, reuse partial results, and avoid unnecessary
-scattering, speeding up common ETL patterns. Thanks to <a
href="https://github.com/pepijnve">pepijnve</a>, <a
href="https://github.com/chenkovsky">chenkovsky</a>
+scattering, speeding up common ETL patterns. Thanks to <a
href="https://github.com/pepijnve">pepijnve</a>, <a
href="https://github.com/chenkovsky">chenkovsky</a>,
and <a href="https://github.com/petern48">petern48</a> for leading
this effort. We hope to share more details on our
implementation in a future post.</p>
-<p><strong>Fewer object store round-trips for Parquet by
Default</strong></p>
+<p><strong>Fewer object store round-trips for Parquet by
default</strong></p>
<p>DataFusion now sets a default
<code>metadata_size_hint</code> for <a
href="https://parquet.apache.org/">Apache Parquet</a> scans
(<a
href="https://github.com/apache/datafusion/issues/18118">#18118</a>),
avoiding the extra
“last 8‑byte” request many clouds require to read file footers. Remote scans
@@ -69,16 +70,15 @@ typically drop from five requests to four per file, cutting
latency and transfer
costs without any application changes. Thanks to <a
href="https://github.com/zhuqi-lucas">zhuqi-lucas</a> for leading this
effort.</p>
<h3 id="faster-parquet-metadata-parsing">Faster Parquet metadata
parsing<a class="headerlink" href="#faster-parquet-metadata-parsing"
title="Permanent link">¶</a></h3>
-<p>DataFusion 51 also includes the latest Parquet reader from
-<a
href="https://arrow.apache.org/blog/2025/10/30/arrow-rs-57.0.0/">Arrow Rust
57.0.0</a>, which is significantly faster parsing Parquet metadata. This
is
+<p>DataFusion 51 also includes the latest Parquet reader from
+<a
href="https://arrow.apache.org/blog/2025/10/30/arrow-rs-57.0.0/">Arrow Rust
57.0.0</a>, which parses Parquet metadata significantly faster. This is
especially beneficial for workloads with many small Parquet files and scenarios
where startup time or low latency is important. You can read more about the
upstream work by
-<a href="https://github.com/etseidl">etseidl</a> and <a
href="https://github.com/jhorstmann">jhorstmann</a> that enabled these
improvements
-in the <a
href="https://arrow.apache.org/blog/2025/10/23/rust-parquet-metadata/">Faster
Apache Parquet Footer Metadata Using a Custom Thrift Parser</a>
blog.</p>
+<a href="https://github.com/etseidl">etseidl</a> and <a
href="https://github.com/jhorstmann">jhorstmann</a> that enabled these
improvements in the <a
href="https://arrow.apache.org/blog/2025/10/23/rust-parquet-metadata/">Faster
Apache Parquet Footer Metadata Using a Custom Thrift Parser</a>
blog.</p>
<p><img alt="Metadata Parsing Performance Improvements in
Arrow/Parquet 57" class="img-responsive"
src="/blog/images/datafusion-51.0.0/arrow-57-metadata-parsing.png"
width="100%"/></p>
<p><strong>Figure 2</strong>: Metadata parsing performance
improvements in Arrow/Parquet 57.0.0. </p>
<h3 id="better-defaults-for-remote-parquet-reads">Better Defaults for
Remote Parquet Reads<a class="headerlink"
href="#better-defaults-for-remote-parquet-reads" title="Permanent
link">¶</a></h3>
-<p>DataFusion by default now fetches the last 512KB (configurable) of
Parquet files
+<p>By default, DataFusion now fetches the last 512KB (configurable) of
Parquet files
so the first request usually includes the full footer (<a
href="https://github.com/apache/datafusion/issues/18118">#18118</a>).
This will
typically avoid two distinct I/O requests for each Parquet file. While this
setting has existed in DataFusion for many years, it was not previously enabled
@@ -168,10 +168,10 @@ functions benefit from the same syntax. Thanks to <a
href="https://github.com
<p>For example, you can pass arguments to functions like this:</p>
<pre><code class="language-sql">SELECT power(exponent =&gt;
3.0, base =&gt; 2.0);
</code></pre>
-<h3 id="metrics-improvement">Metrics improvement<a class="headerlink"
href="#metrics-improvement" title="Permanent link">¶</a></h3>
+<h3 id="metrics-improvements">Metrics improvements<a
class="headerlink" href="#metrics-improvements" title="Permanent
link">¶</a></h3>
<p>The output of <a
href="https://datafusion.apache.org/user-guide/sql/explain.html#explain-analyze">EXPLAIN
ANALYZE</a> has been improved to include more metrics
-about execution time and memory usage of each operator (<a
href="https://github.com/apache/datafusion/issues/18217">#18217</a>).
-You can find more about these new metrics in the <a
href="https://datafusion.apache.org/user-guide/metrics.html">metrics user
guide</a>. Thanks to
+about execution time and memory usage of each operator (<a
href="https://github.com/apache/datafusion/issues/18217">#18217</a>).
+You can learn more about these new metrics in the <a
href="https://datafusion.apache.org/user-guide/metrics.html">metrics user
guide</a>. Thanks to
<a href="https://github.com/2010YOUY01">2010YOUY01</a> for leading
this effort.</p>
<p>The <code>51.0.0</code> release adds:</p>
<ul>
@@ -184,7 +184,7 @@ You can find more about these new metrics in the <a
href="https://datafusion.
<li><strong>NestedLoopJoinExec</strong>: adds a
<code>selectivity</code> metric (<code>output_rows /
(left_rows * right_rows)</code>) to show how many combinations actually
pass the join condition.</li>
<li>Several display formatting improvements were added to make
<code>EXPLAIN ANALYZE</code> output easier to read.</li>
</ul>
-<p>For example, the following query</p>
+<p>For example, the following query:</p>
<pre><code class="language-sql">set
datafusion.explain.analyze_level = summary
explain analyze
diff --git a/blog/feeds/pmc.rss.xml b/blog/feeds/pmc.rss.xml
index f953959..f274e27 100644
--- a/blog/feeds/pmc.rss.xml
+++ b/blog/feeds/pmc.rss.xml
@@ -24,8 +24,7 @@ some of the major improvements since <a
href="https://datafusion.apache.org/b
changes is available in the <a
href="https://github.com/apache/datafusion/blob/branch-51/dev/changelog/51.0.0.md">changelog</a>.
Thanks to the <a
href="https://github.com/apache/datafusion/blob/branch-51/dev/changelog/51.0.0.md#credits">128
contributors</a> for
making this release possible.</p>
<h2 id="performance-improvements">Performance Improvements 🚀<a
class="headerlink" href="#performance-improvements" title="Permanent
link">¶</a></h2>
-<p><img alt="Performance over time" class="img-responsive"
src="/blog/images/datafusion-51.0.0/performance_over_time_clickbench.png"
width="100%"/></p>
-<p><strong>Figure 1
…</strong></p></description><dc:creator
xmlns:dc="http://purl.org/dc/elements/1.1/">pmc</dc:creator><pubDate>Tue, 25
Nov 2025 00:00:00 +0000</pubDate><guid
isPermaLink="false">tag:datafusion.apache.org,2025-11-25:/blog/2025/11/25/datafusion-51.0.0</guid><category>blog</category></item><item><title>Apache
DataFusion Comet 0.11.0
Release</title><link>https://datafusion.apache.org/blog/2025/10/21/datafusion-comet-0.11.0</link><description><!--
+<p>We continue …</p></description><dc:creator
xmlns:dc="http://purl.org/dc/elements/1.1/">pmc</dc:creator><pubDate>Tue, 25
Nov 2025 00:00:00 +0000</pubDate><guid
isPermaLink="false">tag:datafusion.apache.org,2025-11-25:/blog/2025/11/25/datafusion-51.0.0</guid><category>blog</category></item><item><title>Apache
DataFusion Comet 0.11.0
Release</title><link>https://datafusion.apache.org/blog/2025/10/21/datafusion-comet-0.11.0</link><description><!--
{% comment %}
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
diff --git a/blog/index.html b/blog/index.html
index 19e9154..fd7df6e 100644
--- a/blog/index.html
+++ b/blog/index.html
@@ -78,8 +78,7 @@ some of the major improvements since <a
href="https://datafusion.apache.org/blog
changes is available in the <a
href="https://github.com/apache/datafusion/blob/branch-51/dev/changelog/51.0.0.md">changelog</a>.
Thanks to the <a
href="https://github.com/apache/datafusion/blob/branch-51/dev/changelog/51.0.0.md#credits">128
contributors</a> for
making this release possible.</p>
<h2 id="performance-improvements">Performance Improvements 🚀<a
class="headerlink" href="#performance-improvements" title="Permanent
link">¶</a></h2>
-<p><img alt="Performance over time" class="img-responsive"
src="/blog/images/datafusion-51.0.0/performance_over_time_clickbench.png"
width="100%"/></p>
-<p><strong>Figure 1 …</strong></p></p>
+<p>We continue …</p></p>
<footer>
<ul class="actions">
<div style="text-align: right"><a
href="/blog/2025/11/25/datafusion-51.0.0" class="button medium">Continue
Reading</a></div>
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]