This is an automated email from the ASF dual-hosted git repository.

github-bot pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/datafusion-site.git


The following commit(s) were added to refs/heads/asf-site by this push:
     new 27aaba6  Commit build products
27aaba6 is described below

commit 27aaba6947c0eafc28f53291c248321847906562
Author: Build Pelican (action) <[email protected]>
AuthorDate: Mon Jul 28 10:29:42 2025 +0000

    Commit build products
---
 output/2025/07/28/datafusion-49.0.0/index.html     | 370 +++++++++++++++++++++
 output/author/pmc.html                             |  37 ++-
 output/category/blog.html                          |  33 ++
 output/feed.xml                                    |  25 +-
 output/feeds/all-en.atom.xml                       | 308 ++++++++++++++++-
 output/feeds/blog.atom.xml                         | 308 ++++++++++++++++-
 output/feeds/pmc.atom.xml                          | 308 ++++++++++++++++-
 output/feeds/pmc.rss.xml                           |  25 +-
 .../performance_over_time_clickbench.png           | Bin 0 -> 64862 bytes
 .../performance_over_time_planning.png             | Bin 0 -> 11393 bytes
 output/index.html                                  |  42 +++
 11 files changed, 1449 insertions(+), 7 deletions(-)

diff --git a/output/2025/07/28/datafusion-49.0.0/index.html 
b/output/2025/07/28/datafusion-49.0.0/index.html
new file mode 100644
index 0000000..70a0815
--- /dev/null
+++ b/output/2025/07/28/datafusion-49.0.0/index.html
@@ -0,0 +1,370 @@
+<!doctype html>
+<html class="no-js" lang="en" dir="ltr">
+  <head>
+    <meta charset="utf-8">
+    <meta http-equiv="x-ua-compatible" content="ie=edge">
+    <meta name="viewport" content="width=device-width, initial-scale=1.0">
+    <title>Apache DataFusion 49.0.0 Released - Apache DataFusion Blog</title>
+<link href="/blog/css/bootstrap.min.css" rel="stylesheet">
+<link href="/blog/css/fontawesome.all.min.css" rel="stylesheet">
+<link href="/blog/css/headerlink.css" rel="stylesheet">
+<link href="/blog/highlight/default.min.css" rel="stylesheet">
+<script src="/blog/highlight/highlight.js"></script>
+<script>hljs.highlightAll();</script>  </head>
+  <body class="d-flex flex-column h-100">
+  <main class="flex-shrink-0">
+<!-- nav bar -->
+<nav class="navbar navbar-expand-lg navbar-dark bg-dark" aria-label="Fifth 
navbar example">
+    <div class="container-fluid">
+        <a class="navbar-brand" href="/blog"><img 
src="/blog/images/logo_original4x.png" style="height: 32px;"/> Apache 
DataFusion Blog</a>
+        <button class="navbar-toggler" type="button" data-bs-toggle="collapse" 
data-bs-target="#navbarADP" aria-controls="navbarADP" aria-expanded="false" 
aria-label="Toggle navigation">
+            <span class="navbar-toggler-icon"></span>
+        </button>
+
+        <div class="collapse navbar-collapse" id="navbarADP">
+            <ul class="navbar-nav me-auto mb-2 mb-lg-0">
+                <li class="nav-item">
+                    <a class="nav-link" href="/blog/about.html">About</a>
+                </li>
+                <li class="nav-item">
+                    <a class="nav-link" href="/blog/feed.xml">RSS</a>
+                </li>
+            </ul>
+        </div>
+    </div>
+</nav>    
+
+
+<!-- article contents -->
+<div id="contents">
+    <div class="bg-white p-5 rounded">
+        <div class="col-sm-8 mx-auto">
+          <h1>
+            Apache DataFusion 49.0.0 Released
+          </h1>
+            <p>Posted on: Mon 28 July 2025 by pmc</p>
+            <!--
+{% comment %}
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements.  See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to you under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License.  You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+{% endcomment %}
+-->
+<!-- see https://github.com/apache/datafusion/issues/16347 for details -->
+<h2>Introduction</h2>
+<p>We are proud to announce the release of <a 
href="https://crates.io/crates/datafusion/49.0.0";>DataFusion 49.0.0</a>. This 
blog post highlights some of
+the major improvements since the release of <a 
href="https://datafusion.apache.org/blog/2025/07/18/datafusion-48.0.0/";>DataFusion
 48.0.0</a>. The complete list of changes is available in the <a 
href="https://github.com/apache/datafusion/blob/branch-49/dev/changelog/49.0.0.md";>changelog</a>.</p>
+<h2>Performance Improvements 🚀</h2>
+<p>DataFusion continues to focus on enhancing performance, as shown in the 
ClickBench and other results. </p>
+<p><img alt="ClickBench performance results over time for DataFusion" 
class="img-responsive" 
src="/blog/images/datafusion-49.0.0/performance_over_time_clickbench.png" 
width="100%"/></p>
+<p><strong>Figure 1</strong>: ClickBench performance improvements over time
+Average and median normalized query execution times for ClickBench queries for 
each git revision. 
+Query times are normalized using the ClickBench definition. Data and 
definitions on the 
+<a href="https://alamb.github.io/datafusion-benchmarking/";>DataFusion 
Benchmarking Page</a>. </p>
+<!--
+NOTE: Andrew is working on gathering these numbers
+
+<img
+src="/blog/images/datafusion-49.0.0/performance_over_time_planning.png"
+width="80%"
+class="img-responsive"
+alt="Planning benchmark performance results over time for DataFusion"
+/>
+
+**Figure 2**: Planning benchmark performance improved XXX between DataFusion 
48.0.1 and DataFusion 49.0.0. Chart source: TODO
+-->
+<p>Here are some noteworthy optimizations added since DataFusion 48:</p>
+<p><strong>Equivalence system upgrade:</strong> The lower levels of the 
equivalence system, which is used to implement the
+  optimizations described in <a 
href="https://datafusion.apache.org/blog/2025/03/11/ordering-analysis";>Using 
Ordering for Better Plans</a>, were rewritten, leading to
+  much faster planning times, especially for queries with a <a 
href="https://github.com/apache/datafusion/pull/16217#pullrequestreview-2891941229";>large
 number of columns</a>. This change also prepares
+  the way for more sophisticated sort-based optimizations in the future. (PR 
<a href="https://github.com/apache/datafusion/pull/16217";>#16217</a> by <a 
href="https://github.com/ozankabak";>ozankabak</a>).</p>
+<p><strong>Dynamic Filters and TopK pushdown</strong></p>
+<p>DataFusion now supports dynamic filters, which are improved during query 
execution,
+and physical filter pushdown. Together, these features improve the performance 
of
+queries that use <code>LIMIT</code> and <code>ORDER BY</code> clauses, such as 
the following:</p>
+<pre><code class="language-sql">SELECT *
+FROM data
+ORDER BY timestamp DESC
+LIMIT 10
+</code></pre>
+<p>While the query above is simple, without dynamic filtering or knowing that 
the data
+is already sorted by <code>timestamp</code>, a query engine must decode 
<em>all</em> of the data to
+find the top 10 values. With the dynamic filters system, DataFusion applies an
+increasingly selective filter during query execution. It checks the 
<strong>current</strong>
+top 10 values of the <code>timestamp</code> column <strong>before</strong> 
opening files or reading
+Parquet Row Groups and Data Pages, which can skip older data very quickly.</p>
+<p>Dynamic predicates are a common feature of advanced engines such as <a 
href="https://docs.starburst.io/latest/admin/dynamic-filtering.html";>Dynamic
+Filters in Starburst</a> and <a 
href="https://www.snowflake.com/en/engineering-blog/optimizing-top-k-aggregation-snowflake/";>Top-K
 Aggregation Optimization at Snowflake</a>. The
+technique drastically improves query performance (we've seen over a 1.5x
+improvement for some TPC-H-style queries), especially in combination with late
+materialization and columnar file formats such as Parquet. We <a 
href="https://github.com/apache/datafusion/issues/15513";>plan to write a
+blog post</a> explaining the details of this optimization in the future, and 
we expect to
+use the same mechanism to implement additional optimizations such as <a 
href="https://github.com/apache/datafusion/issues/7955";>Sideways
+Information Passing for joins</a> (Issue
+<a href="https://github.com/apache/datafusion/issues/15037";>#15037</a> PR
+<a href="https://github.com/apache/datafusion/pull/15770";>#15770</a> by
+<a href="https://github.com/adriangb";>adriangb</a>).</p>
+<h2>Community Growth  📈</h2>
+<p>The last few months, between <code>46.0.0</code> and <code>49.0.0</code>, 
have seen our community grow:</p>
+<ol>
+<li>New PMC members and committers: <a 
href="https://github.com/berkaysynnada";>berkay</a>, <a 
href="https://github.com/xudong963";>xudong963</a> and <a 
href="https://github.com/timsaucer";>timsaucer</a> joined the PMC.
+   <a href="https://github.com/blaginin";>blaginin</a>, <a 
href="https://github.com/milenkovicm";>milenkovicm</a>, <a 
href="https://github.com/adriangb";>adriangb</a> and <a 
href="https://github.com/kosiew";>kosiew</a> joined as committers. See the <a 
href="https://lists.apache.org/[email protected]";>mailing 
list</a> for more details.</li>
+<li>In the <a href="https://github.com/apache/arrow-datafusion";>core 
DataFusion repo</a> alone, we reviewed and accepted over 850 PRs from 172 
different
+   committers, created over 669 issues, and closed 379 of them 🚀. All changes 
are listed in the detailed
+   <a 
href="https://github.com/apache/datafusion/tree/main/dev/changelog";>changelogs</a>.</li>
+<li>DataFusion published a number of blog posts, including <a 
href="https://datafusion.apache.org/blog/2025/04/19/user-defined-window-functions";>User
 defined Window Functions</a>, <a 
href="https://datafusion.apache.org/blog/2025/06/15/optimizing-sql-dataframes-part-one";>Optimizing
 SQL (and DataFrames)
+   in DataFusion part 1</a>, <a 
href="https://datafusion.apache.org/blog/2025/06/15/optimizing-sql-dataframes-part-two";>part
 2</a>, <a 
href="https://datafusion.apache.org/blog/2025/06/30/cancellation";>Using Rust 
async for Query Execution and Cancelling Long-Running Queries</a>, and
+   <a 
href="https://datafusion.apache.org/blog/2025/07/14/user-defined-parquet-indexes/";>Embedding
 User-Defined Indexes in Apache Parquet Files</a>.</li>
+</ol>
+<!--
+# Unique committers
+$ git shortlog -sn 46.0.0..49.0.0-rc1  .| wc -l
+     172
+# commits
+$ git log --pretty=oneline 46.0.0..49.0.0-rc1 . | wc -l
+     884
+
+
+https://crates.io/crates/datafusion/49.0.0
+DataFusion 49 released July 25, 2025
+
+https://crates.io/crates/datafusion/46.0.0
+DataFusion 46 released March 7, 2025
+
+Issues created in this time: 290 open, 379 closed = 669 total
+https://github.com/apache/datafusion/issues?q=is%3Aissue+created%3A2025-03-07..2025-07-25
+
+Issues closed: 508
+https://github.com/apache/arrow-datafusion/issues?q=is%3Aissue+closed%3A2025-03-07..2025-07-25
+
+PRs merged in this time 874
+https://github.com/apache/arrow-datafusion/pulls?q=is%3Apr+merged%3A2025-03-07..2025-07-25
+
+-->
+<h2>New Features ✨</h2>
+<h3>Async User-Defined Functions</h3>
+<p>It is now possible to write <code>async</code> User-Defined Functions
+(UDFs) in DataFusion that perform asynchronous
+operations, such as network requests or database queries, without blocking the
+execution of the query. This enables new use cases, such as
+integrating with large language models (LLMs) or other external services, and 
we can't
+wait to see what the community builds with it.</p>
+<p>See the <a 
href="https://datafusion.apache.org/library-user-guide/functions/adding-udfs.html";>documentation</a>
 for more details and the <a 
href="https://github.com/apache/datafusion/blob/main/datafusion-examples/examples/async_udf.rs";>async
 UDF example</a> for
+working code. </p>
+<p>You could, for example, implement a function <code>ask_llm</code> that asks 
a large language model
+(LLM) service a question based on the content of two columns.</p>
+<pre><code class="language-sql">SELECT * 
+FROM animal a
+WHERE ask_llm(a.name, 'Is this animal furry?')")
+</code></pre>
+<p>The implementation of an async UDF is almost identical to a normal
+UDF, except that it must implement the <code>AsyncScalarUDFImpl</code> trait 
in addition to <code>ScalarUDFImpl</code> and
+provide an <code>async</code> implementation via 
<code>invoke_async_with_args</code>:</p>
+<pre><code class="language-rust">#[derive(Debug)]
+struct AskLLM {
+    signature: Signature,
+}
+
+#[async_trait]
+impl AsyncScalarUDFImpl for AskLLM {
+    /// The `invoke_async_with_args` method is similar to `invoke_with_args`,
+    /// but it returns a `Future` that resolves to the result.
+    ///
+    /// Since this signature is `async`, it can do any `async` operations, such
+    /// as network requests.
+    async fn invoke_async_with_args(
+        &amp;self,
+        args: ScalarFunctionArgs,
+        options: &amp;ConfigOptions,
+    ) -&gt; Result&lt;ArrayRef&gt; {
+        // Converts the arguments to arrays for simplicity.
+        let args = ColumnarValue::values_to_arrays(&amp;args.args)?;
+        let [column_of_interest, question] = take_function_args(self.name(), 
args)?;
+        let client = Client::new();
+
+        // Make a network request to a hypothetical LLM service
+        let res = client
+            .post(URI)
+            .headers(get_llm_headers(options))
+            .json(&amp;req)
+            .send()
+            .await?
+            .json::&lt;LLMResponse&gt;()
+            .await?;
+
+        let results = extract_results_from_llm_response(&amp;res);
+
+        Ok(Arc::new(results))
+    }
+}
+</code></pre>
+<p>(Issue <a href="https://github.com/apache/datafusion/issues/6518";>#6518</a>,
+<a href="https://github.com/apache/datafusion/pull/14837";>PR #14837</a> from
+<a href="https://github.com/goldmedal";>goldmedal</a> 🏆)</p>
+<h3>Better Cancellation for Certain Long-Running Queries</h3>
+<p>In rare cases, it was previously not possible to cancel long-running 
queries,
+leading to unresponsiveness. Other projects would likely have fixed this issue
+by treating the symptom, but <a 
href="https://github.com/pepijnve";>pepijnve</a> and the DataFusion community 
worked together to
+treat the root cause. The general solution required a deep understanding of the
+DataFusion execution engine, Rust <code>Streams</code>, and the tokio 
cooperative
+scheduling model. The <a 
href="https://github.com/apache/datafusion/pull/16398";>resulting PR</a> is a 
model of careful
+community engineering and a great example of using Rust's <code>async</code> 
ecosystem
+to implement complex functionality. It even resulted in a <a 
href="https://github.com/tokio-rs/tokio/pull/7405";>contribution upstream to 
tokio</a>
+(since accepted). See the <a 
href="https://datafusion.apache.org/blog/2025/06/30/cancellation";>blog post</a> 
for more details.</p>
+<h3>Metadata for User Defined Types such as <code>Variant</code> and 
<code>Geometry</code></h3>
+<p>User-defined types have been <a 
href="https://github.com/apache/datafusion/issues/12644";>a long-requested 
feature</a>, and this release provides
+the low-level APIs to support them efficiently.</p>
+<ol>
+<li>Metadata handling in PRs <a 
href="https://github.com/apache/datafusion/pull/15646";>#15646</a> and <a 
href="https://github.com/apache/datafusion/pull/16170";>#16170</a> from <a 
href="https://github.com/timsaucer";>timsaucer</a></li>
+<li>Pushdown of filters and expressions (see "Dynamic Filters and TopK 
pushdown" section above)</li>
+</ol>
+<p>We still have some work to do to fully support user-defined types, 
specifically
+in documentation and testing, and we would
+love your help in this area. If you are interested in contributing,
+please see <a href="https://github.com/apache/datafusion/issues/12644";>issue 
#12644</a>.</p>
+<h3>Parquet Modular Encryption</h3>
+<p>DataFusion now supports reading and writing encrypted <a 
href="https://parquet.apache.org/";>Apache Parquet</a> files with <a 
href="https://parquet.apache.org/docs/file-format/data-pages/encryption/";>modular
+encryption</a>. This allows users to encrypt specific columns in a Parquet file
+using different keys, while still being able to read data without needing to
+decrypt the entire file.</p>
+<p>Here is an example of how to configure DataFusion to read an encrypted 
Parquet
+table with two columns, <code>double_field</code> and 
<code>float_field</code>, using modular
+encryption:</p>
+<pre><code class="language-sql">CREATE EXTERNAL TABLE encrypted_parquet_table
+(
+double_field double,
+float_field float
+)
+STORED AS PARQUET LOCATION 'pq/' OPTIONS (
+    -- encryption
+    'format.crypto.file_encryption.encrypt_footer' 'true',
+    'format.crypto.file_encryption.footer_key_as_hex' 
'30313233343536373839303132333435',  -- b"0123456789012345"
+    'format.crypto.file_encryption.column_key_as_hex::double_field' 
'31323334353637383930313233343530', -- b"1234567890123450"
+    'format.crypto.file_encryption.column_key_as_hex::float_field' 
'31323334353637383930313233343531', -- b"1234567890123451"
+    -- decryption
+    'format.crypto.file_decryption.footer_key_as_hex' 
'30313233343536373839303132333435', -- b"0123456789012345"
+    'format.crypto.file_decryption.column_key_as_hex::double_field' 
'31323334353637383930313233343530', -- b"1234567890123450"
+    'format.crypto.file_decryption.column_key_as_hex::float_field' 
'31323334353637383930313233343531', -- b"1234567890123451"
+);
+</code></pre>
+<p>(<a href="https://github.com/apache/datafusion/issues/15216";>Issue 
#15216</a>,
+<a href="https://github.com/apache/datafusion/pull/16351";>PR #16351</a>
+from <a href="https://github.com/corwinjoy";>corwinjoy</a> and <a 
href="https://github.com/adamreeve";>adamreeve</a>)</p>
+<h3>Support for <code>WITHIN GROUP</code> for Ordered-Set Aggregate 
Functions</h3>
+<p>DataFusion now supports the <code>WITHIN GROUP</code> clause for <a 
href="https://www.postgresql.org/docs/9.4/functions-aggregate.html#FUNCTIONS-ORDEREDSET-TABLE";>ordered-set
 aggregate
+functions</a> such as <code>approx_percentile_cont</code>, 
<code>percentile_cont</code>, and
+<code>percentile_disc</code>, which allows users to specify the precise 
order.</p>
+<p>For example, the following query computes the 50th percentile for the 
<code>temperature</code> column
+in the <code>city_data</code> table, ordered by <code>date</code>:</p>
+<pre><code class="language-sql">SELECT
+    percentile_disc(0.5) WITHIN GROUP (ORDER BY date) AS median_temperature
+FROM city_data;
+</code></pre>
+<p>(Issue <a 
href="https://github.com/apache/datafusion/issues/11732";>#11732</a>, 
+PR <a href="https://github.com/apache/datafusion/pull/13511";>#13511</a>,
+by <a href="https://github.com/Garamda";>Garamda</a>)</p>
+<h3>Compressed Spill Files</h3>
+<p>DataFusion now supports compressing the files written to disk when spilling
+larger-than-memory datasets while sorting and grouping. Using compression
+can significantly reduce the
+size of the intermediate files and improve performance when reading them back 
into memory.</p>
+<p>(Issue <a 
href="https://github.com/apache/datafusion/issues/16130";>#16130</a>,
+PR <a href="https://github.com/apache/datafusion/pull/16268";>#16268</a>
+by <a href="https://github.com/ding-young";>ding-young</a>)</p>
+<h3>Support for <code>REGEX_INSTR</code> function</h3>
+<p>DataFusion now supports the [<code>REGEXP_INSTR</code> function], which 
returns the position of a
+regular expression match within a string.</p>
+<p>For example, to find the position of the first match of the regular 
expression
+<code>C(.)(..)</code> in the string <code>ABCDEF</code>, you can use:</p>
+<pre><code class="language-sql">&gt; SELECT regexp_instr('ABCDEF', 'C(.)(..)');
++---------------------------------------------------------------+
+| regexp_instr(Utf8("ABCDEF"),Utf8("C(.)(..)"))                 |
++---------------------------------------------------------------+
+| 3                                                             |
++---------------------------------------------------------------+
+</code></pre>
+<p>(<a href="https://github.com/apache/datafusion/issues/13009";>Issue 
#13009</a>,
+<a href="https://github.com/apache/datafusion/pull/15928";>PR #15928</a>
+by <a href="https://github.com/nirnayroy";>nirnayroy</a>)</p>
+<h2>Upgrade Guide and Changelog</h2>
+<p>Upgrading to 49.0.0 should be straightforward for most users. Please review 
the
+<a 
href="https://datafusion.apache.org/library-user-guide/upgrading.html";>Upgrade 
Guide</a>
+for details on breaking changes and code snippets to help with the transition.
+Recently, some users have reported success automatically upgrading DataFusion 
by
+pairing AI tools with the upgrade guide. For a comprehensive list of all 
changes,
+please refer to the <a 
href="https://github.com/apache/datafusion/blob/branch-49/dev/changelog/49.0.0.md";>changelog</a>.</p>
+<h2>About DataFusion</h2>
+<p><a href="https://datafusion.apache.org/";>Apache DataFusion</a> is an 
extensible query engine, written in <a 
href="https://www.rust-lang.org/";>Rust</a>, that
+uses <a href="https://arrow.apache.org";>Apache Arrow</a> as its in-memory 
format. DataFusion is used by developers to
+create new, fast, data-centric systems such as databases, dataframe libraries,
+and machine learning and streaming applications. While <a 
href="https://datafusion.apache.org/user-guide/introduction.html#project-goals";>DataFusion&rsquo;s
 primary design
+goal</a> is to accelerate the creation of other data-centric systems, it 
provides a
+reasonable experience directly out of the box as a <a 
href="https://datafusion.apache.org/user-guide/dataframe.html";>dataframe 
library</a>,
+<a href="https://datafusion.apache.org/python/";>python library</a>, and 
[command-line SQL tool].</p>
+<p>DataFusion's core thesis is that as a community, together we can build much 
more
+advanced technology than any of us as individuals or companies could do alone.
+Without DataFusion, highly performant vectorized query engines would remain
+the domain of a few large companies and world-class research institutions.
+With DataFusion, we can all build on top of a shared foundation and focus on
+what makes our projects unique.</p>
+<h2>How to Get Involved</h2>
+<p>DataFusion is not a project built or driven by a single person, company, or
+foundation. Rather, our community of users and contributors works together to
+build a shared technology that none of us could have built alone.</p>
+<p>If you are interested in joining us, we would love to have you. You can try 
out
+DataFusion on some of your own data and projects and let us know how it goes,
+contribute suggestions, documentation, bug reports, or a PR with documentation,
+tests, or code. A list of open issues suitable for beginners is <a 
href="https://github.com/apache/arrow-datafusion/issues?q=is%3Aissue+is%3Aopen+label%3A%22good+first+issue%22";>here</a>,
 and you
+can find out how to reach us on the <a 
href="https://datafusion.apache.org/contributor-guide/communication.html";>communication
 doc</a>.</p>
+
+    <!--
+        Enable giscuss comments: Allows comments on the blogs posted as
+        https://github.com/apache/datafusion-site/discussions
+
+        More details on https://github.com/apache/datafusion-site/issues/80
+    -->
+    <div id="article_comments">
+        <div id="comment_thread"></div>
+
+        <script src="https://giscus.app/client.js";
+            data-repo="apache/datafusion-site"
+            data-repo-id="R_kgDOL8FTzw"
+            data-category="Announcements"
+            data-category-id="DIC_kwDOL8FTz84Csqua"
+            data-mapping="title"
+            data-strict="1"
+            data-reactions-enabled="1"
+            data-emit-metadata="0"
+            data-input-position="bottom"
+            data-theme="preferred_color_scheme"
+            data-lang="en"
+            data-loading="lazy"
+            crossorigin="anonymous"
+            async>
+        </script>
+    </div>          </div>
+      </div>
+    </div>    
+    <!-- footer -->
+    <div class="row g-0">
+      <div class="col-12">
+        <p style="font-style: italic; font-size: 0.8rem; text-align: center;">
+          Copyright 2025, <a href="https://www.apache.org/";>The Apache 
Software Foundation</a>, Licensed under the <a 
href="https://www.apache.org/licenses/LICENSE-2.0";>Apache License, Version 
2.0</a>.<br/>
+          Apache&reg; and the Apache feather logo are trademarks of The Apache 
Software Foundation.
+        </p>
+      </div>
+    </div>
+    <script src="/blog/js/bootstrap.bundle.min.js"></script>  </main>
+  </body>
+</html>
diff --git a/output/author/pmc.html b/output/author/pmc.html
index cef6e50..10e3b83 100644
--- a/output/author/pmc.html
+++ b/output/author/pmc.html
@@ -1,7 +1,7 @@
 <!DOCTYPE html>
 <html lang="en">
 <head>
-        <title>Apache DataFusion Blog - Articles by PMC</title>
+        <title>Apache DataFusion Blog - Articles by pmc</title>
         <meta charset="utf-8" />
         <meta name="generator" content="Pelican" />
         <link href="https://datafusion.apache.org/blog/feed.xml"; 
type="application/rss+xml" rel="alternate" title="Apache DataFusion Blog RSS 
Feed" />
@@ -17,9 +17,42 @@
             <li><a 
href="https://datafusion.apache.org/blog/category/blog.html";>blog</a></li>
         </ul></nav><!-- /#menu -->
 <section id="content">
-<h2>Articles by PMC</h2>
+<h2>Articles by pmc</h2>
 
 <ol id="post-list">
+        <li><article class="hentry">
+                <header> <h2 class="entry-title"><a 
href="https://datafusion.apache.org/blog/2025/07/28/datafusion-49.0.0"; 
rel="bookmark" title="Permalink to Apache DataFusion 49.0.0 Released">Apache 
DataFusion 49.0.0 Released</a></h2> </header>
+                <footer class="post-info">
+                    <time class="published" 
datetime="2025-07-28T00:00:00+00:00"> Mon 28 July 2025 </time>
+                    <address class="vcard author">By
+                        <a class="url fn" 
href="https://datafusion.apache.org/blog/author/pmc.html";>pmc</a>
+                    </address>
+                </footer><!-- /.post-info -->
+                <div class="entry-content"> <!--
+{% comment %}
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements.  See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to you under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License.  You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+{% endcomment %}
+-->
+<!-- see https://github.com/apache/datafusion/issues/16347 for details -->
+<h2>Introduction</h2>
+<p>We are proud to announce the release of <a 
href="https://crates.io/crates/datafusion/49.0.0";>DataFusion 49.0.0</a>. This 
blog post highlights some of
+the major improvements since the release of <a 
href="https://datafusion.apache.org/blog/2025/07/18/datafusion-48.0.0/";>DataFusion
 48.0.0</a>. The complete list of changes is available in the <a 
href="https://github.com/apache/datafusion/blob/branch-49/dev/changelog/49.0.0.md";>changelog</a>.</p>
+<h2>Performance Improvements 🚀</h2>
+<p>DataFusion continues to focus on enhancing performance, as …</p> </div><!-- 
/.entry-content -->
+        </article></li>
         <li><article class="hentry">
                 <header> <h2 class="entry-title"><a 
href="https://datafusion.apache.org/blog/2025/07/16/datafusion-48.0.0"; 
rel="bookmark" title="Permalink to Apache DataFusion 48.0.0 Released">Apache 
DataFusion 48.0.0 Released</a></h2> </header>
                 <footer class="post-info">
diff --git a/output/category/blog.html b/output/category/blog.html
index 36077cc..4239d8f 100644
--- a/output/category/blog.html
+++ b/output/category/blog.html
@@ -21,6 +21,39 @@
 <h2>Articles in the blog category</h2>
 
 <ol id="post-list">
+        <li><article class="hentry">
+                <header> <h2 class="entry-title"><a 
href="https://datafusion.apache.org/blog/2025/07/28/datafusion-49.0.0"; 
rel="bookmark" title="Permalink to Apache DataFusion 49.0.0 Released">Apache 
DataFusion 49.0.0 Released</a></h2> </header>
+                <footer class="post-info">
+                    <time class="published" 
datetime="2025-07-28T00:00:00+00:00"> Mon 28 July 2025 </time>
+                    <address class="vcard author">By
+                        <a class="url fn" 
href="https://datafusion.apache.org/blog/author/pmc.html";>pmc</a>
+                    </address>
+                </footer><!-- /.post-info -->
+                <div class="entry-content"> <!--
+{% comment %}
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements.  See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to you under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License.  You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+{% endcomment %}
+-->
+<!-- see https://github.com/apache/datafusion/issues/16347 for details -->
+<h2>Introduction</h2>
+<p>We are proud to announce the release of <a 
href="https://crates.io/crates/datafusion/49.0.0";>DataFusion 49.0.0</a>. This 
blog post highlights some of
+the major improvements since the release of <a 
href="https://datafusion.apache.org/blog/2025/07/18/datafusion-48.0.0/";>DataFusion
 48.0.0</a>. The complete list of changes is available in the <a 
href="https://github.com/apache/datafusion/blob/branch-49/dev/changelog/49.0.0.md";>changelog</a>.</p>
+<h2>Performance Improvements 🚀</h2>
+<p>DataFusion continues to focus on enhancing performance, as …</p> </div><!-- 
/.entry-content -->
+        </article></li>
         <li><article class="hentry">
                 <header> <h2 class="entry-title"><a 
href="https://datafusion.apache.org/blog/2025/07/16/datafusion-48.0.0"; 
rel="bookmark" title="Permalink to Apache DataFusion 48.0.0 Released">Apache 
DataFusion 48.0.0 Released</a></h2> </header>
                 <footer class="post-info">
diff --git a/output/feed.xml b/output/feed.xml
index df10db0..70813e5 100644
--- a/output/feed.xml
+++ b/output/feed.xml
@@ -1,5 +1,28 @@
 <?xml version="1.0" encoding="utf-8"?>
-<rss version="2.0"><channel><title>Apache DataFusion 
Blog</title><link>https://datafusion.apache.org/blog/</link><description></description><lastBuildDate>Wed,
 16 Jul 2025 00:00:00 +0000</lastBuildDate><item><title>Apache DataFusion 
48.0.0 
Released</title><link>https://datafusion.apache.org/blog/2025/07/16/datafusion-48.0.0</link><description>&lt;!--
+<rss version="2.0"><channel><title>Apache DataFusion 
Blog</title><link>https://datafusion.apache.org/blog/</link><description></description><lastBuildDate>Mon,
 28 Jul 2025 00:00:00 +0000</lastBuildDate><item><title>Apache DataFusion 
49.0.0 
Released</title><link>https://datafusion.apache.org/blog/2025/07/28/datafusion-49.0.0</link><description>&lt;!--
+{% comment %}
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements.  See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to you under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License.  You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+{% endcomment %}
+--&gt;
+&lt;!-- see https://github.com/apache/datafusion/issues/16347 for details 
--&gt;
+&lt;h2&gt;Introduction&lt;/h2&gt;
+&lt;p&gt;We are proud to announce the release of &lt;a 
href="https://crates.io/crates/datafusion/49.0.0"&gt;DataFusion 
49.0.0&lt;/a&gt;. This blog post highlights some of
+the major improvements since the release of &lt;a 
href="https://datafusion.apache.org/blog/2025/07/18/datafusion-48.0.0/"&gt;DataFusion
 48.0.0&lt;/a&gt;. The complete list of changes is available in the &lt;a 
href="https://github.com/apache/datafusion/blob/branch-49/dev/changelog/49.0.0.md"&gt;changelog&lt;/a&gt;.&lt;/p&gt;
+&lt;h2&gt;Performance Improvements 🚀&lt;/h2&gt;
+&lt;p&gt;DataFusion continues to focus on enhancing performance, as 
…&lt;/p&gt;</description><dc:creator 
xmlns:dc="http://purl.org/dc/elements/1.1/";>pmc</dc:creator><pubDate>Mon, 28 
Jul 2025 00:00:00 +0000</pubDate><guid 
isPermaLink="false">tag:datafusion.apache.org,2025-07-28:/blog/2025/07/28/datafusion-49.0.0</guid><category>blog</category></item><item><title>Apache
 DataFusion 48.0.0 
Released</title><link>https://datafusion.apache.org/blog/2025/07/16/datafusion-48.0.0</link><descriptio
 [...]
 {% comment %}
 Licensed to the Apache Software Foundation (ASF) under one or more
 contributor license agreements.  See the NOTICE file distributed with
diff --git a/output/feeds/all-en.atom.xml b/output/feeds/all-en.atom.xml
index 3d9c139..0ac20fa 100644
--- a/output/feeds/all-en.atom.xml
+++ b/output/feeds/all-en.atom.xml
@@ -1,5 +1,311 @@
 <?xml version="1.0" encoding="utf-8"?>
-<feed xmlns="http://www.w3.org/2005/Atom";><title>Apache DataFusion 
Blog</title><link href="https://datafusion.apache.org/blog/"; 
rel="alternate"></link><link 
href="https://datafusion.apache.org/blog/feeds/all-en.atom.xml"; 
rel="self"></link><id>https://datafusion.apache.org/blog/</id><updated>2025-07-16T00:00:00+00:00</updated><subtitle></subtitle><entry><title>Apache
 DataFusion 48.0.0 Released</title><link 
href="https://datafusion.apache.org/blog/2025/07/16/datafusion-48.0.0"; 
rel="alterna [...]
+<feed xmlns="http://www.w3.org/2005/Atom";><title>Apache DataFusion 
Blog</title><link href="https://datafusion.apache.org/blog/"; 
rel="alternate"></link><link 
href="https://datafusion.apache.org/blog/feeds/all-en.atom.xml"; 
rel="self"></link><id>https://datafusion.apache.org/blog/</id><updated>2025-07-28T00:00:00+00:00</updated><subtitle></subtitle><entry><title>Apache
 DataFusion 49.0.0 Released</title><link 
href="https://datafusion.apache.org/blog/2025/07/28/datafusion-49.0.0"; 
rel="alterna [...]
+{% comment %}
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements.  See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to you under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License.  You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+{% endcomment %}
+--&gt;
+&lt;!-- see https://github.com/apache/datafusion/issues/16347 for details 
--&gt;
+&lt;h2&gt;Introduction&lt;/h2&gt;
+&lt;p&gt;We are proud to announce the release of &lt;a 
href="https://crates.io/crates/datafusion/49.0.0"&gt;DataFusion 
49.0.0&lt;/a&gt;. This blog post highlights some of
+the major improvements since the release of &lt;a 
href="https://datafusion.apache.org/blog/2025/07/18/datafusion-48.0.0/"&gt;DataFusion
 48.0.0&lt;/a&gt;. The complete list of changes is available in the &lt;a 
href="https://github.com/apache/datafusion/blob/branch-49/dev/changelog/49.0.0.md"&gt;changelog&lt;/a&gt;.&lt;/p&gt;
+&lt;h2&gt;Performance Improvements 🚀&lt;/h2&gt;
+&lt;p&gt;DataFusion continues to focus on enhancing performance, as 
…&lt;/p&gt;</summary><content type="html">&lt;!--
+{% comment %}
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements.  See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to you under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License.  You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+{% endcomment %}
+--&gt;
+&lt;!-- see https://github.com/apache/datafusion/issues/16347 for details 
--&gt;
+&lt;h2&gt;Introduction&lt;/h2&gt;
+&lt;p&gt;We are proud to announce the release of &lt;a 
href="https://crates.io/crates/datafusion/49.0.0"&gt;DataFusion 
49.0.0&lt;/a&gt;. This blog post highlights some of
+the major improvements since the release of &lt;a 
href="https://datafusion.apache.org/blog/2025/07/18/datafusion-48.0.0/"&gt;DataFusion
 48.0.0&lt;/a&gt;. The complete list of changes is available in the &lt;a 
href="https://github.com/apache/datafusion/blob/branch-49/dev/changelog/49.0.0.md"&gt;changelog&lt;/a&gt;.&lt;/p&gt;
+&lt;h2&gt;Performance Improvements 🚀&lt;/h2&gt;
+&lt;p&gt;DataFusion continues to focus on enhancing performance, as shown in 
the ClickBench and other results. &lt;/p&gt;
+&lt;p&gt;&lt;img alt="ClickBench performance results over time for DataFusion" 
class="img-responsive" 
src="/blog/images/datafusion-49.0.0/performance_over_time_clickbench.png" 
width="100%"/&gt;&lt;/p&gt;
+&lt;p&gt;&lt;strong&gt;Figure 1&lt;/strong&gt;: ClickBench performance 
improvements over time
+Average and median normalized query execution times for ClickBench queries for 
each git revision. 
+Query times are normalized using the ClickBench definition. Data and 
definitions on the 
+&lt;a href="https://alamb.github.io/datafusion-benchmarking/"&gt;DataFusion 
Benchmarking Page&lt;/a&gt;. &lt;/p&gt;
+&lt;!--
+NOTE: Andrew is working on gathering these numbers
+
+&lt;img
+src="/blog/images/datafusion-49.0.0/performance_over_time_planning.png"
+width="80%"
+class="img-responsive"
+alt="Planning benchmark performance results over time for DataFusion"
+/&gt;
+
+**Figure 2**: Planning benchmark performance improved XXX between DataFusion 
48.0.1 and DataFusion 49.0.0. Chart source: TODO
+--&gt;
+&lt;p&gt;Here are some noteworthy optimizations added since DataFusion 
48:&lt;/p&gt;
+&lt;p&gt;&lt;strong&gt;Equivalence system upgrade:&lt;/strong&gt; The lower 
levels of the equivalence system, which is used to implement the
+  optimizations described in &lt;a 
href="https://datafusion.apache.org/blog/2025/03/11/ordering-analysis"&gt;Using 
Ordering for Better Plans&lt;/a&gt;, were rewritten, leading to
+  much faster planning times, especially for queries with a &lt;a 
href="https://github.com/apache/datafusion/pull/16217#pullrequestreview-2891941229"&gt;large
 number of columns&lt;/a&gt;. This change also prepares
+  the way for more sophisticated sort-based optimizations in the future. (PR 
&lt;a 
href="https://github.com/apache/datafusion/pull/16217"&gt;#16217&lt;/a&gt; by 
&lt;a href="https://github.com/ozankabak"&gt;ozankabak&lt;/a&gt;).&lt;/p&gt;
+&lt;p&gt;&lt;strong&gt;Dynamic Filters and TopK 
pushdown&lt;/strong&gt;&lt;/p&gt;
+&lt;p&gt;DataFusion now supports dynamic filters, which are improved during 
query execution,
+and physical filter pushdown. Together, these features improve the performance 
of
+queries that use &lt;code&gt;LIMIT&lt;/code&gt; and &lt;code&gt;ORDER 
BY&lt;/code&gt; clauses, such as the following:&lt;/p&gt;
+&lt;pre&gt;&lt;code class="language-sql"&gt;SELECT *
+FROM data
+ORDER BY timestamp DESC
+LIMIT 10
+&lt;/code&gt;&lt;/pre&gt;
+&lt;p&gt;While the query above is simple, without dynamic filtering or knowing 
that the data
+is already sorted by &lt;code&gt;timestamp&lt;/code&gt;, a query engine must 
decode &lt;em&gt;all&lt;/em&gt; of the data to
+find the top 10 values. With the dynamic filters system, DataFusion applies an
+increasingly selective filter during query execution. It checks the 
&lt;strong&gt;current&lt;/strong&gt;
+top 10 values of the &lt;code&gt;timestamp&lt;/code&gt; column 
&lt;strong&gt;before&lt;/strong&gt; opening files or reading
+Parquet Row Groups and Data Pages, which can skip older data very 
quickly.&lt;/p&gt;
+&lt;p&gt;Dynamic predicates are a common feature of advanced engines such as 
&lt;a 
href="https://docs.starburst.io/latest/admin/dynamic-filtering.html"&gt;Dynamic
+Filters in Starburst&lt;/a&gt; and &lt;a 
href="https://www.snowflake.com/en/engineering-blog/optimizing-top-k-aggregation-snowflake/"&gt;Top-K
 Aggregation Optimization at Snowflake&lt;/a&gt;. The
+technique drastically improves query performance (we've seen over a 1.5x
+improvement for some TPC-H-style queries), especially in combination with late
+materialization and columnar file formats such as Parquet. We &lt;a 
href="https://github.com/apache/datafusion/issues/15513"&gt;plan to write a
+blog post&lt;/a&gt; explaining the details of this optimization in the future, 
and we expect to
+use the same mechanism to implement additional optimizations such as &lt;a 
href="https://github.com/apache/datafusion/issues/7955"&gt;Sideways
+Information Passing for joins&lt;/a&gt; (Issue
+&lt;a 
href="https://github.com/apache/datafusion/issues/15037"&gt;#15037&lt;/a&gt; PR
+&lt;a 
href="https://github.com/apache/datafusion/pull/15770"&gt;#15770&lt;/a&gt; by
+&lt;a href="https://github.com/adriangb"&gt;adriangb&lt;/a&gt;).&lt;/p&gt;
+&lt;h2&gt;Community Growth  📈&lt;/h2&gt;
+&lt;p&gt;The last few months, between &lt;code&gt;46.0.0&lt;/code&gt; and 
&lt;code&gt;49.0.0&lt;/code&gt;, have seen our community grow:&lt;/p&gt;
+&lt;ol&gt;
+&lt;li&gt;New PMC members and committers: &lt;a 
href="https://github.com/berkaysynnada"&gt;berkay&lt;/a&gt;, &lt;a 
href="https://github.com/xudong963"&gt;xudong963&lt;/a&gt; and &lt;a 
href="https://github.com/timsaucer"&gt;timsaucer&lt;/a&gt; joined the PMC.
+   &lt;a href="https://github.com/blaginin"&gt;blaginin&lt;/a&gt;, &lt;a 
href="https://github.com/milenkovicm"&gt;milenkovicm&lt;/a&gt;, &lt;a 
href="https://github.com/adriangb"&gt;adriangb&lt;/a&gt; and &lt;a 
href="https://github.com/kosiew"&gt;kosiew&lt;/a&gt; joined as committers. See 
the &lt;a 
href="https://lists.apache.org/[email protected]"&gt;mailing 
list&lt;/a&gt; for more details.&lt;/li&gt;
+&lt;li&gt;In the &lt;a 
href="https://github.com/apache/arrow-datafusion"&gt;core DataFusion 
repo&lt;/a&gt; alone, we reviewed and accepted over 850 PRs from 172 different
+   committers, created over 669 issues, and closed 379 of them 🚀. All changes 
are listed in the detailed
+   &lt;a 
href="https://github.com/apache/datafusion/tree/main/dev/changelog"&gt;changelogs&lt;/a&gt;.&lt;/li&gt;
+&lt;li&gt;DataFusion published a number of blog posts, including &lt;a 
href="https://datafusion.apache.org/blog/2025/04/19/user-defined-window-functions"&gt;User
 defined Window Functions&lt;/a&gt;, &lt;a 
href="https://datafusion.apache.org/blog/2025/06/15/optimizing-sql-dataframes-part-one"&gt;Optimizing
 SQL (and DataFrames)
+   in DataFusion part 1&lt;/a&gt;, &lt;a 
href="https://datafusion.apache.org/blog/2025/06/15/optimizing-sql-dataframes-part-two"&gt;part
 2&lt;/a&gt;, &lt;a 
href="https://datafusion.apache.org/blog/2025/06/30/cancellation"&gt;Using Rust 
async for Query Execution and Cancelling Long-Running Queries&lt;/a&gt;, and
+   &lt;a 
href="https://datafusion.apache.org/blog/2025/07/14/user-defined-parquet-indexes/"&gt;Embedding
 User-Defined Indexes in Apache Parquet Files&lt;/a&gt;.&lt;/li&gt;
+&lt;/ol&gt;
+&lt;!--
+# Unique committers
+$ git shortlog -sn 46.0.0..49.0.0-rc1  .| wc -l
+     172
+# commits
+$ git log --pretty=oneline 46.0.0..49.0.0-rc1 . | wc -l
+     884
+
+
+https://crates.io/crates/datafusion/49.0.0
+DataFusion 49 released July 25, 2025
+
+https://crates.io/crates/datafusion/46.0.0
+DataFusion 46 released March 7, 2025
+
+Issues created in this time: 290 open, 379 closed = 669 total
+https://github.com/apache/datafusion/issues?q=is%3Aissue+created%3A2025-03-07..2025-07-25
+
+Issues closed: 508
+https://github.com/apache/arrow-datafusion/issues?q=is%3Aissue+closed%3A2025-03-07..2025-07-25
+
+PRs merged in this time 874
+https://github.com/apache/arrow-datafusion/pulls?q=is%3Apr+merged%3A2025-03-07..2025-07-25
+
+--&gt;
+&lt;h2&gt;New Features ✨&lt;/h2&gt;
+&lt;h3&gt;Async User-Defined Functions&lt;/h3&gt;
+&lt;p&gt;It is now possible to write &lt;code&gt;async&lt;/code&gt; 
User-Defined Functions
+(UDFs) in DataFusion that perform asynchronous
+operations, such as network requests or database queries, without blocking the
+execution of the query. This enables new use cases, such as
+integrating with large language models (LLMs) or other external services, and 
we can't
+wait to see what the community builds with it.&lt;/p&gt;
+&lt;p&gt;See the &lt;a 
href="https://datafusion.apache.org/library-user-guide/functions/adding-udfs.html"&gt;documentation&lt;/a&gt;
 for more details and the &lt;a 
href="https://github.com/apache/datafusion/blob/main/datafusion-examples/examples/async_udf.rs"&gt;async
 UDF example&lt;/a&gt; for
+working code. &lt;/p&gt;
+&lt;p&gt;You could, for example, implement a function 
&lt;code&gt;ask_llm&lt;/code&gt; that asks a large language model
+(LLM) service a question based on the content of two columns.&lt;/p&gt;
+&lt;pre&gt;&lt;code class="language-sql"&gt;SELECT * 
+FROM animal a
+WHERE ask_llm(a.name, 'Is this animal furry?')")
+&lt;/code&gt;&lt;/pre&gt;
+&lt;p&gt;The implementation of an async UDF is almost identical to a normal
+UDF, except that it must implement the 
&lt;code&gt;AsyncScalarUDFImpl&lt;/code&gt; trait in addition to 
&lt;code&gt;ScalarUDFImpl&lt;/code&gt; and
+provide an &lt;code&gt;async&lt;/code&gt; implementation via 
&lt;code&gt;invoke_async_with_args&lt;/code&gt;:&lt;/p&gt;
+&lt;pre&gt;&lt;code class="language-rust"&gt;#[derive(Debug)]
+struct AskLLM {
+    signature: Signature,
+}
+
+#[async_trait]
+impl AsyncScalarUDFImpl for AskLLM {
+    /// The `invoke_async_with_args` method is similar to `invoke_with_args`,
+    /// but it returns a `Future` that resolves to the result.
+    ///
+    /// Since this signature is `async`, it can do any `async` operations, such
+    /// as network requests.
+    async fn invoke_async_with_args(
+        &amp;amp;self,
+        args: ScalarFunctionArgs,
+        options: &amp;amp;ConfigOptions,
+    ) -&amp;gt; Result&amp;lt;ArrayRef&amp;gt; {
+        // Converts the arguments to arrays for simplicity.
+        let args = ColumnarValue::values_to_arrays(&amp;amp;args.args)?;
+        let [column_of_interest, question] = take_function_args(self.name(), 
args)?;
+        let client = Client::new();
+
+        // Make a network request to a hypothetical LLM service
+        let res = client
+            .post(URI)
+            .headers(get_llm_headers(options))
+            .json(&amp;amp;req)
+            .send()
+            .await?
+            .json::&amp;lt;LLMResponse&amp;gt;()
+            .await?;
+
+        let results = extract_results_from_llm_response(&amp;amp;res);
+
+        Ok(Arc::new(results))
+    }
+}
+&lt;/code&gt;&lt;/pre&gt;
+&lt;p&gt;(Issue &lt;a 
href="https://github.com/apache/datafusion/issues/6518"&gt;#6518&lt;/a&gt;,
+&lt;a href="https://github.com/apache/datafusion/pull/14837"&gt;PR 
#14837&lt;/a&gt; from
+&lt;a href="https://github.com/goldmedal"&gt;goldmedal&lt;/a&gt; 🏆)&lt;/p&gt;
+&lt;h3&gt;Better Cancellation for Certain Long-Running Queries&lt;/h3&gt;
+&lt;p&gt;In rare cases, it was previously not possible to cancel long-running 
queries,
+leading to unresponsiveness. Other projects would likely have fixed this issue
+by treating the symptom, but &lt;a 
href="https://github.com/pepijnve"&gt;pepijnve&lt;/a&gt; and the DataFusion 
community worked together to
+treat the root cause. The general solution required a deep understanding of the
+DataFusion execution engine, Rust &lt;code&gt;Streams&lt;/code&gt;, and the 
tokio cooperative
+scheduling model. The &lt;a 
href="https://github.com/apache/datafusion/pull/16398"&gt;resulting 
PR&lt;/a&gt; is a model of careful
+community engineering and a great example of using Rust's 
&lt;code&gt;async&lt;/code&gt; ecosystem
+to implement complex functionality. It even resulted in a &lt;a 
href="https://github.com/tokio-rs/tokio/pull/7405"&gt;contribution upstream to 
tokio&lt;/a&gt;
+(since accepted). See the &lt;a 
href="https://datafusion.apache.org/blog/2025/06/30/cancellation"&gt;blog 
post&lt;/a&gt; for more details.&lt;/p&gt;
+&lt;h3&gt;Metadata for User Defined Types such as 
&lt;code&gt;Variant&lt;/code&gt; and 
&lt;code&gt;Geometry&lt;/code&gt;&lt;/h3&gt;
+&lt;p&gt;User-defined types have been &lt;a 
href="https://github.com/apache/datafusion/issues/12644"&gt;a long-requested 
feature&lt;/a&gt;, and this release provides
+the low-level APIs to support them efficiently.&lt;/p&gt;
+&lt;ol&gt;
+&lt;li&gt;Metadata handling in PRs &lt;a 
href="https://github.com/apache/datafusion/pull/15646"&gt;#15646&lt;/a&gt; and 
&lt;a 
href="https://github.com/apache/datafusion/pull/16170"&gt;#16170&lt;/a&gt; from 
&lt;a href="https://github.com/timsaucer"&gt;timsaucer&lt;/a&gt;&lt;/li&gt;
+&lt;li&gt;Pushdown of filters and expressions (see "Dynamic Filters and TopK 
pushdown" section above)&lt;/li&gt;
+&lt;/ol&gt;
+&lt;p&gt;We still have some work to do to fully support user-defined types, 
specifically
+in documentation and testing, and we would
+love your help in this area. If you are interested in contributing,
+please see &lt;a 
href="https://github.com/apache/datafusion/issues/12644"&gt;issue 
#12644&lt;/a&gt;.&lt;/p&gt;
+&lt;h3&gt;Parquet Modular Encryption&lt;/h3&gt;
+&lt;p&gt;DataFusion now supports reading and writing encrypted &lt;a 
href="https://parquet.apache.org/"&gt;Apache Parquet&lt;/a&gt; files with &lt;a 
href="https://parquet.apache.org/docs/file-format/data-pages/encryption/"&gt;modular
+encryption&lt;/a&gt;. This allows users to encrypt specific columns in a 
Parquet file
+using different keys, while still being able to read data without needing to
+decrypt the entire file.&lt;/p&gt;
+&lt;p&gt;Here is an example of how to configure DataFusion to read an 
encrypted Parquet
+table with two columns, &lt;code&gt;double_field&lt;/code&gt; and 
&lt;code&gt;float_field&lt;/code&gt;, using modular
+encryption:&lt;/p&gt;
+&lt;pre&gt;&lt;code class="language-sql"&gt;CREATE EXTERNAL TABLE 
encrypted_parquet_table
+(
+double_field double,
+float_field float
+)
+STORED AS PARQUET LOCATION 'pq/' OPTIONS (
+    -- encryption
+    'format.crypto.file_encryption.encrypt_footer' 'true',
+    'format.crypto.file_encryption.footer_key_as_hex' 
'30313233343536373839303132333435',  -- b"0123456789012345"
+    'format.crypto.file_encryption.column_key_as_hex::double_field' 
'31323334353637383930313233343530', -- b"1234567890123450"
+    'format.crypto.file_encryption.column_key_as_hex::float_field' 
'31323334353637383930313233343531', -- b"1234567890123451"
+    -- decryption
+    'format.crypto.file_decryption.footer_key_as_hex' 
'30313233343536373839303132333435', -- b"0123456789012345"
+    'format.crypto.file_decryption.column_key_as_hex::double_field' 
'31323334353637383930313233343530', -- b"1234567890123450"
+    'format.crypto.file_decryption.column_key_as_hex::float_field' 
'31323334353637383930313233343531', -- b"1234567890123451"
+);
+&lt;/code&gt;&lt;/pre&gt;
+&lt;p&gt;(&lt;a 
href="https://github.com/apache/datafusion/issues/15216"&gt;Issue 
#15216&lt;/a&gt;,
+&lt;a href="https://github.com/apache/datafusion/pull/16351"&gt;PR 
#16351&lt;/a&gt;
+from &lt;a href="https://github.com/corwinjoy"&gt;corwinjoy&lt;/a&gt; and 
&lt;a href="https://github.com/adamreeve"&gt;adamreeve&lt;/a&gt;)&lt;/p&gt;
+&lt;h3&gt;Support for &lt;code&gt;WITHIN GROUP&lt;/code&gt; for Ordered-Set 
Aggregate Functions&lt;/h3&gt;
+&lt;p&gt;DataFusion now supports the &lt;code&gt;WITHIN GROUP&lt;/code&gt; 
clause for &lt;a 
href="https://www.postgresql.org/docs/9.4/functions-aggregate.html#FUNCTIONS-ORDEREDSET-TABLE"&gt;ordered-set
 aggregate
+functions&lt;/a&gt; such as &lt;code&gt;approx_percentile_cont&lt;/code&gt;, 
&lt;code&gt;percentile_cont&lt;/code&gt;, and
+&lt;code&gt;percentile_disc&lt;/code&gt;, which allows users to specify the 
precise order.&lt;/p&gt;
+&lt;p&gt;For example, the following query computes the 50th percentile for the 
&lt;code&gt;temperature&lt;/code&gt; column
+in the &lt;code&gt;city_data&lt;/code&gt; table, ordered by 
&lt;code&gt;date&lt;/code&gt;:&lt;/p&gt;
+&lt;pre&gt;&lt;code class="language-sql"&gt;SELECT
+    percentile_disc(0.5) WITHIN GROUP (ORDER BY date) AS median_temperature
+FROM city_data;
+&lt;/code&gt;&lt;/pre&gt;
+&lt;p&gt;(Issue &lt;a 
href="https://github.com/apache/datafusion/issues/11732"&gt;#11732&lt;/a&gt;, 
+PR &lt;a 
href="https://github.com/apache/datafusion/pull/13511"&gt;#13511&lt;/a&gt;,
+by &lt;a href="https://github.com/Garamda"&gt;Garamda&lt;/a&gt;)&lt;/p&gt;
+&lt;h3&gt;Compressed Spill Files&lt;/h3&gt;
+&lt;p&gt;DataFusion now supports compressing the files written to disk when 
spilling
+larger-than-memory datasets while sorting and grouping. Using compression
+can significantly reduce the
+size of the intermediate files and improve performance when reading them back 
into memory.&lt;/p&gt;
+&lt;p&gt;(Issue &lt;a 
href="https://github.com/apache/datafusion/issues/16130"&gt;#16130&lt;/a&gt;,
+PR &lt;a 
href="https://github.com/apache/datafusion/pull/16268"&gt;#16268&lt;/a&gt;
+by &lt;a 
href="https://github.com/ding-young"&gt;ding-young&lt;/a&gt;)&lt;/p&gt;
+&lt;h3&gt;Support for &lt;code&gt;REGEX_INSTR&lt;/code&gt; function&lt;/h3&gt;
+&lt;p&gt;DataFusion now supports the [&lt;code&gt;REGEXP_INSTR&lt;/code&gt; 
function], which returns the position of a
+regular expression match within a string.&lt;/p&gt;
+&lt;p&gt;For example, to find the position of the first match of the regular 
expression
+&lt;code&gt;C(.)(..)&lt;/code&gt; in the string 
&lt;code&gt;ABCDEF&lt;/code&gt;, you can use:&lt;/p&gt;
+&lt;pre&gt;&lt;code class="language-sql"&gt;&amp;gt; SELECT 
regexp_instr('ABCDEF', 'C(.)(..)');
++---------------------------------------------------------------+
+| regexp_instr(Utf8("ABCDEF"),Utf8("C(.)(..)"))                 |
++---------------------------------------------------------------+
+| 3                                                             |
++---------------------------------------------------------------+
+&lt;/code&gt;&lt;/pre&gt;
+&lt;p&gt;(&lt;a 
href="https://github.com/apache/datafusion/issues/13009"&gt;Issue 
#13009&lt;/a&gt;,
+&lt;a href="https://github.com/apache/datafusion/pull/15928"&gt;PR 
#15928&lt;/a&gt;
+by &lt;a href="https://github.com/nirnayroy"&gt;nirnayroy&lt;/a&gt;)&lt;/p&gt;
+&lt;h2&gt;Upgrade Guide and Changelog&lt;/h2&gt;
+&lt;p&gt;Upgrading to 49.0.0 should be straightforward for most users. Please 
review the
+&lt;a 
href="https://datafusion.apache.org/library-user-guide/upgrading.html"&gt;Upgrade
 Guide&lt;/a&gt;
+for details on breaking changes and code snippets to help with the transition.
+Recently, some users have reported success automatically upgrading DataFusion 
by
+pairing AI tools with the upgrade guide. For a comprehensive list of all 
changes,
+please refer to the &lt;a 
href="https://github.com/apache/datafusion/blob/branch-49/dev/changelog/49.0.0.md"&gt;changelog&lt;/a&gt;.&lt;/p&gt;
+&lt;h2&gt;About DataFusion&lt;/h2&gt;
+&lt;p&gt;&lt;a href="https://datafusion.apache.org/"&gt;Apache 
DataFusion&lt;/a&gt; is an extensible query engine, written in &lt;a 
href="https://www.rust-lang.org/"&gt;Rust&lt;/a&gt;, that
+uses &lt;a href="https://arrow.apache.org"&gt;Apache Arrow&lt;/a&gt; as its 
in-memory format. DataFusion is used by developers to
+create new, fast, data-centric systems such as databases, dataframe libraries,
+and machine learning and streaming applications. While &lt;a 
href="https://datafusion.apache.org/user-guide/introduction.html#project-goals"&gt;DataFusion&amp;rsquo;s
 primary design
+goal&lt;/a&gt; is to accelerate the creation of other data-centric systems, it 
provides a
+reasonable experience directly out of the box as a &lt;a 
href="https://datafusion.apache.org/user-guide/dataframe.html"&gt;dataframe 
library&lt;/a&gt;,
+&lt;a href="https://datafusion.apache.org/python/"&gt;python 
library&lt;/a&gt;, and [command-line SQL tool].&lt;/p&gt;
+&lt;p&gt;DataFusion's core thesis is that as a community, together we can 
build much more
+advanced technology than any of us as individuals or companies could do alone.
+Without DataFusion, highly performant vectorized query engines would remain
+the domain of a few large companies and world-class research institutions.
+With DataFusion, we can all build on top of a shared foundation and focus on
+what makes our projects unique.&lt;/p&gt;
+&lt;h2&gt;How to Get Involved&lt;/h2&gt;
+&lt;p&gt;DataFusion is not a project built or driven by a single person, 
company, or
+foundation. Rather, our community of users and contributors works together to
+build a shared technology that none of us could have built alone.&lt;/p&gt;
+&lt;p&gt;If you are interested in joining us, we would love to have you. You 
can try out
+DataFusion on some of your own data and projects and let us know how it goes,
+contribute suggestions, documentation, bug reports, or a PR with documentation,
+tests, or code. A list of open issues suitable for beginners is &lt;a 
href="https://github.com/apache/arrow-datafusion/issues?q=is%3Aissue+is%3Aopen+label%3A%22good+first+issue%22"&gt;here&lt;/a&gt;,
 and you
+can find out how to reach us on the &lt;a 
href="https://datafusion.apache.org/contributor-guide/communication.html"&gt;communication
 doc&lt;/a&gt;.&lt;/p&gt;</content><category 
term="blog"></category></entry><entry><title>Apache DataFusion 48.0.0 
Released</title><link 
href="https://datafusion.apache.org/blog/2025/07/16/datafusion-48.0.0"; 
rel="alternate"></link><published>2025-07-16T00:00:00+00:00</published><updated>2025-07-16T00:00:00+00:00</updated><author><name>PMC</name></author><id>
 [...]
 {% comment %}
 Licensed to the Apache Software Foundation (ASF) under one or more
 contributor license agreements.  See the NOTICE file distributed with
diff --git a/output/feeds/blog.atom.xml b/output/feeds/blog.atom.xml
index bb30323..c4ad548 100644
--- a/output/feeds/blog.atom.xml
+++ b/output/feeds/blog.atom.xml
@@ -1,5 +1,311 @@
 <?xml version="1.0" encoding="utf-8"?>
-<feed xmlns="http://www.w3.org/2005/Atom";><title>Apache DataFusion Blog - 
blog</title><link href="https://datafusion.apache.org/blog/"; 
rel="alternate"></link><link 
href="https://datafusion.apache.org/blog/feeds/blog.atom.xml"; 
rel="self"></link><id>https://datafusion.apache.org/blog/</id><updated>2025-07-16T00:00:00+00:00</updated><subtitle></subtitle><entry><title>Apache
 DataFusion 48.0.0 Released</title><link 
href="https://datafusion.apache.org/blog/2025/07/16/datafusion-48.0.0"; rel="al 
[...]
+<feed xmlns="http://www.w3.org/2005/Atom";><title>Apache DataFusion Blog - 
blog</title><link href="https://datafusion.apache.org/blog/"; 
rel="alternate"></link><link 
href="https://datafusion.apache.org/blog/feeds/blog.atom.xml"; 
rel="self"></link><id>https://datafusion.apache.org/blog/</id><updated>2025-07-28T00:00:00+00:00</updated><subtitle></subtitle><entry><title>Apache
 DataFusion 49.0.0 Released</title><link 
href="https://datafusion.apache.org/blog/2025/07/28/datafusion-49.0.0"; rel="al 
[...]
+{% comment %}
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements.  See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to you under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License.  You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+{% endcomment %}
+--&gt;
+&lt;!-- see https://github.com/apache/datafusion/issues/16347 for details 
--&gt;
+&lt;h2&gt;Introduction&lt;/h2&gt;
+&lt;p&gt;We are proud to announce the release of &lt;a 
href="https://crates.io/crates/datafusion/49.0.0"&gt;DataFusion 
49.0.0&lt;/a&gt;. This blog post highlights some of
+the major improvements since the release of &lt;a 
href="https://datafusion.apache.org/blog/2025/07/18/datafusion-48.0.0/"&gt;DataFusion
 48.0.0&lt;/a&gt;. The complete list of changes is available in the &lt;a 
href="https://github.com/apache/datafusion/blob/branch-49/dev/changelog/49.0.0.md"&gt;changelog&lt;/a&gt;.&lt;/p&gt;
+&lt;h2&gt;Performance Improvements 🚀&lt;/h2&gt;
+&lt;p&gt;DataFusion continues to focus on enhancing performance, as 
…&lt;/p&gt;</summary><content type="html">&lt;!--
+{% comment %}
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements.  See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to you under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License.  You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+{% endcomment %}
+--&gt;
+&lt;!-- see https://github.com/apache/datafusion/issues/16347 for details 
--&gt;
+&lt;h2&gt;Introduction&lt;/h2&gt;
+&lt;p&gt;We are proud to announce the release of &lt;a 
href="https://crates.io/crates/datafusion/49.0.0"&gt;DataFusion 
49.0.0&lt;/a&gt;. This blog post highlights some of
+the major improvements since the release of &lt;a 
href="https://datafusion.apache.org/blog/2025/07/18/datafusion-48.0.0/"&gt;DataFusion
 48.0.0&lt;/a&gt;. The complete list of changes is available in the &lt;a 
href="https://github.com/apache/datafusion/blob/branch-49/dev/changelog/49.0.0.md"&gt;changelog&lt;/a&gt;.&lt;/p&gt;
+&lt;h2&gt;Performance Improvements 🚀&lt;/h2&gt;
+&lt;p&gt;DataFusion continues to focus on enhancing performance, as shown in 
the ClickBench and other results. &lt;/p&gt;
+&lt;p&gt;&lt;img alt="ClickBench performance results over time for DataFusion" 
class="img-responsive" 
src="/blog/images/datafusion-49.0.0/performance_over_time_clickbench.png" 
width="100%"/&gt;&lt;/p&gt;
+&lt;p&gt;&lt;strong&gt;Figure 1&lt;/strong&gt;: ClickBench performance 
improvements over time
+Average and median normalized query execution times for ClickBench queries for 
each git revision. 
+Query times are normalized using the ClickBench definition. Data and 
definitions on the 
+&lt;a href="https://alamb.github.io/datafusion-benchmarking/"&gt;DataFusion 
Benchmarking Page&lt;/a&gt;. &lt;/p&gt;
+&lt;!--
+NOTE: Andrew is working on gathering these numbers
+
+&lt;img
+src="/blog/images/datafusion-49.0.0/performance_over_time_planning.png"
+width="80%"
+class="img-responsive"
+alt="Planning benchmark performance results over time for DataFusion"
+/&gt;
+
+**Figure 2**: Planning benchmark performance improved XXX between DataFusion 
48.0.1 and DataFusion 49.0.0. Chart source: TODO
+--&gt;
+&lt;p&gt;Here are some noteworthy optimizations added since DataFusion 
48:&lt;/p&gt;
+&lt;p&gt;&lt;strong&gt;Equivalence system upgrade:&lt;/strong&gt; The lower 
levels of the equivalence system, which is used to implement the
+  optimizations described in &lt;a 
href="https://datafusion.apache.org/blog/2025/03/11/ordering-analysis"&gt;Using 
Ordering for Better Plans&lt;/a&gt;, were rewritten, leading to
+  much faster planning times, especially for queries with a &lt;a 
href="https://github.com/apache/datafusion/pull/16217#pullrequestreview-2891941229"&gt;large
 number of columns&lt;/a&gt;. This change also prepares
+  the way for more sophisticated sort-based optimizations in the future. (PR 
&lt;a 
href="https://github.com/apache/datafusion/pull/16217"&gt;#16217&lt;/a&gt; by 
&lt;a href="https://github.com/ozankabak"&gt;ozankabak&lt;/a&gt;).&lt;/p&gt;
+&lt;p&gt;&lt;strong&gt;Dynamic Filters and TopK 
pushdown&lt;/strong&gt;&lt;/p&gt;
+&lt;p&gt;DataFusion now supports dynamic filters, which are improved during 
query execution,
+and physical filter pushdown. Together, these features improve the performance 
of
+queries that use &lt;code&gt;LIMIT&lt;/code&gt; and &lt;code&gt;ORDER 
BY&lt;/code&gt; clauses, such as the following:&lt;/p&gt;
+&lt;pre&gt;&lt;code class="language-sql"&gt;SELECT *
+FROM data
+ORDER BY timestamp DESC
+LIMIT 10
+&lt;/code&gt;&lt;/pre&gt;
+&lt;p&gt;While the query above is simple, without dynamic filtering or knowing 
that the data
+is already sorted by &lt;code&gt;timestamp&lt;/code&gt;, a query engine must 
decode &lt;em&gt;all&lt;/em&gt; of the data to
+find the top 10 values. With the dynamic filters system, DataFusion applies an
+increasingly selective filter during query execution. It checks the 
&lt;strong&gt;current&lt;/strong&gt;
+top 10 values of the &lt;code&gt;timestamp&lt;/code&gt; column 
&lt;strong&gt;before&lt;/strong&gt; opening files or reading
+Parquet Row Groups and Data Pages, which can skip older data very 
quickly.&lt;/p&gt;
+&lt;p&gt;Dynamic predicates are a common feature of advanced engines such as 
&lt;a 
href="https://docs.starburst.io/latest/admin/dynamic-filtering.html"&gt;Dynamic
+Filters in Starburst&lt;/a&gt; and &lt;a 
href="https://www.snowflake.com/en/engineering-blog/optimizing-top-k-aggregation-snowflake/"&gt;Top-K
 Aggregation Optimization at Snowflake&lt;/a&gt;. The
+technique drastically improves query performance (we've seen over a 1.5x
+improvement for some TPC-H-style queries), especially in combination with late
+materialization and columnar file formats such as Parquet. We &lt;a 
href="https://github.com/apache/datafusion/issues/15513"&gt;plan to write a
+blog post&lt;/a&gt; explaining the details of this optimization in the future, 
and we expect to
+use the same mechanism to implement additional optimizations such as &lt;a 
href="https://github.com/apache/datafusion/issues/7955"&gt;Sideways
+Information Passing for joins&lt;/a&gt; (Issue
+&lt;a 
href="https://github.com/apache/datafusion/issues/15037"&gt;#15037&lt;/a&gt; PR
+&lt;a 
href="https://github.com/apache/datafusion/pull/15770"&gt;#15770&lt;/a&gt; by
+&lt;a href="https://github.com/adriangb"&gt;adriangb&lt;/a&gt;).&lt;/p&gt;
+&lt;h2&gt;Community Growth  📈&lt;/h2&gt;
+&lt;p&gt;The last few months, between &lt;code&gt;46.0.0&lt;/code&gt; and 
&lt;code&gt;49.0.0&lt;/code&gt;, have seen our community grow:&lt;/p&gt;
+&lt;ol&gt;
+&lt;li&gt;New PMC members and committers: &lt;a 
href="https://github.com/berkaysynnada"&gt;berkay&lt;/a&gt;, &lt;a 
href="https://github.com/xudong963"&gt;xudong963&lt;/a&gt; and &lt;a 
href="https://github.com/timsaucer"&gt;timsaucer&lt;/a&gt; joined the PMC.
+   &lt;a href="https://github.com/blaginin"&gt;blaginin&lt;/a&gt;, &lt;a 
href="https://github.com/milenkovicm"&gt;milenkovicm&lt;/a&gt;, &lt;a 
href="https://github.com/adriangb"&gt;adriangb&lt;/a&gt; and &lt;a 
href="https://github.com/kosiew"&gt;kosiew&lt;/a&gt; joined as committers. See 
the &lt;a 
href="https://lists.apache.org/[email protected]"&gt;mailing 
list&lt;/a&gt; for more details.&lt;/li&gt;
+&lt;li&gt;In the &lt;a 
href="https://github.com/apache/arrow-datafusion"&gt;core DataFusion 
repo&lt;/a&gt; alone, we reviewed and accepted over 850 PRs from 172 different
+   committers, created over 669 issues, and closed 379 of them 🚀. All changes 
are listed in the detailed
+   &lt;a 
href="https://github.com/apache/datafusion/tree/main/dev/changelog"&gt;changelogs&lt;/a&gt;.&lt;/li&gt;
+&lt;li&gt;DataFusion published a number of blog posts, including &lt;a 
href="https://datafusion.apache.org/blog/2025/04/19/user-defined-window-functions"&gt;User
 defined Window Functions&lt;/a&gt;, &lt;a 
href="https://datafusion.apache.org/blog/2025/06/15/optimizing-sql-dataframes-part-one"&gt;Optimizing
 SQL (and DataFrames)
+   in DataFusion part 1&lt;/a&gt;, &lt;a 
href="https://datafusion.apache.org/blog/2025/06/15/optimizing-sql-dataframes-part-two"&gt;part
 2&lt;/a&gt;, &lt;a 
href="https://datafusion.apache.org/blog/2025/06/30/cancellation"&gt;Using Rust 
async for Query Execution and Cancelling Long-Running Queries&lt;/a&gt;, and
+   &lt;a 
href="https://datafusion.apache.org/blog/2025/07/14/user-defined-parquet-indexes/"&gt;Embedding
 User-Defined Indexes in Apache Parquet Files&lt;/a&gt;.&lt;/li&gt;
+&lt;/ol&gt;
+&lt;!--
+# Unique committers
+$ git shortlog -sn 46.0.0..49.0.0-rc1  .| wc -l
+     172
+# commits
+$ git log --pretty=oneline 46.0.0..49.0.0-rc1 . | wc -l
+     884
+
+
+https://crates.io/crates/datafusion/49.0.0
+DataFusion 49 released July 25, 2025
+
+https://crates.io/crates/datafusion/46.0.0
+DataFusion 46 released March 7, 2025
+
+Issues created in this time: 290 open, 379 closed = 669 total
+https://github.com/apache/datafusion/issues?q=is%3Aissue+created%3A2025-03-07..2025-07-25
+
+Issues closed: 508
+https://github.com/apache/arrow-datafusion/issues?q=is%3Aissue+closed%3A2025-03-07..2025-07-25
+
+PRs merged in this time 874
+https://github.com/apache/arrow-datafusion/pulls?q=is%3Apr+merged%3A2025-03-07..2025-07-25
+
+--&gt;
+&lt;h2&gt;New Features ✨&lt;/h2&gt;
+&lt;h3&gt;Async User-Defined Functions&lt;/h3&gt;
+&lt;p&gt;It is now possible to write &lt;code&gt;async&lt;/code&gt; 
User-Defined Functions
+(UDFs) in DataFusion that perform asynchronous
+operations, such as network requests or database queries, without blocking the
+execution of the query. This enables new use cases, such as
+integrating with large language models (LLMs) or other external services, and 
we can't
+wait to see what the community builds with it.&lt;/p&gt;
+&lt;p&gt;See the &lt;a 
href="https://datafusion.apache.org/library-user-guide/functions/adding-udfs.html"&gt;documentation&lt;/a&gt;
 for more details and the &lt;a 
href="https://github.com/apache/datafusion/blob/main/datafusion-examples/examples/async_udf.rs"&gt;async
 UDF example&lt;/a&gt; for
+working code. &lt;/p&gt;
+&lt;p&gt;You could, for example, implement a function 
&lt;code&gt;ask_llm&lt;/code&gt; that asks a large language model
+(LLM) service a question based on the content of two columns.&lt;/p&gt;
+&lt;pre&gt;&lt;code class="language-sql"&gt;SELECT * 
+FROM animal a
+WHERE ask_llm(a.name, 'Is this animal furry?')")
+&lt;/code&gt;&lt;/pre&gt;
+&lt;p&gt;The implementation of an async UDF is almost identical to a normal
+UDF, except that it must implement the 
&lt;code&gt;AsyncScalarUDFImpl&lt;/code&gt; trait in addition to 
&lt;code&gt;ScalarUDFImpl&lt;/code&gt; and
+provide an &lt;code&gt;async&lt;/code&gt; implementation via 
&lt;code&gt;invoke_async_with_args&lt;/code&gt;:&lt;/p&gt;
+&lt;pre&gt;&lt;code class="language-rust"&gt;#[derive(Debug)]
+struct AskLLM {
+    signature: Signature,
+}
+
+#[async_trait]
+impl AsyncScalarUDFImpl for AskLLM {
+    /// The `invoke_async_with_args` method is similar to `invoke_with_args`,
+    /// but it returns a `Future` that resolves to the result.
+    ///
+    /// Since this signature is `async`, it can do any `async` operations, such
+    /// as network requests.
+    async fn invoke_async_with_args(
+        &amp;amp;self,
+        args: ScalarFunctionArgs,
+        options: &amp;amp;ConfigOptions,
+    ) -&amp;gt; Result&amp;lt;ArrayRef&amp;gt; {
+        // Converts the arguments to arrays for simplicity.
+        let args = ColumnarValue::values_to_arrays(&amp;amp;args.args)?;
+        let [column_of_interest, question] = take_function_args(self.name(), 
args)?;
+        let client = Client::new();
+
+        // Make a network request to a hypothetical LLM service
+        let res = client
+            .post(URI)
+            .headers(get_llm_headers(options))
+            .json(&amp;amp;req)
+            .send()
+            .await?
+            .json::&amp;lt;LLMResponse&amp;gt;()
+            .await?;
+
+        let results = extract_results_from_llm_response(&amp;amp;res);
+
+        Ok(Arc::new(results))
+    }
+}
+&lt;/code&gt;&lt;/pre&gt;
+&lt;p&gt;(Issue &lt;a 
href="https://github.com/apache/datafusion/issues/6518"&gt;#6518&lt;/a&gt;,
+&lt;a href="https://github.com/apache/datafusion/pull/14837"&gt;PR 
#14837&lt;/a&gt; from
+&lt;a href="https://github.com/goldmedal"&gt;goldmedal&lt;/a&gt; 🏆)&lt;/p&gt;
+&lt;h3&gt;Better Cancellation for Certain Long-Running Queries&lt;/h3&gt;
+&lt;p&gt;In rare cases, it was previously not possible to cancel long-running 
queries,
+leading to unresponsiveness. Other projects would likely have fixed this issue
+by treating the symptom, but &lt;a 
href="https://github.com/pepijnve"&gt;pepijnve&lt;/a&gt; and the DataFusion 
community worked together to
+treat the root cause. The general solution required a deep understanding of the
+DataFusion execution engine, Rust &lt;code&gt;Streams&lt;/code&gt;, and the 
tokio cooperative
+scheduling model. The &lt;a 
href="https://github.com/apache/datafusion/pull/16398"&gt;resulting 
PR&lt;/a&gt; is a model of careful
+community engineering and a great example of using Rust's 
&lt;code&gt;async&lt;/code&gt; ecosystem
+to implement complex functionality. It even resulted in a &lt;a 
href="https://github.com/tokio-rs/tokio/pull/7405"&gt;contribution upstream to 
tokio&lt;/a&gt;
+(since accepted). See the &lt;a 
href="https://datafusion.apache.org/blog/2025/06/30/cancellation"&gt;blog 
post&lt;/a&gt; for more details.&lt;/p&gt;
+&lt;h3&gt;Metadata for User Defined Types such as 
&lt;code&gt;Variant&lt;/code&gt; and 
&lt;code&gt;Geometry&lt;/code&gt;&lt;/h3&gt;
+&lt;p&gt;User-defined types have been &lt;a 
href="https://github.com/apache/datafusion/issues/12644"&gt;a long-requested 
feature&lt;/a&gt;, and this release provides
+the low-level APIs to support them efficiently.&lt;/p&gt;
+&lt;ol&gt;
+&lt;li&gt;Metadata handling in PRs &lt;a 
href="https://github.com/apache/datafusion/pull/15646"&gt;#15646&lt;/a&gt; and 
&lt;a 
href="https://github.com/apache/datafusion/pull/16170"&gt;#16170&lt;/a&gt; from 
&lt;a href="https://github.com/timsaucer"&gt;timsaucer&lt;/a&gt;&lt;/li&gt;
+&lt;li&gt;Pushdown of filters and expressions (see "Dynamic Filters and TopK 
pushdown" section above)&lt;/li&gt;
+&lt;/ol&gt;
+&lt;p&gt;We still have some work to do to fully support user-defined types, 
specifically
+in documentation and testing, and we would
+love your help in this area. If you are interested in contributing,
+please see &lt;a 
href="https://github.com/apache/datafusion/issues/12644"&gt;issue 
#12644&lt;/a&gt;.&lt;/p&gt;
+&lt;h3&gt;Parquet Modular Encryption&lt;/h3&gt;
+&lt;p&gt;DataFusion now supports reading and writing encrypted &lt;a 
href="https://parquet.apache.org/"&gt;Apache Parquet&lt;/a&gt; files with &lt;a 
href="https://parquet.apache.org/docs/file-format/data-pages/encryption/"&gt;modular
+encryption&lt;/a&gt;. This allows users to encrypt specific columns in a 
Parquet file
+using different keys, while still being able to read data without needing to
+decrypt the entire file.&lt;/p&gt;
+&lt;p&gt;Here is an example of how to configure DataFusion to read an 
encrypted Parquet
+table with two columns, &lt;code&gt;double_field&lt;/code&gt; and 
&lt;code&gt;float_field&lt;/code&gt;, using modular
+encryption:&lt;/p&gt;
+&lt;pre&gt;&lt;code class="language-sql"&gt;CREATE EXTERNAL TABLE 
encrypted_parquet_table
+(
+double_field double,
+float_field float
+)
+STORED AS PARQUET LOCATION 'pq/' OPTIONS (
+    -- encryption
+    'format.crypto.file_encryption.encrypt_footer' 'true',
+    'format.crypto.file_encryption.footer_key_as_hex' 
'30313233343536373839303132333435',  -- b"0123456789012345"
+    'format.crypto.file_encryption.column_key_as_hex::double_field' 
'31323334353637383930313233343530', -- b"1234567890123450"
+    'format.crypto.file_encryption.column_key_as_hex::float_field' 
'31323334353637383930313233343531', -- b"1234567890123451"
+    -- decryption
+    'format.crypto.file_decryption.footer_key_as_hex' 
'30313233343536373839303132333435', -- b"0123456789012345"
+    'format.crypto.file_decryption.column_key_as_hex::double_field' 
'31323334353637383930313233343530', -- b"1234567890123450"
+    'format.crypto.file_decryption.column_key_as_hex::float_field' 
'31323334353637383930313233343531', -- b"1234567890123451"
+);
+&lt;/code&gt;&lt;/pre&gt;
+&lt;p&gt;(&lt;a 
href="https://github.com/apache/datafusion/issues/15216"&gt;Issue 
#15216&lt;/a&gt;,
+&lt;a href="https://github.com/apache/datafusion/pull/16351"&gt;PR 
#16351&lt;/a&gt;
+from &lt;a href="https://github.com/corwinjoy"&gt;corwinjoy&lt;/a&gt; and 
&lt;a href="https://github.com/adamreeve"&gt;adamreeve&lt;/a&gt;)&lt;/p&gt;
+&lt;h3&gt;Support for &lt;code&gt;WITHIN GROUP&lt;/code&gt; for Ordered-Set 
Aggregate Functions&lt;/h3&gt;
+&lt;p&gt;DataFusion now supports the &lt;code&gt;WITHIN GROUP&lt;/code&gt; 
clause for &lt;a 
href="https://www.postgresql.org/docs/9.4/functions-aggregate.html#FUNCTIONS-ORDEREDSET-TABLE"&gt;ordered-set
 aggregate
+functions&lt;/a&gt; such as &lt;code&gt;approx_percentile_cont&lt;/code&gt;, 
&lt;code&gt;percentile_cont&lt;/code&gt;, and
+&lt;code&gt;percentile_disc&lt;/code&gt;, which allows users to specify the 
precise order.&lt;/p&gt;
+&lt;p&gt;For example, the following query computes the 50th percentile for the 
&lt;code&gt;temperature&lt;/code&gt; column
+in the &lt;code&gt;city_data&lt;/code&gt; table, ordered by 
&lt;code&gt;date&lt;/code&gt;:&lt;/p&gt;
+&lt;pre&gt;&lt;code class="language-sql"&gt;SELECT
+    percentile_disc(0.5) WITHIN GROUP (ORDER BY date) AS median_temperature
+FROM city_data;
+&lt;/code&gt;&lt;/pre&gt;
+&lt;p&gt;(Issue &lt;a 
href="https://github.com/apache/datafusion/issues/11732"&gt;#11732&lt;/a&gt;, 
+PR &lt;a 
href="https://github.com/apache/datafusion/pull/13511"&gt;#13511&lt;/a&gt;,
+by &lt;a href="https://github.com/Garamda"&gt;Garamda&lt;/a&gt;)&lt;/p&gt;
+&lt;h3&gt;Compressed Spill Files&lt;/h3&gt;
+&lt;p&gt;DataFusion now supports compressing the files written to disk when 
spilling
+larger-than-memory datasets while sorting and grouping. Using compression
+can significantly reduce the
+size of the intermediate files and improve performance when reading them back 
into memory.&lt;/p&gt;
+&lt;p&gt;(Issue &lt;a 
href="https://github.com/apache/datafusion/issues/16130"&gt;#16130&lt;/a&gt;,
+PR &lt;a 
href="https://github.com/apache/datafusion/pull/16268"&gt;#16268&lt;/a&gt;
+by &lt;a 
href="https://github.com/ding-young"&gt;ding-young&lt;/a&gt;)&lt;/p&gt;
+&lt;h3&gt;Support for &lt;code&gt;REGEX_INSTR&lt;/code&gt; function&lt;/h3&gt;
+&lt;p&gt;DataFusion now supports the [&lt;code&gt;REGEXP_INSTR&lt;/code&gt; 
function], which returns the position of a
+regular expression match within a string.&lt;/p&gt;
+&lt;p&gt;For example, to find the position of the first match of the regular 
expression
+&lt;code&gt;C(.)(..)&lt;/code&gt; in the string 
&lt;code&gt;ABCDEF&lt;/code&gt;, you can use:&lt;/p&gt;
+&lt;pre&gt;&lt;code class="language-sql"&gt;&amp;gt; SELECT 
regexp_instr('ABCDEF', 'C(.)(..)');
++---------------------------------------------------------------+
+| regexp_instr(Utf8("ABCDEF"),Utf8("C(.)(..)"))                 |
++---------------------------------------------------------------+
+| 3                                                             |
++---------------------------------------------------------------+
+&lt;/code&gt;&lt;/pre&gt;
+&lt;p&gt;(&lt;a 
href="https://github.com/apache/datafusion/issues/13009"&gt;Issue 
#13009&lt;/a&gt;,
+&lt;a href="https://github.com/apache/datafusion/pull/15928"&gt;PR 
#15928&lt;/a&gt;
+by &lt;a href="https://github.com/nirnayroy"&gt;nirnayroy&lt;/a&gt;)&lt;/p&gt;
+&lt;h2&gt;Upgrade Guide and Changelog&lt;/h2&gt;
+&lt;p&gt;Upgrading to 49.0.0 should be straightforward for most users. Please 
review the
+&lt;a 
href="https://datafusion.apache.org/library-user-guide/upgrading.html"&gt;Upgrade
 Guide&lt;/a&gt;
+for details on breaking changes and code snippets to help with the transition.
+Recently, some users have reported success automatically upgrading DataFusion 
by
+pairing AI tools with the upgrade guide. For a comprehensive list of all 
changes,
+please refer to the &lt;a 
href="https://github.com/apache/datafusion/blob/branch-49/dev/changelog/49.0.0.md"&gt;changelog&lt;/a&gt;.&lt;/p&gt;
+&lt;h2&gt;About DataFusion&lt;/h2&gt;
+&lt;p&gt;&lt;a href="https://datafusion.apache.org/"&gt;Apache 
DataFusion&lt;/a&gt; is an extensible query engine, written in &lt;a 
href="https://www.rust-lang.org/"&gt;Rust&lt;/a&gt;, that
+uses &lt;a href="https://arrow.apache.org"&gt;Apache Arrow&lt;/a&gt; as its 
in-memory format. DataFusion is used by developers to
+create new, fast, data-centric systems such as databases, dataframe libraries,
+and machine learning and streaming applications. While &lt;a 
href="https://datafusion.apache.org/user-guide/introduction.html#project-goals"&gt;DataFusion&amp;rsquo;s
 primary design
+goal&lt;/a&gt; is to accelerate the creation of other data-centric systems, it 
provides a
+reasonable experience directly out of the box as a &lt;a 
href="https://datafusion.apache.org/user-guide/dataframe.html"&gt;dataframe 
library&lt;/a&gt;,
+&lt;a href="https://datafusion.apache.org/python/"&gt;python 
library&lt;/a&gt;, and [command-line SQL tool].&lt;/p&gt;
+&lt;p&gt;DataFusion's core thesis is that as a community, together we can 
build much more
+advanced technology than any of us as individuals or companies could do alone.
+Without DataFusion, highly performant vectorized query engines would remain
+the domain of a few large companies and world-class research institutions.
+With DataFusion, we can all build on top of a shared foundation and focus on
+what makes our projects unique.&lt;/p&gt;
+&lt;h2&gt;How to Get Involved&lt;/h2&gt;
+&lt;p&gt;DataFusion is not a project built or driven by a single person, 
company, or
+foundation. Rather, our community of users and contributors works together to
+build a shared technology that none of us could have built alone.&lt;/p&gt;
+&lt;p&gt;If you are interested in joining us, we would love to have you. You 
can try out
+DataFusion on some of your own data and projects and let us know how it goes,
+contribute suggestions, documentation, bug reports, or a PR with documentation,
+tests, or code. A list of open issues suitable for beginners is &lt;a 
href="https://github.com/apache/arrow-datafusion/issues?q=is%3Aissue+is%3Aopen+label%3A%22good+first+issue%22"&gt;here&lt;/a&gt;,
 and you
+can find out how to reach us on the &lt;a 
href="https://datafusion.apache.org/contributor-guide/communication.html"&gt;communication
 doc&lt;/a&gt;.&lt;/p&gt;</content><category 
term="blog"></category></entry><entry><title>Apache DataFusion 48.0.0 
Released</title><link 
href="https://datafusion.apache.org/blog/2025/07/16/datafusion-48.0.0"; 
rel="alternate"></link><published>2025-07-16T00:00:00+00:00</published><updated>2025-07-16T00:00:00+00:00</updated><author><name>PMC</name></author><id>
 [...]
 {% comment %}
 Licensed to the Apache Software Foundation (ASF) under one or more
 contributor license agreements.  See the NOTICE file distributed with
diff --git a/output/feeds/pmc.atom.xml b/output/feeds/pmc.atom.xml
index 5f0238d..e9ea368 100644
--- a/output/feeds/pmc.atom.xml
+++ b/output/feeds/pmc.atom.xml
@@ -1,5 +1,311 @@
 <?xml version="1.0" encoding="utf-8"?>
-<feed xmlns="http://www.w3.org/2005/Atom";><title>Apache DataFusion Blog - 
PMC</title><link href="https://datafusion.apache.org/blog/"; 
rel="alternate"></link><link 
href="https://datafusion.apache.org/blog/feeds/pmc.atom.xml"; 
rel="self"></link><id>https://datafusion.apache.org/blog/</id><updated>2025-07-16T00:00:00+00:00</updated><subtitle></subtitle><entry><title>Apache
 DataFusion 48.0.0 Released</title><link 
href="https://datafusion.apache.org/blog/2025/07/16/datafusion-48.0.0"; 
rel="alte [...]
+<feed xmlns="http://www.w3.org/2005/Atom";><title>Apache DataFusion Blog - 
pmc</title><link href="https://datafusion.apache.org/blog/"; 
rel="alternate"></link><link 
href="https://datafusion.apache.org/blog/feeds/pmc.atom.xml"; 
rel="self"></link><id>https://datafusion.apache.org/blog/</id><updated>2025-07-28T00:00:00+00:00</updated><subtitle></subtitle><entry><title>Apache
 DataFusion 49.0.0 Released</title><link 
href="https://datafusion.apache.org/blog/2025/07/28/datafusion-49.0.0"; 
rel="alte [...]
+{% comment %}
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements.  See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to you under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License.  You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+{% endcomment %}
+--&gt;
+&lt;!-- see https://github.com/apache/datafusion/issues/16347 for details 
--&gt;
+&lt;h2&gt;Introduction&lt;/h2&gt;
+&lt;p&gt;We are proud to announce the release of &lt;a 
href="https://crates.io/crates/datafusion/49.0.0"&gt;DataFusion 
49.0.0&lt;/a&gt;. This blog post highlights some of
+the major improvements since the release of &lt;a 
href="https://datafusion.apache.org/blog/2025/07/18/datafusion-48.0.0/"&gt;DataFusion
 48.0.0&lt;/a&gt;. The complete list of changes is available in the &lt;a 
href="https://github.com/apache/datafusion/blob/branch-49/dev/changelog/49.0.0.md"&gt;changelog&lt;/a&gt;.&lt;/p&gt;
+&lt;h2&gt;Performance Improvements 🚀&lt;/h2&gt;
+&lt;p&gt;DataFusion continues to focus on enhancing performance, as 
…&lt;/p&gt;</summary><content type="html">&lt;!--
+{% comment %}
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements.  See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to you under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License.  You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+{% endcomment %}
+--&gt;
+&lt;!-- see https://github.com/apache/datafusion/issues/16347 for details 
--&gt;
+&lt;h2&gt;Introduction&lt;/h2&gt;
+&lt;p&gt;We are proud to announce the release of &lt;a 
href="https://crates.io/crates/datafusion/49.0.0"&gt;DataFusion 
49.0.0&lt;/a&gt;. This blog post highlights some of
+the major improvements since the release of &lt;a 
href="https://datafusion.apache.org/blog/2025/07/18/datafusion-48.0.0/"&gt;DataFusion
 48.0.0&lt;/a&gt;. The complete list of changes is available in the &lt;a 
href="https://github.com/apache/datafusion/blob/branch-49/dev/changelog/49.0.0.md"&gt;changelog&lt;/a&gt;.&lt;/p&gt;
+&lt;h2&gt;Performance Improvements 🚀&lt;/h2&gt;
+&lt;p&gt;DataFusion continues to focus on enhancing performance, as shown in 
the ClickBench and other results. &lt;/p&gt;
+&lt;p&gt;&lt;img alt="ClickBench performance results over time for DataFusion" 
class="img-responsive" 
src="/blog/images/datafusion-49.0.0/performance_over_time_clickbench.png" 
width="100%"/&gt;&lt;/p&gt;
+&lt;p&gt;&lt;strong&gt;Figure 1&lt;/strong&gt;: ClickBench performance 
improvements over time
+Average and median normalized query execution times for ClickBench queries for 
each git revision. 
+Query times are normalized using the ClickBench definition. Data and 
definitions on the 
+&lt;a href="https://alamb.github.io/datafusion-benchmarking/"&gt;DataFusion 
Benchmarking Page&lt;/a&gt;. &lt;/p&gt;
+&lt;!--
+NOTE: Andrew is working on gathering these numbers
+
+&lt;img
+src="/blog/images/datafusion-49.0.0/performance_over_time_planning.png"
+width="80%"
+class="img-responsive"
+alt="Planning benchmark performance results over time for DataFusion"
+/&gt;
+
+**Figure 2**: Planning benchmark performance improved XXX between DataFusion 
48.0.1 and DataFusion 49.0.0. Chart source: TODO
+--&gt;
+&lt;p&gt;Here are some noteworthy optimizations added since DataFusion 
48:&lt;/p&gt;
+&lt;p&gt;&lt;strong&gt;Equivalence system upgrade:&lt;/strong&gt; The lower 
levels of the equivalence system, which is used to implement the
+  optimizations described in &lt;a 
href="https://datafusion.apache.org/blog/2025/03/11/ordering-analysis"&gt;Using 
Ordering for Better Plans&lt;/a&gt;, were rewritten, leading to
+  much faster planning times, especially for queries with a &lt;a 
href="https://github.com/apache/datafusion/pull/16217#pullrequestreview-2891941229"&gt;large
 number of columns&lt;/a&gt;. This change also prepares
+  the way for more sophisticated sort-based optimizations in the future. (PR 
&lt;a 
href="https://github.com/apache/datafusion/pull/16217"&gt;#16217&lt;/a&gt; by 
&lt;a href="https://github.com/ozankabak"&gt;ozankabak&lt;/a&gt;).&lt;/p&gt;
+&lt;p&gt;&lt;strong&gt;Dynamic Filters and TopK 
pushdown&lt;/strong&gt;&lt;/p&gt;
+&lt;p&gt;DataFusion now supports dynamic filters, which are improved during 
query execution,
+and physical filter pushdown. Together, these features improve the performance 
of
+queries that use &lt;code&gt;LIMIT&lt;/code&gt; and &lt;code&gt;ORDER 
BY&lt;/code&gt; clauses, such as the following:&lt;/p&gt;
+&lt;pre&gt;&lt;code class="language-sql"&gt;SELECT *
+FROM data
+ORDER BY timestamp DESC
+LIMIT 10
+&lt;/code&gt;&lt;/pre&gt;
+&lt;p&gt;While the query above is simple, without dynamic filtering or knowing 
that the data
+is already sorted by &lt;code&gt;timestamp&lt;/code&gt;, a query engine must 
decode &lt;em&gt;all&lt;/em&gt; of the data to
+find the top 10 values. With the dynamic filters system, DataFusion applies an
+increasingly selective filter during query execution. It checks the 
&lt;strong&gt;current&lt;/strong&gt;
+top 10 values of the &lt;code&gt;timestamp&lt;/code&gt; column 
&lt;strong&gt;before&lt;/strong&gt; opening files or reading
+Parquet Row Groups and Data Pages, which can skip older data very 
quickly.&lt;/p&gt;
+&lt;p&gt;Dynamic predicates are a common feature of advanced engines such as 
&lt;a 
href="https://docs.starburst.io/latest/admin/dynamic-filtering.html"&gt;Dynamic
+Filters in Starburst&lt;/a&gt; and &lt;a 
href="https://www.snowflake.com/en/engineering-blog/optimizing-top-k-aggregation-snowflake/"&gt;Top-K
 Aggregation Optimization at Snowflake&lt;/a&gt;. The
+technique drastically improves query performance (we've seen over a 1.5x
+improvement for some TPC-H-style queries), especially in combination with late
+materialization and columnar file formats such as Parquet. We &lt;a 
href="https://github.com/apache/datafusion/issues/15513"&gt;plan to write a
+blog post&lt;/a&gt; explaining the details of this optimization in the future, 
and we expect to
+use the same mechanism to implement additional optimizations such as &lt;a 
href="https://github.com/apache/datafusion/issues/7955"&gt;Sideways
+Information Passing for joins&lt;/a&gt; (Issue
+&lt;a 
href="https://github.com/apache/datafusion/issues/15037"&gt;#15037&lt;/a&gt; PR
+&lt;a 
href="https://github.com/apache/datafusion/pull/15770"&gt;#15770&lt;/a&gt; by
+&lt;a href="https://github.com/adriangb"&gt;adriangb&lt;/a&gt;).&lt;/p&gt;
+&lt;h2&gt;Community Growth  📈&lt;/h2&gt;
+&lt;p&gt;The last few months, between &lt;code&gt;46.0.0&lt;/code&gt; and 
&lt;code&gt;49.0.0&lt;/code&gt;, have seen our community grow:&lt;/p&gt;
+&lt;ol&gt;
+&lt;li&gt;New PMC members and committers: &lt;a 
href="https://github.com/berkaysynnada"&gt;berkay&lt;/a&gt;, &lt;a 
href="https://github.com/xudong963"&gt;xudong963&lt;/a&gt; and &lt;a 
href="https://github.com/timsaucer"&gt;timsaucer&lt;/a&gt; joined the PMC.
+   &lt;a href="https://github.com/blaginin"&gt;blaginin&lt;/a&gt;, &lt;a 
href="https://github.com/milenkovicm"&gt;milenkovicm&lt;/a&gt;, &lt;a 
href="https://github.com/adriangb"&gt;adriangb&lt;/a&gt; and &lt;a 
href="https://github.com/kosiew"&gt;kosiew&lt;/a&gt; joined as committers. See 
the &lt;a 
href="https://lists.apache.org/[email protected]"&gt;mailing 
list&lt;/a&gt; for more details.&lt;/li&gt;
+&lt;li&gt;In the &lt;a 
href="https://github.com/apache/arrow-datafusion"&gt;core DataFusion 
repo&lt;/a&gt; alone, we reviewed and accepted over 850 PRs from 172 different
+   committers, created over 669 issues, and closed 379 of them 🚀. All changes 
are listed in the detailed
+   &lt;a 
href="https://github.com/apache/datafusion/tree/main/dev/changelog"&gt;changelogs&lt;/a&gt;.&lt;/li&gt;
+&lt;li&gt;DataFusion published a number of blog posts, including &lt;a 
href="https://datafusion.apache.org/blog/2025/04/19/user-defined-window-functions"&gt;User
 defined Window Functions&lt;/a&gt;, &lt;a 
href="https://datafusion.apache.org/blog/2025/06/15/optimizing-sql-dataframes-part-one"&gt;Optimizing
 SQL (and DataFrames)
+   in DataFusion part 1&lt;/a&gt;, &lt;a 
href="https://datafusion.apache.org/blog/2025/06/15/optimizing-sql-dataframes-part-two"&gt;part
 2&lt;/a&gt;, &lt;a 
href="https://datafusion.apache.org/blog/2025/06/30/cancellation"&gt;Using Rust 
async for Query Execution and Cancelling Long-Running Queries&lt;/a&gt;, and
+   &lt;a 
href="https://datafusion.apache.org/blog/2025/07/14/user-defined-parquet-indexes/"&gt;Embedding
 User-Defined Indexes in Apache Parquet Files&lt;/a&gt;.&lt;/li&gt;
+&lt;/ol&gt;
+&lt;!--
+# Unique committers
+$ git shortlog -sn 46.0.0..49.0.0-rc1  .| wc -l
+     172
+# commits
+$ git log --pretty=oneline 46.0.0..49.0.0-rc1 . | wc -l
+     884
+
+
+https://crates.io/crates/datafusion/49.0.0
+DataFusion 49 released July 25, 2025
+
+https://crates.io/crates/datafusion/46.0.0
+DataFusion 46 released March 7, 2025
+
+Issues created in this time: 290 open, 379 closed = 669 total
+https://github.com/apache/datafusion/issues?q=is%3Aissue+created%3A2025-03-07..2025-07-25
+
+Issues closed: 508
+https://github.com/apache/arrow-datafusion/issues?q=is%3Aissue+closed%3A2025-03-07..2025-07-25
+
+PRs merged in this time 874
+https://github.com/apache/arrow-datafusion/pulls?q=is%3Apr+merged%3A2025-03-07..2025-07-25
+
+--&gt;
+&lt;h2&gt;New Features ✨&lt;/h2&gt;
+&lt;h3&gt;Async User-Defined Functions&lt;/h3&gt;
+&lt;p&gt;It is now possible to write &lt;code&gt;async&lt;/code&gt; 
User-Defined Functions
+(UDFs) in DataFusion that perform asynchronous
+operations, such as network requests or database queries, without blocking the
+execution of the query. This enables new use cases, such as
+integrating with large language models (LLMs) or other external services, and 
we can't
+wait to see what the community builds with it.&lt;/p&gt;
+&lt;p&gt;See the &lt;a 
href="https://datafusion.apache.org/library-user-guide/functions/adding-udfs.html"&gt;documentation&lt;/a&gt;
 for more details and the &lt;a 
href="https://github.com/apache/datafusion/blob/main/datafusion-examples/examples/async_udf.rs"&gt;async
 UDF example&lt;/a&gt; for
+working code. &lt;/p&gt;
+&lt;p&gt;You could, for example, implement a function 
&lt;code&gt;ask_llm&lt;/code&gt; that asks a large language model
+(LLM) service a question based on the content of two columns.&lt;/p&gt;
+&lt;pre&gt;&lt;code class="language-sql"&gt;SELECT * 
+FROM animal a
+WHERE ask_llm(a.name, 'Is this animal furry?')")
+&lt;/code&gt;&lt;/pre&gt;
+&lt;p&gt;The implementation of an async UDF is almost identical to a normal
+UDF, except that it must implement the 
&lt;code&gt;AsyncScalarUDFImpl&lt;/code&gt; trait in addition to 
&lt;code&gt;ScalarUDFImpl&lt;/code&gt; and
+provide an &lt;code&gt;async&lt;/code&gt; implementation via 
&lt;code&gt;invoke_async_with_args&lt;/code&gt;:&lt;/p&gt;
+&lt;pre&gt;&lt;code class="language-rust"&gt;#[derive(Debug)]
+struct AskLLM {
+    signature: Signature,
+}
+
+#[async_trait]
+impl AsyncScalarUDFImpl for AskLLM {
+    /// The `invoke_async_with_args` method is similar to `invoke_with_args`,
+    /// but it returns a `Future` that resolves to the result.
+    ///
+    /// Since this signature is `async`, it can do any `async` operations, such
+    /// as network requests.
+    async fn invoke_async_with_args(
+        &amp;amp;self,
+        args: ScalarFunctionArgs,
+        options: &amp;amp;ConfigOptions,
+    ) -&amp;gt; Result&amp;lt;ArrayRef&amp;gt; {
+        // Converts the arguments to arrays for simplicity.
+        let args = ColumnarValue::values_to_arrays(&amp;amp;args.args)?;
+        let [column_of_interest, question] = take_function_args(self.name(), 
args)?;
+        let client = Client::new();
+
+        // Make a network request to a hypothetical LLM service
+        let res = client
+            .post(URI)
+            .headers(get_llm_headers(options))
+            .json(&amp;amp;req)
+            .send()
+            .await?
+            .json::&amp;lt;LLMResponse&amp;gt;()
+            .await?;
+
+        let results = extract_results_from_llm_response(&amp;amp;res);
+
+        Ok(Arc::new(results))
+    }
+}
+&lt;/code&gt;&lt;/pre&gt;
+&lt;p&gt;(Issue &lt;a 
href="https://github.com/apache/datafusion/issues/6518"&gt;#6518&lt;/a&gt;,
+&lt;a href="https://github.com/apache/datafusion/pull/14837"&gt;PR 
#14837&lt;/a&gt; from
+&lt;a href="https://github.com/goldmedal"&gt;goldmedal&lt;/a&gt; 🏆)&lt;/p&gt;
+&lt;h3&gt;Better Cancellation for Certain Long-Running Queries&lt;/h3&gt;
+&lt;p&gt;In rare cases, it was previously not possible to cancel long-running 
queries,
+leading to unresponsiveness. Other projects would likely have fixed this issue
+by treating the symptom, but &lt;a 
href="https://github.com/pepijnve"&gt;pepijnve&lt;/a&gt; and the DataFusion 
community worked together to
+treat the root cause. The general solution required a deep understanding of the
+DataFusion execution engine, Rust &lt;code&gt;Streams&lt;/code&gt;, and the 
tokio cooperative
+scheduling model. The &lt;a 
href="https://github.com/apache/datafusion/pull/16398"&gt;resulting 
PR&lt;/a&gt; is a model of careful
+community engineering and a great example of using Rust's 
&lt;code&gt;async&lt;/code&gt; ecosystem
+to implement complex functionality. It even resulted in a &lt;a 
href="https://github.com/tokio-rs/tokio/pull/7405"&gt;contribution upstream to 
tokio&lt;/a&gt;
+(since accepted). See the &lt;a 
href="https://datafusion.apache.org/blog/2025/06/30/cancellation"&gt;blog 
post&lt;/a&gt; for more details.&lt;/p&gt;
+&lt;h3&gt;Metadata for User Defined Types such as 
&lt;code&gt;Variant&lt;/code&gt; and 
&lt;code&gt;Geometry&lt;/code&gt;&lt;/h3&gt;
+&lt;p&gt;User-defined types have been &lt;a 
href="https://github.com/apache/datafusion/issues/12644"&gt;a long-requested 
feature&lt;/a&gt;, and this release provides
+the low-level APIs to support them efficiently.&lt;/p&gt;
+&lt;ol&gt;
+&lt;li&gt;Metadata handling in PRs &lt;a 
href="https://github.com/apache/datafusion/pull/15646"&gt;#15646&lt;/a&gt; and 
&lt;a 
href="https://github.com/apache/datafusion/pull/16170"&gt;#16170&lt;/a&gt; from 
&lt;a href="https://github.com/timsaucer"&gt;timsaucer&lt;/a&gt;&lt;/li&gt;
+&lt;li&gt;Pushdown of filters and expressions (see "Dynamic Filters and TopK 
pushdown" section above)&lt;/li&gt;
+&lt;/ol&gt;
+&lt;p&gt;We still have some work to do to fully support user-defined types, 
specifically
+in documentation and testing, and we would
+love your help in this area. If you are interested in contributing,
+please see &lt;a 
href="https://github.com/apache/datafusion/issues/12644"&gt;issue 
#12644&lt;/a&gt;.&lt;/p&gt;
+&lt;h3&gt;Parquet Modular Encryption&lt;/h3&gt;
+&lt;p&gt;DataFusion now supports reading and writing encrypted &lt;a 
href="https://parquet.apache.org/"&gt;Apache Parquet&lt;/a&gt; files with &lt;a 
href="https://parquet.apache.org/docs/file-format/data-pages/encryption/"&gt;modular
+encryption&lt;/a&gt;. This allows users to encrypt specific columns in a 
Parquet file
+using different keys, while still being able to read data without needing to
+decrypt the entire file.&lt;/p&gt;
+&lt;p&gt;Here is an example of how to configure DataFusion to read an 
encrypted Parquet
+table with two columns, &lt;code&gt;double_field&lt;/code&gt; and 
&lt;code&gt;float_field&lt;/code&gt;, using modular
+encryption:&lt;/p&gt;
+&lt;pre&gt;&lt;code class="language-sql"&gt;CREATE EXTERNAL TABLE 
encrypted_parquet_table
+(
+double_field double,
+float_field float
+)
+STORED AS PARQUET LOCATION 'pq/' OPTIONS (
+    -- encryption
+    'format.crypto.file_encryption.encrypt_footer' 'true',
+    'format.crypto.file_encryption.footer_key_as_hex' 
'30313233343536373839303132333435',  -- b"0123456789012345"
+    'format.crypto.file_encryption.column_key_as_hex::double_field' 
'31323334353637383930313233343530', -- b"1234567890123450"
+    'format.crypto.file_encryption.column_key_as_hex::float_field' 
'31323334353637383930313233343531', -- b"1234567890123451"
+    -- decryption
+    'format.crypto.file_decryption.footer_key_as_hex' 
'30313233343536373839303132333435', -- b"0123456789012345"
+    'format.crypto.file_decryption.column_key_as_hex::double_field' 
'31323334353637383930313233343530', -- b"1234567890123450"
+    'format.crypto.file_decryption.column_key_as_hex::float_field' 
'31323334353637383930313233343531', -- b"1234567890123451"
+);
+&lt;/code&gt;&lt;/pre&gt;
+&lt;p&gt;(&lt;a 
href="https://github.com/apache/datafusion/issues/15216"&gt;Issue 
#15216&lt;/a&gt;,
+&lt;a href="https://github.com/apache/datafusion/pull/16351"&gt;PR 
#16351&lt;/a&gt;
+from &lt;a href="https://github.com/corwinjoy"&gt;corwinjoy&lt;/a&gt; and 
&lt;a href="https://github.com/adamreeve"&gt;adamreeve&lt;/a&gt;)&lt;/p&gt;
+&lt;h3&gt;Support for &lt;code&gt;WITHIN GROUP&lt;/code&gt; for Ordered-Set 
Aggregate Functions&lt;/h3&gt;
+&lt;p&gt;DataFusion now supports the &lt;code&gt;WITHIN GROUP&lt;/code&gt; 
clause for &lt;a 
href="https://www.postgresql.org/docs/9.4/functions-aggregate.html#FUNCTIONS-ORDEREDSET-TABLE"&gt;ordered-set
 aggregate
+functions&lt;/a&gt; such as &lt;code&gt;approx_percentile_cont&lt;/code&gt;, 
&lt;code&gt;percentile_cont&lt;/code&gt;, and
+&lt;code&gt;percentile_disc&lt;/code&gt;, which allows users to specify the 
precise order.&lt;/p&gt;
+&lt;p&gt;For example, the following query computes the 50th percentile for the 
&lt;code&gt;temperature&lt;/code&gt; column
+in the &lt;code&gt;city_data&lt;/code&gt; table, ordered by 
&lt;code&gt;date&lt;/code&gt;:&lt;/p&gt;
+&lt;pre&gt;&lt;code class="language-sql"&gt;SELECT
+    percentile_disc(0.5) WITHIN GROUP (ORDER BY date) AS median_temperature
+FROM city_data;
+&lt;/code&gt;&lt;/pre&gt;
+&lt;p&gt;(Issue &lt;a 
href="https://github.com/apache/datafusion/issues/11732"&gt;#11732&lt;/a&gt;, 
+PR &lt;a 
href="https://github.com/apache/datafusion/pull/13511"&gt;#13511&lt;/a&gt;,
+by &lt;a href="https://github.com/Garamda"&gt;Garamda&lt;/a&gt;)&lt;/p&gt;
+&lt;h3&gt;Compressed Spill Files&lt;/h3&gt;
+&lt;p&gt;DataFusion now supports compressing the files written to disk when 
spilling
+larger-than-memory datasets while sorting and grouping. Using compression
+can significantly reduce the
+size of the intermediate files and improve performance when reading them back 
into memory.&lt;/p&gt;
+&lt;p&gt;(Issue &lt;a 
href="https://github.com/apache/datafusion/issues/16130"&gt;#16130&lt;/a&gt;,
+PR &lt;a 
href="https://github.com/apache/datafusion/pull/16268"&gt;#16268&lt;/a&gt;
+by &lt;a 
href="https://github.com/ding-young"&gt;ding-young&lt;/a&gt;)&lt;/p&gt;
+&lt;h3&gt;Support for &lt;code&gt;REGEX_INSTR&lt;/code&gt; function&lt;/h3&gt;
+&lt;p&gt;DataFusion now supports the [&lt;code&gt;REGEXP_INSTR&lt;/code&gt; 
function], which returns the position of a
+regular expression match within a string.&lt;/p&gt;
+&lt;p&gt;For example, to find the position of the first match of the regular 
expression
+&lt;code&gt;C(.)(..)&lt;/code&gt; in the string 
&lt;code&gt;ABCDEF&lt;/code&gt;, you can use:&lt;/p&gt;
+&lt;pre&gt;&lt;code class="language-sql"&gt;&amp;gt; SELECT 
regexp_instr('ABCDEF', 'C(.)(..)');
++---------------------------------------------------------------+
+| regexp_instr(Utf8("ABCDEF"),Utf8("C(.)(..)"))                 |
++---------------------------------------------------------------+
+| 3                                                             |
++---------------------------------------------------------------+
+&lt;/code&gt;&lt;/pre&gt;
+&lt;p&gt;(&lt;a 
href="https://github.com/apache/datafusion/issues/13009"&gt;Issue 
#13009&lt;/a&gt;,
+&lt;a href="https://github.com/apache/datafusion/pull/15928"&gt;PR 
#15928&lt;/a&gt;
+by &lt;a href="https://github.com/nirnayroy"&gt;nirnayroy&lt;/a&gt;)&lt;/p&gt;
+&lt;h2&gt;Upgrade Guide and Changelog&lt;/h2&gt;
+&lt;p&gt;Upgrading to 49.0.0 should be straightforward for most users. Please 
review the
+&lt;a 
href="https://datafusion.apache.org/library-user-guide/upgrading.html"&gt;Upgrade
 Guide&lt;/a&gt;
+for details on breaking changes and code snippets to help with the transition.
+Recently, some users have reported success automatically upgrading DataFusion 
by
+pairing AI tools with the upgrade guide. For a comprehensive list of all 
changes,
+please refer to the &lt;a 
href="https://github.com/apache/datafusion/blob/branch-49/dev/changelog/49.0.0.md"&gt;changelog&lt;/a&gt;.&lt;/p&gt;
+&lt;h2&gt;About DataFusion&lt;/h2&gt;
+&lt;p&gt;&lt;a href="https://datafusion.apache.org/"&gt;Apache 
DataFusion&lt;/a&gt; is an extensible query engine, written in &lt;a 
href="https://www.rust-lang.org/"&gt;Rust&lt;/a&gt;, that
+uses &lt;a href="https://arrow.apache.org"&gt;Apache Arrow&lt;/a&gt; as its 
in-memory format. DataFusion is used by developers to
+create new, fast, data-centric systems such as databases, dataframe libraries,
+and machine learning and streaming applications. While &lt;a 
href="https://datafusion.apache.org/user-guide/introduction.html#project-goals"&gt;DataFusion&amp;rsquo;s
 primary design
+goal&lt;/a&gt; is to accelerate the creation of other data-centric systems, it 
provides a
+reasonable experience directly out of the box as a &lt;a 
href="https://datafusion.apache.org/user-guide/dataframe.html"&gt;dataframe 
library&lt;/a&gt;,
+&lt;a href="https://datafusion.apache.org/python/"&gt;python 
library&lt;/a&gt;, and [command-line SQL tool].&lt;/p&gt;
+&lt;p&gt;DataFusion's core thesis is that as a community, together we can 
build much more
+advanced technology than any of us as individuals or companies could do alone.
+Without DataFusion, highly performant vectorized query engines would remain
+the domain of a few large companies and world-class research institutions.
+With DataFusion, we can all build on top of a shared foundation and focus on
+what makes our projects unique.&lt;/p&gt;
+&lt;h2&gt;How to Get Involved&lt;/h2&gt;
+&lt;p&gt;DataFusion is not a project built or driven by a single person, 
company, or
+foundation. Rather, our community of users and contributors works together to
+build a shared technology that none of us could have built alone.&lt;/p&gt;
+&lt;p&gt;If you are interested in joining us, we would love to have you. You 
can try out
+DataFusion on some of your own data and projects and let us know how it goes,
+contribute suggestions, documentation, bug reports, or a PR with documentation,
+tests, or code. A list of open issues suitable for beginners is &lt;a 
href="https://github.com/apache/arrow-datafusion/issues?q=is%3Aissue+is%3Aopen+label%3A%22good+first+issue%22"&gt;here&lt;/a&gt;,
 and you
+can find out how to reach us on the &lt;a 
href="https://datafusion.apache.org/contributor-guide/communication.html"&gt;communication
 doc&lt;/a&gt;.&lt;/p&gt;</content><category 
term="blog"></category></entry><entry><title>Apache DataFusion 48.0.0 
Released</title><link 
href="https://datafusion.apache.org/blog/2025/07/16/datafusion-48.0.0"; 
rel="alternate"></link><published>2025-07-16T00:00:00+00:00</published><updated>2025-07-16T00:00:00+00:00</updated><author><name>PMC</name></author><id>
 [...]
 {% comment %}
 Licensed to the Apache Software Foundation (ASF) under one or more
 contributor license agreements.  See the NOTICE file distributed with
diff --git a/output/feeds/pmc.rss.xml b/output/feeds/pmc.rss.xml
index bbd8f2c..0804813 100644
--- a/output/feeds/pmc.rss.xml
+++ b/output/feeds/pmc.rss.xml
@@ -1,5 +1,28 @@
 <?xml version="1.0" encoding="utf-8"?>
-<rss version="2.0"><channel><title>Apache DataFusion Blog - 
PMC</title><link>https://datafusion.apache.org/blog/</link><description></description><lastBuildDate>Wed,
 16 Jul 2025 00:00:00 +0000</lastBuildDate><item><title>Apache DataFusion 
48.0.0 
Released</title><link>https://datafusion.apache.org/blog/2025/07/16/datafusion-48.0.0</link><description>&lt;!--
+<rss version="2.0"><channel><title>Apache DataFusion Blog - 
pmc</title><link>https://datafusion.apache.org/blog/</link><description></description><lastBuildDate>Mon,
 28 Jul 2025 00:00:00 +0000</lastBuildDate><item><title>Apache DataFusion 
49.0.0 
Released</title><link>https://datafusion.apache.org/blog/2025/07/28/datafusion-49.0.0</link><description>&lt;!--
+{% comment %}
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements.  See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to you under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License.  You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+{% endcomment %}
+--&gt;
+&lt;!-- see https://github.com/apache/datafusion/issues/16347 for details 
--&gt;
+&lt;h2&gt;Introduction&lt;/h2&gt;
+&lt;p&gt;We are proud to announce the release of &lt;a 
href="https://crates.io/crates/datafusion/49.0.0"&gt;DataFusion 
49.0.0&lt;/a&gt;. This blog post highlights some of
+the major improvements since the release of &lt;a 
href="https://datafusion.apache.org/blog/2025/07/18/datafusion-48.0.0/"&gt;DataFusion
 48.0.0&lt;/a&gt;. The complete list of changes is available in the &lt;a 
href="https://github.com/apache/datafusion/blob/branch-49/dev/changelog/49.0.0.md"&gt;changelog&lt;/a&gt;.&lt;/p&gt;
+&lt;h2&gt;Performance Improvements 🚀&lt;/h2&gt;
+&lt;p&gt;DataFusion continues to focus on enhancing performance, as 
…&lt;/p&gt;</description><dc:creator 
xmlns:dc="http://purl.org/dc/elements/1.1/";>pmc</dc:creator><pubDate>Mon, 28 
Jul 2025 00:00:00 +0000</pubDate><guid 
isPermaLink="false">tag:datafusion.apache.org,2025-07-28:/blog/2025/07/28/datafusion-49.0.0</guid><category>blog</category></item><item><title>Apache
 DataFusion 48.0.0 
Released</title><link>https://datafusion.apache.org/blog/2025/07/16/datafusion-48.0.0</link><descriptio
 [...]
 {% comment %}
 Licensed to the Apache Software Foundation (ASF) under one or more
 contributor license agreements.  See the NOTICE file distributed with
diff --git 
a/output/images/datafusion-49.0.0/performance_over_time_clickbench.png 
b/output/images/datafusion-49.0.0/performance_over_time_clickbench.png
new file mode 100644
index 0000000..adb9003
Binary files /dev/null and 
b/output/images/datafusion-49.0.0/performance_over_time_clickbench.png differ
diff --git a/output/images/datafusion-49.0.0/performance_over_time_planning.png 
b/output/images/datafusion-49.0.0/performance_over_time_planning.png
new file mode 100644
index 0000000..50cda90
Binary files /dev/null and 
b/output/images/datafusion-49.0.0/performance_over_time_planning.png differ
diff --git a/output/index.html b/output/index.html
index 10d6b89..2e6fff6 100644
--- a/output/index.html
+++ b/output/index.html
@@ -44,6 +44,48 @@
             <p><i>Here you can find the latest updates from DataFusion and 
related projects.</i></p>
 
 
+    <!-- Post -->
+    <div class="row">
+        <div class="callout">
+            <article class="post">
+                <header>
+                    <div class="title">
+                        <h1><a 
href="/blog/2025/07/28/datafusion-49.0.0">Apache DataFusion 49.0.0 
Released</a></h1>
+                        <p>Posted on: Mon 28 July 2025 by pmc</p>
+                        <p><!--
+{% comment %}
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements.  See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to you under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License.  You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+{% endcomment %}
+-->
+<!-- see https://github.com/apache/datafusion/issues/16347 for details -->
+<h2>Introduction</h2>
+<p>We are proud to announce the release of <a 
href="https://crates.io/crates/datafusion/49.0.0";>DataFusion 49.0.0</a>. This 
blog post highlights some of
+the major improvements since the release of <a 
href="https://datafusion.apache.org/blog/2025/07/18/datafusion-48.0.0/";>DataFusion
 48.0.0</a>. The complete list of changes is available in the <a 
href="https://github.com/apache/datafusion/blob/branch-49/dev/changelog/49.0.0.md";>changelog</a>.</p>
+<h2>Performance Improvements 🚀</h2>
+<p>DataFusion continues to focus on enhancing performance, as …</p></p>
+                        <footer>
+                            <ul class="actions">
+                                <div style="text-align: right"><a 
href="/blog/2025/07/28/datafusion-49.0.0" class="button medium">Continue 
Reading</a></div>
+                            </ul>
+                            <ul class="stats">
+                            </ul>
+                        </footer>
+            </article>
+        </div>
+    </div>
     <!-- Post -->
     <div class="row">
         <div class="callout">


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to