nuno-faria commented on code in PR #162:
URL: https://github.com/apache/datafusion-site/pull/162#discussion_r3017573664
##########
content/blog/2026-03-25-datafusion-53.0.0.md:
##########
@@ -0,0 +1,397 @@
+---
+layout: post
+title: Apache DataFusion 53.0.0 Released
+date: 2026-03-25
+author: pmc
+categories: [release]
+---
+
+<!--
+{% comment %}
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements. See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to you under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License. You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+{% endcomment %}
+-->
+
+[TOC]
+
+We are proud to announce the release of [DataFusion 53.0.0]. This post
highlights
+some of the major improvements since [DataFusion 52.0.0]. The complete list of
+changes is available in the [changelog]. Thanks to the [114 contributors] for
+making this release possible.
+
+[DataFusion 53.0.0]: https://crates.io/crates/datafusion/53.0.0
+[DataFusion 52.0.0]:
https://datafusion.apache.org/blog/2026/01/12/datafusion-52.0.0/
+[changelog]:
https://github.com/apache/datafusion/blob/branch-53/dev/changelog/53.0.0.md
+[114 contributors]:
https://github.com/apache/datafusion/blob/branch-53/dev/changelog/53.0.0.md#credits
+
+## Performance Improvements 🚀
+
+<img
+src="/blog/images/datafusion-53.0.0/performance_over_time_clickbench.png"
+width="100%"
+class="img-fluid"
+alt="Performance over time"
+/>
+
+**Figure 1**: Average and median normalized execution times for DataFusion
53.0.0 on ClickBench queries, compared to previous releases.
+Query times are normalized using the ClickBench definition. See the
+[DataFusion Benchmarking
Page](https://alamb.github.io/datafusion-benchmarking/)
+for more details.
+
+DataFusion 53 continues the project-wide focus on performance. This release
+reduces planning overhead, skips more unnecessary I/O, and pushes more work
+into earlier and cheaper stages of execution.
+
+### `LIMIT`-Aware Parquet Row Group Pruning
+
+DataFusion 53 includes a new optimization that makes Parquet pruning aware of
+`LIMIT`. This optimization is described in full in [limit pruning blog post].
If
+DataFusion can prove that an entire row group matches the predicate, and those
+fully matching row groups contain enough rows to satisfy the `LIMIT`, partially
+matching row groups are skipped entirely.
+
+<figure>
+<img
+src="/blog/images/limit-pruning/pruning-pipeline.svg"
+width="80%"
+class="img-fluid"
+alt="Pruning pipeline with limit pruning highlighted"
+/>
+<figcaption><b>Figure 2</b>: Limit pruning is inserted between row group and
page index pruning.</figcaption>
+</figure>
+
+Thanks to [@xudong963] for implementing this feature. Related PRs: [#18868]
+
+### Improved Filter Pushdown
+
+DataFusion 53 pushes filters down through more join types and through
`UnionExec`,
+and expands support for pushing down [dynamic filters]. More
+pushdown means fewer rows flow into joins, repartitions, and later operators,
+which reduces CPU, memory, and I/O.
+
+For example:
+
+```sql
+SELECT *
+FROM t1
+LEFT JOIN t2 ON t1.k = t2.k
+WHERE t1.k = 1;
+```
+
+Now DataFusion can often transform the physical plan so the filter is applied
+to both sides before the join. This is especially important with [dynamic
filters].
+
+<figure>
+<img
+src="/blog/images/datafusion-53.0.0/join-filter-pushdown.svg"
+width="80%"
+alt="Before and after diagram of filter pushdown around a hash join"
+class="img-fluid"
+/>
+<figcaption><b>Figure 3</b>: DataFusion 53 infers additional filters from join
conditions and pushes them down in the plan.</figcaption>
+</figure>
Review Comment:
<img width="878" height="310" alt="Image"
src="https://github.com/user-attachments/assets/75c4b81e-512a-4882-bee7-cc2b5c6fada5"
/>
I think the before diagram is wrong, the filter on table `t1` was already
pushed in this case.
##########
content/blog/2026-03-25-datafusion-53.0.0.md:
##########
@@ -0,0 +1,397 @@
+---
+layout: post
+title: Apache DataFusion 53.0.0 Released
+date: 2026-03-25
+author: pmc
+categories: [release]
+---
+
+<!--
+{% comment %}
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements. See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to you under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License. You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+{% endcomment %}
+-->
+
+[TOC]
+
+We are proud to announce the release of [DataFusion 53.0.0]. This post
highlights
+some of the major improvements since [DataFusion 52.0.0]. The complete list of
+changes is available in the [changelog]. Thanks to the [114 contributors] for
+making this release possible.
+
+[DataFusion 53.0.0]: https://crates.io/crates/datafusion/53.0.0
+[DataFusion 52.0.0]:
https://datafusion.apache.org/blog/2026/01/12/datafusion-52.0.0/
+[changelog]:
https://github.com/apache/datafusion/blob/branch-53/dev/changelog/53.0.0.md
+[114 contributors]:
https://github.com/apache/datafusion/blob/branch-53/dev/changelog/53.0.0.md#credits
+
+## Performance Improvements 🚀
+
+<img
+src="/blog/images/datafusion-53.0.0/performance_over_time_clickbench.png"
+width="100%"
+class="img-fluid"
+alt="Performance over time"
+/>
+
+**Figure 1**: Average and median normalized execution times for DataFusion
53.0.0 on ClickBench queries, compared to previous releases.
+Query times are normalized using the ClickBench definition. See the
+[DataFusion Benchmarking
Page](https://alamb.github.io/datafusion-benchmarking/)
+for more details.
+
+DataFusion 53 continues the project-wide focus on performance. This release
+reduces planning overhead, skips more unnecessary I/O, and pushes more work
+into earlier and cheaper stages of execution.
+
+### `LIMIT`-Aware Parquet Row Group Pruning
+
+DataFusion 53 includes a new optimization that makes Parquet pruning aware of
+`LIMIT`. This optimization is described in full in [limit pruning blog post].
If
+DataFusion can prove that an entire row group matches the predicate, and those
+fully matching row groups contain enough rows to satisfy the `LIMIT`, partially
+matching row groups are skipped entirely.
+
+<figure>
+<img
+src="/blog/images/limit-pruning/pruning-pipeline.svg"
+width="80%"
+class="img-fluid"
+alt="Pruning pipeline with limit pruning highlighted"
+/>
+<figcaption><b>Figure 2</b>: Limit pruning is inserted between row group and
page index pruning.</figcaption>
+</figure>
+
+Thanks to [@xudong963] for implementing this feature. Related PRs: [#18868]
+
+### Improved Filter Pushdown
+
+DataFusion 53 pushes filters down through more join types and through
`UnionExec`,
+and expands support for pushing down [dynamic filters]. More
+pushdown means fewer rows flow into joins, repartitions, and later operators,
+which reduces CPU, memory, and I/O.
+
+For example:
+
+```sql
+SELECT *
+FROM t1
+LEFT JOIN t2 ON t1.k = t2.k
+WHERE t1.k = 1;
+```
+
+Now DataFusion can often transform the physical plan so the filter is applied
+to both sides before the join. This is especially important with [dynamic
filters].
+
+<figure>
+<img
+src="/blog/images/datafusion-53.0.0/join-filter-pushdown.svg"
+width="80%"
+alt="Before and after diagram of filter pushdown around a hash join"
+class="img-fluid"
+/>
+<figcaption><b>Figure 3</b>: DataFusion 53 infers additional filters from join
conditions and pushes them down in the plan.</figcaption>
+</figure>
+
Review Comment:
Also, if needed, a good example for the dynamic filters optimization would
be this one, where a dynamic filter is pushed through a subquery with nested
joins (before it wouldn't be pushed to neither table):
```sql
> explain select *
from (
select *
from t1
left anti join t2 on t1.k = t2.k
) a
join t1 b on a.k = b.k
where b.v = 1;
+---------------+-----------------------------------------------------------------------------------------+
| plan_type | plan
|
+---------------+-----------------------------------------------------------------------------------------+
| physical_plan | ┌───────────────────────────┐
|
| | │ ProjectionExec │
|
| | │ -------------------- │
|
| | │ k: k │
|
| | │ v: v │
|
| | └─────────────┬─────────────┘
|
| | ┌─────────────┴─────────────┐
|
| | │ HashJoinExec │
|
| | │ -------------------- ├──────────────┐
|
| | │ on: (k = k) │ │
|
| | └─────────────┬─────────────┘ │
|
| | ┌─────────────┴─────────────┐┌─────────────┴─────────────┐
|
| | │ CoalescePartitionsExec ││ HashJoinExec │
|
| | │ ││ -------------------- │
|
| | │ ││ join_type: RightAnti
├──────────────┐ |
| | │ ││ on: (k = k) │
│ |
| | └─────────────┬─────────────┘└─────────────┬─────────────┘
│ |
| |
┌─────────────┴─────────────┐┌─────────────┴─────────────┐┌─────────────┴─────────────┐
|
| | │ FilterExec ││ DataSourceExec
││ RepartitionExec │ |
| | │ -------------------- ││ --------------------
││ -------------------- │ |
| | │ predicate: v = 1 ││ files: 1
││ partition_count(in->out): │ |
| | │ ││ format: parquet
││ 1 -> 12 │ |
| | │ ││
││ │ |
| | │ ││ predicate:
││ partitioning_scheme: │ |
| | │ ││ DynamicFilter [ empty ]
││ RoundRobinBatch(12) │ |
| |
└─────────────┬─────────────┘└───────────────────────────┘└─────────────┬─────────────┘
|
| | ┌─────────────┴─────────────┐
┌─────────────┴─────────────┐ |
| | │ RepartitionExec │
│ DataSourceExec │ |
| | │ -------------------- │
│ -------------------- │ |
| | │ partition_count(in->out): │
│ files: 1 │ |
| | │ 1 -> 12 │
│ format: parquet │ |
| | │ │
│ │ |
| | │ partitioning_scheme: │
│ predicate: │ |
| | │ RoundRobinBatch(12) │
│ DynamicFilter [ empty ] │ |
| | └─────────────┬─────────────┘
└───────────────────────────┘ |
| | ┌─────────────┴─────────────┐
|
| | │ DataSourceExec │
|
| | │ -------------------- │
|
| | │ files: 1 │
|
| | │ format: parquet │
|
| | │ predicate: v = 1 │
|
| | └───────────────────────────┘
|
| |
|
+---------------+-----------------------------------------------------------------------------------------+
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]