comphead commented on code in PR #124:
URL: https://github.com/apache/datafusion-site/pull/124#discussion_r2547906647


##########
content/blog/2025-11-25-datafusion-51.0.0.md:
##########
@@ -0,0 +1,342 @@
+---
+layout: post
+title: Apache DataFusion 51.0.0 Released
+date: 2025-11-25
+author: pmc
+categories: [release]
+---
+
+<!--
+{% comment %}
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements.  See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to you under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License.  You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+{% endcomment %}
+-->
+
+[TOC]
+
+## Introduction
+
+We are proud to announce the release of [DataFusion 51.0.0]. This post 
highlights
+some of the major improvements since [DataFusion 50.0.0]. The complete list of
+changes is available in the [changelog]. Thanks to the [128 contributors] for
+making this release possible.
+
+[DataFusion 51.0.0]: https://crates.io/crates/datafusion/51.0.0
+[DataFusion 50.0.0]: 
https://datafusion.apache.org/blog/2025/09/29/datafusion-50.0.0/
+[changelog]: 
https://github.com/apache/datafusion/blob/branch-51/dev/changelog/51.0.0.md
+[128 contributors]: 
https://github.com/apache/datafusion/blob/branch-51/dev/changelog/51.0.0.md#credits
+
+## Performance Improvements 🚀
+We continue to make significant performance improvements in DataFusion, both in
+the core engine and in the Parquet reader.
+
+<img
+src="/blog/images/datafusion-51.0.0/performance_over_time_clickbench.png"
+width="100%"
+class="img-responsive"
+alt="Performance over time"
+/>
+
+**Figure 1**: Average and median normalized query execution times for 
ClickBench queries for DataFusion 51.0.0 compared to previous releases.
+Query times are normalized using the ClickBench definition. See the
+[DataFusion Benchmarking 
Page](https://alamb.github.io/datafusion-benchmarking/)
+for more details.
+
+### Faster `CASE` expression evaluation
+
+This release builds on the [CASE performance epic] with significant 
improvements.
+Expressions short‑circuit earlier, reuse partial results, and avoid unnecessary
+scattering, speeding up common ETL patterns. Thanks to [pepijnve], 
[chenkovsky],
+and [petern48] for leading this effort. We hope to share more details on our
+implementation in a future post.
+
+[pepijnve]: https://github.com/pepijnve
+[chenkovsky]: https://github.com/chenkovsky
+[petern48]: https://github.com/petern48
+
+**Fewer object store round-trips for Parquet by default**
+
+DataFusion now sets a default `metadata_size_hint` for [Apache Parquet] scans
+([#18118]), avoiding the extra
+“last 8‑byte” request many clouds require to read file footers. Remote scans
+typically drop from five requests to four per file, cutting latency and 
transfer
+costs without any application changes. Thanks to [zhuqi-lucas] for leading this
+effort.
+
+[apache parquet]: https://parquet.apache.org/
+
+### Faster Parquet metadata parsing
+
+DataFusion 51 also includes the latest Parquet reader from
+[Arrow Rust 57.0.0], which parses Parquet metadata significantly faster. This 
is
+especially beneficial for workloads with many small Parquet files and scenarios
+where startup time or low latency is important. You can read more about the 
upstream work by
+[etseidl] and [jhorstmann] that enabled these improvements in the [Faster 
Apache Parquet Footer Metadata Using a Custom Thrift Parser] blog.
+
+<img 
+  src="/blog/images/datafusion-51.0.0/arrow-57-metadata-parsing.png"
+  width="100%" 
+  class="img-responsive" 
+  alt="Metadata Parsing Performance Improvements in Arrow/Parquet 57" 
+/>
+
+**Figure 2**: Metadata parsing performance improvements in Arrow/Parquet 
57.0.0. 
+
+[Arrow Rust 57.0.0]: https://arrow.apache.org/blog/2025/10/30/arrow-rs-57.0.0/
+[Faster Apache Parquet Footer Metadata Using a Custom Thrift Parser]: 
https://arrow.apache.org/blog/2025/10/23/rust-parquet-metadata/
+
+### Better Defaults for Remote Parquet Reads
+
+By default, DataFusion now fetches the last 512KB (configurable) of Parquet 
files
+so the first request usually includes the full footer ([#18118]). This will
+typically avoid two distinct I/O requests for each Parquet file. While this
+setting has existed in DataFusion for many years, it was not previously enabled
+by default. Users can tune the number of bytes fetched in the initial I/O
+request via the `datafusion.execution.parquet.metadata_size_hint` [config 
setting]. Thanks to
+[zhuqi-lucas] for leading this effort.
+
+[config setting]: https://datafusion.apache.org/user-guide/configs.html
+
+
+## New Features ✨
+
+### Decimal32/Decimal64 support
+
+The new Arrow types `Decimal32` and `Decimal64` are now supported in DataFusion

Review Comment:
   oh nice!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to