alamb commented on code in PR #84: URL: https://github.com/apache/datafusion-site/pull/84#discussion_r2210222605
########## content/blog/2025-07-16-datafusion-48.0.0.md: ########## @@ -0,0 +1,209 @@ +--- +layout: post +title: Apache DataFusion 48.0.0 Released +date: 2025-07-16 +author: PMC +categories: [ release ] +--- + +<!-- +{% comment %} +Licensed to the Apache Software Foundation (ASF) under one or more +contributor license agreements. See the NOTICE file distributed with +this work for additional information regarding copyright ownership. +The ASF licenses this file to you under the Apache License, Version 2.0 +(the "License"); you may not use this file except in compliance with +the License. You may obtain a copy of the License at +http://www.apache.org/licenses/LICENSE-2.0 +Unless required by applicable law or agreed to in writing, software +distributed under the License is distributed on an "AS IS" BASIS, +WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +See the License for the specific language governing permissions and +limitations under the License. +{% endcomment %} +--> + +<!-- see https://github.com/apache/datafusion/issues/16347 for details --> + +We’re excited to announce the release of **Apache DataFusion 48.0.0**! As always, this version packs in a wide range of +improvements and fixes. You can find the complete details in the full +[changelog](https://github.com/apache/datafusion/blob/branch-48/dev/changelog/48.0.0.md). We’ll highlight the most +important changes below and guide you through upgrading. + +## Breaking Changes + +DataFusion 48.0.0 brings a few **breaking changes** that may require adjustments to your code as described in +the [Upgrade Guide](https://datafusion.apache.org/library-user-guide/upgrading.html#datafusion-48-0-0). Here are the most notable ones: + + +- `datafusion.execution.collect_statistics` defaults to `true`: In DataFusion 48.0.0, the default value of this [configuration setting] is now true, and DataFusion will collect and store statistics when a table is first created via `CREATE EXTERNAL TABLE` or one of the `DataFrame::register_*` APIs. + +[configuration setting]: https://datafusion.apache.org/user-guide/configs.html + +- `Expr::Literal` has optional metadata: The `Expr::Literal` variant now includes optional metadata, which allows + for carrying through Arrow field metadata to support extension types and other uses. This means code such as + +```rust +match expr { +... + Expr::Literal(scalar) => ... +... +} +``` + +Should be updated to: + +```rust +match expr { +... + Expr::Literal(scalar, _metadata) => ... +... +} +``` + +- `Expr::WindowFunction` is now Boxed: `Expr::WindowFunction` is now a `Box<WindowFunction>` instead of a `WindowFunction` + directly. This change was made to reduce the size of `Expr` and improve performance when planning queries + (see details on [#16207](https://github.com/apache/datafusion/pull/16207)). + +- UDFs changed to use `FieldRef` instead of `DataType`: To support metadata handling and + prepare for extension types, UDF traits now use [FieldRef] rather than a `DataType` + and nullability. `FieldRef` contains the type and nullability, and additionally allows access to + metadata fields, which can be used for extension types. + +[FieldRef]: https://docs.rs/arrow/latest/arrow/datatypes/type.FieldRef.html + +- Physical Expression return `Field`: Similarly to UDFs, in order to prepare for extension type support the + [PhysicalExpr] trait has been changed to return [Field] rather than `DataType`. To upgrade structs which + implement `PhysicalExpr` you need to implement the `return_field` function. + +[PhysicalExpr]: https://docs.rs/datafusion/latest/datafusion/physical_expr/trait.PhysicalExpr.html +[Field]: https://docs.rs/arrow/latest/arrow/datatypes/struct.Field.html + +- `FileFormat::supports_filters_pushdown` was replaced with `FileSource::try_pushdown_filters` to support upcoming work to push down dynamic filters and physical filter pushdown. + +- `ParquetExec`, `AvroExec`, `CsvExec`, `JsonExec` removed: `ParquetExec`, `AvroExec`, `CsvExec`, and `JsonExec` + were deprecated in DataFusion 46 and are removed in DataFusion 48. + +## Performance Improvements + +DataFusion 48.0.0 comes with some noteworthy performance enhancements: + +- **Fewer unnecessary projections:** DataFusion now removes additional unnecessary `Projection`s in queries. (PRs [#15787](https://github.com/apache/datafusion/pull/15787), [#15761](https://github.com/apache/datafusion/pull/15761), + and [#15746](https://github.com/apache/datafusion/pull/15746) by [xudong963](https://github.com/xudong963)). + +- **Accelerated string functions**: The `ascii` function was optimized to significantly improve its performance + (PR [#16087](https://github.com/apache/datafusion/pull/16087) by [tlm365](https://github.com/tlm365)). The `character_length` function was optimized resulting in + [up to 3x](https://github.com/apache/datafusion/pull/15931#issuecomment-2848561984) performance improvement (PR [#15931](https://github.com/apache/datafusion/pull/15931) by [Dandandan](https://github.com/Dandandan)) Review Comment: fyi @Dandandan ########## content/blog/2025-07-16-datafusion-48.0.0.md: ########## @@ -0,0 +1,209 @@ +--- +layout: post +title: Apache DataFusion 48.0.0 Released +date: 2025-07-16 +author: PMC +categories: [ release ] +--- + +<!-- +{% comment %} +Licensed to the Apache Software Foundation (ASF) under one or more +contributor license agreements. See the NOTICE file distributed with +this work for additional information regarding copyright ownership. +The ASF licenses this file to you under the Apache License, Version 2.0 +(the "License"); you may not use this file except in compliance with +the License. You may obtain a copy of the License at +http://www.apache.org/licenses/LICENSE-2.0 +Unless required by applicable law or agreed to in writing, software +distributed under the License is distributed on an "AS IS" BASIS, +WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +See the License for the specific language governing permissions and +limitations under the License. +{% endcomment %} +--> + +<!-- see https://github.com/apache/datafusion/issues/16347 for details --> + +We’re excited to announce the release of **Apache DataFusion 48.0.0**! As always, this version packs in a wide range of +improvements and fixes. You can find the complete details in the full +[changelog](https://github.com/apache/datafusion/blob/branch-48/dev/changelog/48.0.0.md). We’ll highlight the most +important changes below and guide you through upgrading. + +## Breaking Changes + +DataFusion 48.0.0 brings a few **breaking changes** that may require adjustments to your code as described in +the [Upgrade Guide](https://datafusion.apache.org/library-user-guide/upgrading.html#datafusion-48-0-0). Here are the most notable ones: + + +- `datafusion.execution.collect_statistics` defaults to `true`: In DataFusion 48.0.0, the default value of this [configuration setting] is now true, and DataFusion will collect and store statistics when a table is first created via `CREATE EXTERNAL TABLE` or one of the `DataFrame::register_*` APIs. + +[configuration setting]: https://datafusion.apache.org/user-guide/configs.html + +- `Expr::Literal` has optional metadata: The `Expr::Literal` variant now includes optional metadata, which allows + for carrying through Arrow field metadata to support extension types and other uses. This means code such as + +```rust +match expr { +... + Expr::Literal(scalar) => ... +... +} +``` + +Should be updated to: + +```rust +match expr { +... + Expr::Literal(scalar, _metadata) => ... +... +} +``` + +- `Expr::WindowFunction` is now Boxed: `Expr::WindowFunction` is now a `Box<WindowFunction>` instead of a `WindowFunction` + directly. This change was made to reduce the size of `Expr` and improve performance when planning queries + (see details on [#16207](https://github.com/apache/datafusion/pull/16207)). + +- UDFs changed to use `FieldRef` instead of `DataType`: To support metadata handling and + prepare for extension types, UDF traits now use [FieldRef] rather than a `DataType` + and nullability. `FieldRef` contains the type and nullability, and additionally allows access to + metadata fields, which can be used for extension types. + +[FieldRef]: https://docs.rs/arrow/latest/arrow/datatypes/type.FieldRef.html + +- Physical Expression return `Field`: Similarly to UDFs, in order to prepare for extension type support the + [PhysicalExpr] trait has been changed to return [Field] rather than `DataType`. To upgrade structs which + implement `PhysicalExpr` you need to implement the `return_field` function. + +[PhysicalExpr]: https://docs.rs/datafusion/latest/datafusion/physical_expr/trait.PhysicalExpr.html +[Field]: https://docs.rs/arrow/latest/arrow/datatypes/struct.Field.html + +- `FileFormat::supports_filters_pushdown` was replaced with `FileSource::try_pushdown_filters` to support upcoming work to push down dynamic filters and physical filter pushdown. + +- `ParquetExec`, `AvroExec`, `CsvExec`, `JsonExec` removed: `ParquetExec`, `AvroExec`, `CsvExec`, and `JsonExec` + were deprecated in DataFusion 46 and are removed in DataFusion 48. + +## Performance Improvements + +DataFusion 48.0.0 comes with some noteworthy performance enhancements: + +- **Fewer unnecessary projections:** DataFusion now removes additional unnecessary `Projection`s in queries. (PRs [#15787](https://github.com/apache/datafusion/pull/15787), [#15761](https://github.com/apache/datafusion/pull/15761), + and [#15746](https://github.com/apache/datafusion/pull/15746) by [xudong963](https://github.com/xudong963)). + +- **Accelerated string functions**: The `ascii` function was optimized to significantly improve its performance + (PR [#16087](https://github.com/apache/datafusion/pull/16087) by [tlm365](https://github.com/tlm365)). The `character_length` function was optimized resulting in Review Comment: fyi @tlm365 ########## content/blog/2025-07-16-datafusion-48.0.0.md: ########## @@ -0,0 +1,209 @@ +--- +layout: post +title: Apache DataFusion 48.0.0 Released +date: 2025-07-16 +author: PMC +categories: [ release ] +--- + +<!-- +{% comment %} +Licensed to the Apache Software Foundation (ASF) under one or more +contributor license agreements. See the NOTICE file distributed with +this work for additional information regarding copyright ownership. +The ASF licenses this file to you under the Apache License, Version 2.0 +(the "License"); you may not use this file except in compliance with +the License. You may obtain a copy of the License at +http://www.apache.org/licenses/LICENSE-2.0 +Unless required by applicable law or agreed to in writing, software +distributed under the License is distributed on an "AS IS" BASIS, +WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +See the License for the specific language governing permissions and +limitations under the License. +{% endcomment %} +--> + +<!-- see https://github.com/apache/datafusion/issues/16347 for details --> + +We’re excited to announce the release of **Apache DataFusion 48.0.0**! As always, this version packs in a wide range of +improvements and fixes. You can find the complete details in the full +[changelog](https://github.com/apache/datafusion/blob/branch-48/dev/changelog/48.0.0.md). We’ll highlight the most +important changes below and guide you through upgrading. + +## Breaking Changes + +DataFusion 48.0.0 brings a few **breaking changes** that may require adjustments to your code as described in +the [Upgrade Guide](https://datafusion.apache.org/library-user-guide/upgrading.html#datafusion-48-0-0). Here are the most notable ones: + + +- `datafusion.execution.collect_statistics` defaults to `true`: In DataFusion 48.0.0, the default value of this [configuration setting] is now true, and DataFusion will collect and store statistics when a table is first created via `CREATE EXTERNAL TABLE` or one of the `DataFrame::register_*` APIs. + +[configuration setting]: https://datafusion.apache.org/user-guide/configs.html + +- `Expr::Literal` has optional metadata: The `Expr::Literal` variant now includes optional metadata, which allows + for carrying through Arrow field metadata to support extension types and other uses. This means code such as + +```rust +match expr { +... + Expr::Literal(scalar) => ... +... +} +``` + +Should be updated to: + +```rust +match expr { +... + Expr::Literal(scalar, _metadata) => ... +... +} +``` + +- `Expr::WindowFunction` is now Boxed: `Expr::WindowFunction` is now a `Box<WindowFunction>` instead of a `WindowFunction` + directly. This change was made to reduce the size of `Expr` and improve performance when planning queries + (see details on [#16207](https://github.com/apache/datafusion/pull/16207)). + +- UDFs changed to use `FieldRef` instead of `DataType`: To support metadata handling and + prepare for extension types, UDF traits now use [FieldRef] rather than a `DataType` + and nullability. `FieldRef` contains the type and nullability, and additionally allows access to + metadata fields, which can be used for extension types. + +[FieldRef]: https://docs.rs/arrow/latest/arrow/datatypes/type.FieldRef.html + +- Physical Expression return `Field`: Similarly to UDFs, in order to prepare for extension type support the + [PhysicalExpr] trait has been changed to return [Field] rather than `DataType`. To upgrade structs which + implement `PhysicalExpr` you need to implement the `return_field` function. + +[PhysicalExpr]: https://docs.rs/datafusion/latest/datafusion/physical_expr/trait.PhysicalExpr.html +[Field]: https://docs.rs/arrow/latest/arrow/datatypes/struct.Field.html + +- `FileFormat::supports_filters_pushdown` was replaced with `FileSource::try_pushdown_filters` to support upcoming work to push down dynamic filters and physical filter pushdown. + +- `ParquetExec`, `AvroExec`, `CsvExec`, `JsonExec` removed: `ParquetExec`, `AvroExec`, `CsvExec`, and `JsonExec` + were deprecated in DataFusion 46 and are removed in DataFusion 48. + +## Performance Improvements + +DataFusion 48.0.0 comes with some noteworthy performance enhancements: + +- **Fewer unnecessary projections:** DataFusion now removes additional unnecessary `Projection`s in queries. (PRs [#15787](https://github.com/apache/datafusion/pull/15787), [#15761](https://github.com/apache/datafusion/pull/15761), + and [#15746](https://github.com/apache/datafusion/pull/15746) by [xudong963](https://github.com/xudong963)). + +- **Accelerated string functions**: The `ascii` function was optimized to significantly improve its performance + (PR [#16087](https://github.com/apache/datafusion/pull/16087) by [tlm365](https://github.com/tlm365)). The `character_length` function was optimized resulting in + [up to 3x](https://github.com/apache/datafusion/pull/15931#issuecomment-2848561984) performance improvement (PR [#15931](https://github.com/apache/datafusion/pull/15931) by [Dandandan](https://github.com/Dandandan)) + +- **Constant aggregate window expressions:** For unbounded aggregate window functions the result is the + same for all rows within a partition. DataFusion 48.0.0 avoids unnecessary computation for such queries, resulting in [improved performance by 5.6x](https://github.com/apache/datafusion/pull/16234#issuecomment-2935960865) + (PR [#16234](https://github.com/apache/datafusion/pull/16234) by [suibianwanwank](https://github.com/suibianwanwank)) + +## Highlighted New Features + +### New `datafusion-spark` crate + +The DataFusion community has requested [Apache Spark]-compatible functions for many years, but the current builtin function library is most similar to Postgresql, which leads to friction. Unfortunately, there are even functions with the same name but different signatures and/or return types in the two systems. + +One of the many uses of DataFusion is to enhance (e.g. [Apache DataFusion Comet](https://github.com/apache/datafusion-comet)) +or replace (e.g. [Sail](https://github.com/lakehq/sail)) [Apache Spark](https://spark.apache.org/). To +support the community requests and the use cases mentioned above, we have introduced a new +[datafusion-spark] crate for DataFusion with spark-compatible functions so the +community can collaborate to build this shared resource. There are several hundred functions to implement, and we are looking for help to [complete datafusion-spark Spark Compatible Functions]. + +[datafusion-spark]: https://crates.io/crates/datafusion-spark +[Apache Spark]: https://spark.apache.org + +To register all functions in `datafusion-spark` you can use: +```Rust + // Create a new session context + let mut ctx = SessionContext::new(); + // register all spark functions with the context + datafusion_spark::register_all(&mut ctx)?; + // run a query. Note the `sha2` function is now available which + // has Spark semantics + let df = ctx.sql("SELECT sha2('The input String', 256)").await?; + ... +} +``` +Or, to use an individual function, you can do: +```Rust +use datafusion_expr::{col, lit}; +use datafusion_spark::expr_fn::sha2; +// Create the expression `sha2(my_data, 256)` +let expr = sha2(col("my_data"), lit(256)); +... +``` +Thanks to [shehabgamin](https://github.com/shehabgamin) for the initial PR [#15168](https://github.com/apache/datafusion/pull/15168) Review Comment: fyi @shehabgamin ########## content/blog/2025-07-16-datafusion-48.0.0.md: ########## @@ -0,0 +1,209 @@ +--- +layout: post +title: Apache DataFusion 48.0.0 Released +date: 2025-07-16 +author: PMC +categories: [ release ] +--- + +<!-- +{% comment %} +Licensed to the Apache Software Foundation (ASF) under one or more +contributor license agreements. See the NOTICE file distributed with +this work for additional information regarding copyright ownership. +The ASF licenses this file to you under the Apache License, Version 2.0 +(the "License"); you may not use this file except in compliance with +the License. You may obtain a copy of the License at +http://www.apache.org/licenses/LICENSE-2.0 +Unless required by applicable law or agreed to in writing, software +distributed under the License is distributed on an "AS IS" BASIS, +WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +See the License for the specific language governing permissions and +limitations under the License. +{% endcomment %} +--> + +<!-- see https://github.com/apache/datafusion/issues/16347 for details --> + +We’re excited to announce the release of **Apache DataFusion 48.0.0**! As always, this version packs in a wide range of +improvements and fixes. You can find the complete details in the full +[changelog](https://github.com/apache/datafusion/blob/branch-48/dev/changelog/48.0.0.md). We’ll highlight the most +important changes below and guide you through upgrading. + +## Breaking Changes + +DataFusion 48.0.0 brings a few **breaking changes** that may require adjustments to your code as described in +the [Upgrade Guide](https://datafusion.apache.org/library-user-guide/upgrading.html#datafusion-48-0-0). Here are the most notable ones: + + +- `datafusion.execution.collect_statistics` defaults to `true`: In DataFusion 48.0.0, the default value of this [configuration setting] is now true, and DataFusion will collect and store statistics when a table is first created via `CREATE EXTERNAL TABLE` or one of the `DataFrame::register_*` APIs. + +[configuration setting]: https://datafusion.apache.org/user-guide/configs.html + +- `Expr::Literal` has optional metadata: The `Expr::Literal` variant now includes optional metadata, which allows + for carrying through Arrow field metadata to support extension types and other uses. This means code such as + +```rust +match expr { +... + Expr::Literal(scalar) => ... +... +} +``` + +Should be updated to: + +```rust +match expr { +... + Expr::Literal(scalar, _metadata) => ... +... +} +``` + +- `Expr::WindowFunction` is now Boxed: `Expr::WindowFunction` is now a `Box<WindowFunction>` instead of a `WindowFunction` + directly. This change was made to reduce the size of `Expr` and improve performance when planning queries + (see details on [#16207](https://github.com/apache/datafusion/pull/16207)). + +- UDFs changed to use `FieldRef` instead of `DataType`: To support metadata handling and + prepare for extension types, UDF traits now use [FieldRef] rather than a `DataType` + and nullability. `FieldRef` contains the type and nullability, and additionally allows access to + metadata fields, which can be used for extension types. + +[FieldRef]: https://docs.rs/arrow/latest/arrow/datatypes/type.FieldRef.html + +- Physical Expression return `Field`: Similarly to UDFs, in order to prepare for extension type support the + [PhysicalExpr] trait has been changed to return [Field] rather than `DataType`. To upgrade structs which + implement `PhysicalExpr` you need to implement the `return_field` function. + +[PhysicalExpr]: https://docs.rs/datafusion/latest/datafusion/physical_expr/trait.PhysicalExpr.html +[Field]: https://docs.rs/arrow/latest/arrow/datatypes/struct.Field.html + +- `FileFormat::supports_filters_pushdown` was replaced with `FileSource::try_pushdown_filters` to support upcoming work to push down dynamic filters and physical filter pushdown. + +- `ParquetExec`, `AvroExec`, `CsvExec`, `JsonExec` removed: `ParquetExec`, `AvroExec`, `CsvExec`, and `JsonExec` + were deprecated in DataFusion 46 and are removed in DataFusion 48. + +## Performance Improvements + +DataFusion 48.0.0 comes with some noteworthy performance enhancements: + +- **Fewer unnecessary projections:** DataFusion now removes additional unnecessary `Projection`s in queries. (PRs [#15787](https://github.com/apache/datafusion/pull/15787), [#15761](https://github.com/apache/datafusion/pull/15761), + and [#15746](https://github.com/apache/datafusion/pull/15746) by [xudong963](https://github.com/xudong963)). + +- **Accelerated string functions**: The `ascii` function was optimized to significantly improve its performance + (PR [#16087](https://github.com/apache/datafusion/pull/16087) by [tlm365](https://github.com/tlm365)). The `character_length` function was optimized resulting in + [up to 3x](https://github.com/apache/datafusion/pull/15931#issuecomment-2848561984) performance improvement (PR [#15931](https://github.com/apache/datafusion/pull/15931) by [Dandandan](https://github.com/Dandandan)) + +- **Constant aggregate window expressions:** For unbounded aggregate window functions the result is the + same for all rows within a partition. DataFusion 48.0.0 avoids unnecessary computation for such queries, resulting in [improved performance by 5.6x](https://github.com/apache/datafusion/pull/16234#issuecomment-2935960865) + (PR [#16234](https://github.com/apache/datafusion/pull/16234) by [suibianwanwank](https://github.com/suibianwanwank)) + +## Highlighted New Features + +### New `datafusion-spark` crate + +The DataFusion community has requested [Apache Spark]-compatible functions for many years, but the current builtin function library is most similar to Postgresql, which leads to friction. Unfortunately, there are even functions with the same name but different signatures and/or return types in the two systems. + +One of the many uses of DataFusion is to enhance (e.g. [Apache DataFusion Comet](https://github.com/apache/datafusion-comet)) +or replace (e.g. [Sail](https://github.com/lakehq/sail)) [Apache Spark](https://spark.apache.org/). To +support the community requests and the use cases mentioned above, we have introduced a new +[datafusion-spark] crate for DataFusion with spark-compatible functions so the +community can collaborate to build this shared resource. There are several hundred functions to implement, and we are looking for help to [complete datafusion-spark Spark Compatible Functions]. + +[datafusion-spark]: https://crates.io/crates/datafusion-spark +[Apache Spark]: https://spark.apache.org + +To register all functions in `datafusion-spark` you can use: +```Rust + // Create a new session context + let mut ctx = SessionContext::new(); + // register all spark functions with the context + datafusion_spark::register_all(&mut ctx)?; + // run a query. Note the `sha2` function is now available which + // has Spark semantics + let df = ctx.sql("SELECT sha2('The input String', 256)").await?; + ... +} +``` +Or, to use an individual function, you can do: +```Rust +use datafusion_expr::{col, lit}; +use datafusion_spark::expr_fn::sha2; +// Create the expression `sha2(my_data, 256)` +let expr = sha2(col("my_data"), lit(256)); +... +``` +Thanks to [shehabgamin](https://github.com/shehabgamin) for the initial PR [#15168](https://github.com/apache/datafusion/pull/15168) +and many others for their help adding additional functions. Please consider +helping [complete datafusion-spark Spark Compatible Functions]. + +[Complete datafusion-spark Spark Compatible Functions]: https://github.com/apache/datafusion/issues/15914 + +### `ORDER BY ALL sql` support + +Inspired by [DuckDB](https://duckdb.org/docs/stable/sql/query_syntax/orderby.html#order-by-all-examples), DataFusion 48.0.0 adds support for `ORDER BY ALL`. This allows for easy ordering of all columns in a query: + +```sql +> set datafusion.sql_parser.dialect = 'DuckDB'; +0 row(s) fetched. +> CREATE OR REPLACE TABLE addresses AS + SELECT '123 Quack Blvd' AS address, 'DuckTown' AS city, '11111' AS zip + UNION ALL + SELECT '111 Duck Duck Goose Ln', 'DuckTown', '11111' + UNION ALL + SELECT '111 Duck Duck Goose Ln', 'Duck Town', '11111' + UNION ALL + SELECT '111 Duck Duck Goose Ln', 'Duck Town', '11111-0001'; +0 row(s) fetched. +> SELECT * FROM addresses ORDER BY ALL; ++------------------------+-----------+------------+ +| address | city | zip | ++------------------------+-----------+------------+ +| 111 Duck Duck Goose Ln | Duck Town | 11111 | +| 111 Duck Duck Goose Ln | Duck Town | 11111-0001 | +| 111 Duck Duck Goose Ln | DuckTown | 11111 | +| 123 Quack Blvd | DuckTown | 11111 | ++------------------------+-----------+------------+ +4 row(s) fetched. +``` +Thanks to [PokIsemaine](https://github.com/PokIsemaine) for PR [#15772](https://github.com/apache/datafusion/pull/15772) Review Comment: fyi @PokIsemaine ########## content/blog/2025-07-16-datafusion-48.0.0.md: ########## @@ -0,0 +1,209 @@ +--- +layout: post +title: Apache DataFusion 48.0.0 Released +date: 2025-07-16 +author: PMC +categories: [ release ] +--- + +<!-- +{% comment %} +Licensed to the Apache Software Foundation (ASF) under one or more +contributor license agreements. See the NOTICE file distributed with +this work for additional information regarding copyright ownership. +The ASF licenses this file to you under the Apache License, Version 2.0 +(the "License"); you may not use this file except in compliance with +the License. You may obtain a copy of the License at +http://www.apache.org/licenses/LICENSE-2.0 +Unless required by applicable law or agreed to in writing, software +distributed under the License is distributed on an "AS IS" BASIS, +WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +See the License for the specific language governing permissions and +limitations under the License. +{% endcomment %} +--> + +<!-- see https://github.com/apache/datafusion/issues/16347 for details --> + +We’re excited to announce the release of **Apache DataFusion 48.0.0**! As always, this version packs in a wide range of +improvements and fixes. You can find the complete details in the full +[changelog](https://github.com/apache/datafusion/blob/branch-48/dev/changelog/48.0.0.md). We’ll highlight the most +important changes below and guide you through upgrading. + +## Breaking Changes + +DataFusion 48.0.0 brings a few **breaking changes** that may require adjustments to your code as described in +the [Upgrade Guide](https://datafusion.apache.org/library-user-guide/upgrading.html#datafusion-48-0-0). Here are the most notable ones: + + +- `datafusion.execution.collect_statistics` defaults to `true`: In DataFusion 48.0.0, the default value of this [configuration setting] is now true, and DataFusion will collect and store statistics when a table is first created via `CREATE EXTERNAL TABLE` or one of the `DataFrame::register_*` APIs. + +[configuration setting]: https://datafusion.apache.org/user-guide/configs.html + +- `Expr::Literal` has optional metadata: The `Expr::Literal` variant now includes optional metadata, which allows + for carrying through Arrow field metadata to support extension types and other uses. This means code such as + +```rust +match expr { +... + Expr::Literal(scalar) => ... +... +} +``` + +Should be updated to: + +```rust +match expr { +... + Expr::Literal(scalar, _metadata) => ... +... +} +``` + +- `Expr::WindowFunction` is now Boxed: `Expr::WindowFunction` is now a `Box<WindowFunction>` instead of a `WindowFunction` + directly. This change was made to reduce the size of `Expr` and improve performance when planning queries + (see details on [#16207](https://github.com/apache/datafusion/pull/16207)). + +- UDFs changed to use `FieldRef` instead of `DataType`: To support metadata handling and + prepare for extension types, UDF traits now use [FieldRef] rather than a `DataType` + and nullability. `FieldRef` contains the type and nullability, and additionally allows access to + metadata fields, which can be used for extension types. + +[FieldRef]: https://docs.rs/arrow/latest/arrow/datatypes/type.FieldRef.html + +- Physical Expression return `Field`: Similarly to UDFs, in order to prepare for extension type support the + [PhysicalExpr] trait has been changed to return [Field] rather than `DataType`. To upgrade structs which + implement `PhysicalExpr` you need to implement the `return_field` function. + +[PhysicalExpr]: https://docs.rs/datafusion/latest/datafusion/physical_expr/trait.PhysicalExpr.html +[Field]: https://docs.rs/arrow/latest/arrow/datatypes/struct.Field.html + +- `FileFormat::supports_filters_pushdown` was replaced with `FileSource::try_pushdown_filters` to support upcoming work to push down dynamic filters and physical filter pushdown. + +- `ParquetExec`, `AvroExec`, `CsvExec`, `JsonExec` removed: `ParquetExec`, `AvroExec`, `CsvExec`, and `JsonExec` + were deprecated in DataFusion 46 and are removed in DataFusion 48. + +## Performance Improvements + +DataFusion 48.0.0 comes with some noteworthy performance enhancements: + +- **Fewer unnecessary projections:** DataFusion now removes additional unnecessary `Projection`s in queries. (PRs [#15787](https://github.com/apache/datafusion/pull/15787), [#15761](https://github.com/apache/datafusion/pull/15761), Review Comment: fyi @xudong963 ########## content/blog/2025-07-16-datafusion-48.0.0.md: ########## @@ -0,0 +1,209 @@ +--- +layout: post +title: Apache DataFusion 48.0.0 Released +date: 2025-07-16 +author: PMC +categories: [ release ] +--- + +<!-- +{% comment %} +Licensed to the Apache Software Foundation (ASF) under one or more +contributor license agreements. See the NOTICE file distributed with +this work for additional information regarding copyright ownership. +The ASF licenses this file to you under the Apache License, Version 2.0 +(the "License"); you may not use this file except in compliance with +the License. You may obtain a copy of the License at +http://www.apache.org/licenses/LICENSE-2.0 +Unless required by applicable law or agreed to in writing, software +distributed under the License is distributed on an "AS IS" BASIS, +WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +See the License for the specific language governing permissions and +limitations under the License. +{% endcomment %} +--> + +<!-- see https://github.com/apache/datafusion/issues/16347 for details --> + +We’re excited to announce the release of **Apache DataFusion 48.0.0**! As always, this version packs in a wide range of +improvements and fixes. You can find the complete details in the full +[changelog](https://github.com/apache/datafusion/blob/branch-48/dev/changelog/48.0.0.md). We’ll highlight the most +important changes below and guide you through upgrading. + +## Breaking Changes + +DataFusion 48.0.0 brings a few **breaking changes** that may require adjustments to your code as described in +the [Upgrade Guide](https://datafusion.apache.org/library-user-guide/upgrading.html#datafusion-48-0-0). Here are the most notable ones: + + +- `datafusion.execution.collect_statistics` defaults to `true`: In DataFusion 48.0.0, the default value of this [configuration setting] is now true, and DataFusion will collect and store statistics when a table is first created via `CREATE EXTERNAL TABLE` or one of the `DataFrame::register_*` APIs. + +[configuration setting]: https://datafusion.apache.org/user-guide/configs.html + +- `Expr::Literal` has optional metadata: The `Expr::Literal` variant now includes optional metadata, which allows + for carrying through Arrow field metadata to support extension types and other uses. This means code such as + +```rust +match expr { +... + Expr::Literal(scalar) => ... +... +} +``` + +Should be updated to: + +```rust +match expr { +... + Expr::Literal(scalar, _metadata) => ... +... +} +``` + +- `Expr::WindowFunction` is now Boxed: `Expr::WindowFunction` is now a `Box<WindowFunction>` instead of a `WindowFunction` + directly. This change was made to reduce the size of `Expr` and improve performance when planning queries + (see details on [#16207](https://github.com/apache/datafusion/pull/16207)). + +- UDFs changed to use `FieldRef` instead of `DataType`: To support metadata handling and + prepare for extension types, UDF traits now use [FieldRef] rather than a `DataType` + and nullability. `FieldRef` contains the type and nullability, and additionally allows access to + metadata fields, which can be used for extension types. + +[FieldRef]: https://docs.rs/arrow/latest/arrow/datatypes/type.FieldRef.html + +- Physical Expression return `Field`: Similarly to UDFs, in order to prepare for extension type support the + [PhysicalExpr] trait has been changed to return [Field] rather than `DataType`. To upgrade structs which + implement `PhysicalExpr` you need to implement the `return_field` function. + +[PhysicalExpr]: https://docs.rs/datafusion/latest/datafusion/physical_expr/trait.PhysicalExpr.html +[Field]: https://docs.rs/arrow/latest/arrow/datatypes/struct.Field.html + +- `FileFormat::supports_filters_pushdown` was replaced with `FileSource::try_pushdown_filters` to support upcoming work to push down dynamic filters and physical filter pushdown. + +- `ParquetExec`, `AvroExec`, `CsvExec`, `JsonExec` removed: `ParquetExec`, `AvroExec`, `CsvExec`, and `JsonExec` + were deprecated in DataFusion 46 and are removed in DataFusion 48. + +## Performance Improvements + +DataFusion 48.0.0 comes with some noteworthy performance enhancements: + +- **Fewer unnecessary projections:** DataFusion now removes additional unnecessary `Projection`s in queries. (PRs [#15787](https://github.com/apache/datafusion/pull/15787), [#15761](https://github.com/apache/datafusion/pull/15761), + and [#15746](https://github.com/apache/datafusion/pull/15746) by [xudong963](https://github.com/xudong963)). + +- **Accelerated string functions**: The `ascii` function was optimized to significantly improve its performance + (PR [#16087](https://github.com/apache/datafusion/pull/16087) by [tlm365](https://github.com/tlm365)). The `character_length` function was optimized resulting in + [up to 3x](https://github.com/apache/datafusion/pull/15931#issuecomment-2848561984) performance improvement (PR [#15931](https://github.com/apache/datafusion/pull/15931) by [Dandandan](https://github.com/Dandandan)) + +- **Constant aggregate window expressions:** For unbounded aggregate window functions the result is the + same for all rows within a partition. DataFusion 48.0.0 avoids unnecessary computation for such queries, resulting in [improved performance by 5.6x](https://github.com/apache/datafusion/pull/16234#issuecomment-2935960865) + (PR [#16234](https://github.com/apache/datafusion/pull/16234) by [suibianwanwank](https://github.com/suibianwanwank)) + +## Highlighted New Features + +### New `datafusion-spark` crate + +The DataFusion community has requested [Apache Spark]-compatible functions for many years, but the current builtin function library is most similar to Postgresql, which leads to friction. Unfortunately, there are even functions with the same name but different signatures and/or return types in the two systems. + +One of the many uses of DataFusion is to enhance (e.g. [Apache DataFusion Comet](https://github.com/apache/datafusion-comet)) +or replace (e.g. [Sail](https://github.com/lakehq/sail)) [Apache Spark](https://spark.apache.org/). To +support the community requests and the use cases mentioned above, we have introduced a new +[datafusion-spark] crate for DataFusion with spark-compatible functions so the +community can collaborate to build this shared resource. There are several hundred functions to implement, and we are looking for help to [complete datafusion-spark Spark Compatible Functions]. + +[datafusion-spark]: https://crates.io/crates/datafusion-spark +[Apache Spark]: https://spark.apache.org + +To register all functions in `datafusion-spark` you can use: +```Rust + // Create a new session context + let mut ctx = SessionContext::new(); + // register all spark functions with the context + datafusion_spark::register_all(&mut ctx)?; + // run a query. Note the `sha2` function is now available which + // has Spark semantics + let df = ctx.sql("SELECT sha2('The input String', 256)").await?; + ... +} +``` +Or, to use an individual function, you can do: +```Rust +use datafusion_expr::{col, lit}; +use datafusion_spark::expr_fn::sha2; +// Create the expression `sha2(my_data, 256)` +let expr = sha2(col("my_data"), lit(256)); +... +``` +Thanks to [shehabgamin](https://github.com/shehabgamin) for the initial PR [#15168](https://github.com/apache/datafusion/pull/15168) +and many others for their help adding additional functions. Please consider +helping [complete datafusion-spark Spark Compatible Functions]. + +[Complete datafusion-spark Spark Compatible Functions]: https://github.com/apache/datafusion/issues/15914 + +### `ORDER BY ALL sql` support + +Inspired by [DuckDB](https://duckdb.org/docs/stable/sql/query_syntax/orderby.html#order-by-all-examples), DataFusion 48.0.0 adds support for `ORDER BY ALL`. This allows for easy ordering of all columns in a query: + +```sql +> set datafusion.sql_parser.dialect = 'DuckDB'; +0 row(s) fetched. +> CREATE OR REPLACE TABLE addresses AS + SELECT '123 Quack Blvd' AS address, 'DuckTown' AS city, '11111' AS zip + UNION ALL + SELECT '111 Duck Duck Goose Ln', 'DuckTown', '11111' + UNION ALL + SELECT '111 Duck Duck Goose Ln', 'Duck Town', '11111' + UNION ALL + SELECT '111 Duck Duck Goose Ln', 'Duck Town', '11111-0001'; +0 row(s) fetched. +> SELECT * FROM addresses ORDER BY ALL; ++------------------------+-----------+------------+ +| address | city | zip | ++------------------------+-----------+------------+ +| 111 Duck Duck Goose Ln | Duck Town | 11111 | +| 111 Duck Duck Goose Ln | Duck Town | 11111-0001 | +| 111 Duck Duck Goose Ln | DuckTown | 11111 | +| 123 Quack Blvd | DuckTown | 11111 | ++------------------------+-----------+------------+ +4 row(s) fetched. +``` +Thanks to [PokIsemaine](https://github.com/PokIsemaine) for PR [#15772](https://github.com/apache/datafusion/pull/15772) + +### FFI Support for `AggregateUDF` and `WindowUDF` + +This improvement allows for using user defined aggregate and user defined window functions across FFI boundaries, which enables shared libraries to pass functions back and forth. This feature unlocks: + +- Modules to provide DataFusion based FFI aggregates that can be reused in projects such as [datafusion-python](https://github.com/apache/datafusion-python) + +- Using the same aggregate and window functions without recompiling with different DataFusion versions. + +This completes the work to add support for all UDF types to DataFusion's FFI bindings. Thanks to [timsaucer](https://github.com/timsaucer) Review Comment: fyi @timsaucer ########## content/blog/2025-07-16-datafusion-48.0.0.md: ########## @@ -0,0 +1,209 @@ +--- +layout: post +title: Apache DataFusion 48.0.0 Released +date: 2025-07-16 +author: PMC +categories: [ release ] +--- + +<!-- +{% comment %} +Licensed to the Apache Software Foundation (ASF) under one or more +contributor license agreements. See the NOTICE file distributed with +this work for additional information regarding copyright ownership. +The ASF licenses this file to you under the Apache License, Version 2.0 +(the "License"); you may not use this file except in compliance with +the License. You may obtain a copy of the License at +http://www.apache.org/licenses/LICENSE-2.0 +Unless required by applicable law or agreed to in writing, software +distributed under the License is distributed on an "AS IS" BASIS, +WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +See the License for the specific language governing permissions and +limitations under the License. +{% endcomment %} +--> + +<!-- see https://github.com/apache/datafusion/issues/16347 for details --> + +We’re excited to announce the release of **Apache DataFusion 48.0.0**! As always, this version packs in a wide range of +improvements and fixes. You can find the complete details in the full +[changelog](https://github.com/apache/datafusion/blob/branch-48/dev/changelog/48.0.0.md). We’ll highlight the most +important changes below and guide you through upgrading. + +## Breaking Changes + +DataFusion 48.0.0 brings a few **breaking changes** that may require adjustments to your code as described in +the [Upgrade Guide](https://datafusion.apache.org/library-user-guide/upgrading.html#datafusion-48-0-0). Here are the most notable ones: + + +- `datafusion.execution.collect_statistics` defaults to `true`: In DataFusion 48.0.0, the default value of this [configuration setting] is now true, and DataFusion will collect and store statistics when a table is first created via `CREATE EXTERNAL TABLE` or one of the `DataFrame::register_*` APIs. + +[configuration setting]: https://datafusion.apache.org/user-guide/configs.html + +- `Expr::Literal` has optional metadata: The `Expr::Literal` variant now includes optional metadata, which allows + for carrying through Arrow field metadata to support extension types and other uses. This means code such as + +```rust +match expr { +... + Expr::Literal(scalar) => ... +... +} +``` + +Should be updated to: + +```rust +match expr { +... + Expr::Literal(scalar, _metadata) => ... +... +} +``` + +- `Expr::WindowFunction` is now Boxed: `Expr::WindowFunction` is now a `Box<WindowFunction>` instead of a `WindowFunction` + directly. This change was made to reduce the size of `Expr` and improve performance when planning queries + (see details on [#16207](https://github.com/apache/datafusion/pull/16207)). + +- UDFs changed to use `FieldRef` instead of `DataType`: To support metadata handling and + prepare for extension types, UDF traits now use [FieldRef] rather than a `DataType` + and nullability. `FieldRef` contains the type and nullability, and additionally allows access to + metadata fields, which can be used for extension types. + +[FieldRef]: https://docs.rs/arrow/latest/arrow/datatypes/type.FieldRef.html + +- Physical Expression return `Field`: Similarly to UDFs, in order to prepare for extension type support the + [PhysicalExpr] trait has been changed to return [Field] rather than `DataType`. To upgrade structs which + implement `PhysicalExpr` you need to implement the `return_field` function. + +[PhysicalExpr]: https://docs.rs/datafusion/latest/datafusion/physical_expr/trait.PhysicalExpr.html +[Field]: https://docs.rs/arrow/latest/arrow/datatypes/struct.Field.html + +- `FileFormat::supports_filters_pushdown` was replaced with `FileSource::try_pushdown_filters` to support upcoming work to push down dynamic filters and physical filter pushdown. + +- `ParquetExec`, `AvroExec`, `CsvExec`, `JsonExec` removed: `ParquetExec`, `AvroExec`, `CsvExec`, and `JsonExec` + were deprecated in DataFusion 46 and are removed in DataFusion 48. + +## Performance Improvements + +DataFusion 48.0.0 comes with some noteworthy performance enhancements: + +- **Fewer unnecessary projections:** DataFusion now removes additional unnecessary `Projection`s in queries. (PRs [#15787](https://github.com/apache/datafusion/pull/15787), [#15761](https://github.com/apache/datafusion/pull/15761), + and [#15746](https://github.com/apache/datafusion/pull/15746) by [xudong963](https://github.com/xudong963)). + +- **Accelerated string functions**: The `ascii` function was optimized to significantly improve its performance + (PR [#16087](https://github.com/apache/datafusion/pull/16087) by [tlm365](https://github.com/tlm365)). The `character_length` function was optimized resulting in + [up to 3x](https://github.com/apache/datafusion/pull/15931#issuecomment-2848561984) performance improvement (PR [#15931](https://github.com/apache/datafusion/pull/15931) by [Dandandan](https://github.com/Dandandan)) + +- **Constant aggregate window expressions:** For unbounded aggregate window functions the result is the + same for all rows within a partition. DataFusion 48.0.0 avoids unnecessary computation for such queries, resulting in [improved performance by 5.6x](https://github.com/apache/datafusion/pull/16234#issuecomment-2935960865) + (PR [#16234](https://github.com/apache/datafusion/pull/16234) by [suibianwanwank](https://github.com/suibianwanwank)) Review Comment: FYI @suibianwanwank ########## content/blog/2025-07-16-datafusion-48.0.0.md: ########## @@ -0,0 +1,209 @@ +--- +layout: post +title: Apache DataFusion 48.0.0 Released +date: 2025-07-16 +author: PMC +categories: [ release ] +--- + +<!-- +{% comment %} +Licensed to the Apache Software Foundation (ASF) under one or more +contributor license agreements. See the NOTICE file distributed with +this work for additional information regarding copyright ownership. +The ASF licenses this file to you under the Apache License, Version 2.0 +(the "License"); you may not use this file except in compliance with +the License. You may obtain a copy of the License at +http://www.apache.org/licenses/LICENSE-2.0 +Unless required by applicable law or agreed to in writing, software +distributed under the License is distributed on an "AS IS" BASIS, +WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +See the License for the specific language governing permissions and +limitations under the License. +{% endcomment %} +--> + +<!-- see https://github.com/apache/datafusion/issues/16347 for details --> + +We’re excited to announce the release of **Apache DataFusion 48.0.0**! As always, this version packs in a wide range of +improvements and fixes. You can find the complete details in the full +[changelog](https://github.com/apache/datafusion/blob/branch-48/dev/changelog/48.0.0.md). We’ll highlight the most +important changes below and guide you through upgrading. + +## Breaking Changes + +DataFusion 48.0.0 brings a few **breaking changes** that may require adjustments to your code as described in +the [Upgrade Guide](https://datafusion.apache.org/library-user-guide/upgrading.html#datafusion-48-0-0). Here are the most notable ones: + + +- `datafusion.execution.collect_statistics` defaults to `true`: In DataFusion 48.0.0, the default value of this [configuration setting] is now true, and DataFusion will collect and store statistics when a table is first created via `CREATE EXTERNAL TABLE` or one of the `DataFrame::register_*` APIs. + +[configuration setting]: https://datafusion.apache.org/user-guide/configs.html + +- `Expr::Literal` has optional metadata: The `Expr::Literal` variant now includes optional metadata, which allows + for carrying through Arrow field metadata to support extension types and other uses. This means code such as + +```rust +match expr { +... + Expr::Literal(scalar) => ... +... +} +``` + +Should be updated to: + +```rust +match expr { +... + Expr::Literal(scalar, _metadata) => ... +... +} +``` + +- `Expr::WindowFunction` is now Boxed: `Expr::WindowFunction` is now a `Box<WindowFunction>` instead of a `WindowFunction` + directly. This change was made to reduce the size of `Expr` and improve performance when planning queries + (see details on [#16207](https://github.com/apache/datafusion/pull/16207)). + +- UDFs changed to use `FieldRef` instead of `DataType`: To support metadata handling and + prepare for extension types, UDF traits now use [FieldRef] rather than a `DataType` + and nullability. `FieldRef` contains the type and nullability, and additionally allows access to + metadata fields, which can be used for extension types. + +[FieldRef]: https://docs.rs/arrow/latest/arrow/datatypes/type.FieldRef.html + +- Physical Expression return `Field`: Similarly to UDFs, in order to prepare for extension type support the + [PhysicalExpr] trait has been changed to return [Field] rather than `DataType`. To upgrade structs which + implement `PhysicalExpr` you need to implement the `return_field` function. + +[PhysicalExpr]: https://docs.rs/datafusion/latest/datafusion/physical_expr/trait.PhysicalExpr.html +[Field]: https://docs.rs/arrow/latest/arrow/datatypes/struct.Field.html + +- `FileFormat::supports_filters_pushdown` was replaced with `FileSource::try_pushdown_filters` to support upcoming work to push down dynamic filters and physical filter pushdown. + +- `ParquetExec`, `AvroExec`, `CsvExec`, `JsonExec` removed: `ParquetExec`, `AvroExec`, `CsvExec`, and `JsonExec` + were deprecated in DataFusion 46 and are removed in DataFusion 48. + +## Performance Improvements + +DataFusion 48.0.0 comes with some noteworthy performance enhancements: + +- **Fewer unnecessary projections:** DataFusion now removes additional unnecessary `Projection`s in queries. (PRs [#15787](https://github.com/apache/datafusion/pull/15787), [#15761](https://github.com/apache/datafusion/pull/15761), + and [#15746](https://github.com/apache/datafusion/pull/15746) by [xudong963](https://github.com/xudong963)). + +- **Accelerated string functions**: The `ascii` function was optimized to significantly improve its performance + (PR [#16087](https://github.com/apache/datafusion/pull/16087) by [tlm365](https://github.com/tlm365)). The `character_length` function was optimized resulting in + [up to 3x](https://github.com/apache/datafusion/pull/15931#issuecomment-2848561984) performance improvement (PR [#15931](https://github.com/apache/datafusion/pull/15931) by [Dandandan](https://github.com/Dandandan)) + +- **Constant aggregate window expressions:** For unbounded aggregate window functions the result is the + same for all rows within a partition. DataFusion 48.0.0 avoids unnecessary computation for such queries, resulting in [improved performance by 5.6x](https://github.com/apache/datafusion/pull/16234#issuecomment-2935960865) + (PR [#16234](https://github.com/apache/datafusion/pull/16234) by [suibianwanwank](https://github.com/suibianwanwank)) + +## Highlighted New Features + +### New `datafusion-spark` crate + +The DataFusion community has requested [Apache Spark]-compatible functions for many years, but the current builtin function library is most similar to Postgresql, which leads to friction. Unfortunately, there are even functions with the same name but different signatures and/or return types in the two systems. + +One of the many uses of DataFusion is to enhance (e.g. [Apache DataFusion Comet](https://github.com/apache/datafusion-comet)) +or replace (e.g. [Sail](https://github.com/lakehq/sail)) [Apache Spark](https://spark.apache.org/). To +support the community requests and the use cases mentioned above, we have introduced a new +[datafusion-spark] crate for DataFusion with spark-compatible functions so the +community can collaborate to build this shared resource. There are several hundred functions to implement, and we are looking for help to [complete datafusion-spark Spark Compatible Functions]. + +[datafusion-spark]: https://crates.io/crates/datafusion-spark +[Apache Spark]: https://spark.apache.org + +To register all functions in `datafusion-spark` you can use: +```Rust + // Create a new session context + let mut ctx = SessionContext::new(); + // register all spark functions with the context + datafusion_spark::register_all(&mut ctx)?; + // run a query. Note the `sha2` function is now available which + // has Spark semantics + let df = ctx.sql("SELECT sha2('The input String', 256)").await?; + ... +} +``` +Or, to use an individual function, you can do: +```Rust +use datafusion_expr::{col, lit}; +use datafusion_spark::expr_fn::sha2; +// Create the expression `sha2(my_data, 256)` +let expr = sha2(col("my_data"), lit(256)); +... +``` +Thanks to [shehabgamin](https://github.com/shehabgamin) for the initial PR [#15168](https://github.com/apache/datafusion/pull/15168) +and many others for their help adding additional functions. Please consider +helping [complete datafusion-spark Spark Compatible Functions]. + +[Complete datafusion-spark Spark Compatible Functions]: https://github.com/apache/datafusion/issues/15914 + +### `ORDER BY ALL sql` support + +Inspired by [DuckDB](https://duckdb.org/docs/stable/sql/query_syntax/orderby.html#order-by-all-examples), DataFusion 48.0.0 adds support for `ORDER BY ALL`. This allows for easy ordering of all columns in a query: + +```sql +> set datafusion.sql_parser.dialect = 'DuckDB'; +0 row(s) fetched. +> CREATE OR REPLACE TABLE addresses AS + SELECT '123 Quack Blvd' AS address, 'DuckTown' AS city, '11111' AS zip + UNION ALL + SELECT '111 Duck Duck Goose Ln', 'DuckTown', '11111' + UNION ALL + SELECT '111 Duck Duck Goose Ln', 'Duck Town', '11111' + UNION ALL + SELECT '111 Duck Duck Goose Ln', 'Duck Town', '11111-0001'; +0 row(s) fetched. +> SELECT * FROM addresses ORDER BY ALL; ++------------------------+-----------+------------+ +| address | city | zip | ++------------------------+-----------+------------+ +| 111 Duck Duck Goose Ln | Duck Town | 11111 | +| 111 Duck Duck Goose Ln | Duck Town | 11111-0001 | +| 111 Duck Duck Goose Ln | DuckTown | 11111 | +| 123 Quack Blvd | DuckTown | 11111 | ++------------------------+-----------+------------+ +4 row(s) fetched. +``` +Thanks to [PokIsemaine](https://github.com/PokIsemaine) for PR [#15772](https://github.com/apache/datafusion/pull/15772) + +### FFI Support for `AggregateUDF` and `WindowUDF` + +This improvement allows for using user defined aggregate and user defined window functions across FFI boundaries, which enables shared libraries to pass functions back and forth. This feature unlocks: + +- Modules to provide DataFusion based FFI aggregates that can be reused in projects such as [datafusion-python](https://github.com/apache/datafusion-python) + +- Using the same aggregate and window functions without recompiling with different DataFusion versions. + +This completes the work to add support for all UDF types to DataFusion's FFI bindings. Thanks to [timsaucer](https://github.com/timsaucer) +for PRs [#16261](https://github.com/apache/datafusion/pull/16261) and [#14775](https://github.com/apache/datafusion/pull/14775). + +### Reduced size of `Expr` struct + +The [Expr] struct is widely used across the DataFusion and downstream codebases. By `Box`ing `WindowFunction`s, we reduced the size of `Expr` by almost 50%, from `272` to `144` bytes. This reduction improved planning times between 10% and 20% and reduced memory usage. Thanks to [hendrikmakait](https://github.com/hendrikmakait) for Review Comment: fyi @hendrikmakait -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org