alamb commented on code in PR #65:
URL: https://github.com/apache/datafusion-site/pull/65#discussion_r2030236857
##########
content/blog/2025-03-30-datafusion-python-46.0.0.md:
##########
@@ -0,0 +1,300 @@
+---
+layout: post
+title: Apache DataFusion Python 46.0.0 Released
+date: 2025-03-30
+author: timsaucer
+categories: [release]
+---
+<!--
+{% comment %}
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements. See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to you under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License. You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+{% endcomment %}
+-->
+
+
+We are happy to announce that [datafusion-python 46.0.0] has been released.
This release
+brings in all of the new features of the core [DataFusion 46.0.0] library.
Since the last
+blog post for [datafusion-python 43.1.0], a large number of improvements have
been made
+that can be found in the [changelogs].
+
+We highly recommend reviewing the upstream [DataFusion 46.0.0] announcement.
+
+[DataFusion 46.0.0]:
https://datafusion.apache.org/blog/2025/03/24/datafusion-46.0.0
+[datafusion-python 43.1.0]:
https://datafusion.apache.org/blog/2024/12/14/datafusion-python-43.1.0/
+[datafusion-python 46.0.0]: https://pypi.org/project/datafusion/46.0.0/
+[changelogs]:
https://github.com/apache/datafusion-python/tree/main/dev/changelog
+
+## Easier file reading
+
+In these releases we have introduced two new ways to more easily read files
into
+DataFrames.
+
+PR [#982] introduced a series of easier read functions for Parquet, JSON, CSV,
and
+AVRO files. This introduces a concept of a global context that is available by
+default when using these methods. Now instead of creating a default Session
+Context and then calling the read methods, you can simply import these read
+alternative methods and begin working with your DataFrames. Below is an
example of
+how easy to use this new approach is.
+
+```python
+from datafusion.io import read_parquet
+df = read_parquet(path="./examples/tpch/data/customer.parquet")
+```
+
+PR [#980] adds a method for setting up a session context to use URL tables.
With
+this enabled, you can use a path to a local file as a table name. An example
+of how to use this is demonstrated in the following snippet.
+
+```python
+import datafusion
+ctx = datafusion.SessionContext().enable_url_table()
+df = ctx.table("./examples/tpch/data/customer.parquet")
+```
+
+[#982]: https://github.com/apache/datafusion-python/pull/982
+[#980]: https://github.com/apache/datafusion-python/pull/980
+
+## Registering Table Views
+
+DataFusion supports registering a logical plan as a view with a session
context. This
+allows for work flows to create views in one part of the work flow and pass
the session
+context around to other places where that logical plan can be reused. This is
an useful
+feature for building up complex workflows and for code clarity. PR [#1016]
enables this
+feature in `datafusion-python`.
Review Comment:
Here is a minor suggestion on wording:
```suggestion
DataFusion supports registering a logical plan as a view with a session
context. This
allows creating views in one part of your work flow and passinng the session
context to other places where that logical plan can be reused. This is an
useful
feature for building up complex workflows and for code clarity. PR [#1016]
enables this
feature in `datafusion-python`.
```
##########
content/blog/2025-03-30-datafusion-python-46.0.0.md:
##########
@@ -0,0 +1,300 @@
+---
+layout: post
+title: Apache DataFusion Python 46.0.0 Released
+date: 2025-03-30
+author: timsaucer
+categories: [release]
+---
+<!--
+{% comment %}
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements. See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to you under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License. You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+{% endcomment %}
+-->
+
+
+We are happy to announce that [datafusion-python 46.0.0] has been released.
This release
+brings in all of the new features of the core [DataFusion 46.0.0] library.
Since the last
+blog post for [datafusion-python 43.1.0], a large number of improvements have
been made
+that can be found in the [changelogs].
+
+We highly recommend reviewing the upstream [DataFusion 46.0.0] announcement.
+
+[DataFusion 46.0.0]:
https://datafusion.apache.org/blog/2025/03/24/datafusion-46.0.0
+[datafusion-python 43.1.0]:
https://datafusion.apache.org/blog/2024/12/14/datafusion-python-43.1.0/
+[datafusion-python 46.0.0]: https://pypi.org/project/datafusion/46.0.0/
+[changelogs]:
https://github.com/apache/datafusion-python/tree/main/dev/changelog
+
+## Easier file reading
+
+In these releases we have introduced two new ways to more easily read files
into
+DataFrames.
+
+PR [#982] introduced a series of easier read functions for Parquet, JSON, CSV,
and
+AVRO files. This introduces a concept of a global context that is available by
+default when using these methods. Now instead of creating a default Session
+Context and then calling the read methods, you can simply import these read
+alternative methods and begin working with your DataFrames. Below is an
example of
+how easy to use this new approach is.
+
+```python
+from datafusion.io import read_parquet
+df = read_parquet(path="./examples/tpch/data/customer.parquet")
+```
+
+PR [#980] adds a method for setting up a session context to use URL tables.
With
+this enabled, you can use a path to a local file as a table name. An example
+of how to use this is demonstrated in the following snippet.
+
+```python
+import datafusion
+ctx = datafusion.SessionContext().enable_url_table()
+df = ctx.table("./examples/tpch/data/customer.parquet")
+```
+
+[#982]: https://github.com/apache/datafusion-python/pull/982
+[#980]: https://github.com/apache/datafusion-python/pull/980
+
+## Registering Table Views
+
+DataFusion supports registering a logical plan as a view with a session
context. This
+allows for work flows to create views in one part of the work flow and pass
the session
+context around to other places where that logical plan can be reused. This is
an useful
+feature for building up complex workflows and for code clarity. PR [#1016]
enables this
+feature in `datafusion-python`.
+
+For example, supposing you have a DataFrame called `df1`, you could use this
code snippet
+to register the view and then use it in another place:
+
+```python
+ctx.register_view("view1", df1)
+```
+
+And then in another portion of your code which has access to the same session
context
+you can retrive the DataFrame with:
+
+```
+df2 = ctx.table("view1")
+```
+
+[#1016]: https://github.com/apache/datafusion-python/pull/1016
+
+## Asynchronous Iteration of Record Batches
+
+Retrieving a `RecordBatch` from a `RecordBatchStream` was a synchronous call,
which would
+require the end user's code to wait for the data retrieval. This is described
in
+[Issue 974]. We continue to support this as a synchronous iterator, but we
have also added
+in the ability to retrieve the `RecordBatch` using the Python asynchronous
`anext`
+function.
+
+[Issue 974]: https://github.com/apache/datafusion-python/issues/974
+
+## Default Compression for Parquet files
+
+With PR [#981], we change the saving of Parquet files to use zstd compression
by default.
+Previously the default was uncompressed, causing excessive disk storage. Zstd
is an
+excellent compression scheme that balances speed and compression ratio. Users
can still
+save their Parquet files uncompressed by passing in the appropriate value to
the
+`compression` argument when calling `DataFrame.write_parquet`.
+
+[#981]: https://github.com/apache/datafusion-python/pull/981
+
+## UDF Decorators
+
+In PRs [#1040] and [#1061] we add methods to make creating user defined
functions
+easier and take advantage of Python decorators. With these PRs you can save a
step
+from defining a method and then defining a udf of that method. Instead you can
+simply add the appropriate `udf` decorator. Similar methods exist for aggregate
+and window user defined functions.
+
+```python
+@udf([pa.int64(), pa.int64()], pa.bool_(), "stable")
+def my_custom_function(
+ age: pa.Array,
+ favorite_number: pa.Array,
+) -> pa.Array:
+ pass
+```
+
+[#1040]: https://github.com/apache/datafusion-python/pull/1040
+[#1061]: https://github.com/apache/datafusion-python/pull/1061
+
+
+## `uv` package management
+
+[uv] is an extremely fast Python package manager, written in Rust. In the
previous version
+of `datafusion-python` we had a combination of settings of PyPi and Conda.
Instead, we
+switch to using [uv] is our primary method for dependency management.
+
+For most users of DataFusion, this change will be transparent. You can still
install
+via `pip` or `conda`. For developers, the instructions in the repository have
been updated.
+
+[uv]: https://github.com/astral-sh/uv
+
+## Code cleanup
+
+In an effort to improve our code cleanliness and ensure we are following
Python best
+practices, we use [ruff] to perform Python linting. Until now we enabled only
a portion
+of the available linters available. In PRs [#1055] and [#1062], we enable many
more
+of these linters and made code improvements to ensure we are following these
+recommendations.
+
+[ruff]: https://docs.astral.sh/ruff/
+[#1055]: https://github.com/apache/datafusion-python/pull/1055
+[#1062]: https://github.com/apache/datafusion-python/pull/1062
+
+## Improved Jupyter Notebook rendering
+
+Since PR [#839] in DataFusion 41.0.0 we have been able to render DataFrames
using html in
+[jupyter] notebooks. This is a big improvement over the `show` command when we
have the
+ability to render tables. In PR [#1036] we went a step further and added in a
variety
+of features.
+
+- Now html tables are scrollable, vertically and horizontally.
+- When data are truncated, we report this to the user.
+- Instead of showing a small number of rows, we collect up to 2 megabytes of
data to
+display. Since we have scrollable tables, we are able to make more data
available
+to the user without sacrificing notebook usability.
+- We report explicitly when the DataFrame is empty. Previously we would not
output
+anything for an empty table. This indicator is helpful to users to ensure
their plans
+are written correctly. Sometimes a non-output can be overlooked.
+- For long output of data, we generate a collapsed view of the data with an
option
+for the user to click on it to expand the data.
+
+In the below view you can see an example of some of these features such as the
+expandable text and scroll bars.
+
+<figure style="text-align: center;">
+ <img
+ src="/blog/images/python-datafusion-46.0.0/html_rendering.png"
+ width="100%"
+ class="img-responsive"
+ alt="Fig 1: Example html rendering in a jupyter notebook."
+ >
+ <figcaption>
+ <b>Figure 1</b>: With the html rendering enhancements, tables are more
easily
+ viewable in jupyter notebooks.
+</figcaption>
+</figure>
+
+[jupyter]: https://jupyter.org/
+[#839]: https://github.com/apache/datafusion-python/pull/839
+[#1036]: https://github.com/apache/datafusion-python/pull/1036
+
+## Extension Documentation
+
+We have recently added [Extension Documentation] to the DataFusion in Python
website. We
+have received many requests about how to better understand how to integrate
DataFusion
+in Python with other Rust libraries. To address these questions we wrote an
article about
+some of the difficulties that we encounter when using Rust libraries in Python
and our
+approach to addressing them.
+
+[Extension Documentation]:
https://datafusion.apache.org/python/contributor-guide/ffi.html
+
+## Migration Guide
+
+During the upgrade from [DataFusion 43.0.0] to [DataFusion 44.0.0] as our
upstream core
+dependency, we discovered a few changes were necessary within our repository
and our
+unit tests. These notes serve to help guide users who may encounter similar
issues when
+upgrading.
+
+- `RuntimeConfig` is now deprecated in favor of `RuntimeEnvBuilder`. The
migration is
+fairly straightforward, and the corresponding classes have been marked as
deprecated. For
+end users it should be simply a matter of changing the class name.
+- If you perform a `concat` of a `string_view` and `string`, it will now
return a
+`string_view` instead of a `string`. This likely only impacts unit tests that
are validating
+return types. In general, it is recommended to switch to using `string_view`
whenever
+possible. You can see the blog articles [String View Pt 1] and [Pt 2] for more
information
+on these performance improvements.
+- The function `date_part` now returns an `int32` instead of a `float64`. This
is likely
+only impactful to unit tests.
+- We have upgraded the Python minimum version to 3.9 since 3.8 is no longer
officially
+supported.
+
+[DataFusion 43.0.0]:
https://github.com/apache/datafusion/blob/main/dev/changelog/43.0.0.md
+[DataFusion 44.0.0]:
https://github.com/apache/datafusion/blob/main/dev/changelog/44.0.0.md
+[String View Pt 1]:
https://datafusion.apache.org/blog/2024/09/13/string-view-german-style-strings-part-1/
+[Pt 2]:
https://datafusion.apache.org/blog/2024/09/13/string-view-german-style-strings-part-2/
+
+## Coming Soon
+
+There is a lot of excitement around the upcoming work. This list is not
comprehensive, but
+a glimpse into some of the upcoming work includes:
+
+- Reusable DataFusion UDFs: The way user defined functions are currently
written in
+`datafusion-python` is slightly different from those written for the upstream
Rust
+`datafusion`. The core ideas are usually the same, but it means it takes
effort for users
+to re-implement functions already written for Rust projects to be usable in
Python. Issue
+[#1017] addresses this topic. Work is well underway to make it easier to
expose these
+user functions through the FFI boundary. This means that the work that already
exists in
+repositories such as those found in the [datafusion-contrib] project can be
easily
+re-used in Python. This will provide a low effort way to expose significant
functionality
+to the DataFusion in Python community.
+- Additional table providers: We have work well underway to provide a host of
table providers
+to `datafusion-python` including: sqlite, duckdb, postgres, odbc, and mysql! In
+[datafusion-contrib #279] we track the progress of this excellent work. Once
complete, users
+will be able to `pip install` this library and get easy access to all of these
table
+providers. This is another way we are leveraging the FFI work to greatly
expand the usability
+of `datafusion-python` with relatively low effort.
+- External catalog and schema providers: For users who wish to go beyond table
providers
+and have an entire custom catalog with schema, Issue [#1091] tracks the
progress of exposing
+this in Python. With this work, if you have already written a Rust based table
catalog you
+will be able to interface it in Python similar to the work described for the
table
+providers above.
+
+This is only a sample of the great work that is being done. If there are
features you would
+love to see, we encourage you to open an issue and join us as we build
something wonderful.
+
+[#1017]: https://github.com/apache/datafusion-python/issues/1017
+[datafusion-contrib #279]:
https://github.com/datafusion-contrib/datafusion-table-providers/issues/279
+[#1091]: https://github.com/apache/datafusion-python/issues/1091
+[datafusion-contrib]: https://github.com/datafusion-contrib
+
+## Appreciation
+
+We would like to thank everyone who has helped with these releases through
their helpful
+conversations, code review, issue descriptions, and code authoring. We would
especially
+like to thank the following authors of PRs who made these releases possible,
listed in
+alphabetical order by username: [@chenkovsky], [@CrystalZhou0529],
[@ion-elgreco],
Review Comment:
FYI @chenkovsky, @CrystalZhou0529, @ion-elgreco, @jsai28, @kevinjqliu,
@kylebarron, @kosiew, @nirnayroy, and @Spaarsh
##########
content/blog/2025-03-30-datafusion-python-46.0.0.md:
##########
@@ -0,0 +1,300 @@
+---
+layout: post
+title: Apache DataFusion Python 46.0.0 Released
+date: 2025-03-30
+author: timsaucer
+categories: [release]
+---
+<!--
+{% comment %}
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements. See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to you under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License. You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+{% endcomment %}
+-->
+
+
+We are happy to announce that [datafusion-python 46.0.0] has been released.
This release
+brings in all of the new features of the core [DataFusion 46.0.0] library.
Since the last
+blog post for [datafusion-python 43.1.0], a large number of improvements have
been made
+that can be found in the [changelogs].
+
+We highly recommend reviewing the upstream [DataFusion 46.0.0] announcement.
+
+[DataFusion 46.0.0]:
https://datafusion.apache.org/blog/2025/03/24/datafusion-46.0.0
+[datafusion-python 43.1.0]:
https://datafusion.apache.org/blog/2024/12/14/datafusion-python-43.1.0/
+[datafusion-python 46.0.0]: https://pypi.org/project/datafusion/46.0.0/
+[changelogs]:
https://github.com/apache/datafusion-python/tree/main/dev/changelog
+
+## Easier file reading
+
+In these releases we have introduced two new ways to more easily read files
into
+DataFrames.
+
+PR [#982] introduced a series of easier read functions for Parquet, JSON, CSV,
and
+AVRO files. This introduces a concept of a global context that is available by
+default when using these methods. Now instead of creating a default Session
+Context and then calling the read methods, you can simply import these read
+alternative methods and begin working with your DataFrames. Below is an
example of
+how easy to use this new approach is.
+
+```python
+from datafusion.io import read_parquet
+df = read_parquet(path="./examples/tpch/data/customer.parquet")
+```
+
+PR [#980] adds a method for setting up a session context to use URL tables.
With
+this enabled, you can use a path to a local file as a table name. An example
+of how to use this is demonstrated in the following snippet.
+
+```python
+import datafusion
+ctx = datafusion.SessionContext().enable_url_table()
Review Comment:
FYI @goldmedal (this is exposing your great work via datafusion-python 🐱 🎉 )
##########
content/blog/2025-03-30-datafusion-python-46.0.0.md:
##########
@@ -0,0 +1,300 @@
+---
+layout: post
+title: Apache DataFusion Python 46.0.0 Released
+date: 2025-03-30
+author: timsaucer
+categories: [release]
+---
+<!--
+{% comment %}
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements. See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to you under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License. You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+{% endcomment %}
+-->
+
+
+We are happy to announce that [datafusion-python 46.0.0] has been released.
This release
+brings in all of the new features of the core [DataFusion 46.0.0] library.
Since the last
+blog post for [datafusion-python 43.1.0], a large number of improvements have
been made
+that can be found in the [changelogs].
+
+We highly recommend reviewing the upstream [DataFusion 46.0.0] announcement.
+
+[DataFusion 46.0.0]:
https://datafusion.apache.org/blog/2025/03/24/datafusion-46.0.0
+[datafusion-python 43.1.0]:
https://datafusion.apache.org/blog/2024/12/14/datafusion-python-43.1.0/
+[datafusion-python 46.0.0]: https://pypi.org/project/datafusion/46.0.0/
+[changelogs]:
https://github.com/apache/datafusion-python/tree/main/dev/changelog
+
+## Easier file reading
+
+In these releases we have introduced two new ways to more easily read files
into
+DataFrames.
+
+PR [#982] introduced a series of easier read functions for Parquet, JSON, CSV,
and
+AVRO files. This introduces a concept of a global context that is available by
+default when using these methods. Now instead of creating a default Session
+Context and then calling the read methods, you can simply import these read
+alternative methods and begin working with your DataFrames. Below is an
example of
+how easy to use this new approach is.
+
+```python
+from datafusion.io import read_parquet
+df = read_parquet(path="./examples/tpch/data/customer.parquet")
+```
+
+PR [#980] adds a method for setting up a session context to use URL tables.
With
+this enabled, you can use a path to a local file as a table name. An example
+of how to use this is demonstrated in the following snippet.
+
+```python
+import datafusion
+ctx = datafusion.SessionContext().enable_url_table()
+df = ctx.table("./examples/tpch/data/customer.parquet")
+```
+
+[#982]: https://github.com/apache/datafusion-python/pull/982
+[#980]: https://github.com/apache/datafusion-python/pull/980
+
+## Registering Table Views
+
+DataFusion supports registering a logical plan as a view with a session
context. This
+allows for work flows to create views in one part of the work flow and pass
the session
+context around to other places where that logical plan can be reused. This is
an useful
+feature for building up complex workflows and for code clarity. PR [#1016]
enables this
+feature in `datafusion-python`.
+
+For example, supposing you have a DataFrame called `df1`, you could use this
code snippet
+to register the view and then use it in another place:
+
+```python
+ctx.register_view("view1", df1)
+```
+
+And then in another portion of your code which has access to the same session
context
+you can retrive the DataFrame with:
+
+```
+df2 = ctx.table("view1")
+```
+
+[#1016]: https://github.com/apache/datafusion-python/pull/1016
+
+## Asynchronous Iteration of Record Batches
+
+Retrieving a `RecordBatch` from a `RecordBatchStream` was a synchronous call,
which would
+require the end user's code to wait for the data retrieval. This is described
in
+[Issue 974]. We continue to support this as a synchronous iterator, but we
have also added
+in the ability to retrieve the `RecordBatch` using the Python asynchronous
`anext`
+function.
+
+[Issue 974]: https://github.com/apache/datafusion-python/issues/974
+
+## Default Compression for Parquet files
+
+With PR [#981], we change the saving of Parquet files to use zstd compression
by default.
+Previously the default was uncompressed, causing excessive disk storage. Zstd
is an
+excellent compression scheme that balances speed and compression ratio. Users
can still
+save their Parquet files uncompressed by passing in the appropriate value to
the
+`compression` argument when calling `DataFrame.write_parquet`.
+
+[#981]: https://github.com/apache/datafusion-python/pull/981
+
+## UDF Decorators
+
+In PRs [#1040] and [#1061] we add methods to make creating user defined
functions
+easier and take advantage of Python decorators. With these PRs you can save a
step
+from defining a method and then defining a udf of that method. Instead you can
+simply add the appropriate `udf` decorator. Similar methods exist for aggregate
+and window user defined functions.
+
+```python
+@udf([pa.int64(), pa.int64()], pa.bool_(), "stable")
+def my_custom_function(
+ age: pa.Array,
+ favorite_number: pa.Array,
+) -> pa.Array:
+ pass
+```
+
+[#1040]: https://github.com/apache/datafusion-python/pull/1040
+[#1061]: https://github.com/apache/datafusion-python/pull/1061
+
+
+## `uv` package management
+
+[uv] is an extremely fast Python package manager, written in Rust. In the
previous version
+of `datafusion-python` we had a combination of settings of PyPi and Conda.
Instead, we
+switch to using [uv] is our primary method for dependency management.
+
+For most users of DataFusion, this change will be transparent. You can still
install
+via `pip` or `conda`. For developers, the instructions in the repository have
been updated.
+
+[uv]: https://github.com/astral-sh/uv
+
+## Code cleanup
+
+In an effort to improve our code cleanliness and ensure we are following
Python best
+practices, we use [ruff] to perform Python linting. Until now we enabled only
a portion
+of the available linters available. In PRs [#1055] and [#1062], we enable many
more
+of these linters and made code improvements to ensure we are following these
+recommendations.
+
+[ruff]: https://docs.astral.sh/ruff/
+[#1055]: https://github.com/apache/datafusion-python/pull/1055
+[#1062]: https://github.com/apache/datafusion-python/pull/1062
+
+## Improved Jupyter Notebook rendering
Review Comment:
this is really cool
##########
content/blog/2025-03-30-datafusion-python-46.0.0.md:
##########
@@ -0,0 +1,300 @@
+---
+layout: post
+title: Apache DataFusion Python 46.0.0 Released
+date: 2025-03-30
+author: timsaucer
+categories: [release]
+---
+<!--
+{% comment %}
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements. See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to you under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License. You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+{% endcomment %}
+-->
+
+
+We are happy to announce that [datafusion-python 46.0.0] has been released.
This release
+brings in all of the new features of the core [DataFusion 46.0.0] library.
Since the last
+blog post for [datafusion-python 43.1.0], a large number of improvements have
been made
+that can be found in the [changelogs].
+
+We highly recommend reviewing the upstream [DataFusion 46.0.0] announcement.
+
+[DataFusion 46.0.0]:
https://datafusion.apache.org/blog/2025/03/24/datafusion-46.0.0
+[datafusion-python 43.1.0]:
https://datafusion.apache.org/blog/2024/12/14/datafusion-python-43.1.0/
+[datafusion-python 46.0.0]: https://pypi.org/project/datafusion/46.0.0/
+[changelogs]:
https://github.com/apache/datafusion-python/tree/main/dev/changelog
+
+## Easier file reading
+
+In these releases we have introduced two new ways to more easily read files
into
+DataFrames.
+
+PR [#982] introduced a series of easier read functions for Parquet, JSON, CSV,
and
+AVRO files. This introduces a concept of a global context that is available by
+default when using these methods. Now instead of creating a default Session
+Context and then calling the read methods, you can simply import these read
+alternative methods and begin working with your DataFrames. Below is an
example of
+how easy to use this new approach is.
+
+```python
+from datafusion.io import read_parquet
+df = read_parquet(path="./examples/tpch/data/customer.parquet")
+```
+
+PR [#980] adds a method for setting up a session context to use URL tables.
With
+this enabled, you can use a path to a local file as a table name. An example
+of how to use this is demonstrated in the following snippet.
+
+```python
+import datafusion
+ctx = datafusion.SessionContext().enable_url_table()
+df = ctx.table("./examples/tpch/data/customer.parquet")
+```
+
+[#982]: https://github.com/apache/datafusion-python/pull/982
+[#980]: https://github.com/apache/datafusion-python/pull/980
+
+## Registering Table Views
+
+DataFusion supports registering a logical plan as a view with a session
context. This
+allows for work flows to create views in one part of the work flow and pass
the session
+context around to other places where that logical plan can be reused. This is
an useful
+feature for building up complex workflows and for code clarity. PR [#1016]
enables this
+feature in `datafusion-python`.
+
+For example, supposing you have a DataFrame called `df1`, you could use this
code snippet
+to register the view and then use it in another place:
+
+```python
+ctx.register_view("view1", df1)
+```
+
+And then in another portion of your code which has access to the same session
context
+you can retrive the DataFrame with:
+
+```
+df2 = ctx.table("view1")
+```
+
+[#1016]: https://github.com/apache/datafusion-python/pull/1016
+
+## Asynchronous Iteration of Record Batches
+
+Retrieving a `RecordBatch` from a `RecordBatchStream` was a synchronous call,
which would
+require the end user's code to wait for the data retrieval. This is described
in
+[Issue 974]. We continue to support this as a synchronous iterator, but we
have also added
+in the ability to retrieve the `RecordBatch` using the Python asynchronous
`anext`
+function.
+
+[Issue 974]: https://github.com/apache/datafusion-python/issues/974
+
+## Default Compression for Parquet files
Review Comment:
```suggestion
## Default ZSTD Compression for Parquet files
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]