This is an automated email from the ASF dual-hosted git repository. github-bot pushed a commit to branch asf-staging in repository https://gitbox.apache.org/repos/asf/datafusion-site.git
The following commit(s) were added to refs/heads/asf-staging by this push: new 745a943 Commit build products 745a943 is described below commit 745a943f077d0ecff770c0404aa9fa7bac24e128 Author: Build Pelican (action) <priv...@infra.apache.org> AuthorDate: Fri Jan 17 16:15:41 2025 +0000 Commit build products --- blog/2019/02/04/datafusion-donation/index.html | 2 +- blog/2021/04/12/ballista-donation/index.html | 2 +- blog/2021/08/18/ballista-0.5.0/index.html | 2 +- blog/2021/08/18/datafusion-5.0.0/index.html | 2 +- .../11/19/2021-11-8-datafusion-6.0.0.md/index.html | 2 +- blog/2022/02/28/datafusion-7.0.0/index.html | 2 +- blog/2022/03/21/datafusion-contrib/index.html | 2 +- blog/2022/05/16/datafusion-8.0.0/index.html | 2 +- blog/2022/10/25/datafusion-13.0.0/index.html | 2 +- blog/2022/10/28/ballista-0.9.0/index.html | 2 +- blog/2023/01/19/datafusion-16.0.0/index.html | 2 +- blog/2023/06/24/datafusion-25.0.0/index.html | 2 +- .../2023/08/05/datafusion_fast_grouping/index.html | 2 +- blog/2024/01/19/datafusion-34.0.0/index.html | 2 +- blog/2024/03/06/comet-donation/index.html | 2 +- blog/2024/05/07/datafusion-tlp/index.html | 2 +- blog/2024/07/20/datafusion-comet-0.1.0/index.html | 2 +- blog/2024/07/24/datafusion-40.0.0/index.html | 2 +- .../2024/08/20/python-datafusion-40.0.0/index.html | 2 +- blog/2024/08/28/datafusion-comet-0.2.0/index.html | 2 +- .../index.html | 2 +- .../index.html | 2 +- blog/2024/09/27/datafusion-comet-0.3.0/index.html | 2 +- .../index.html | 2 +- .../datafusion-python-udf-comparisons/index.html | 2 +- blog/2024/11/20/datafusion-comet-0.4.0/index.html | 2 +- .../2024/12/14/datafusion-python-43.1.0/index.html | 2 +- .../01/17/datafusion-comet-0.5.0}/index.html | 97 ++++++++++------------ blog/about.html | 2 +- blog/author/agrove.html | 2 +- blog/author/alamb-dandandan-tustvold.html | 2 +- .../andrew-lamb-staff-engineer-at-influxdata.html | 2 +- blog/author/pmc.html | 42 +++++++++- blog/author/timsaucer.html | 2 +- blog/author/xiangpeng-hao-andrew-lamb.html | 2 +- blog/category/blog.html | 42 +++++++++- blog/feed.xml | 23 ++++- blog/feeds/all-en.atom.xml | 95 ++++++++++++++++++++- blog/feeds/blog.atom.xml | 95 ++++++++++++++++++++- blog/feeds/pmc.atom.xml | 95 ++++++++++++++++++++- blog/feeds/pmc.rss.xml | 23 ++++- blog/index.html | 42 +++++++++- 42 files changed, 528 insertions(+), 92 deletions(-) diff --git a/blog/2019/02/04/datafusion-donation/index.html b/blog/2019/02/04/datafusion-donation/index.html index 6078d6b..0a04dd8 100644 --- a/blog/2019/02/04/datafusion-donation/index.html +++ b/blog/2019/02/04/datafusion-donation/index.html @@ -149,7 +149,7 @@ limitations under the License. <div class="row"> <div class="large-12 medium-12 columns"> <p style="font-style: italic; font-size: 0.8rem; text-align: center;"> - Copyright 2024, <a href="https://www.apache.org/">The Apache Software Foundation</a>, Licensed under the <a href="https://www.apache.org/licenses/LICENSE-2.0">Apache License, Version 2.0</a>.<br/> + Copyright 2025, <a href="https://www.apache.org/">The Apache Software Foundation</a>, Licensed under the <a href="https://www.apache.org/licenses/LICENSE-2.0">Apache License, Version 2.0</a>.<br/> Apache® and the Apache feather logo are trademarks of The Apache Software Foundation. </p> </div> diff --git a/blog/2021/04/12/ballista-donation/index.html b/blog/2021/04/12/ballista-donation/index.html index fb99e53..e8ad1d9 100644 --- a/blog/2021/04/12/ballista-donation/index.html +++ b/blog/2021/04/12/ballista-donation/index.html @@ -100,7 +100,7 @@ maintainers.</p> <div class="row"> <div class="large-12 medium-12 columns"> <p style="font-style: italic; font-size: 0.8rem; text-align: center;"> - Copyright 2024, <a href="https://www.apache.org/">The Apache Software Foundation</a>, Licensed under the <a href="https://www.apache.org/licenses/LICENSE-2.0">Apache License, Version 2.0</a>.<br/> + Copyright 2025, <a href="https://www.apache.org/">The Apache Software Foundation</a>, Licensed under the <a href="https://www.apache.org/licenses/LICENSE-2.0">Apache License, Version 2.0</a>.<br/> Apache® and the Apache feather logo are trademarks of The Apache Software Foundation. </p> </div> diff --git a/blog/2021/08/18/ballista-0.5.0/index.html b/blog/2021/08/18/ballista-0.5.0/index.html index b6861ad..3cfe738 100644 --- a/blog/2021/08/18/ballista-0.5.0/index.html +++ b/blog/2021/08/18/ballista-0.5.0/index.html @@ -114,7 +114,7 @@ and the full list is <a href="https://github.com/apache/arrow-datafusion/issues" <div class="row"> <div class="large-12 medium-12 columns"> <p style="font-style: italic; font-size: 0.8rem; text-align: center;"> - Copyright 2024, <a href="https://www.apache.org/">The Apache Software Foundation</a>, Licensed under the <a href="https://www.apache.org/licenses/LICENSE-2.0">Apache License, Version 2.0</a>.<br/> + Copyright 2025, <a href="https://www.apache.org/">The Apache Software Foundation</a>, Licensed under the <a href="https://www.apache.org/licenses/LICENSE-2.0">Apache License, Version 2.0</a>.<br/> Apache® and the Apache feather logo are trademarks of The Apache Software Foundation. </p> </div> diff --git a/blog/2021/08/18/datafusion-5.0.0/index.html b/blog/2021/08/18/datafusion-5.0.0/index.html index 764bf79..79f34c3 100644 --- a/blog/2021/08/18/datafusion-5.0.0/index.html +++ b/blog/2021/08/18/datafusion-5.0.0/index.html @@ -145,7 +145,7 @@ and the full list is <a href="https://github.com/apache/arrow-datafusion/issues" <div class="row"> <div class="large-12 medium-12 columns"> <p style="font-style: italic; font-size: 0.8rem; text-align: center;"> - Copyright 2024, <a href="https://www.apache.org/">The Apache Software Foundation</a>, Licensed under the <a href="https://www.apache.org/licenses/LICENSE-2.0">Apache License, Version 2.0</a>.<br/> + Copyright 2025, <a href="https://www.apache.org/">The Apache Software Foundation</a>, Licensed under the <a href="https://www.apache.org/licenses/LICENSE-2.0">Apache License, Version 2.0</a>.<br/> Apache® and the Apache feather logo are trademarks of The Apache Software Foundation. </p> </div> diff --git a/blog/2021/11/19/2021-11-8-datafusion-6.0.0.md/index.html b/blog/2021/11/19/2021-11-8-datafusion-6.0.0.md/index.html index c6c6d88..b3c94ce 100644 --- a/blog/2021/11/19/2021-11-8-datafusion-6.0.0.md/index.html +++ b/blog/2021/11/19/2021-11-8-datafusion-6.0.0.md/index.html @@ -183,7 +183,7 @@ ways to engage with the community.</p> <div class="row"> <div class="large-12 medium-12 columns"> <p style="font-style: italic; font-size: 0.8rem; text-align: center;"> - Copyright 2024, <a href="https://www.apache.org/">The Apache Software Foundation</a>, Licensed under the <a href="https://www.apache.org/licenses/LICENSE-2.0">Apache License, Version 2.0</a>.<br/> + Copyright 2025, <a href="https://www.apache.org/">The Apache Software Foundation</a>, Licensed under the <a href="https://www.apache.org/licenses/LICENSE-2.0">Apache License, Version 2.0</a>.<br/> Apache® and the Apache feather logo are trademarks of The Apache Software Foundation. </p> </div> diff --git a/blog/2022/02/28/datafusion-7.0.0/index.html b/blog/2022/02/28/datafusion-7.0.0/index.html index f6164d1..feb34f3 100644 --- a/blog/2022/02/28/datafusion-7.0.0/index.html +++ b/blog/2022/02/28/datafusion-7.0.0/index.html @@ -189,7 +189,7 @@ ways to engage with the community.</p> <div class="row"> <div class="large-12 medium-12 columns"> <p style="font-style: italic; font-size: 0.8rem; text-align: center;"> - Copyright 2024, <a href="https://www.apache.org/">The Apache Software Foundation</a>, Licensed under the <a href="https://www.apache.org/licenses/LICENSE-2.0">Apache License, Version 2.0</a>.<br/> + Copyright 2025, <a href="https://www.apache.org/">The Apache Software Foundation</a>, Licensed under the <a href="https://www.apache.org/licenses/LICENSE-2.0">Apache License, Version 2.0</a>.<br/> Apache® and the Apache feather logo are trademarks of The Apache Software Foundation. </p> </div> diff --git a/blog/2022/03/21/datafusion-contrib/index.html b/blog/2022/03/21/datafusion-contrib/index.html index 4b2eee0..5e97e01 100644 --- a/blog/2022/03/21/datafusion-contrib/index.html +++ b/blog/2022/03/21/datafusion-contrib/index.html @@ -185,7 +185,7 @@ can help by trying out DataFusion on some of your own data and projects and let <div class="row"> <div class="large-12 medium-12 columns"> <p style="font-style: italic; font-size: 0.8rem; text-align: center;"> - Copyright 2024, <a href="https://www.apache.org/">The Apache Software Foundation</a>, Licensed under the <a href="https://www.apache.org/licenses/LICENSE-2.0">Apache License, Version 2.0</a>.<br/> + Copyright 2025, <a href="https://www.apache.org/">The Apache Software Foundation</a>, Licensed under the <a href="https://www.apache.org/licenses/LICENSE-2.0">Apache License, Version 2.0</a>.<br/> Apache® and the Apache feather logo are trademarks of The Apache Software Foundation. </p> </div> diff --git a/blog/2022/05/16/datafusion-8.0.0/index.html b/blog/2022/05/16/datafusion-8.0.0/index.html index 810abcb..f62fd5b 100644 --- a/blog/2022/05/16/datafusion-8.0.0/index.html +++ b/blog/2022/05/16/datafusion-8.0.0/index.html @@ -217,7 +217,7 @@ ways to engage with the community.</p> <div class="row"> <div class="large-12 medium-12 columns"> <p style="font-style: italic; font-size: 0.8rem; text-align: center;"> - Copyright 2024, <a href="https://www.apache.org/">The Apache Software Foundation</a>, Licensed under the <a href="https://www.apache.org/licenses/LICENSE-2.0">Apache License, Version 2.0</a>.<br/> + Copyright 2025, <a href="https://www.apache.org/">The Apache Software Foundation</a>, Licensed under the <a href="https://www.apache.org/licenses/LICENSE-2.0">Apache License, Version 2.0</a>.<br/> Apache® and the Apache feather logo are trademarks of The Apache Software Foundation. </p> </div> diff --git a/blog/2022/10/25/datafusion-13.0.0/index.html b/blog/2022/10/25/datafusion-13.0.0/index.html index 0de62b5..d9299c4 100644 --- a/blog/2022/10/25/datafusion-13.0.0/index.html +++ b/blog/2022/10/25/datafusion-13.0.0/index.html @@ -246,7 +246,7 @@ ways to engage with the community.</p> <div class="row"> <div class="large-12 medium-12 columns"> <p style="font-style: italic; font-size: 0.8rem; text-align: center;"> - Copyright 2024, <a href="https://www.apache.org/">The Apache Software Foundation</a>, Licensed under the <a href="https://www.apache.org/licenses/LICENSE-2.0">Apache License, Version 2.0</a>.<br/> + Copyright 2025, <a href="https://www.apache.org/">The Apache Software Foundation</a>, Licensed under the <a href="https://www.apache.org/licenses/LICENSE-2.0">Apache License, Version 2.0</a>.<br/> Apache® and the Apache feather logo are trademarks of The Apache Software Foundation. </p> </div> diff --git a/blog/2022/10/28/ballista-0.9.0/index.html b/blog/2022/10/28/ballista-0.9.0/index.html index 98925b9..5fb32d8 100644 --- a/blog/2022/10/28/ballista-0.9.0/index.html +++ b/blog/2022/10/28/ballista-0.9.0/index.html @@ -126,7 +126,7 @@ for any bugs or feature suggestions.</p> <div class="row"> <div class="large-12 medium-12 columns"> <p style="font-style: italic; font-size: 0.8rem; text-align: center;"> - Copyright 2024, <a href="https://www.apache.org/">The Apache Software Foundation</a>, Licensed under the <a href="https://www.apache.org/licenses/LICENSE-2.0">Apache License, Version 2.0</a>.<br/> + Copyright 2025, <a href="https://www.apache.org/">The Apache Software Foundation</a>, Licensed under the <a href="https://www.apache.org/licenses/LICENSE-2.0">Apache License, Version 2.0</a>.<br/> Apache® and the Apache feather logo are trademarks of The Apache Software Foundation. </p> </div> diff --git a/blog/2023/01/19/datafusion-16.0.0/index.html b/blog/2023/01/19/datafusion-16.0.0/index.html index 37c0d06..b303041 100644 --- a/blog/2023/01/19/datafusion-16.0.0/index.html +++ b/blog/2023/01/19/datafusion-16.0.0/index.html @@ -267,7 +267,7 @@ ways to engage with the community.</p> <div class="row"> <div class="large-12 medium-12 columns"> <p style="font-style: italic; font-size: 0.8rem; text-align: center;"> - Copyright 2024, <a href="https://www.apache.org/">The Apache Software Foundation</a>, Licensed under the <a href="https://www.apache.org/licenses/LICENSE-2.0">Apache License, Version 2.0</a>.<br/> + Copyright 2025, <a href="https://www.apache.org/">The Apache Software Foundation</a>, Licensed under the <a href="https://www.apache.org/licenses/LICENSE-2.0">Apache License, Version 2.0</a>.<br/> Apache® and the Apache feather logo are trademarks of The Apache Software Foundation. </p> </div> diff --git a/blog/2023/06/24/datafusion-25.0.0/index.html b/blog/2023/06/24/datafusion-25.0.0/index.html index aff5d0f..384c70f 100644 --- a/blog/2023/06/24/datafusion-25.0.0/index.html +++ b/blog/2023/06/24/datafusion-25.0.0/index.html @@ -245,7 +245,7 @@ community.</p> <div class="row"> <div class="large-12 medium-12 columns"> <p style="font-style: italic; font-size: 0.8rem; text-align: center;"> - Copyright 2024, <a href="https://www.apache.org/">The Apache Software Foundation</a>, Licensed under the <a href="https://www.apache.org/licenses/LICENSE-2.0">Apache License, Version 2.0</a>.<br/> + Copyright 2025, <a href="https://www.apache.org/">The Apache Software Foundation</a>, Licensed under the <a href="https://www.apache.org/licenses/LICENSE-2.0">Apache License, Version 2.0</a>.<br/> Apache® and the Apache feather logo are trademarks of The Apache Software Foundation. </p> </div> diff --git a/blog/2023/08/05/datafusion_fast_grouping/index.html b/blog/2023/08/05/datafusion_fast_grouping/index.html index 242166c..4760de1 100644 --- a/blog/2023/08/05/datafusion_fast_grouping/index.html +++ b/blog/2023/08/05/datafusion_fast_grouping/index.html @@ -347,7 +347,7 @@ Thus, to answer the query, DataFusion must map each of the 100M different input <div class="row"> <div class="large-12 medium-12 columns"> <p style="font-style: italic; font-size: 0.8rem; text-align: center;"> - Copyright 2024, <a href="https://www.apache.org/">The Apache Software Foundation</a>, Licensed under the <a href="https://www.apache.org/licenses/LICENSE-2.0">Apache License, Version 2.0</a>.<br/> + Copyright 2025, <a href="https://www.apache.org/">The Apache Software Foundation</a>, Licensed under the <a href="https://www.apache.org/licenses/LICENSE-2.0">Apache License, Version 2.0</a>.<br/> Apache® and the Apache feather logo are trademarks of The Apache Software Foundation. </p> </div> diff --git a/blog/2024/01/19/datafusion-34.0.0/index.html b/blog/2024/01/19/datafusion-34.0.0/index.html index c0da7f2..2ea4b74 100644 --- a/blog/2024/01/19/datafusion-34.0.0/index.html +++ b/blog/2024/01/19/datafusion-34.0.0/index.html @@ -294,7 +294,7 @@ the methods listed in our <a href="https://arrow.apache.org/datafusion/contribut <div class="row"> <div class="large-12 medium-12 columns"> <p style="font-style: italic; font-size: 0.8rem; text-align: center;"> - Copyright 2024, <a href="https://www.apache.org/">The Apache Software Foundation</a>, Licensed under the <a href="https://www.apache.org/licenses/LICENSE-2.0">Apache License, Version 2.0</a>.<br/> + Copyright 2025, <a href="https://www.apache.org/">The Apache Software Foundation</a>, Licensed under the <a href="https://www.apache.org/licenses/LICENSE-2.0">Apache License, Version 2.0</a>.<br/> Apache® and the Apache feather logo are trademarks of The Apache Software Foundation. </p> </div> diff --git a/blog/2024/03/06/comet-donation/index.html b/blog/2024/03/06/comet-donation/index.html index b7be6ea..6a8420a 100644 --- a/blog/2024/03/06/comet-donation/index.html +++ b/blog/2024/03/06/comet-donation/index.html @@ -110,7 +110,7 @@ expect to post another update with more details at that time.</p> <div class="row"> <div class="large-12 medium-12 columns"> <p style="font-style: italic; font-size: 0.8rem; text-align: center;"> - Copyright 2024, <a href="https://www.apache.org/">The Apache Software Foundation</a>, Licensed under the <a href="https://www.apache.org/licenses/LICENSE-2.0">Apache License, Version 2.0</a>.<br/> + Copyright 2025, <a href="https://www.apache.org/">The Apache Software Foundation</a>, Licensed under the <a href="https://www.apache.org/licenses/LICENSE-2.0">Apache License, Version 2.0</a>.<br/> Apache® and the Apache feather logo are trademarks of The Apache Software Foundation. </p> </div> diff --git a/blog/2024/05/07/datafusion-tlp/index.html b/blog/2024/05/07/datafusion-tlp/index.html index 9d888aa..2c4a2b7 100644 --- a/blog/2024/05/07/datafusion-tlp/index.html +++ b/blog/2024/05/07/datafusion-tlp/index.html @@ -104,7 +104,7 @@ documentation, bug reports, or a PR with documentation, tests or code.</p> <div class="row"> <div class="large-12 medium-12 columns"> <p style="font-style: italic; font-size: 0.8rem; text-align: center;"> - Copyright 2024, <a href="https://www.apache.org/">The Apache Software Foundation</a>, Licensed under the <a href="https://www.apache.org/licenses/LICENSE-2.0">Apache License, Version 2.0</a>.<br/> + Copyright 2025, <a href="https://www.apache.org/">The Apache Software Foundation</a>, Licensed under the <a href="https://www.apache.org/licenses/LICENSE-2.0">Apache License, Version 2.0</a>.<br/> Apache® and the Apache feather logo are trademarks of The Apache Software Foundation. </p> </div> diff --git a/blog/2024/07/20/datafusion-comet-0.1.0/index.html b/blog/2024/07/20/datafusion-comet-0.1.0/index.html index 7ef6dec..3deb707 100644 --- a/blog/2024/07/20/datafusion-comet-0.1.0/index.html +++ b/blog/2024/07/20/datafusion-comet-0.1.0/index.html @@ -144,7 +144,7 @@ Comet.</p> <div class="row"> <div class="large-12 medium-12 columns"> <p style="font-style: italic; font-size: 0.8rem; text-align: center;"> - Copyright 2024, <a href="https://www.apache.org/">The Apache Software Foundation</a>, Licensed under the <a href="https://www.apache.org/licenses/LICENSE-2.0">Apache License, Version 2.0</a>.<br/> + Copyright 2025, <a href="https://www.apache.org/">The Apache Software Foundation</a>, Licensed under the <a href="https://www.apache.org/licenses/LICENSE-2.0">Apache License, Version 2.0</a>.<br/> Apache® and the Apache feather logo are trademarks of The Apache Software Foundation. </p> </div> diff --git a/blog/2024/07/24/datafusion-40.0.0/index.html b/blog/2024/07/24/datafusion-40.0.0/index.html index 7866e95..b2d4827 100644 --- a/blog/2024/07/24/datafusion-40.0.0/index.html +++ b/blog/2024/07/24/datafusion-40.0.0/index.html @@ -353,7 +353,7 @@ can find how to reach us on the <a href="https://datafusion.apache.org/contribut <div class="row"> <div class="large-12 medium-12 columns"> <p style="font-style: italic; font-size: 0.8rem; text-align: center;"> - Copyright 2024, <a href="https://www.apache.org/">The Apache Software Foundation</a>, Licensed under the <a href="https://www.apache.org/licenses/LICENSE-2.0">Apache License, Version 2.0</a>.<br/> + Copyright 2025, <a href="https://www.apache.org/">The Apache Software Foundation</a>, Licensed under the <a href="https://www.apache.org/licenses/LICENSE-2.0">Apache License, Version 2.0</a>.<br/> Apache® and the Apache feather logo are trademarks of The Apache Software Foundation. </p> </div> diff --git a/blog/2024/08/20/python-datafusion-40.0.0/index.html b/blog/2024/08/20/python-datafusion-40.0.0/index.html index e77749b..3139ab3 100644 --- a/blog/2024/08/20/python-datafusion-40.0.0/index.html +++ b/blog/2024/08/20/python-datafusion-40.0.0/index.html @@ -200,7 +200,7 @@ page.</p> <div class="row"> <div class="large-12 medium-12 columns"> <p style="font-style: italic; font-size: 0.8rem; text-align: center;"> - Copyright 2024, <a href="https://www.apache.org/">The Apache Software Foundation</a>, Licensed under the <a href="https://www.apache.org/licenses/LICENSE-2.0">Apache License, Version 2.0</a>.<br/> + Copyright 2025, <a href="https://www.apache.org/">The Apache Software Foundation</a>, Licensed under the <a href="https://www.apache.org/licenses/LICENSE-2.0">Apache License, Version 2.0</a>.<br/> Apache® and the Apache feather logo are trademarks of The Apache Software Foundation. </p> </div> diff --git a/blog/2024/08/28/datafusion-comet-0.2.0/index.html b/blog/2024/08/28/datafusion-comet-0.2.0/index.html index 5908b78..e0772da 100644 --- a/blog/2024/08/28/datafusion-comet-0.2.0/index.html +++ b/blog/2024/08/28/datafusion-comet-0.2.0/index.html @@ -117,7 +117,7 @@ Comet.</p> <div class="row"> <div class="large-12 medium-12 columns"> <p style="font-style: italic; font-size: 0.8rem; text-align: center;"> - Copyright 2024, <a href="https://www.apache.org/">The Apache Software Foundation</a>, Licensed under the <a href="https://www.apache.org/licenses/LICENSE-2.0">Apache License, Version 2.0</a>.<br/> + Copyright 2025, <a href="https://www.apache.org/">The Apache Software Foundation</a>, Licensed under the <a href="https://www.apache.org/licenses/LICENSE-2.0">Apache License, Version 2.0</a>.<br/> Apache® and the Apache feather logo are trademarks of The Apache Software Foundation. </p> </div> diff --git a/blog/2024/09/13/string-view-german-style-strings-part-1/index.html b/blog/2024/09/13/string-view-german-style-strings-part-1/index.html index a4736d5..d980308 100644 --- a/blog/2024/09/13/string-view-german-style-strings-part-1/index.html +++ b/blog/2024/09/13/string-view-german-style-strings-part-1/index.html @@ -161,7 +161,7 @@ along with some of the pitfalls we encountered while implementing them.</p> <div class="row"> <div class="large-12 medium-12 columns"> <p style="font-style: italic; font-size: 0.8rem; text-align: center;"> - Copyright 2024, <a href="https://www.apache.org/">The Apache Software Foundation</a>, Licensed under the <a href="https://www.apache.org/licenses/LICENSE-2.0">Apache License, Version 2.0</a>.<br/> + Copyright 2025, <a href="https://www.apache.org/">The Apache Software Foundation</a>, Licensed under the <a href="https://www.apache.org/licenses/LICENSE-2.0">Apache License, Version 2.0</a>.<br/> Apache® and the Apache feather logo are trademarks of The Apache Software Foundation. </p> </div> diff --git a/blog/2024/09/13/string-view-german-style-strings-part-2/index.html b/blog/2024/09/13/string-view-german-style-strings-part-2/index.html index cd6ff12..f0e3f0c 100644 --- a/blog/2024/09/13/string-view-german-style-strings-part-2/index.html +++ b/blog/2024/09/13/string-view-german-style-strings-part-2/index.html @@ -138,7 +138,7 @@ In certain cases, we found that multiple calls to <a href="https://docs.rs/arrow <div class="row"> <div class="large-12 medium-12 columns"> <p style="font-style: italic; font-size: 0.8rem; text-align: center;"> - Copyright 2024, <a href="https://www.apache.org/">The Apache Software Foundation</a>, Licensed under the <a href="https://www.apache.org/licenses/LICENSE-2.0">Apache License, Version 2.0</a>.<br/> + Copyright 2025, <a href="https://www.apache.org/">The Apache Software Foundation</a>, Licensed under the <a href="https://www.apache.org/licenses/LICENSE-2.0">Apache License, Version 2.0</a>.<br/> Apache® and the Apache feather logo are trademarks of The Apache Software Foundation. </p> </div> diff --git a/blog/2024/09/27/datafusion-comet-0.3.0/index.html b/blog/2024/09/27/datafusion-comet-0.3.0/index.html index 313e622..66c15e6 100644 --- a/blog/2024/09/27/datafusion-comet-0.3.0/index.html +++ b/blog/2024/09/27/datafusion-comet-0.3.0/index.html @@ -114,7 +114,7 @@ Comet.</p> <div class="row"> <div class="large-12 medium-12 columns"> <p style="font-style: italic; font-size: 0.8rem; text-align: center;"> - Copyright 2024, <a href="https://www.apache.org/">The Apache Software Foundation</a>, Licensed under the <a href="https://www.apache.org/licenses/LICENSE-2.0">Apache License, Version 2.0</a>.<br/> + Copyright 2025, <a href="https://www.apache.org/">The Apache Software Foundation</a>, Licensed under the <a href="https://www.apache.org/licenses/LICENSE-2.0">Apache License, Version 2.0</a>.<br/> Apache® and the Apache feather logo are trademarks of The Apache Software Foundation. </p> </div> diff --git a/blog/2024/11/18/datafusion-fastest-single-node-parquet-clickbench/index.html b/blog/2024/11/18/datafusion-fastest-single-node-parquet-clickbench/index.html index d87cd5e..f27c73f 100644 --- a/blog/2024/11/18/datafusion-fastest-single-node-parquet-clickbench/index.html +++ b/blog/2024/11/18/datafusion-fastest-single-node-parquet-clickbench/index.html @@ -253,7 +253,7 @@ online</a>.</p> <div class="row"> <div class="large-12 medium-12 columns"> <p style="font-style: italic; font-size: 0.8rem; text-align: center;"> - Copyright 2024, <a href="https://www.apache.org/">The Apache Software Foundation</a>, Licensed under the <a href="https://www.apache.org/licenses/LICENSE-2.0">Apache License, Version 2.0</a>.<br/> + Copyright 2025, <a href="https://www.apache.org/">The Apache Software Foundation</a>, Licensed under the <a href="https://www.apache.org/licenses/LICENSE-2.0">Apache License, Version 2.0</a>.<br/> Apache® and the Apache feather logo are trademarks of The Apache Software Foundation. </p> </div> diff --git a/blog/2024/11/19/datafusion-python-udf-comparisons/index.html b/blog/2024/11/19/datafusion-python-udf-comparisons/index.html index 4ad6c1f..9cac39c 100644 --- a/blog/2024/11/19/datafusion-python-udf-comparisons/index.html +++ b/blog/2024/11/19/datafusion-python-udf-comparisons/index.html @@ -591,7 +591,7 @@ to make a great tool. If you want to get involved, please take a look at the <div class="row"> <div class="large-12 medium-12 columns"> <p style="font-style: italic; font-size: 0.8rem; text-align: center;"> - Copyright 2024, <a href="https://www.apache.org/">The Apache Software Foundation</a>, Licensed under the <a href="https://www.apache.org/licenses/LICENSE-2.0">Apache License, Version 2.0</a>.<br/> + Copyright 2025, <a href="https://www.apache.org/">The Apache Software Foundation</a>, Licensed under the <a href="https://www.apache.org/licenses/LICENSE-2.0">Apache License, Version 2.0</a>.<br/> Apache® and the Apache feather logo are trademarks of The Apache Software Foundation. </p> </div> diff --git a/blog/2024/11/20/datafusion-comet-0.4.0/index.html b/blog/2024/11/20/datafusion-comet-0.4.0/index.html index 807d16b..4d7965e 100644 --- a/blog/2024/11/20/datafusion-comet-0.4.0/index.html +++ b/blog/2024/11/20/datafusion-comet-0.4.0/index.html @@ -128,7 +128,7 @@ Comet.</p> <div class="row"> <div class="large-12 medium-12 columns"> <p style="font-style: italic; font-size: 0.8rem; text-align: center;"> - Copyright 2024, <a href="https://www.apache.org/">The Apache Software Foundation</a>, Licensed under the <a href="https://www.apache.org/licenses/LICENSE-2.0">Apache License, Version 2.0</a>.<br/> + Copyright 2025, <a href="https://www.apache.org/">The Apache Software Foundation</a>, Licensed under the <a href="https://www.apache.org/licenses/LICENSE-2.0">Apache License, Version 2.0</a>.<br/> Apache® and the Apache feather logo are trademarks of The Apache Software Foundation. </p> </div> diff --git a/blog/2024/12/14/datafusion-python-43.1.0/index.html b/blog/2024/12/14/datafusion-python-43.1.0/index.html index 5f77796..240870d 100644 --- a/blog/2024/12/14/datafusion-python-43.1.0/index.html +++ b/blog/2024/12/14/datafusion-python-43.1.0/index.html @@ -181,7 +181,7 @@ page.</p> <div class="row"> <div class="large-12 medium-12 columns"> <p style="font-style: italic; font-size: 0.8rem; text-align: center;"> - Copyright 2024, <a href="https://www.apache.org/">The Apache Software Foundation</a>, Licensed under the <a href="https://www.apache.org/licenses/LICENSE-2.0">Apache License, Version 2.0</a>.<br/> + Copyright 2025, <a href="https://www.apache.org/">The Apache Software Foundation</a>, Licensed under the <a href="https://www.apache.org/licenses/LICENSE-2.0">Apache License, Version 2.0</a>.<br/> Apache® and the Apache feather logo are trademarks of The Apache Software Foundation. </p> </div> diff --git a/blog/2024/11/20/datafusion-comet-0.4.0/index.html b/blog/2025/01/17/datafusion-comet-0.5.0/index.html similarity index 52% copy from blog/2024/11/20/datafusion-comet-0.4.0/index.html copy to blog/2025/01/17/datafusion-comet-0.5.0/index.html index 807d16b..759d1f2 100644 --- a/blog/2024/11/20/datafusion-comet-0.4.0/index.html +++ b/blog/2025/01/17/datafusion-comet-0.5.0/index.html @@ -4,7 +4,7 @@ <meta charset="utf-8"> <meta http-equiv="x-ua-compatible" content="ie=edge"> <meta name="viewport" content="width=device-width, initial-scale=1.0"> - <title>Apache DataFusion Comet 0.4.0 Release - Apache DataFusion Blog</title> + <title>Apache DataFusion Comet 0.5.0 Release - Apache DataFusion Blog</title> <link href="/blog/css/bootstrap.min.css" rel="stylesheet"> <link href="/blog/css/fontawesome.all.min.css" rel="stylesheet"> <link href="/blog/css/headerlink.css" rel="stylesheet"> @@ -40,9 +40,9 @@ <div class="bg-white p-5 rounded"> <div class="col-sm-8 mx-auto"> <h1> - Apache DataFusion Comet 0.4.0 Release + Apache DataFusion Comet 0.5.0 Release </h1> - <p>Posted on: Wed 20 November 2024 by pmc</p> + <p>Posted on: Fri 17 January 2025 by pmc</p> <!-- {% comment %} Licensed to the Apache Software Foundation (ASF) under one or more @@ -61,59 +61,54 @@ See the License for the specific language governing permissions and limitations under the License. {% endcomment %} --> -<p>The Apache DataFusion PMC is pleased to announce version 0.4.0 of the <a href="https://datafusion.apache.org/comet/">Comet</a> subproject.</p> +<p>The Apache DataFusion PMC is pleased to announce version 0.5.0 of the <a href="https://datafusion.apache.org/comet/">Comet</a> subproject.</p> <p>Comet is an accelerator for Apache Spark that translates Spark physical plans to DataFusion physical plans for improved performance and efficiency without requiring any code changes.</p> <p>Comet runs on commodity hardware and aims to provide 100% compatibility with Apache Spark. Any operators or expressions that are not fully compatible will fall back to Spark unless explicitly enabled by the user. Refer to the <a href="https://datafusion.apache.org/comet/user-guide/compatibility.html">compatibility guide</a> for more information.</p> -<p>This release covers approximately six weeks of development work and is the result of merging 51 PRs from 10 -contributors. See the <a href="https://github.com/apache/datafusion-comet/blob/main/dev/changelog/0.4.0.md">change log</a> for more information.</p> +<p>This release covers approximately 8 weeks of development work and is the result of merging 69 PRs from 15 +contributors. See the <a href="https://github.com/apache/datafusion-comet/blob/main/dev/changelog/0.5.0.md">change log</a> for more information.</p> <h2>Release Highlights</h2> -<h3>Performance & Stability</h3> -<p>There are a number of performance and stability improvements in this release. Here is a summary of some of the -larger changes. Current benchmarking results can be found in the <a href="https://datafusion.apache.org/comet/contributor-guide/benchmarking.html">Comet Benchmarking Guide</a>.</p> -<h4>Unified Memory Management</h4> -<p>Comet now uses a unified memory management approach that shares an off-heap memory pool with Apache Spark, resulting -in a much simpler configuration. Comet now requires <code>spark.memory.offHeap.enabled=true</code>. This approach provides a -holistic view of memory usage in Spark and Comet and makes it easier to optimize system performance.</p> -<h4>Faster Joins</h4> -<p>Apache Spark supports sort-merge and hash joins, which have similar performance characteristics. Spark defaults to -using sort-merge joins because they are less likely to result in OutOfMemory exceptions. In vectorized query -engines such as DataFusion, hash joins outperform sort-merge joins. Comet now has an experimental feature to -replace Spark sort-merge joins with hash joins for improved performance. This feature is experimental because -there is currently no spill-to-disk support in the hash join implementation. This feature can be enabled by -setting <code>spark.comet.exec.replaceSortMergeJoin=true</code>.</p> -<h4>Bloom Filter Aggregates</h4> -<p>Spark’s optimizer can insert Bloom filter aggregations and filters to prune large result sets before a shuffle. However, -Comet would fall back to Spark for the aggregation. Comet now has native support for Bloom filter aggregations -after previously supporting Bloom filter testing. Users no longer need to set -<code>spark.sql.optimizer.runtime.bloomFilter.enabled=false</code> when using Comet.</p> -<h4>Complex Type support</h4> -<p>This release has the following improvements to complex type support:</p> -<ul> -<li>Implemented <code>ArrayAppend</code> and <code>GetArrayStructFields</code>.</li> -<li>Implemented native cast between structs</li> -<li>Implemented native cast from structs to string</li> -</ul> -<h2>Roadmap</h2> -<p>One of the highest priority items on the roadmap is to add support for reading complex types (maps, structs, and arrays) -from Parquet sources, both when reading Parquet directly and from Iceberg.</p> -<p>Comet currently has proprietary native code for decoding Parquet pages, native column readers for all of Spark’s -primitive types, and special handling for Spark-specific use cases such as timestamp rebasing and decimal type -promotion. This implementation does not yet support complex types. File IO, decryption, and decompression are handled -in JVM code, and Parquet pages are passed on to native code for decoding.</p> -<p>Rather than add complex type support to this existing code, we are exploring two main options to allow us to -leverage more of the upstream Arrow and DataFusion code.</p> -<h3>Use DataFusion’s ParquetExec</h3> -<p>For use cases where DataFusion can support reading a Parquet source, Comet could create a native plan that uses -DataFusion’s ParquetExec. We are investigating using DataFusion’s SchemaAdapter to handle some Spark-specific -handling of timestamps and decimals.</p> -<h3>Use Arrow’s Parquet Batch Reader</h3> -<p>For use cases not supported by DataFusion’s ParquetExec, such as integrating with Iceberg, we are exploring -replacing our current native Parquet decoding logic with the Arrow readers provided by the Parquet crate.</p> -<p>Iceberg already provides a vectorized Spark reader for Parquet. A <a href="https://github.com/apache/iceberg/pull/9841">PR</a> is open against Iceberg for adding a native -version based on Comet, and we hope to update this to leverage the improvements outlined above.</p> +<h3>Performance</h3> +<p>Comet 0.5.0 achieves a 1.9x speedup for single-node TPC-H @ 100 GB, an improvement from 1.7x in the previous release.</p> +<p><img alt="tpch-summary" src="https://datafusion.apache.org/comet/_images/tpch_allqueries.png"/></p> +<p><img alt="tpch-queries" src="https://datafusion.apache.org/comet/_images/tpch_queries_compare.png"/></p> +<p>More benchmarking results can be found in the <a href="https://datafusion.apache.org/comet/contributor-guide/benchmarking.html">Comet Benchmarking Guide</a>.</p> +<h3>Shuffle Improvements</h3> +<p>Comet now supports multiple compression algorithms for compressing shuffle files. Previously, only ZSTD was supported +but Comet now also supports LZ4 and Snappy. The default is now LZ4, which matches the default in Spark. ZSTD may be +a better choice when the compression ratio is more important than CPU overhead.</p> +<p>Previously, Comet used Arrow IPC to encode record batches into shuffle files. Although Arrow IPC is a good +general-purpose framework for serializing Arrow record batches, we found that we could get better performance using +a custom serialization approach optimized for Comet. One optimization is that the schema is encoded once per shuffle +operation rather than once per batch. There are some planned performance improvements in the Rust implementation of +Arrow IPC and Comet may switch back to Arrow IPC in the future.</p> +<p>Comet provides two shuffle implementations. Comet native shuffle is the fastest and performs repartitioning in +native code. Comet columnar shuffle delegates to Spark to perform repartitioning and is used in cases where native +shuffle is not supported, such as with <code>RangePartitioning</code>. Comet generally tries to use native shuffle first, then +columnar shuffle, and finally falls back to Spark if neither is supported. There was a bug in previous releases +where Comet would sometimes fall back to Spark shuffle if native shuffle was not supported and missed opportunities +to use columnar shuffle. This bug was fixed in this release but currently requires the configuration setting +<code>spark.comet.exec.shuffle.fallbackToColumnar=true</code>. This will be enabled by default in the next release.</p> +<h3>Memory Management</h3> +<p>Comet 0.4.0 required Spark to be configured to use off-heap memory. In this release it is no longer required and +there are multiple options for configuring Comet to use on-heap memory instead. More details are available in the +<a href="https://datafusion.apache.org/comet/user-guide/tuning.html">Comet Tuning Guide</a>.</p> +<h3>Spark SQL Metrics</h3> +<p>Comet now provides detailed metrics for native shuffle, showing time for repartitioning, encoding and compressing, +and writing to disk.</p> +<h3>Crate Reorganization</h3> +<p>One of the goals of the Comet project is to make Spark-compatible functionality available to other projects that +are based on DataFusion. In this release, many implementations of Spark-compatible expressions were moved from the +unpublished <code>datafusion-comet</code> crate, which provides the native part of the Spark plugin, into the +<code>datafusion-comet-spark-expr</code> crate. There is also ongoing work to reorganize this crate to move expressions into +subfolders named after the group name that Spark uses to organize expressions. For example, there are now subfolders +named <code>agg_funcs</code>, <code>datetime_funcs</code>, <code>hash_funcs</code>, and so on.</p> +<h2>Update on Complex Type Support</h2> +<p>Good progress has been made with proof-of-concept work using DataFusion’s <code>ParquetExec</code>, which has the advantage of +supporting complex types. This work is available on the <code>comet-parquet-exec</code> branch, and the current focus is on +fixing test regressions, particularly regarding timestamp conversion issues.</p> <h2>Getting Involved</h2> <p>The Comet project welcomes new contributors. We use the same <a href="https://datafusion.apache.org/contributor-guide/communication.html#slack-and-discord">Slack and Discord</a> channels as the main DataFusion project and have a weekly <a href="https://docs.google.com/document/d/1NBpkIAuU7O9h8Br5CbFksDhX-L9TyO9wmGLPMe0Plc8/edit?usp=sharing">DataFusion video call</a>.</p> @@ -128,7 +123,7 @@ Comet.</p> <div class="row"> <div class="large-12 medium-12 columns"> <p style="font-style: italic; font-size: 0.8rem; text-align: center;"> - Copyright 2024, <a href="https://www.apache.org/">The Apache Software Foundation</a>, Licensed under the <a href="https://www.apache.org/licenses/LICENSE-2.0">Apache License, Version 2.0</a>.<br/> + Copyright 2025, <a href="https://www.apache.org/">The Apache Software Foundation</a>, Licensed under the <a href="https://www.apache.org/licenses/LICENSE-2.0">Apache License, Version 2.0</a>.<br/> Apache® and the Apache feather logo are trademarks of The Apache Software Foundation. </p> </div> diff --git a/blog/about.html b/blog/about.html index aa3c0d3..df473ec 100644 --- a/blog/about.html +++ b/blog/about.html @@ -53,7 +53,7 @@ <div class="row"> <div class="large-12 medium-12 columns"> <p style="font-style: italic; font-size: 0.8rem; text-align: center;"> - Copyright 2024, <a href="https://www.apache.org/">The Apache Software Foundation</a>, Licensed under the <a href="https://www.apache.org/licenses/LICENSE-2.0">Apache License, Version 2.0</a>.<br/> + Copyright 2025, <a href="https://www.apache.org/">The Apache Software Foundation</a>, Licensed under the <a href="https://www.apache.org/licenses/LICENSE-2.0">Apache License, Version 2.0</a>.<br/> Apache® and the Apache feather logo are trademarks of The Apache Software Foundation. </p> </div> diff --git a/blog/author/agrove.html b/blog/author/agrove.html index ab8e0f0..6f99fc5 100644 --- a/blog/author/agrove.html +++ b/blog/author/agrove.html @@ -136,7 +136,7 @@ limitations under the License. <div class="row"> <div class="large-12 medium-12 columns"> <p style="font-style: italic; font-size: 0.8rem; text-align: center;"> - Copyright 2024, <a href="https://www.apache.org/">The Apache Software Foundation</a>, Licensed under the <a href="https://www.apache.org/licenses/LICENSE-2.0">Apache License, Version 2.0</a>.<br/> + Copyright 2025, <a href="https://www.apache.org/">The Apache Software Foundation</a>, Licensed under the <a href="https://www.apache.org/licenses/LICENSE-2.0">Apache License, Version 2.0</a>.<br/> Apache® and the Apache feather logo are trademarks of The Apache Software Foundation. </p> </div> diff --git a/blog/author/alamb-dandandan-tustvold.html b/blog/author/alamb-dandandan-tustvold.html index 253ab13..a46055b 100644 --- a/blog/author/alamb-dandandan-tustvold.html +++ b/blog/author/alamb-dandandan-tustvold.html @@ -100,7 +100,7 @@ limitations under the License. <div class="row"> <div class="large-12 medium-12 columns"> <p style="font-style: italic; font-size: 0.8rem; text-align: center;"> - Copyright 2024, <a href="https://www.apache.org/">The Apache Software Foundation</a>, Licensed under the <a href="https://www.apache.org/licenses/LICENSE-2.0">Apache License, Version 2.0</a>.<br/> + Copyright 2025, <a href="https://www.apache.org/">The Apache Software Foundation</a>, Licensed under the <a href="https://www.apache.org/licenses/LICENSE-2.0">Apache License, Version 2.0</a>.<br/> Apache® and the Apache feather logo are trademarks of The Apache Software Foundation. </p> </div> diff --git a/blog/author/andrew-lamb-staff-engineer-at-influxdata.html b/blog/author/andrew-lamb-staff-engineer-at-influxdata.html index d971f78..5a149a2 100644 --- a/blog/author/andrew-lamb-staff-engineer-at-influxdata.html +++ b/blog/author/andrew-lamb-staff-engineer-at-influxdata.html @@ -99,7 +99,7 @@ been …</p></p> <div class="row"> <div class="large-12 medium-12 columns"> <p style="font-style: italic; font-size: 0.8rem; text-align: center;"> - Copyright 2024, <a href="https://www.apache.org/">The Apache Software Foundation</a>, Licensed under the <a href="https://www.apache.org/licenses/LICENSE-2.0">Apache License, Version 2.0</a>.<br/> + Copyright 2025, <a href="https://www.apache.org/">The Apache Software Foundation</a>, Licensed under the <a href="https://www.apache.org/licenses/LICENSE-2.0">Apache License, Version 2.0</a>.<br/> Apache® and the Apache feather logo are trademarks of The Apache Software Foundation. </p> </div> diff --git a/blog/author/pmc.html b/blog/author/pmc.html index 5282a8c..35da552 100644 --- a/blog/author/pmc.html +++ b/blog/author/pmc.html @@ -47,6 +47,46 @@ <p><i>Here you can find the latest updates from DataFusion and related projects.</i></p> + <!-- Post --> + <div class="row"> + <div class="callout"> + <article class="post"> + <header> + <div class="title"> + <h1><a href="/blog/2025/01/17/datafusion-comet-0.5.0">Apache DataFusion Comet 0.5.0 Release</a></h1> + <p>Posted on: Fri 17 January 2025 by pmc</p> + <p><!-- +{% comment %} +Licensed to the Apache Software Foundation (ASF) under one or more +contributor license agreements. See the NOTICE file distributed with +this work for additional information regarding copyright ownership. +The ASF licenses this file to you under the Apache License, Version 2.0 +(the "License"); you may not use this file except in compliance with +the License. You may obtain a copy of the License at + +http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, software +distributed under the License is distributed on an "AS IS" BASIS, +WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +See the License for the specific language governing permissions and +limitations under the License. +{% endcomment %} +--> +<p>The Apache DataFusion PMC is pleased to announce version 0.5.0 of the <a href="https://datafusion.apache.org/comet/">Comet</a> subproject.</p> +<p>Comet is an accelerator for Apache Spark that translates Spark physical plans to DataFusion physical plans for +improved performance and efficiency without requiring any code changes.</p> +<p>Comet runs on commodity hardware and aims to …</p></p> + <footer> + <ul class="actions"> + <div style="text-align: right"><a href="/blog/2025/01/17/datafusion-comet-0.5.0" class="button medium">Continue Reading</a></div> + </ul> + <ul class="stats"> + </ul> + </footer> + </article> + </div> + </div> <!-- Post --> <div class="row"> <div class="callout"> @@ -799,7 +839,7 @@ and includes 211 commits from the following 31 distinct contributors.</p> <div class="row"> <div class="large-12 medium-12 columns"> <p style="font-style: italic; font-size: 0.8rem; text-align: center;"> - Copyright 2024, <a href="https://www.apache.org/">The Apache Software Foundation</a>, Licensed under the <a href="https://www.apache.org/licenses/LICENSE-2.0">Apache License, Version 2.0</a>.<br/> + Copyright 2025, <a href="https://www.apache.org/">The Apache Software Foundation</a>, Licensed under the <a href="https://www.apache.org/licenses/LICENSE-2.0">Apache License, Version 2.0</a>.<br/> Apache® and the Apache feather logo are trademarks of The Apache Software Foundation. </p> </div> diff --git a/blog/author/timsaucer.html b/blog/author/timsaucer.html index 1940550..1a905f0 100644 --- a/blog/author/timsaucer.html +++ b/blog/author/timsaucer.html @@ -179,7 +179,7 @@ contains <em>significant</em> updates to the user interface and documentation. W <div class="row"> <div class="large-12 medium-12 columns"> <p style="font-style: italic; font-size: 0.8rem; text-align: center;"> - Copyright 2024, <a href="https://www.apache.org/">The Apache Software Foundation</a>, Licensed under the <a href="https://www.apache.org/licenses/LICENSE-2.0">Apache License, Version 2.0</a>.<br/> + Copyright 2025, <a href="https://www.apache.org/">The Apache Software Foundation</a>, Licensed under the <a href="https://www.apache.org/licenses/LICENSE-2.0">Apache License, Version 2.0</a>.<br/> Apache® and the Apache feather logo are trademarks of The Apache Software Foundation. </p> </div> diff --git a/blog/author/xiangpeng-hao-andrew-lamb.html b/blog/author/xiangpeng-hao-andrew-lamb.html index 7ebe093..93a18a2 100644 --- a/blog/author/xiangpeng-hao-andrew-lamb.html +++ b/blog/author/xiangpeng-hao-andrew-lamb.html @@ -135,7 +135,7 @@ In this second …</p></p> <div class="row"> <div class="large-12 medium-12 columns"> <p style="font-style: italic; font-size: 0.8rem; text-align: center;"> - Copyright 2024, <a href="https://www.apache.org/">The Apache Software Foundation</a>, Licensed under the <a href="https://www.apache.org/licenses/LICENSE-2.0">Apache License, Version 2.0</a>.<br/> + Copyright 2025, <a href="https://www.apache.org/">The Apache Software Foundation</a>, Licensed under the <a href="https://www.apache.org/licenses/LICENSE-2.0">Apache License, Version 2.0</a>.<br/> Apache® and the Apache feather logo are trademarks of The Apache Software Foundation. </p> </div> diff --git a/blog/category/blog.html b/blog/category/blog.html index 91a8264..33084fb 100644 --- a/blog/category/blog.html +++ b/blog/category/blog.html @@ -47,6 +47,46 @@ <p><i>Here you can find the latest updates from DataFusion and related projects.</i></p> + <!-- Post --> + <div class="row"> + <div class="callout"> + <article class="post"> + <header> + <div class="title"> + <h1><a href="/blog/2025/01/17/datafusion-comet-0.5.0">Apache DataFusion Comet 0.5.0 Release</a></h1> + <p>Posted on: Fri 17 January 2025 by pmc</p> + <p><!-- +{% comment %} +Licensed to the Apache Software Foundation (ASF) under one or more +contributor license agreements. See the NOTICE file distributed with +this work for additional information regarding copyright ownership. +The ASF licenses this file to you under the Apache License, Version 2.0 +(the "License"); you may not use this file except in compliance with +the License. You may obtain a copy of the License at + +http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, software +distributed under the License is distributed on an "AS IS" BASIS, +WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +See the License for the specific language governing permissions and +limitations under the License. +{% endcomment %} +--> +<p>The Apache DataFusion PMC is pleased to announce version 0.5.0 of the <a href="https://datafusion.apache.org/comet/">Comet</a> subproject.</p> +<p>Comet is an accelerator for Apache Spark that translates Spark physical plans to DataFusion physical plans for +improved performance and efficiency without requiring any code changes.</p> +<p>Comet runs on commodity hardware and aims to …</p></p> + <footer> + <ul class="actions"> + <div style="text-align: right"><a href="/blog/2025/01/17/datafusion-comet-0.5.0" class="button medium">Continue Reading</a></div> + </ul> + <ul class="stats"> + </ul> + </footer> + </article> + </div> + </div> <!-- Post --> <div class="row"> <div class="callout"> @@ -1158,7 +1198,7 @@ limitations under the License. <div class="row"> <div class="large-12 medium-12 columns"> <p style="font-style: italic; font-size: 0.8rem; text-align: center;"> - Copyright 2024, <a href="https://www.apache.org/">The Apache Software Foundation</a>, Licensed under the <a href="https://www.apache.org/licenses/LICENSE-2.0">Apache License, Version 2.0</a>.<br/> + Copyright 2025, <a href="https://www.apache.org/">The Apache Software Foundation</a>, Licensed under the <a href="https://www.apache.org/licenses/LICENSE-2.0">Apache License, Version 2.0</a>.<br/> Apache® and the Apache feather logo are trademarks of The Apache Software Foundation. </p> </div> diff --git a/blog/feed.xml b/blog/feed.xml index 58c295d..097ceff 100644 --- a/blog/feed.xml +++ b/blog/feed.xml @@ -1,5 +1,26 @@ <?xml version="1.0" encoding="utf-8"?> -<rss version="2.0"><channel><title>Apache DataFusion Blog</title><link>https://datafusion.apache.org/blog/</link><description></description><lastBuildDate>Sat, 14 Dec 2024 00:00:00 +0000</lastBuildDate><item><title>Apache DataFusion Python 43.1.0 Released</title><link>https://datafusion.apache.org/blog/2024/12/14/datafusion-python-43.1.0</link><description><!-- +<rss version="2.0"><channel><title>Apache DataFusion Blog</title><link>https://datafusion.apache.org/blog/</link><description></description><lastBuildDate>Fri, 17 Jan 2025 00:00:00 +0000</lastBuildDate><item><title>Apache DataFusion Comet 0.5.0 Release</title><link>https://datafusion.apache.org/blog/2025/01/17/datafusion-comet-0.5.0</link><description><!-- +{% comment %} +Licensed to the Apache Software Foundation (ASF) under one or more +contributor license agreements. See the NOTICE file distributed with +this work for additional information regarding copyright ownership. +The ASF licenses this file to you under the Apache License, Version 2.0 +(the "License"); you may not use this file except in compliance with +the License. You may obtain a copy of the License at + +http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, software +distributed under the License is distributed on an "AS IS" BASIS, +WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +See the License for the specific language governing permissions and +limitations under the License. +{% endcomment %} +--> +<p>The Apache DataFusion PMC is pleased to announce version 0.5.0 of the <a href="https://datafusion.apache.org/comet/">Comet</a> subproject.</p> +<p>Comet is an accelerator for Apache Spark that translates Spark physical plans to DataFusion physical plans for +improved performance and efficiency without requiring any code changes.</p> +<p>Comet runs on commodity hardware and aims to …</p></description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">pmc</dc:creator><pubDate>Fri, 17 Jan 2025 00:00:00 +0000</pubDate><guid isPermaLink="false">tag:datafusion.apache.org,2025-01-17:/blog/2025/01/17/datafusion-comet-0.5.0</guid><category>blog</category></item><item><title>Apache DataFusion Python 43.1.0 Released</title><link>https://datafusion.apache.org/blog/2024/12/14/datafusion-python-43.1.0</link><descr [...] {% comment %} Licensed to the Apache Software Foundation (ASF) under one or more contributor license agreements. See the NOTICE file distributed with diff --git a/blog/feeds/all-en.atom.xml b/blog/feeds/all-en.atom.xml index 0fade6f..01b783a 100644 --- a/blog/feeds/all-en.atom.xml +++ b/blog/feeds/all-en.atom.xml @@ -1,5 +1,98 @@ <?xml version="1.0" encoding="utf-8"?> -<feed xmlns="http://www.w3.org/2005/Atom"><title>Apache DataFusion Blog</title><link href="https://datafusion.apache.org/blog/" rel="alternate"></link><link href="https://datafusion.apache.org/blog/feeds/all-en.atom.xml" rel="self"></link><id>https://datafusion.apache.org/blog/</id><updated>2024-12-14T00:00:00+00:00</updated><subtitle></subtitle><entry><title>Apache DataFusion Python 43.1.0 Released</title><link href="https://datafusion.apache.org/blog/2024/12/14/datafusion-python-43.1.0 [...] +<feed xmlns="http://www.w3.org/2005/Atom"><title>Apache DataFusion Blog</title><link href="https://datafusion.apache.org/blog/" rel="alternate"></link><link href="https://datafusion.apache.org/blog/feeds/all-en.atom.xml" rel="self"></link><id>https://datafusion.apache.org/blog/</id><updated>2025-01-17T00:00:00+00:00</updated><subtitle></subtitle><entry><title>Apache DataFusion Comet 0.5.0 Release</title><link href="https://datafusion.apache.org/blog/2025/01/17/datafusion-comet-0.5.0" rel [...] +{% comment %} +Licensed to the Apache Software Foundation (ASF) under one or more +contributor license agreements. See the NOTICE file distributed with +this work for additional information regarding copyright ownership. +The ASF licenses this file to you under the Apache License, Version 2.0 +(the "License"); you may not use this file except in compliance with +the License. You may obtain a copy of the License at + +http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, software +distributed under the License is distributed on an "AS IS" BASIS, +WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +See the License for the specific language governing permissions and +limitations under the License. +{% endcomment %} +--> +<p>The Apache DataFusion PMC is pleased to announce version 0.5.0 of the <a href="https://datafusion.apache.org/comet/">Comet</a> subproject.</p> +<p>Comet is an accelerator for Apache Spark that translates Spark physical plans to DataFusion physical plans for +improved performance and efficiency without requiring any code changes.</p> +<p>Comet runs on commodity hardware and aims to …</p></summary><content type="html"><!-- +{% comment %} +Licensed to the Apache Software Foundation (ASF) under one or more +contributor license agreements. See the NOTICE file distributed with +this work for additional information regarding copyright ownership. +The ASF licenses this file to you under the Apache License, Version 2.0 +(the "License"); you may not use this file except in compliance with +the License. You may obtain a copy of the License at + +http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, software +distributed under the License is distributed on an "AS IS" BASIS, +WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +See the License for the specific language governing permissions and +limitations under the License. +{% endcomment %} +--> +<p>The Apache DataFusion PMC is pleased to announce version 0.5.0 of the <a href="https://datafusion.apache.org/comet/">Comet</a> subproject.</p> +<p>Comet is an accelerator for Apache Spark that translates Spark physical plans to DataFusion physical plans for +improved performance and efficiency without requiring any code changes.</p> +<p>Comet runs on commodity hardware and aims to provide 100% compatibility with Apache Spark. Any operators or +expressions that are not fully compatible will fall back to Spark unless explicitly enabled by the user. Refer +to the <a href="https://datafusion.apache.org/comet/user-guide/compatibility.html">compatibility guide</a> for more information.</p> +<p>This release covers approximately 8 weeks of development work and is the result of merging 69 PRs from 15 +contributors. See the <a href="https://github.com/apache/datafusion-comet/blob/main/dev/changelog/0.5.0.md">change log</a> for more information.</p> +<h2>Release Highlights</h2> +<h3>Performance</h3> +<p>Comet 0.5.0 achieves a 1.9x speedup for single-node TPC-H @ 100 GB, an improvement from 1.7x in the previous release.</p> +<p><img alt="tpch-summary" src="https://datafusion.apache.org/comet/_images/tpch_allqueries.png"/></p> +<p><img alt="tpch-queries" src="https://datafusion.apache.org/comet/_images/tpch_queries_compare.png"/></p> +<p>More benchmarking results can be found in the <a href="https://datafusion.apache.org/comet/contributor-guide/benchmarking.html">Comet Benchmarking Guide</a>.</p> +<h3>Shuffle Improvements</h3> +<p>Comet now supports multiple compression algorithms for compressing shuffle files. Previously, only ZSTD was supported +but Comet now also supports LZ4 and Snappy. The default is now LZ4, which matches the default in Spark. ZSTD may be +a better choice when the compression ratio is more important than CPU overhead.</p> +<p>Previously, Comet used Arrow IPC to encode record batches into shuffle files. Although Arrow IPC is a good +general-purpose framework for serializing Arrow record batches, we found that we could get better performance using +a custom serialization approach optimized for Comet. One optimization is that the schema is encoded once per shuffle +operation rather than once per batch. There are some planned performance improvements in the Rust implementation of +Arrow IPC and Comet may switch back to Arrow IPC in the future.</p> +<p>Comet provides two shuffle implementations. Comet native shuffle is the fastest and performs repartitioning in +native code. Comet columnar shuffle delegates to Spark to perform repartitioning and is used in cases where native +shuffle is not supported, such as with <code>RangePartitioning</code>. Comet generally tries to use native shuffle first, then +columnar shuffle, and finally falls back to Spark if neither is supported. There was a bug in previous releases +where Comet would sometimes fall back to Spark shuffle if native shuffle was not supported and missed opportunities +to use columnar shuffle. This bug was fixed in this release but currently requires the configuration setting +<code>spark.comet.exec.shuffle.fallbackToColumnar=true</code>. This will be enabled by default in the next release.</p> +<h3>Memory Management</h3> +<p>Comet 0.4.0 required Spark to be configured to use off-heap memory. In this release it is no longer required and +there are multiple options for configuring Comet to use on-heap memory instead. More details are available in the +<a href="https://datafusion.apache.org/comet/user-guide/tuning.html">Comet Tuning Guide</a>.</p> +<h3>Spark SQL Metrics</h3> +<p>Comet now provides detailed metrics for native shuffle, showing time for repartitioning, encoding and compressing, +and writing to disk.</p> +<h3>Crate Reorganization</h3> +<p>One of the goals of the Comet project is to make Spark-compatible functionality available to other projects that +are based on DataFusion. In this release, many implementations of Spark-compatible expressions were moved from the +unpublished <code>datafusion-comet</code> crate, which provides the native part of the Spark plugin, into the +<code>datafusion-comet-spark-expr</code> crate. There is also ongoing work to reorganize this crate to move expressions into +subfolders named after the group name that Spark uses to organize expressions. For example, there are now subfolders +named <code>agg_funcs</code>, <code>datetime_funcs</code>, <code>hash_funcs</code>, and so on.</p> +<h2>Update on Complex Type Support</h2> +<p>Good progress has been made with proof-of-concept work using DataFusion&rsquo;s <code>ParquetExec</code>, which has the advantage of +supporting complex types. This work is available on the <code>comet-parquet-exec</code> branch, and the current focus is on +fixing test regressions, particularly regarding timestamp conversion issues.</p> +<h2>Getting Involved</h2> +<p>The Comet project welcomes new contributors. We use the same <a href="https://datafusion.apache.org/contributor-guide/communication.html#slack-and-discord">Slack and Discord</a> channels as the main DataFusion +project and have a weekly <a href="https://docs.google.com/document/d/1NBpkIAuU7O9h8Br5CbFksDhX-L9TyO9wmGLPMe0Plc8/edit?usp=sharing">DataFusion video call</a>.</p> +<p>The easiest way to get involved is to test Comet with your current Spark jobs and file issues for any bugs or +performance regressions that you find. See the <a href="https://datafusion.apache.org/comet/user-guide/installation.html">Getting Started</a> guide for instructions on downloading and installing +Comet.</p> +<p>There are also many <a href="https://github.com/apache/datafusion-comet/contribute">good first issues</a> waiting for contributions.</p></content><category term="blog"></category></entry><entry><title>Apache DataFusion Python 43.1.0 Released</title><link href="https://datafusion.apache.org/blog/2024/12/14/datafusion-python-43.1.0" rel="alternate"></link><published>2024-12-14T00:00:00+00:00</published><updated>2024-12-14T00:00:00+00:00</updated><author><name>tim [...] {% comment %} Licensed to the Apache Software Foundation (ASF) under one or more contributor license agreements. See the NOTICE file distributed with diff --git a/blog/feeds/blog.atom.xml b/blog/feeds/blog.atom.xml index 90b9802..bc813c4 100644 --- a/blog/feeds/blog.atom.xml +++ b/blog/feeds/blog.atom.xml @@ -1,5 +1,98 @@ <?xml version="1.0" encoding="utf-8"?> -<feed xmlns="http://www.w3.org/2005/Atom"><title>Apache DataFusion Blog - blog</title><link href="https://datafusion.apache.org/blog/" rel="alternate"></link><link href="https://datafusion.apache.org/blog/feeds/blog.atom.xml" rel="self"></link><id>https://datafusion.apache.org/blog/</id><updated>2024-12-14T00:00:00+00:00</updated><subtitle></subtitle><entry><title>Apache DataFusion Python 43.1.0 Released</title><link href="https://datafusion.apache.org/blog/2024/12/14/datafusion-python-4 [...] +<feed xmlns="http://www.w3.org/2005/Atom"><title>Apache DataFusion Blog - blog</title><link href="https://datafusion.apache.org/blog/" rel="alternate"></link><link href="https://datafusion.apache.org/blog/feeds/blog.atom.xml" rel="self"></link><id>https://datafusion.apache.org/blog/</id><updated>2025-01-17T00:00:00+00:00</updated><subtitle></subtitle><entry><title>Apache DataFusion Comet 0.5.0 Release</title><link href="https://datafusion.apache.org/blog/2025/01/17/datafusion-comet-0.5.0 [...] +{% comment %} +Licensed to the Apache Software Foundation (ASF) under one or more +contributor license agreements. See the NOTICE file distributed with +this work for additional information regarding copyright ownership. +The ASF licenses this file to you under the Apache License, Version 2.0 +(the "License"); you may not use this file except in compliance with +the License. You may obtain a copy of the License at + +http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, software +distributed under the License is distributed on an "AS IS" BASIS, +WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +See the License for the specific language governing permissions and +limitations under the License. +{% endcomment %} +--> +<p>The Apache DataFusion PMC is pleased to announce version 0.5.0 of the <a href="https://datafusion.apache.org/comet/">Comet</a> subproject.</p> +<p>Comet is an accelerator for Apache Spark that translates Spark physical plans to DataFusion physical plans for +improved performance and efficiency without requiring any code changes.</p> +<p>Comet runs on commodity hardware and aims to …</p></summary><content type="html"><!-- +{% comment %} +Licensed to the Apache Software Foundation (ASF) under one or more +contributor license agreements. See the NOTICE file distributed with +this work for additional information regarding copyright ownership. +The ASF licenses this file to you under the Apache License, Version 2.0 +(the "License"); you may not use this file except in compliance with +the License. You may obtain a copy of the License at + +http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, software +distributed under the License is distributed on an "AS IS" BASIS, +WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +See the License for the specific language governing permissions and +limitations under the License. +{% endcomment %} +--> +<p>The Apache DataFusion PMC is pleased to announce version 0.5.0 of the <a href="https://datafusion.apache.org/comet/">Comet</a> subproject.</p> +<p>Comet is an accelerator for Apache Spark that translates Spark physical plans to DataFusion physical plans for +improved performance and efficiency without requiring any code changes.</p> +<p>Comet runs on commodity hardware and aims to provide 100% compatibility with Apache Spark. Any operators or +expressions that are not fully compatible will fall back to Spark unless explicitly enabled by the user. Refer +to the <a href="https://datafusion.apache.org/comet/user-guide/compatibility.html">compatibility guide</a> for more information.</p> +<p>This release covers approximately 8 weeks of development work and is the result of merging 69 PRs from 15 +contributors. See the <a href="https://github.com/apache/datafusion-comet/blob/main/dev/changelog/0.5.0.md">change log</a> for more information.</p> +<h2>Release Highlights</h2> +<h3>Performance</h3> +<p>Comet 0.5.0 achieves a 1.9x speedup for single-node TPC-H @ 100 GB, an improvement from 1.7x in the previous release.</p> +<p><img alt="tpch-summary" src="https://datafusion.apache.org/comet/_images/tpch_allqueries.png"/></p> +<p><img alt="tpch-queries" src="https://datafusion.apache.org/comet/_images/tpch_queries_compare.png"/></p> +<p>More benchmarking results can be found in the <a href="https://datafusion.apache.org/comet/contributor-guide/benchmarking.html">Comet Benchmarking Guide</a>.</p> +<h3>Shuffle Improvements</h3> +<p>Comet now supports multiple compression algorithms for compressing shuffle files. Previously, only ZSTD was supported +but Comet now also supports LZ4 and Snappy. The default is now LZ4, which matches the default in Spark. ZSTD may be +a better choice when the compression ratio is more important than CPU overhead.</p> +<p>Previously, Comet used Arrow IPC to encode record batches into shuffle files. Although Arrow IPC is a good +general-purpose framework for serializing Arrow record batches, we found that we could get better performance using +a custom serialization approach optimized for Comet. One optimization is that the schema is encoded once per shuffle +operation rather than once per batch. There are some planned performance improvements in the Rust implementation of +Arrow IPC and Comet may switch back to Arrow IPC in the future.</p> +<p>Comet provides two shuffle implementations. Comet native shuffle is the fastest and performs repartitioning in +native code. Comet columnar shuffle delegates to Spark to perform repartitioning and is used in cases where native +shuffle is not supported, such as with <code>RangePartitioning</code>. Comet generally tries to use native shuffle first, then +columnar shuffle, and finally falls back to Spark if neither is supported. There was a bug in previous releases +where Comet would sometimes fall back to Spark shuffle if native shuffle was not supported and missed opportunities +to use columnar shuffle. This bug was fixed in this release but currently requires the configuration setting +<code>spark.comet.exec.shuffle.fallbackToColumnar=true</code>. This will be enabled by default in the next release.</p> +<h3>Memory Management</h3> +<p>Comet 0.4.0 required Spark to be configured to use off-heap memory. In this release it is no longer required and +there are multiple options for configuring Comet to use on-heap memory instead. More details are available in the +<a href="https://datafusion.apache.org/comet/user-guide/tuning.html">Comet Tuning Guide</a>.</p> +<h3>Spark SQL Metrics</h3> +<p>Comet now provides detailed metrics for native shuffle, showing time for repartitioning, encoding and compressing, +and writing to disk.</p> +<h3>Crate Reorganization</h3> +<p>One of the goals of the Comet project is to make Spark-compatible functionality available to other projects that +are based on DataFusion. In this release, many implementations of Spark-compatible expressions were moved from the +unpublished <code>datafusion-comet</code> crate, which provides the native part of the Spark plugin, into the +<code>datafusion-comet-spark-expr</code> crate. There is also ongoing work to reorganize this crate to move expressions into +subfolders named after the group name that Spark uses to organize expressions. For example, there are now subfolders +named <code>agg_funcs</code>, <code>datetime_funcs</code>, <code>hash_funcs</code>, and so on.</p> +<h2>Update on Complex Type Support</h2> +<p>Good progress has been made with proof-of-concept work using DataFusion&rsquo;s <code>ParquetExec</code>, which has the advantage of +supporting complex types. This work is available on the <code>comet-parquet-exec</code> branch, and the current focus is on +fixing test regressions, particularly regarding timestamp conversion issues.</p> +<h2>Getting Involved</h2> +<p>The Comet project welcomes new contributors. We use the same <a href="https://datafusion.apache.org/contributor-guide/communication.html#slack-and-discord">Slack and Discord</a> channels as the main DataFusion +project and have a weekly <a href="https://docs.google.com/document/d/1NBpkIAuU7O9h8Br5CbFksDhX-L9TyO9wmGLPMe0Plc8/edit?usp=sharing">DataFusion video call</a>.</p> +<p>The easiest way to get involved is to test Comet with your current Spark jobs and file issues for any bugs or +performance regressions that you find. See the <a href="https://datafusion.apache.org/comet/user-guide/installation.html">Getting Started</a> guide for instructions on downloading and installing +Comet.</p> +<p>There are also many <a href="https://github.com/apache/datafusion-comet/contribute">good first issues</a> waiting for contributions.</p></content><category term="blog"></category></entry><entry><title>Apache DataFusion Python 43.1.0 Released</title><link href="https://datafusion.apache.org/blog/2024/12/14/datafusion-python-43.1.0" rel="alternate"></link><published>2024-12-14T00:00:00+00:00</published><updated>2024-12-14T00:00:00+00:00</updated><author><name>tim [...] {% comment %} Licensed to the Apache Software Foundation (ASF) under one or more contributor license agreements. See the NOTICE file distributed with diff --git a/blog/feeds/pmc.atom.xml b/blog/feeds/pmc.atom.xml index c2d862f..1ca8f95 100644 --- a/blog/feeds/pmc.atom.xml +++ b/blog/feeds/pmc.atom.xml @@ -1,5 +1,98 @@ <?xml version="1.0" encoding="utf-8"?> -<feed xmlns="http://www.w3.org/2005/Atom"><title>Apache DataFusion Blog - pmc</title><link href="https://datafusion.apache.org/blog/" rel="alternate"></link><link href="https://datafusion.apache.org/blog/feeds/pmc.atom.xml" rel="self"></link><id>https://datafusion.apache.org/blog/</id><updated>2024-11-20T00:00:00+00:00</updated><subtitle></subtitle><entry><title>Apache DataFusion Comet 0.4.0 Release</title><link href="https://datafusion.apache.org/blog/2024/11/20/datafusion-comet-0.4.0" [...] +<feed xmlns="http://www.w3.org/2005/Atom"><title>Apache DataFusion Blog - pmc</title><link href="https://datafusion.apache.org/blog/" rel="alternate"></link><link href="https://datafusion.apache.org/blog/feeds/pmc.atom.xml" rel="self"></link><id>https://datafusion.apache.org/blog/</id><updated>2025-01-17T00:00:00+00:00</updated><subtitle></subtitle><entry><title>Apache DataFusion Comet 0.5.0 Release</title><link href="https://datafusion.apache.org/blog/2025/01/17/datafusion-comet-0.5.0" [...] +{% comment %} +Licensed to the Apache Software Foundation (ASF) under one or more +contributor license agreements. See the NOTICE file distributed with +this work for additional information regarding copyright ownership. +The ASF licenses this file to you under the Apache License, Version 2.0 +(the "License"); you may not use this file except in compliance with +the License. You may obtain a copy of the License at + +http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, software +distributed under the License is distributed on an "AS IS" BASIS, +WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +See the License for the specific language governing permissions and +limitations under the License. +{% endcomment %} +--> +<p>The Apache DataFusion PMC is pleased to announce version 0.5.0 of the <a href="https://datafusion.apache.org/comet/">Comet</a> subproject.</p> +<p>Comet is an accelerator for Apache Spark that translates Spark physical plans to DataFusion physical plans for +improved performance and efficiency without requiring any code changes.</p> +<p>Comet runs on commodity hardware and aims to …</p></summary><content type="html"><!-- +{% comment %} +Licensed to the Apache Software Foundation (ASF) under one or more +contributor license agreements. See the NOTICE file distributed with +this work for additional information regarding copyright ownership. +The ASF licenses this file to you under the Apache License, Version 2.0 +(the "License"); you may not use this file except in compliance with +the License. You may obtain a copy of the License at + +http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, software +distributed under the License is distributed on an "AS IS" BASIS, +WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +See the License for the specific language governing permissions and +limitations under the License. +{% endcomment %} +--> +<p>The Apache DataFusion PMC is pleased to announce version 0.5.0 of the <a href="https://datafusion.apache.org/comet/">Comet</a> subproject.</p> +<p>Comet is an accelerator for Apache Spark that translates Spark physical plans to DataFusion physical plans for +improved performance and efficiency without requiring any code changes.</p> +<p>Comet runs on commodity hardware and aims to provide 100% compatibility with Apache Spark. Any operators or +expressions that are not fully compatible will fall back to Spark unless explicitly enabled by the user. Refer +to the <a href="https://datafusion.apache.org/comet/user-guide/compatibility.html">compatibility guide</a> for more information.</p> +<p>This release covers approximately 8 weeks of development work and is the result of merging 69 PRs from 15 +contributors. See the <a href="https://github.com/apache/datafusion-comet/blob/main/dev/changelog/0.5.0.md">change log</a> for more information.</p> +<h2>Release Highlights</h2> +<h3>Performance</h3> +<p>Comet 0.5.0 achieves a 1.9x speedup for single-node TPC-H @ 100 GB, an improvement from 1.7x in the previous release.</p> +<p><img alt="tpch-summary" src="https://datafusion.apache.org/comet/_images/tpch_allqueries.png"/></p> +<p><img alt="tpch-queries" src="https://datafusion.apache.org/comet/_images/tpch_queries_compare.png"/></p> +<p>More benchmarking results can be found in the <a href="https://datafusion.apache.org/comet/contributor-guide/benchmarking.html">Comet Benchmarking Guide</a>.</p> +<h3>Shuffle Improvements</h3> +<p>Comet now supports multiple compression algorithms for compressing shuffle files. Previously, only ZSTD was supported +but Comet now also supports LZ4 and Snappy. The default is now LZ4, which matches the default in Spark. ZSTD may be +a better choice when the compression ratio is more important than CPU overhead.</p> +<p>Previously, Comet used Arrow IPC to encode record batches into shuffle files. Although Arrow IPC is a good +general-purpose framework for serializing Arrow record batches, we found that we could get better performance using +a custom serialization approach optimized for Comet. One optimization is that the schema is encoded once per shuffle +operation rather than once per batch. There are some planned performance improvements in the Rust implementation of +Arrow IPC and Comet may switch back to Arrow IPC in the future.</p> +<p>Comet provides two shuffle implementations. Comet native shuffle is the fastest and performs repartitioning in +native code. Comet columnar shuffle delegates to Spark to perform repartitioning and is used in cases where native +shuffle is not supported, such as with <code>RangePartitioning</code>. Comet generally tries to use native shuffle first, then +columnar shuffle, and finally falls back to Spark if neither is supported. There was a bug in previous releases +where Comet would sometimes fall back to Spark shuffle if native shuffle was not supported and missed opportunities +to use columnar shuffle. This bug was fixed in this release but currently requires the configuration setting +<code>spark.comet.exec.shuffle.fallbackToColumnar=true</code>. This will be enabled by default in the next release.</p> +<h3>Memory Management</h3> +<p>Comet 0.4.0 required Spark to be configured to use off-heap memory. In this release it is no longer required and +there are multiple options for configuring Comet to use on-heap memory instead. More details are available in the +<a href="https://datafusion.apache.org/comet/user-guide/tuning.html">Comet Tuning Guide</a>.</p> +<h3>Spark SQL Metrics</h3> +<p>Comet now provides detailed metrics for native shuffle, showing time for repartitioning, encoding and compressing, +and writing to disk.</p> +<h3>Crate Reorganization</h3> +<p>One of the goals of the Comet project is to make Spark-compatible functionality available to other projects that +are based on DataFusion. In this release, many implementations of Spark-compatible expressions were moved from the +unpublished <code>datafusion-comet</code> crate, which provides the native part of the Spark plugin, into the +<code>datafusion-comet-spark-expr</code> crate. There is also ongoing work to reorganize this crate to move expressions into +subfolders named after the group name that Spark uses to organize expressions. For example, there are now subfolders +named <code>agg_funcs</code>, <code>datetime_funcs</code>, <code>hash_funcs</code>, and so on.</p> +<h2>Update on Complex Type Support</h2> +<p>Good progress has been made with proof-of-concept work using DataFusion&rsquo;s <code>ParquetExec</code>, which has the advantage of +supporting complex types. This work is available on the <code>comet-parquet-exec</code> branch, and the current focus is on +fixing test regressions, particularly regarding timestamp conversion issues.</p> +<h2>Getting Involved</h2> +<p>The Comet project welcomes new contributors. We use the same <a href="https://datafusion.apache.org/contributor-guide/communication.html#slack-and-discord">Slack and Discord</a> channels as the main DataFusion +project and have a weekly <a href="https://docs.google.com/document/d/1NBpkIAuU7O9h8Br5CbFksDhX-L9TyO9wmGLPMe0Plc8/edit?usp=sharing">DataFusion video call</a>.</p> +<p>The easiest way to get involved is to test Comet with your current Spark jobs and file issues for any bugs or +performance regressions that you find. See the <a href="https://datafusion.apache.org/comet/user-guide/installation.html">Getting Started</a> guide for instructions on downloading and installing +Comet.</p> +<p>There are also many <a href="https://github.com/apache/datafusion-comet/contribute">good first issues</a> waiting for contributions.</p></content><category term="blog"></category></entry><entry><title>Apache DataFusion Comet 0.4.0 Release</title><link href="https://datafusion.apache.org/blog/2024/11/20/datafusion-comet-0.4.0" rel="alternate"></link><published>2024-11-20T00:00:00+00:00</published><updated>2024-11-20T00:00:00+00:00</updated><author><name>pmc</nam [...] {% comment %} Licensed to the Apache Software Foundation (ASF) under one or more contributor license agreements. See the NOTICE file distributed with diff --git a/blog/feeds/pmc.rss.xml b/blog/feeds/pmc.rss.xml index 3c3e4b7..974782d 100644 --- a/blog/feeds/pmc.rss.xml +++ b/blog/feeds/pmc.rss.xml @@ -1,5 +1,26 @@ <?xml version="1.0" encoding="utf-8"?> -<rss version="2.0"><channel><title>Apache DataFusion Blog - pmc</title><link>https://datafusion.apache.org/blog/</link><description></description><lastBuildDate>Wed, 20 Nov 2024 00:00:00 +0000</lastBuildDate><item><title>Apache DataFusion Comet 0.4.0 Release</title><link>https://datafusion.apache.org/blog/2024/11/20/datafusion-comet-0.4.0</link><description><!-- +<rss version="2.0"><channel><title>Apache DataFusion Blog - pmc</title><link>https://datafusion.apache.org/blog/</link><description></description><lastBuildDate>Fri, 17 Jan 2025 00:00:00 +0000</lastBuildDate><item><title>Apache DataFusion Comet 0.5.0 Release</title><link>https://datafusion.apache.org/blog/2025/01/17/datafusion-comet-0.5.0</link><description><!-- +{% comment %} +Licensed to the Apache Software Foundation (ASF) under one or more +contributor license agreements. See the NOTICE file distributed with +this work for additional information regarding copyright ownership. +The ASF licenses this file to you under the Apache License, Version 2.0 +(the "License"); you may not use this file except in compliance with +the License. You may obtain a copy of the License at + +http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, software +distributed under the License is distributed on an "AS IS" BASIS, +WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +See the License for the specific language governing permissions and +limitations under the License. +{% endcomment %} +--> +<p>The Apache DataFusion PMC is pleased to announce version 0.5.0 of the <a href="https://datafusion.apache.org/comet/">Comet</a> subproject.</p> +<p>Comet is an accelerator for Apache Spark that translates Spark physical plans to DataFusion physical plans for +improved performance and efficiency without requiring any code changes.</p> +<p>Comet runs on commodity hardware and aims to …</p></description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">pmc</dc:creator><pubDate>Fri, 17 Jan 2025 00:00:00 +0000</pubDate><guid isPermaLink="false">tag:datafusion.apache.org,2025-01-17:/blog/2025/01/17/datafusion-comet-0.5.0</guid><category>blog</category></item><item><title>Apache DataFusion Comet 0.4.0 Release</title><link>https://datafusion.apache.org/blog/2024/11/20/datafusion-comet-0.4.0</link><descriptio [...] {% comment %} Licensed to the Apache Software Foundation (ASF) under one or more contributor license agreements. See the NOTICE file distributed with diff --git a/blog/index.html b/blog/index.html index d3771d4..edc59b1 100644 --- a/blog/index.html +++ b/blog/index.html @@ -44,6 +44,46 @@ <p><i>Here you can find the latest updates from DataFusion and related projects.</i></p> + <!-- Post --> + <div class="row"> + <div class="callout"> + <article class="post"> + <header> + <div class="title"> + <h1><a href="/blog/2025/01/17/datafusion-comet-0.5.0">Apache DataFusion Comet 0.5.0 Release</a></h1> + <p>Posted on: Fri 17 January 2025 by pmc</p> + <p><!-- +{% comment %} +Licensed to the Apache Software Foundation (ASF) under one or more +contributor license agreements. See the NOTICE file distributed with +this work for additional information regarding copyright ownership. +The ASF licenses this file to you under the Apache License, Version 2.0 +(the "License"); you may not use this file except in compliance with +the License. You may obtain a copy of the License at + +http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, software +distributed under the License is distributed on an "AS IS" BASIS, +WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +See the License for the specific language governing permissions and +limitations under the License. +{% endcomment %} +--> +<p>The Apache DataFusion PMC is pleased to announce version 0.5.0 of the <a href="https://datafusion.apache.org/comet/">Comet</a> subproject.</p> +<p>Comet is an accelerator for Apache Spark that translates Spark physical plans to DataFusion physical plans for +improved performance and efficiency without requiring any code changes.</p> +<p>Comet runs on commodity hardware and aims to …</p></p> + <footer> + <ul class="actions"> + <div style="text-align: right"><a href="/blog/2025/01/17/datafusion-comet-0.5.0" class="button medium">Continue Reading</a></div> + </ul> + <ul class="stats"> + </ul> + </footer> + </article> + </div> + </div> <!-- Post --> <div class="row"> <div class="callout"> @@ -1152,7 +1192,7 @@ limitations under the License. <div class="row"> <div class="large-12 medium-12 columns"> <p style="font-style: italic; font-size: 0.8rem; text-align: center;"> - Copyright 2024, <a href="https://www.apache.org/">The Apache Software Foundation</a>, Licensed under the <a href="https://www.apache.org/licenses/LICENSE-2.0">Apache License, Version 2.0</a>.<br/> + Copyright 2025, <a href="https://www.apache.org/">The Apache Software Foundation</a>, Licensed under the <a href="https://www.apache.org/licenses/LICENSE-2.0">Apache License, Version 2.0</a>.<br/> Apache® and the Apache feather logo are trademarks of The Apache Software Foundation. </p> </div> --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@datafusion.apache.org For additional commands, e-mail: commits-h...@datafusion.apache.org