This is an automated email from the ASF dual-hosted git repository.
github-bot pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/datafusion-site.git
The following commit(s) were added to refs/heads/asf-site by this push:
new 4d98057 Commit build products
4d98057 is described below
commit 4d980575ac2c1e22688d6ab95d1b59a4bb0242d9
Author: Build Pelican (action) <[email protected]>
AuthorDate: Tue May 6 17:22:27 2025 +0000
Commit build products
---
.../2025/05/06/datafusion-comet-0.8.0/index.html | 125 +++++++++++++++++++++
output/author/pmc.html | 40 +++++++
output/category/blog.html | 40 +++++++
output/feed.xml | 23 +++-
output/feeds/all-en.atom.xml | 87 +++++++++++++-
output/feeds/blog.atom.xml | 87 +++++++++++++-
output/feeds/pmc.atom.xml | 87 +++++++++++++-
output/feeds/pmc.rss.xml | 23 +++-
output/index.html | 40 +++++++
9 files changed, 547 insertions(+), 5 deletions(-)
diff --git a/output/2025/05/06/datafusion-comet-0.8.0/index.html
b/output/2025/05/06/datafusion-comet-0.8.0/index.html
new file mode 100644
index 0000000..adf2bc8
--- /dev/null
+++ b/output/2025/05/06/datafusion-comet-0.8.0/index.html
@@ -0,0 +1,125 @@
+<!doctype html>
+<html class="no-js" lang="en" dir="ltr">
+ <head>
+ <meta charset="utf-8">
+ <meta http-equiv="x-ua-compatible" content="ie=edge">
+ <meta name="viewport" content="width=device-width, initial-scale=1.0">
+ <title>Apache DataFusion Comet 0.8.0 Release - Apache DataFusion
Blog</title>
+<link href="/blog/css/bootstrap.min.css" rel="stylesheet">
+<link href="/blog/css/fontawesome.all.min.css" rel="stylesheet">
+<link href="/blog/css/headerlink.css" rel="stylesheet">
+<link href="/blog/highlight/default.min.css" rel="stylesheet">
+<script src="/blog/highlight/highlight.js"></script>
+<script>hljs.highlightAll();</script> </head>
+ <body class="d-flex flex-column h-100">
+ <main class="flex-shrink-0">
+<!-- nav bar -->
+<nav class="navbar navbar-expand-lg navbar-dark bg-dark" aria-label="Fifth
navbar example">
+ <div class="container-fluid">
+ <a class="navbar-brand" href="/blog"><img
src="/blog/images/logo_original4x.png" style="height: 32px;"/> Apache
DataFusion Blog</a>
+ <button class="navbar-toggler" type="button" data-bs-toggle="collapse"
data-bs-target="#navbarADP" aria-controls="navbarADP" aria-expanded="false"
aria-label="Toggle navigation">
+ <span class="navbar-toggler-icon"></span>
+ </button>
+
+ <div class="collapse navbar-collapse" id="navbarADP">
+ <ul class="navbar-nav me-auto mb-2 mb-lg-0">
+ <li class="nav-item">
+ <a class="nav-link" href="/blog/about.html">About</a>
+ </li>
+ <li class="nav-item">
+ <a class="nav-link" href="/blog/feed.xml">RSS</a>
+ </li>
+ </ul>
+ </div>
+ </div>
+</nav>
+
+
+<!-- page contents -->
+<div id="contents">
+ <div class="bg-white p-5 rounded">
+ <div class="col-sm-8 mx-auto">
+ <h1>
+ Apache DataFusion Comet 0.8.0 Release
+ </h1>
+ <p>Posted on: Tue 06 May 2025 by pmc</p>
+ <!--
+{% comment %}
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements. See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to you under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License. You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+{% endcomment %}
+-->
+<p>The Apache DataFusion PMC is pleased to announce version 0.8.0 of the <a
href="https://datafusion.apache.org/comet/">Comet</a> subproject.</p>
+<p>Comet is an accelerator for Apache Spark that translates Spark physical
plans to DataFusion physical plans for
+improved performance and efficiency without requiring any code changes.</p>
+<p>This release covers approximately SIX weeks of development work and is the
result of merging 81 PRs from 11
+contributors. See the <a
href="https://github.com/apache/datafusion-comet/blob/main/dev/changelog/0.8.0.md">change
log</a> for more information.</p>
+<h2>Release Highlights</h2>
+<h3>Performance & Stability</h3>
+<ul>
+<li>Up to 4x speedup in jobs using <code>dropDuplicates</code>, thanks to
optimizations in the <code>first_value</code> and <code>last_value</code>
+ aggregate functions in DataFusion 47.0.0.</li>
+<li>Introduction of a global Tokio runtime, which resolves potential deadlocks
in certain multi-task scenarios.</li>
+</ul>
+<h2>Native Shuffle Improvements</h2>
+<p>Significant enhancements to the native shuffle mechanism include:</p>
+<ul>
+<li>Lower memory usage through using <code>interleave_record_batches</code>
instead of using array builders.</li>
+<li>Support for complex types in shuffle data (note: hash partition
expressions still require primitive types).</li>
+<li>Reclaimable shuffle files, reducing disk pressure.</li>
+<li>Respects <code>spark.local.dir</code> for temporary storage.</li>
+<li>Per-task shuffle metrics are now available, providing better visibility
into execution behavior.</li>
+</ul>
+<h2>Experimental Support for DataFusion’s Parquet Scan</h2>
+<p>It is now possible to configure Comet to use DataFusion’s Parquet
reader instead of Comet’s current Parquet reader. This
+has the advantage of supporting complex types, and also has performance
optimizations that are not present in Comet's
+existing reader.</p>
+<p>This release continues with the ongoing improvements and bug fixes and
supports more use cases, but there are still
+some known issues:</p>
+<ul>
+<li>There are schema coercion bugs for nested types containing INT96 columns,
which can cause incorrect results.</li>
+<li>There are compatibility issues when reading integer values that are larger
than their type annotation, such as the
+ value 1024 being stored in a field annotated as int(8).</li>
+<li>A small number of Spark SQL tests remain unsupported (<a
href="https://github.com/apache/datafusion-comet/issues/1545">#1545</a>).</li>
+</ul>
+<p>To enable DataFusion’s Parquet reader, either set
<code>spark.comet.scan.impl=native_datafusion</code> or set the environment
+variable <code>COMET_PARQUET_SCAN_IMPL=native_datafusion</code>.</p>
+<h2>Updates to Supported Spark Versions</h2>
+<ul>
+<li>Added support for Spark 3.5.5</li>
+<li>Dropped support for Spark 3.3.x</li>
+</ul>
+<h2>Getting Involved</h2>
+<p>The Comet project welcomes new contributors. We use the same <a
href="https://datafusion.apache.org/contributor-guide/communication.html#slack-and-discord">Slack
and Discord</a> channels as the main DataFusion
+project and have a weekly <a
href="https://docs.google.com/document/d/1NBpkIAuU7O9h8Br5CbFksDhX-L9TyO9wmGLPMe0Plc8/edit?usp=sharing">DataFusion
video call</a>.</p>
+<p>The easiest way to get involved is to test Comet with your current Spark
jobs and file issues for any bugs or
+performance regressions that you find. See the <a
href="https://datafusion.apache.org/comet/user-guide/installation.html">Getting
Started</a> guide for instructions on downloading and installing
+Comet.</p>
+<p>There are also many <a
href="https://github.com/apache/datafusion-comet/contribute">good first
issues</a> waiting for contributions.</p>
+ </div>
+ </div>
+ </div>
+ <!-- footer -->
+ <div class="row">
+ <div class="large-12 medium-12 columns">
+ <p style="font-style: italic; font-size: 0.8rem; text-align: center;">
+ Copyright 2025, <a href="https://www.apache.org/">The Apache
Software Foundation</a>, Licensed under the <a
href="https://www.apache.org/licenses/LICENSE-2.0">Apache License, Version
2.0</a>.<br/>
+ Apache® and the Apache feather logo are trademarks of The Apache
Software Foundation.
+ </p>
+ </div>
+ </div>
+ <script src="/blog/js/bootstrap.bundle.min.js"></script> </main>
+ </body>
+</html>
diff --git a/output/author/pmc.html b/output/author/pmc.html
index 9314685..e8c80cb 100644
--- a/output/author/pmc.html
+++ b/output/author/pmc.html
@@ -47,6 +47,46 @@
<p><i>Here you can find the latest updates from DataFusion and
related projects.</i></p>
+ <!-- Post -->
+ <div class="row">
+ <div class="callout">
+ <article class="post">
+ <header>
+ <div class="title">
+ <h1><a
href="/blog/2025/05/06/datafusion-comet-0.8.0">Apache DataFusion Comet 0.8.0
Release</a></h1>
+ <p>Posted on: Tue 06 May 2025 by pmc</p>
+ <p><!--
+{% comment %}
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements. See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to you under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License. You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+{% endcomment %}
+-->
+<p>The Apache DataFusion PMC is pleased to announce version 0.8.0 of the <a
href="https://datafusion.apache.org/comet/">Comet</a> subproject.</p>
+<p>Comet is an accelerator for Apache Spark that translates Spark physical
plans to DataFusion physical plans for
+improved performance and efficiency without requiring any code changes.</p>
+<p>This release covers approximately SIX weeks of development …</p></p>
+ <footer>
+ <ul class="actions">
+ <div style="text-align: right"><a
href="/blog/2025/05/06/datafusion-comet-0.8.0" class="button medium">Continue
Reading</a></div>
+ </ul>
+ <ul class="stats">
+ </ul>
+ </footer>
+ </article>
+ </div>
+ </div>
<!-- Post -->
<div class="row">
<div class="callout">
diff --git a/output/category/blog.html b/output/category/blog.html
index 6fd5396..e52d46d 100644
--- a/output/category/blog.html
+++ b/output/category/blog.html
@@ -47,6 +47,46 @@
<p><i>Here you can find the latest updates from DataFusion and
related projects.</i></p>
+ <!-- Post -->
+ <div class="row">
+ <div class="callout">
+ <article class="post">
+ <header>
+ <div class="title">
+ <h1><a
href="/blog/2025/05/06/datafusion-comet-0.8.0">Apache DataFusion Comet 0.8.0
Release</a></h1>
+ <p>Posted on: Tue 06 May 2025 by pmc</p>
+ <p><!--
+{% comment %}
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements. See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to you under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License. You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+{% endcomment %}
+-->
+<p>The Apache DataFusion PMC is pleased to announce version 0.8.0 of the <a
href="https://datafusion.apache.org/comet/">Comet</a> subproject.</p>
+<p>Comet is an accelerator for Apache Spark that translates Spark physical
plans to DataFusion physical plans for
+improved performance and efficiency without requiring any code changes.</p>
+<p>This release covers approximately SIX weeks of development …</p></p>
+ <footer>
+ <ul class="actions">
+ <div style="text-align: right"><a
href="/blog/2025/05/06/datafusion-comet-0.8.0" class="button medium">Continue
Reading</a></div>
+ </ul>
+ <ul class="stats">
+ </ul>
+ </footer>
+ </article>
+ </div>
+ </div>
<!-- Post -->
<div class="row">
<div class="callout">
diff --git a/output/feed.xml b/output/feed.xml
index 2a7a3e9..339530b 100644
--- a/output/feed.xml
+++ b/output/feed.xml
@@ -1,5 +1,26 @@
<?xml version="1.0" encoding="utf-8"?>
-<rss version="2.0"><channel><title>Apache DataFusion
Blog</title><link>https://datafusion.apache.org/blog/</link><description></description><lastBuildDate>Sat,
19 Apr 2025 00:00:00 +0000</lastBuildDate><item><title>User defined Window
Functions in
DataFusion</title><link>https://datafusion.apache.org/blog/2025/04/19/user-defined-window-functions</link><description><!--
+<rss version="2.0"><channel><title>Apache DataFusion
Blog</title><link>https://datafusion.apache.org/blog/</link><description></description><lastBuildDate>Tue,
06 May 2025 00:00:00 +0000</lastBuildDate><item><title>Apache DataFusion Comet
0.8.0
Release</title><link>https://datafusion.apache.org/blog/2025/05/06/datafusion-comet-0.8.0</link><description><!--
+{% comment %}
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements. See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to you under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License. You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+{% endcomment %}
+-->
+<p>The Apache DataFusion PMC is pleased to announce version 0.8.0 of the
<a href="https://datafusion.apache.org/comet/">Comet</a>
subproject.</p>
+<p>Comet is an accelerator for Apache Spark that translates Spark
physical plans to DataFusion physical plans for
+improved performance and efficiency without requiring any code
changes.</p>
+<p>This release covers approximately SIX weeks of development
…</p></description><dc:creator
xmlns:dc="http://purl.org/dc/elements/1.1/">pmc</dc:creator><pubDate>Tue, 06
May 2025 00:00:00 +0000</pubDate><guid
isPermaLink="false">tag:datafusion.apache.org,2025-05-06:/blog/2025/05/06/datafusion-comet-0.8.0</guid><category>blog</category></item><item><title>User
defined Window Functions in
DataFusion</title><link>https://datafusion.apache.org/blog/2025/04/19/user-defined-window-
[...]
{% comment %}
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
diff --git a/output/feeds/all-en.atom.xml b/output/feeds/all-en.atom.xml
index 74296bc..22ed6b8 100644
--- a/output/feeds/all-en.atom.xml
+++ b/output/feeds/all-en.atom.xml
@@ -1,5 +1,90 @@
<?xml version="1.0" encoding="utf-8"?>
-<feed xmlns="http://www.w3.org/2005/Atom"><title>Apache DataFusion
Blog</title><link href="https://datafusion.apache.org/blog/"
rel="alternate"></link><link
href="https://datafusion.apache.org/blog/feeds/all-en.atom.xml"
rel="self"></link><id>https://datafusion.apache.org/blog/</id><updated>2025-04-19T00:00:00+00:00</updated><subtitle></subtitle><entry><title>User
defined Window Functions in DataFusion</title><link
href="https://datafusion.apache.org/blog/2025/04/19/user-defined-window-f [...]
+<feed xmlns="http://www.w3.org/2005/Atom"><title>Apache DataFusion
Blog</title><link href="https://datafusion.apache.org/blog/"
rel="alternate"></link><link
href="https://datafusion.apache.org/blog/feeds/all-en.atom.xml"
rel="self"></link><id>https://datafusion.apache.org/blog/</id><updated>2025-05-06T00:00:00+00:00</updated><subtitle></subtitle><entry><title>Apache
DataFusion Comet 0.8.0 Release</title><link
href="https://datafusion.apache.org/blog/2025/05/06/datafusion-comet-0.8.0" rel
[...]
+{% comment %}
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements. See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to you under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License. You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+{% endcomment %}
+-->
+<p>The Apache DataFusion PMC is pleased to announce version 0.8.0 of the
<a href="https://datafusion.apache.org/comet/">Comet</a>
subproject.</p>
+<p>Comet is an accelerator for Apache Spark that translates Spark
physical plans to DataFusion physical plans for
+improved performance and efficiency without requiring any code
changes.</p>
+<p>This release covers approximately SIX weeks of development
…</p></summary><content type="html"><!--
+{% comment %}
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements. See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to you under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License. You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+{% endcomment %}
+-->
+<p>The Apache DataFusion PMC is pleased to announce version 0.8.0 of the
<a href="https://datafusion.apache.org/comet/">Comet</a>
subproject.</p>
+<p>Comet is an accelerator for Apache Spark that translates Spark
physical plans to DataFusion physical plans for
+improved performance and efficiency without requiring any code
changes.</p>
+<p>This release covers approximately SIX weeks of development work and
is the result of merging 81 PRs from 11
+contributors. See the <a
href="https://github.com/apache/datafusion-comet/blob/main/dev/changelog/0.8.0.md">change
log</a> for more information.</p>
+<h2>Release Highlights</h2>
+<h3>Performance &amp; Stability</h3>
+<ul>
+<li>Up to 4x speedup in jobs using
<code>dropDuplicates</code>, thanks to optimizations in the
<code>first_value</code> and <code>last_value</code>
+ aggregate functions in DataFusion 47.0.0.</li>
+<li>Introduction of a global Tokio runtime, which resolves potential
deadlocks in certain multi-task scenarios.</li>
+</ul>
+<h2>Native Shuffle Improvements</h2>
+<p>Significant enhancements to the native shuffle mechanism
include:</p>
+<ul>
+<li>Lower memory usage through using
<code>interleave_record_batches</code> instead of using array
builders.</li>
+<li>Support for complex types in shuffle data (note: hash partition
expressions still require primitive types).</li>
+<li>Reclaimable shuffle files, reducing disk pressure.</li>
+<li>Respects <code>spark.local.dir</code> for temporary
storage.</li>
+<li>Per-task shuffle metrics are now available, providing better
visibility into execution behavior.</li>
+</ul>
+<h2>Experimental Support for DataFusion&rsquo;s Parquet
Scan</h2>
+<p>It is now possible to configure Comet to use DataFusion&rsquo;s
Parquet reader instead of Comet&rsquo;s current Parquet reader. This
+has the advantage of supporting complex types, and also has performance
optimizations that are not present in Comet's
+existing reader.</p>
+<p>This release continues with the ongoing improvements and bug fixes
and supports more use cases, but there are still
+some known issues:</p>
+<ul>
+<li>There are schema coercion bugs for nested types containing INT96
columns, which can cause incorrect results.</li>
+<li>There are compatibility issues when reading integer values that are
larger than their type annotation, such as the
+ value 1024 being stored in a field annotated as int(8).</li>
+<li>A small number of Spark SQL tests remain unsupported (<a
href="https://github.com/apache/datafusion-comet/issues/1545">#1545</a>).</li>
+</ul>
+<p>To enable DataFusion&rsquo;s Parquet reader, either set
<code>spark.comet.scan.impl=native_datafusion</code> or set the
environment
+variable
<code>COMET_PARQUET_SCAN_IMPL=native_datafusion</code>.</p>
+<h2>Updates to Supported Spark Versions</h2>
+<ul>
+<li>Added support for Spark 3.5.5</li>
+<li>Dropped support for Spark 3.3.x</li>
+</ul>
+<h2>Getting Involved</h2>
+<p>The Comet project welcomes new contributors. We use the same <a
href="https://datafusion.apache.org/contributor-guide/communication.html#slack-and-discord">Slack
and Discord</a> channels as the main DataFusion
+project and have a weekly <a
href="https://docs.google.com/document/d/1NBpkIAuU7O9h8Br5CbFksDhX-L9TyO9wmGLPMe0Plc8/edit?usp=sharing">DataFusion
video call</a>.</p>
+<p>The easiest way to get involved is to test Comet with your current
Spark jobs and file issues for any bugs or
+performance regressions that you find. See the <a
href="https://datafusion.apache.org/comet/user-guide/installation.html">Getting
Started</a> guide for instructions on downloading and installing
+Comet.</p>
+<p>There are also many <a
href="https://github.com/apache/datafusion-comet/contribute">good first
issues</a> waiting for contributions.</p></content><category
term="blog"></category></entry><entry><title>User defined Window Functions in
DataFusion</title><link
href="https://datafusion.apache.org/blog/2025/04/19/user-defined-window-functions"
rel="alternate"></link><published>2025-04-19T00:00:00+00:00</published><updated>2025-04-19T00:00:00+00:00</updated><author><
[...]
{% comment %}
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
diff --git a/output/feeds/blog.atom.xml b/output/feeds/blog.atom.xml
index 4ac55e3..612af25 100644
--- a/output/feeds/blog.atom.xml
+++ b/output/feeds/blog.atom.xml
@@ -1,5 +1,90 @@
<?xml version="1.0" encoding="utf-8"?>
-<feed xmlns="http://www.w3.org/2005/Atom"><title>Apache DataFusion Blog -
blog</title><link href="https://datafusion.apache.org/blog/"
rel="alternate"></link><link
href="https://datafusion.apache.org/blog/feeds/blog.atom.xml"
rel="self"></link><id>https://datafusion.apache.org/blog/</id><updated>2025-04-19T00:00:00+00:00</updated><subtitle></subtitle><entry><title>User
defined Window Functions in DataFusion</title><link
href="https://datafusion.apache.org/blog/2025/04/19/user-defined-win [...]
+<feed xmlns="http://www.w3.org/2005/Atom"><title>Apache DataFusion Blog -
blog</title><link href="https://datafusion.apache.org/blog/"
rel="alternate"></link><link
href="https://datafusion.apache.org/blog/feeds/blog.atom.xml"
rel="self"></link><id>https://datafusion.apache.org/blog/</id><updated>2025-05-06T00:00:00+00:00</updated><subtitle></subtitle><entry><title>Apache
DataFusion Comet 0.8.0 Release</title><link
href="https://datafusion.apache.org/blog/2025/05/06/datafusion-comet-0.8.0 [...]
+{% comment %}
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements. See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to you under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License. You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+{% endcomment %}
+-->
+<p>The Apache DataFusion PMC is pleased to announce version 0.8.0 of the
<a href="https://datafusion.apache.org/comet/">Comet</a>
subproject.</p>
+<p>Comet is an accelerator for Apache Spark that translates Spark
physical plans to DataFusion physical plans for
+improved performance and efficiency without requiring any code
changes.</p>
+<p>This release covers approximately SIX weeks of development
…</p></summary><content type="html"><!--
+{% comment %}
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements. See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to you under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License. You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+{% endcomment %}
+-->
+<p>The Apache DataFusion PMC is pleased to announce version 0.8.0 of the
<a href="https://datafusion.apache.org/comet/">Comet</a>
subproject.</p>
+<p>Comet is an accelerator for Apache Spark that translates Spark
physical plans to DataFusion physical plans for
+improved performance and efficiency without requiring any code
changes.</p>
+<p>This release covers approximately SIX weeks of development work and
is the result of merging 81 PRs from 11
+contributors. See the <a
href="https://github.com/apache/datafusion-comet/blob/main/dev/changelog/0.8.0.md">change
log</a> for more information.</p>
+<h2>Release Highlights</h2>
+<h3>Performance &amp; Stability</h3>
+<ul>
+<li>Up to 4x speedup in jobs using
<code>dropDuplicates</code>, thanks to optimizations in the
<code>first_value</code> and <code>last_value</code>
+ aggregate functions in DataFusion 47.0.0.</li>
+<li>Introduction of a global Tokio runtime, which resolves potential
deadlocks in certain multi-task scenarios.</li>
+</ul>
+<h2>Native Shuffle Improvements</h2>
+<p>Significant enhancements to the native shuffle mechanism
include:</p>
+<ul>
+<li>Lower memory usage through using
<code>interleave_record_batches</code> instead of using array
builders.</li>
+<li>Support for complex types in shuffle data (note: hash partition
expressions still require primitive types).</li>
+<li>Reclaimable shuffle files, reducing disk pressure.</li>
+<li>Respects <code>spark.local.dir</code> for temporary
storage.</li>
+<li>Per-task shuffle metrics are now available, providing better
visibility into execution behavior.</li>
+</ul>
+<h2>Experimental Support for DataFusion&rsquo;s Parquet
Scan</h2>
+<p>It is now possible to configure Comet to use DataFusion&rsquo;s
Parquet reader instead of Comet&rsquo;s current Parquet reader. This
+has the advantage of supporting complex types, and also has performance
optimizations that are not present in Comet's
+existing reader.</p>
+<p>This release continues with the ongoing improvements and bug fixes
and supports more use cases, but there are still
+some known issues:</p>
+<ul>
+<li>There are schema coercion bugs for nested types containing INT96
columns, which can cause incorrect results.</li>
+<li>There are compatibility issues when reading integer values that are
larger than their type annotation, such as the
+ value 1024 being stored in a field annotated as int(8).</li>
+<li>A small number of Spark SQL tests remain unsupported (<a
href="https://github.com/apache/datafusion-comet/issues/1545">#1545</a>).</li>
+</ul>
+<p>To enable DataFusion&rsquo;s Parquet reader, either set
<code>spark.comet.scan.impl=native_datafusion</code> or set the
environment
+variable
<code>COMET_PARQUET_SCAN_IMPL=native_datafusion</code>.</p>
+<h2>Updates to Supported Spark Versions</h2>
+<ul>
+<li>Added support for Spark 3.5.5</li>
+<li>Dropped support for Spark 3.3.x</li>
+</ul>
+<h2>Getting Involved</h2>
+<p>The Comet project welcomes new contributors. We use the same <a
href="https://datafusion.apache.org/contributor-guide/communication.html#slack-and-discord">Slack
and Discord</a> channels as the main DataFusion
+project and have a weekly <a
href="https://docs.google.com/document/d/1NBpkIAuU7O9h8Br5CbFksDhX-L9TyO9wmGLPMe0Plc8/edit?usp=sharing">DataFusion
video call</a>.</p>
+<p>The easiest way to get involved is to test Comet with your current
Spark jobs and file issues for any bugs or
+performance regressions that you find. See the <a
href="https://datafusion.apache.org/comet/user-guide/installation.html">Getting
Started</a> guide for instructions on downloading and installing
+Comet.</p>
+<p>There are also many <a
href="https://github.com/apache/datafusion-comet/contribute">good first
issues</a> waiting for contributions.</p></content><category
term="blog"></category></entry><entry><title>User defined Window Functions in
DataFusion</title><link
href="https://datafusion.apache.org/blog/2025/04/19/user-defined-window-functions"
rel="alternate"></link><published>2025-04-19T00:00:00+00:00</published><updated>2025-04-19T00:00:00+00:00</updated><author><
[...]
{% comment %}
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
diff --git a/output/feeds/pmc.atom.xml b/output/feeds/pmc.atom.xml
index a056f83..3a42c96 100644
--- a/output/feeds/pmc.atom.xml
+++ b/output/feeds/pmc.atom.xml
@@ -1,5 +1,90 @@
<?xml version="1.0" encoding="utf-8"?>
-<feed xmlns="http://www.w3.org/2005/Atom"><title>Apache DataFusion Blog -
pmc</title><link href="https://datafusion.apache.org/blog/"
rel="alternate"></link><link
href="https://datafusion.apache.org/blog/feeds/pmc.atom.xml"
rel="self"></link><id>https://datafusion.apache.org/blog/</id><updated>2025-03-20T00:00:00+00:00</updated><subtitle></subtitle><entry><title>Apache
DataFusion Comet 0.7.0 Release</title><link
href="https://datafusion.apache.org/blog/2025/03/20/datafusion-comet-0.7.0"
[...]
+<feed xmlns="http://www.w3.org/2005/Atom"><title>Apache DataFusion Blog -
pmc</title><link href="https://datafusion.apache.org/blog/"
rel="alternate"></link><link
href="https://datafusion.apache.org/blog/feeds/pmc.atom.xml"
rel="self"></link><id>https://datafusion.apache.org/blog/</id><updated>2025-05-06T00:00:00+00:00</updated><subtitle></subtitle><entry><title>Apache
DataFusion Comet 0.8.0 Release</title><link
href="https://datafusion.apache.org/blog/2025/05/06/datafusion-comet-0.8.0"
[...]
+{% comment %}
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements. See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to you under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License. You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+{% endcomment %}
+-->
+<p>The Apache DataFusion PMC is pleased to announce version 0.8.0 of the
<a href="https://datafusion.apache.org/comet/">Comet</a>
subproject.</p>
+<p>Comet is an accelerator for Apache Spark that translates Spark
physical plans to DataFusion physical plans for
+improved performance and efficiency without requiring any code
changes.</p>
+<p>This release covers approximately SIX weeks of development
…</p></summary><content type="html"><!--
+{% comment %}
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements. See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to you under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License. You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+{% endcomment %}
+-->
+<p>The Apache DataFusion PMC is pleased to announce version 0.8.0 of the
<a href="https://datafusion.apache.org/comet/">Comet</a>
subproject.</p>
+<p>Comet is an accelerator for Apache Spark that translates Spark
physical plans to DataFusion physical plans for
+improved performance and efficiency without requiring any code
changes.</p>
+<p>This release covers approximately SIX weeks of development work and
is the result of merging 81 PRs from 11
+contributors. See the <a
href="https://github.com/apache/datafusion-comet/blob/main/dev/changelog/0.8.0.md">change
log</a> for more information.</p>
+<h2>Release Highlights</h2>
+<h3>Performance &amp; Stability</h3>
+<ul>
+<li>Up to 4x speedup in jobs using
<code>dropDuplicates</code>, thanks to optimizations in the
<code>first_value</code> and <code>last_value</code>
+ aggregate functions in DataFusion 47.0.0.</li>
+<li>Introduction of a global Tokio runtime, which resolves potential
deadlocks in certain multi-task scenarios.</li>
+</ul>
+<h2>Native Shuffle Improvements</h2>
+<p>Significant enhancements to the native shuffle mechanism
include:</p>
+<ul>
+<li>Lower memory usage through using
<code>interleave_record_batches</code> instead of using array
builders.</li>
+<li>Support for complex types in shuffle data (note: hash partition
expressions still require primitive types).</li>
+<li>Reclaimable shuffle files, reducing disk pressure.</li>
+<li>Respects <code>spark.local.dir</code> for temporary
storage.</li>
+<li>Per-task shuffle metrics are now available, providing better
visibility into execution behavior.</li>
+</ul>
+<h2>Experimental Support for DataFusion&rsquo;s Parquet
Scan</h2>
+<p>It is now possible to configure Comet to use DataFusion&rsquo;s
Parquet reader instead of Comet&rsquo;s current Parquet reader. This
+has the advantage of supporting complex types, and also has performance
optimizations that are not present in Comet's
+existing reader.</p>
+<p>This release continues with the ongoing improvements and bug fixes
and supports more use cases, but there are still
+some known issues:</p>
+<ul>
+<li>There are schema coercion bugs for nested types containing INT96
columns, which can cause incorrect results.</li>
+<li>There are compatibility issues when reading integer values that are
larger than their type annotation, such as the
+ value 1024 being stored in a field annotated as int(8).</li>
+<li>A small number of Spark SQL tests remain unsupported (<a
href="https://github.com/apache/datafusion-comet/issues/1545">#1545</a>).</li>
+</ul>
+<p>To enable DataFusion&rsquo;s Parquet reader, either set
<code>spark.comet.scan.impl=native_datafusion</code> or set the
environment
+variable
<code>COMET_PARQUET_SCAN_IMPL=native_datafusion</code>.</p>
+<h2>Updates to Supported Spark Versions</h2>
+<ul>
+<li>Added support for Spark 3.5.5</li>
+<li>Dropped support for Spark 3.3.x</li>
+</ul>
+<h2>Getting Involved</h2>
+<p>The Comet project welcomes new contributors. We use the same <a
href="https://datafusion.apache.org/contributor-guide/communication.html#slack-and-discord">Slack
and Discord</a> channels as the main DataFusion
+project and have a weekly <a
href="https://docs.google.com/document/d/1NBpkIAuU7O9h8Br5CbFksDhX-L9TyO9wmGLPMe0Plc8/edit?usp=sharing">DataFusion
video call</a>.</p>
+<p>The easiest way to get involved is to test Comet with your current
Spark jobs and file issues for any bugs or
+performance regressions that you find. See the <a
href="https://datafusion.apache.org/comet/user-guide/installation.html">Getting
Started</a> guide for instructions on downloading and installing
+Comet.</p>
+<p>There are also many <a
href="https://github.com/apache/datafusion-comet/contribute">good first
issues</a> waiting for contributions.</p></content><category
term="blog"></category></entry><entry><title>Apache DataFusion Comet 0.7.0
Release</title><link
href="https://datafusion.apache.org/blog/2025/03/20/datafusion-comet-0.7.0"
rel="alternate"></link><published>2025-03-20T00:00:00+00:00</published><updated>2025-03-20T00:00:00+00:00</updated><author><name>pmc</nam
[...]
{% comment %}
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
diff --git a/output/feeds/pmc.rss.xml b/output/feeds/pmc.rss.xml
index be7d552..ea12593 100644
--- a/output/feeds/pmc.rss.xml
+++ b/output/feeds/pmc.rss.xml
@@ -1,5 +1,26 @@
<?xml version="1.0" encoding="utf-8"?>
-<rss version="2.0"><channel><title>Apache DataFusion Blog -
pmc</title><link>https://datafusion.apache.org/blog/</link><description></description><lastBuildDate>Thu,
20 Mar 2025 00:00:00 +0000</lastBuildDate><item><title>Apache DataFusion Comet
0.7.0
Release</title><link>https://datafusion.apache.org/blog/2025/03/20/datafusion-comet-0.7.0</link><description><!--
+<rss version="2.0"><channel><title>Apache DataFusion Blog -
pmc</title><link>https://datafusion.apache.org/blog/</link><description></description><lastBuildDate>Tue,
06 May 2025 00:00:00 +0000</lastBuildDate><item><title>Apache DataFusion Comet
0.8.0
Release</title><link>https://datafusion.apache.org/blog/2025/05/06/datafusion-comet-0.8.0</link><description><!--
+{% comment %}
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements. See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to you under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License. You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+{% endcomment %}
+-->
+<p>The Apache DataFusion PMC is pleased to announce version 0.8.0 of the
<a href="https://datafusion.apache.org/comet/">Comet</a>
subproject.</p>
+<p>Comet is an accelerator for Apache Spark that translates Spark
physical plans to DataFusion physical plans for
+improved performance and efficiency without requiring any code
changes.</p>
+<p>This release covers approximately SIX weeks of development
…</p></description><dc:creator
xmlns:dc="http://purl.org/dc/elements/1.1/">pmc</dc:creator><pubDate>Tue, 06
May 2025 00:00:00 +0000</pubDate><guid
isPermaLink="false">tag:datafusion.apache.org,2025-05-06:/blog/2025/05/06/datafusion-comet-0.8.0</guid><category>blog</category></item><item><title>Apache
DataFusion Comet 0.7.0
Release</title><link>https://datafusion.apache.org/blog/2025/03/20/datafusion-comet-0.7.0</li
[...]
{% comment %}
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
diff --git a/output/index.html b/output/index.html
index 764c26b..a85fc5d 100644
--- a/output/index.html
+++ b/output/index.html
@@ -44,6 +44,46 @@
<p><i>Here you can find the latest updates from DataFusion and
related projects.</i></p>
+ <!-- Post -->
+ <div class="row">
+ <div class="callout">
+ <article class="post">
+ <header>
+ <div class="title">
+ <h1><a
href="/blog/2025/05/06/datafusion-comet-0.8.0">Apache DataFusion Comet 0.8.0
Release</a></h1>
+ <p>Posted on: Tue 06 May 2025 by pmc</p>
+ <p><!--
+{% comment %}
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements. See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to you under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License. You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+{% endcomment %}
+-->
+<p>The Apache DataFusion PMC is pleased to announce version 0.8.0 of the <a
href="https://datafusion.apache.org/comet/">Comet</a> subproject.</p>
+<p>Comet is an accelerator for Apache Spark that translates Spark physical
plans to DataFusion physical plans for
+improved performance and efficiency without requiring any code changes.</p>
+<p>This release covers approximately SIX weeks of development …</p></p>
+ <footer>
+ <ul class="actions">
+ <div style="text-align: right"><a
href="/blog/2025/05/06/datafusion-comet-0.8.0" class="button medium">Continue
Reading</a></div>
+ </ul>
+ <ul class="stats">
+ </ul>
+ </footer>
+ </article>
+ </div>
+ </div>
<!-- Post -->
<div class="row">
<div class="callout">
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]