This is an automated email from the ASF dual-hosted git repository. github-bot pushed a commit to branch asf-staging in repository https://gitbox.apache.org/repos/asf/datafusion-site.git
The following commit(s) were added to refs/heads/asf-staging by this push: new 4cf2fe2 Commit build products 4cf2fe2 is described below commit 4cf2fe218be26373b4310d6db5c1ee4f55ecd29d Author: Build Pelican (action) <priv...@infra.apache.org> AuthorDate: Tue Aug 12 14:19:52 2025 +0000 Commit build products --- blog/2025/08/15/external-parquet-indexes/index.html | 5 ++++- blog/feeds/all-en.atom.xml | 5 ++++- blog/feeds/andrew-lamb-influxdata.atom.xml | 5 ++++- blog/feeds/blog.atom.xml | 5 ++++- 4 files changed, 16 insertions(+), 4 deletions(-) diff --git a/blog/2025/08/15/external-parquet-indexes/index.html b/blog/2025/08/15/external-parquet-indexes/index.html index 285c245..6cd5bbe 100644 --- a/blog/2025/08/15/external-parquet-indexes/index.html +++ b/blog/2025/08/15/external-parquet-indexes/index.html @@ -587,6 +587,9 @@ components, rather than as a single tightly integrated system.</p> improve the project. If you are interested in learning more about how query execution works, help document or improve the DataFusion codebase, or just try it out, we would love for you to join us.</p> +<h3>Acknowledgements</h3> +<p>Thank you to <a href="https://github.com/zhuqi-lucas">Qi Zhu</a>, <a href="https://github.com/adamreeve">Adam Reeve</a>, <a href="https://github.com/JigaoLuo">Jigao Luo</a>, <a href="https://github.com/comphead">Oleks V</a>, <a href="https://github.com/shehabgamin">Shehab Amin</a>, <a href="https://nuno-faria.github.io/">Nuno Faria</a> +and <a href="https://github.com/Omega359">Bruce Ritchie</a> for their insightful feedback on this blog post.</p> <h3>Footnotes</h3> <p><a id="footnote1"></a><code>1</code>: This trend is described in more detail in the <a href="https://www.influxdata.com/blog/flight-datafusion-arrow-parquet-fdap-architecture-influxdb/">FDAP Stack</a> blog</p> <p><a id="footnote2"></a><code>2</code>: This layout is referred to as <a href="https://www.vldb.org/conf/2001/P169.pdf">PAX in the @@ -601,7 +604,7 @@ with additional engineering effort (see <a href="https://xiangpeng.systems/">Xia topic</a>). <a href="https://github.com/etseidl">Ed Seidl</a> is beginning this effort. See the <a href="https://github.com/apache/arrow-rs/issues/5854">ticket</a> for details.</p> <p><a id="footnote6"></a><code>6</code>: ClickBench includes a wide variety of query patterns such as point lookups, filters of different selectivity, and aggregations.</p> -<p><a id="footnote7"></a><code>7</code>: For example, <a href="https://github.com/zhuqi-lucas">Zhu Qi</a> was able to speed up reads by over 2x +<p><a id="footnote7"></a><code>7</code>: For example, <a href="https://github.com/zhuqi-lucas">Qi Zhu</a> was able to speed up reads by over 2x simply by rewriting the Parquet files with Offset Indexes and no compression (see <a href="https://github.com/apache/datafusion/issues/16149#issuecomment-2918761743">issue #16149 comment</a> for details). There is likely significant additional performance available by using Bloom Filters and resorting the data to be clustered in a more optimal way for the queries.</p> diff --git a/blog/feeds/all-en.atom.xml b/blog/feeds/all-en.atom.xml index 1261772..8f643f6 100644 --- a/blog/feeds/all-en.atom.xml +++ b/blog/feeds/all-en.atom.xml @@ -567,6 +567,9 @@ components, rather than as a single tightly integrated system.</p> improve the project. If you are interested in learning more about how query execution works, help document or improve the DataFusion codebase, or just try it out, we would love for you to join us.</p> +<h3>Acknowledgements</h3> +<p>Thank you to <a href="https://github.com/zhuqi-lucas">Qi Zhu</a>, <a href="https://github.com/adamreeve">Adam Reeve</a>, <a href="https://github.com/JigaoLuo">Jigao Luo</a>, <a href="https://github.com/comphead">Oleks V</a>, <a href="https://github.com/shehabgamin">Shehab Amin</a>, <a href="https://nuno-faria.github.io/">Nuno Faria</a> +and <a href="https://github.com/Omega359">Bruce Ritchie</a> for their insightful feedback on this blog post.</p> <h3>Footnotes</h3> <p><a id="footnote1"></a><code>1</code>: This trend is described in more detail in the <a href="https://www.influxdata.com/blog/flight-datafusion-arrow-parquet-fdap-architecture-influxdb/">FDAP Stack</a> blog</p> <p><a id="footnote2"></a><code>2</code>: This layout is referred to as <a href="https://www.vldb.org/conf/2001/P169.pdf">PAX in the @@ -581,7 +584,7 @@ with additional engineering effort (see <a href="https://xiangpeng.systems/"& topic</a>). <a href="https://github.com/etseidl">Ed Seidl</a> is beginning this effort. See the <a href="https://github.com/apache/arrow-rs/issues/5854">ticket</a> for details.</p> <p><a id="footnote6"></a><code>6</code>: ClickBench includes a wide variety of query patterns such as point lookups, filters of different selectivity, and aggregations.</p> -<p><a id="footnote7"></a><code>7</code>: For example, <a href="https://github.com/zhuqi-lucas">Zhu Qi</a> was able to speed up reads by over 2x +<p><a id="footnote7"></a><code>7</code>: For example, <a href="https://github.com/zhuqi-lucas">Qi Zhu</a> was able to speed up reads by over 2x simply by rewriting the Parquet files with Offset Indexes and no compression (see <a href="https://github.com/apache/datafusion/issues/16149#issuecomment-2918761743">issue #16149 comment</a> for details). There is likely significant additional performance available by using Bloom Filters and resorting the data to be clustered in a more optimal way for the queries.</p></content><category term="blog"></category></entry><entry><title>Apache DataFusion 49.0.0 Released</title><link href="https://datafusion.apache.org/blog/2025/07/28/datafusion-49.0.0" rel="alternate"></link><published>2025-07-28T00:00:00+00:00</published><updated>2025-07-28T00:00:00+00:00</updated><author><name>pmc</name></author><id>tag:datafusion.apache.org,2025-07-28:/blog/2025/07/28/datafusion-49.0.0</id><summary type="ht [...] diff --git a/blog/feeds/andrew-lamb-influxdata.atom.xml b/blog/feeds/andrew-lamb-influxdata.atom.xml index 77d036c..b1b2365 100644 --- a/blog/feeds/andrew-lamb-influxdata.atom.xml +++ b/blog/feeds/andrew-lamb-influxdata.atom.xml @@ -567,6 +567,9 @@ components, rather than as a single tightly integrated system.</p> improve the project. If you are interested in learning more about how query execution works, help document or improve the DataFusion codebase, or just try it out, we would love for you to join us.</p> +<h3>Acknowledgements</h3> +<p>Thank you to <a href="https://github.com/zhuqi-lucas">Qi Zhu</a>, <a href="https://github.com/adamreeve">Adam Reeve</a>, <a href="https://github.com/JigaoLuo">Jigao Luo</a>, <a href="https://github.com/comphead">Oleks V</a>, <a href="https://github.com/shehabgamin">Shehab Amin</a>, <a href="https://nuno-faria.github.io/">Nuno Faria</a> +and <a href="https://github.com/Omega359">Bruce Ritchie</a> for their insightful feedback on this blog post.</p> <h3>Footnotes</h3> <p><a id="footnote1"></a><code>1</code>: This trend is described in more detail in the <a href="https://www.influxdata.com/blog/flight-datafusion-arrow-parquet-fdap-architecture-influxdb/">FDAP Stack</a> blog</p> <p><a id="footnote2"></a><code>2</code>: This layout is referred to as <a href="https://www.vldb.org/conf/2001/P169.pdf">PAX in the @@ -581,7 +584,7 @@ with additional engineering effort (see <a href="https://xiangpeng.systems/"& topic</a>). <a href="https://github.com/etseidl">Ed Seidl</a> is beginning this effort. See the <a href="https://github.com/apache/arrow-rs/issues/5854">ticket</a> for details.</p> <p><a id="footnote6"></a><code>6</code>: ClickBench includes a wide variety of query patterns such as point lookups, filters of different selectivity, and aggregations.</p> -<p><a id="footnote7"></a><code>7</code>: For example, <a href="https://github.com/zhuqi-lucas">Zhu Qi</a> was able to speed up reads by over 2x +<p><a id="footnote7"></a><code>7</code>: For example, <a href="https://github.com/zhuqi-lucas">Qi Zhu</a> was able to speed up reads by over 2x simply by rewriting the Parquet files with Offset Indexes and no compression (see <a href="https://github.com/apache/datafusion/issues/16149#issuecomment-2918761743">issue #16149 comment</a> for details). There is likely significant additional performance available by using Bloom Filters and resorting the data to be clustered in a more optimal way for the queries.</p></content><category term="blog"></category></entry></feed> \ No newline at end of file diff --git a/blog/feeds/blog.atom.xml b/blog/feeds/blog.atom.xml index 69d53ea..86d5433 100644 --- a/blog/feeds/blog.atom.xml +++ b/blog/feeds/blog.atom.xml @@ -567,6 +567,9 @@ components, rather than as a single tightly integrated system.</p> improve the project. If you are interested in learning more about how query execution works, help document or improve the DataFusion codebase, or just try it out, we would love for you to join us.</p> +<h3>Acknowledgements</h3> +<p>Thank you to <a href="https://github.com/zhuqi-lucas">Qi Zhu</a>, <a href="https://github.com/adamreeve">Adam Reeve</a>, <a href="https://github.com/JigaoLuo">Jigao Luo</a>, <a href="https://github.com/comphead">Oleks V</a>, <a href="https://github.com/shehabgamin">Shehab Amin</a>, <a href="https://nuno-faria.github.io/">Nuno Faria</a> +and <a href="https://github.com/Omega359">Bruce Ritchie</a> for their insightful feedback on this blog post.</p> <h3>Footnotes</h3> <p><a id="footnote1"></a><code>1</code>: This trend is described in more detail in the <a href="https://www.influxdata.com/blog/flight-datafusion-arrow-parquet-fdap-architecture-influxdb/">FDAP Stack</a> blog</p> <p><a id="footnote2"></a><code>2</code>: This layout is referred to as <a href="https://www.vldb.org/conf/2001/P169.pdf">PAX in the @@ -581,7 +584,7 @@ with additional engineering effort (see <a href="https://xiangpeng.systems/"& topic</a>). <a href="https://github.com/etseidl">Ed Seidl</a> is beginning this effort. See the <a href="https://github.com/apache/arrow-rs/issues/5854">ticket</a> for details.</p> <p><a id="footnote6"></a><code>6</code>: ClickBench includes a wide variety of query patterns such as point lookups, filters of different selectivity, and aggregations.</p> -<p><a id="footnote7"></a><code>7</code>: For example, <a href="https://github.com/zhuqi-lucas">Zhu Qi</a> was able to speed up reads by over 2x +<p><a id="footnote7"></a><code>7</code>: For example, <a href="https://github.com/zhuqi-lucas">Qi Zhu</a> was able to speed up reads by over 2x simply by rewriting the Parquet files with Offset Indexes and no compression (see <a href="https://github.com/apache/datafusion/issues/16149#issuecomment-2918761743">issue #16149 comment</a> for details). There is likely significant additional performance available by using Bloom Filters and resorting the data to be clustered in a more optimal way for the queries.</p></content><category term="blog"></category></entry><entry><title>Apache DataFusion 49.0.0 Released</title><link href="https://datafusion.apache.org/blog/2025/07/28/datafusion-49.0.0" rel="alternate"></link><published>2025-07-28T00:00:00+00:00</published><updated>2025-07-28T00:00:00+00:00</updated><author><name>pmc</name></author><id>tag:datafusion.apache.org,2025-07-28:/blog/2025/07/28/datafusion-49.0.0</id><summary type="ht [...] --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@datafusion.apache.org For additional commands, e-mail: commits-h...@datafusion.apache.org