This is an automated email from the ASF dual-hosted git repository.
github-bot pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/datafusion-site.git
The following commit(s) were added to refs/heads/asf-site by this push:
new 331b655 Commit build products
331b655 is described below
commit 331b655d7ce74e587895cf28791f105b69f51128
Author: Build Pelican (action) <[email protected]>
AuthorDate: Thu Jul 17 19:23:26 2025 +0000
Commit build products
---
output/2025/07/14/user-defined-parquet-indexes/index.html | 3 ++-
output/feeds/all-en.atom.xml | 5 +++--
output/feeds/blog.atom.xml | 5 +++--
...systems-group-at-tu-darmstadt-and-andrew-lamb-influxdata.atom.xml | 5 +++--
4 files changed, 11 insertions(+), 7 deletions(-)
diff --git a/output/2025/07/14/user-defined-parquet-indexes/index.html
b/output/2025/07/14/user-defined-parquet-indexes/index.html
index 9b74b62..266ce01 100644
--- a/output/2025/07/14/user-defined-parquet-indexes/index.html
+++ b/output/2025/07/14/user-defined-parquet-indexes/index.html
@@ -103,7 +103,7 @@ limitations under the License.
<p>Modern Parquet writers create these indexes automatically and provide APIs
to control their generation and placement. For example, the <a
href="https://docs.rs/parquet/latest/parquet/">Rust Parquet Library</a>
provides <a
href="https://docs.rs/parquet/latest/parquet/file/properties/struct.WriterProperties.html">Parquet
WriterProperties</a>, <a
href="https://docs.rs/parquet/latest/parquet/file/properties/enum.EnabledStatistics.html">EnabledStatistics</a>,
and <a href="https://docs.rs/p [...]
<h2>Embedding User Defined Indexes in Parquet Files</h2>
<hr/>
-<p>Embedding user-defined indexes in Parquet files is straightforward and
follows the same principles as standard index structures:</p>
+<p>Embedding user-defined indexes in Parquet files is straightforward and
follows the same principles as standard index structures<sup><a
href="#footnote6">6</a></sup>:</p>
<ol>
<li>
<p>Serialize the index into a binary format and write it into the file body
before the Thrift-encoded footer metadata.</p>
@@ -513,6 +513,7 @@ it out, we would love for you to join us.</p>
<p><a id="footnote3"></a><code>3</code>: <a
href="https://dl.gi.de/items/2a8571f8-0ef2-481c-8ee9-05f82ee258c8">Seamless
Integration of Parquet Files into Data Processing. / Rey, Alice; Freitag,
Michael; Neumann, Thomas. / BTW 2023</a></p>
<p><a id="footnote4"></a><code>4</code>: For more information about external
indexes, see <a href="https://www.youtube.com/watch?v=74YsJT1-Rdk">this
talk</a> and the <a
href="https://github.com/apache/datafusion/blob/main/datafusion-examples/examples/parquet_index.rs">parquet_index.rs</a>
and <a
href="https://github.com/apache/datafusion/blob/main/datafusion-examples/examples/advanced_parquet_index.rs">advanced_parquet_index.rs</a>
examples in the DataFusion repository.</p>
<p><a id="footnote5"></a><code>5</code>: For information about rewriting files
to optimize for specific queries, such as resorting, repartitioning, and tuning
data page and row group sizes, see <a
href="https://github.com/XiangpengHao/liquid-cache/issues/227">XiangpengHao/liquid‑cache#227</a>
and the conversation between <a
href="https://github.com/JigaoLuo">JigaoLuo</a> and <a
href="https://github.com/XiangpengHao">XiangpengHao</a> for details. We hope to
make a future post about this t [...]
+<p><a id="footnote6"></a><code>6</code>: An index can also be stored inline in
the key-value metadata. This approach is simple to implement and ensures the
index is available once the footer is read, without additional I/O. However, it
requires the index to be serialized as a UTF-8 string, which may be less
efficient and increases the size of the footer metadata, impacting all Parquet
readers, even those that ignore the index.</p>
</div>
</div>
</div>
diff --git a/output/feeds/all-en.atom.xml b/output/feeds/all-en.atom.xml
index f74d67e..3d9c139 100644
--- a/output/feeds/all-en.atom.xml
+++ b/output/feeds/all-en.atom.xml
@@ -271,7 +271,7 @@ limitations under the License.
<p>Modern Parquet writers create these indexes automatically and provide
APIs to control their generation and placement. For example, the <a
href="https://docs.rs/parquet/latest/parquet/">Rust Parquet
Library</a> provides <a
href="https://docs.rs/parquet/latest/parquet/file/properties/struct.WriterProperties.html">Parquet
WriterProperties</a>, <a
href="https://docs.rs/parquet/latest/parquet/file/properties/enum.EnabledStatistics.html">EnabledStatistics
[...]
<h2>Embedding User Defined Indexes in Parquet Files</h2>
<hr/>
-<p>Embedding user-defined indexes in Parquet files is straightforward
and follows the same principles as standard index structures:</p>
+<p>Embedding user-defined indexes in Parquet files is straightforward
and follows the same principles as standard index structures<sup><a
href="#footnote6">6</a></sup>:</p>
<ol>
<li>
<p>Serialize the index into a binary format and write it into the file
body before the Thrift-encoded footer metadata.</p>
@@ -680,7 +680,8 @@ it out, we would love for you to join us.</p>
<p><a id="footnote2"></a><code>2</code>: There
are other index structures, but they are either 1) not widely supported (such
as statistics in the page headers) or 2) not yet widely used in practice at the
time of this writing (such as <a
href="https://github.com/apache/parquet-format/blob/819adce0ec6aa848e56c56f20b9347f4ab50857f/src/main/thrift/parquet.thrift#L256">GeospatialStatistics</a>
and <a href="https://github.com/apache/parquet-format/ [...]
<p><a id="footnote3"></a><code>3</code>: <a
href="https://dl.gi.de/items/2a8571f8-0ef2-481c-8ee9-05f82ee258c8">Seamless
Integration of Parquet Files into Data Processing. / Rey, Alice; Freitag,
Michael; Neumann, Thomas. / BTW 2023</a></p>
<p><a id="footnote4"></a><code>4</code>: For
more information about external indexes, see <a
href="https://www.youtube.com/watch?v=74YsJT1-Rdk">this talk</a> and
the <a
href="https://github.com/apache/datafusion/blob/main/datafusion-examples/examples/parquet_index.rs">parquet_index.rs</a>
and <a
href="https://github.com/apache/datafusion/blob/main/datafusion-examples/examples/advanced_parquet_index.rs">advanced_parquet_index.rs&
[...]
-<p><a id="footnote5"></a><code>5</code>: For
information about rewriting files to optimize for specific queries, such as
resorting, repartitioning, and tuning data page and row group sizes, see <a
href="https://github.com/XiangpengHao/liquid-cache/issues/227">XiangpengHao/liquid‑cache#227</a>
and the conversation between <a
href="https://github.com/JigaoLuo">JigaoLuo</a> and <a
href="https://github.com/XiangpengHao">XiangpengHao [...]
+<p><a id="footnote5"></a><code>5</code>: For
information about rewriting files to optimize for specific queries, such as
resorting, repartitioning, and tuning data page and row group sizes, see <a
href="https://github.com/XiangpengHao/liquid-cache/issues/227">XiangpengHao/liquid‑cache#227</a>
and the conversation between <a
href="https://github.com/JigaoLuo">JigaoLuo</a> and <a
href="https://github.com/XiangpengHao">XiangpengHao [...]
+<p><a id="footnote6"></a><code>6</code>: An
index can also be stored inline in the key-value metadata. This approach is
simple to implement and ensures the index is available once the footer is read,
without additional I/O. However, it requires the index to be serialized as a
UTF-8 string, which may be less efficient and increases the size of the footer
metadata, impacting all Parquet readers, even those that ignore the
index.</p></content><category te [...]
{% comment %}
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
diff --git a/output/feeds/blog.atom.xml b/output/feeds/blog.atom.xml
index 432770f..bb30323 100644
--- a/output/feeds/blog.atom.xml
+++ b/output/feeds/blog.atom.xml
@@ -271,7 +271,7 @@ limitations under the License.
<p>Modern Parquet writers create these indexes automatically and provide
APIs to control their generation and placement. For example, the <a
href="https://docs.rs/parquet/latest/parquet/">Rust Parquet
Library</a> provides <a
href="https://docs.rs/parquet/latest/parquet/file/properties/struct.WriterProperties.html">Parquet
WriterProperties</a>, <a
href="https://docs.rs/parquet/latest/parquet/file/properties/enum.EnabledStatistics.html">EnabledStatistics
[...]
<h2>Embedding User Defined Indexes in Parquet Files</h2>
<hr/>
-<p>Embedding user-defined indexes in Parquet files is straightforward
and follows the same principles as standard index structures:</p>
+<p>Embedding user-defined indexes in Parquet files is straightforward
and follows the same principles as standard index structures<sup><a
href="#footnote6">6</a></sup>:</p>
<ol>
<li>
<p>Serialize the index into a binary format and write it into the file
body before the Thrift-encoded footer metadata.</p>
@@ -680,7 +680,8 @@ it out, we would love for you to join us.</p>
<p><a id="footnote2"></a><code>2</code>: There
are other index structures, but they are either 1) not widely supported (such
as statistics in the page headers) or 2) not yet widely used in practice at the
time of this writing (such as <a
href="https://github.com/apache/parquet-format/blob/819adce0ec6aa848e56c56f20b9347f4ab50857f/src/main/thrift/parquet.thrift#L256">GeospatialStatistics</a>
and <a href="https://github.com/apache/parquet-format/ [...]
<p><a id="footnote3"></a><code>3</code>: <a
href="https://dl.gi.de/items/2a8571f8-0ef2-481c-8ee9-05f82ee258c8">Seamless
Integration of Parquet Files into Data Processing. / Rey, Alice; Freitag,
Michael; Neumann, Thomas. / BTW 2023</a></p>
<p><a id="footnote4"></a><code>4</code>: For
more information about external indexes, see <a
href="https://www.youtube.com/watch?v=74YsJT1-Rdk">this talk</a> and
the <a
href="https://github.com/apache/datafusion/blob/main/datafusion-examples/examples/parquet_index.rs">parquet_index.rs</a>
and <a
href="https://github.com/apache/datafusion/blob/main/datafusion-examples/examples/advanced_parquet_index.rs">advanced_parquet_index.rs&
[...]
-<p><a id="footnote5"></a><code>5</code>: For
information about rewriting files to optimize for specific queries, such as
resorting, repartitioning, and tuning data page and row group sizes, see <a
href="https://github.com/XiangpengHao/liquid-cache/issues/227">XiangpengHao/liquid‑cache#227</a>
and the conversation between <a
href="https://github.com/JigaoLuo">JigaoLuo</a> and <a
href="https://github.com/XiangpengHao">XiangpengHao [...]
+<p><a id="footnote5"></a><code>5</code>: For
information about rewriting files to optimize for specific queries, such as
resorting, repartitioning, and tuning data page and row group sizes, see <a
href="https://github.com/XiangpengHao/liquid-cache/issues/227">XiangpengHao/liquid‑cache#227</a>
and the conversation between <a
href="https://github.com/JigaoLuo">JigaoLuo</a> and <a
href="https://github.com/XiangpengHao">XiangpengHao [...]
+<p><a id="footnote6"></a><code>6</code>: An
index can also be stored inline in the key-value metadata. This approach is
simple to implement and ensures the index is available once the footer is read,
without additional I/O. However, it requires the index to be serialized as a
UTF-8 string, which may be less efficient and increases the size of the footer
metadata, impacting all Parquet readers, even those that ignore the
index.</p></content><category te [...]
{% comment %}
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
diff --git
a/output/feeds/qi-zhu-cloudera-jigao-luo-systems-group-at-tu-darmstadt-and-andrew-lamb-influxdata.atom.xml
b/output/feeds/qi-zhu-cloudera-jigao-luo-systems-group-at-tu-darmstadt-and-andrew-lamb-influxdata.atom.xml
index 911bfd1..278b9d1 100644
---
a/output/feeds/qi-zhu-cloudera-jigao-luo-systems-group-at-tu-darmstadt-and-andrew-lamb-influxdata.atom.xml
+++
b/output/feeds/qi-zhu-cloudera-jigao-luo-systems-group-at-tu-darmstadt-and-andrew-lamb-influxdata.atom.xml
@@ -77,7 +77,7 @@ limitations under the License.
<p>Modern Parquet writers create these indexes automatically and provide
APIs to control their generation and placement. For example, the <a
href="https://docs.rs/parquet/latest/parquet/">Rust Parquet
Library</a> provides <a
href="https://docs.rs/parquet/latest/parquet/file/properties/struct.WriterProperties.html">Parquet
WriterProperties</a>, <a
href="https://docs.rs/parquet/latest/parquet/file/properties/enum.EnabledStatistics.html">EnabledStatistics
[...]
<h2>Embedding User Defined Indexes in Parquet Files</h2>
<hr/>
-<p>Embedding user-defined indexes in Parquet files is straightforward
and follows the same principles as standard index structures:</p>
+<p>Embedding user-defined indexes in Parquet files is straightforward
and follows the same principles as standard index structures<sup><a
href="#footnote6">6</a></sup>:</p>
<ol>
<li>
<p>Serialize the index into a binary format and write it into the file
body before the Thrift-encoded footer metadata.</p>
@@ -486,4 +486,5 @@ it out, we would love for you to join us.</p>
<p><a id="footnote2"></a><code>2</code>: There
are other index structures, but they are either 1) not widely supported (such
as statistics in the page headers) or 2) not yet widely used in practice at the
time of this writing (such as <a
href="https://github.com/apache/parquet-format/blob/819adce0ec6aa848e56c56f20b9347f4ab50857f/src/main/thrift/parquet.thrift#L256">GeospatialStatistics</a>
and <a href="https://github.com/apache/parquet-format/ [...]
<p><a id="footnote3"></a><code>3</code>: <a
href="https://dl.gi.de/items/2a8571f8-0ef2-481c-8ee9-05f82ee258c8">Seamless
Integration of Parquet Files into Data Processing. / Rey, Alice; Freitag,
Michael; Neumann, Thomas. / BTW 2023</a></p>
<p><a id="footnote4"></a><code>4</code>: For
more information about external indexes, see <a
href="https://www.youtube.com/watch?v=74YsJT1-Rdk">this talk</a> and
the <a
href="https://github.com/apache/datafusion/blob/main/datafusion-examples/examples/parquet_index.rs">parquet_index.rs</a>
and <a
href="https://github.com/apache/datafusion/blob/main/datafusion-examples/examples/advanced_parquet_index.rs">advanced_parquet_index.rs&
[...]
-<p><a id="footnote5"></a><code>5</code>: For
information about rewriting files to optimize for specific queries, such as
resorting, repartitioning, and tuning data page and row group sizes, see <a
href="https://github.com/XiangpengHao/liquid-cache/issues/227">XiangpengHao/liquid‑cache#227</a>
and the conversation between <a
href="https://github.com/JigaoLuo">JigaoLuo</a> and <a
href="https://github.com/XiangpengHao">XiangpengHao [...]
\ No newline at end of file
+<p><a id="footnote5"></a><code>5</code>: For
information about rewriting files to optimize for specific queries, such as
resorting, repartitioning, and tuning data page and row group sizes, see <a
href="https://github.com/XiangpengHao/liquid-cache/issues/227">XiangpengHao/liquid‑cache#227</a>
and the conversation between <a
href="https://github.com/JigaoLuo">JigaoLuo</a> and <a
href="https://github.com/XiangpengHao">XiangpengHao [...]
+<p><a id="footnote6"></a><code>6</code>: An
index can also be stored inline in the key-value metadata. This approach is
simple to implement and ensures the index is available once the footer is read,
without additional I/O. However, it requires the index to be serialized as a
UTF-8 string, which may be less efficient and increases the size of the footer
metadata, impacting all Parquet readers, even those that ignore the
index.</p></content><category te [...]
\ No newline at end of file
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]