This is an automated email from the ASF dual-hosted git repository. alamb pushed a commit to branch main in repository https://gitbox.apache.org/repos/asf/datafusion-site.git
The following commit(s) were added to refs/heads/main by this push: new 3be95e3 Add alternate index strategy footnote to parquet indexing blog (#90) 3be95e3 is described below commit 3be95e3d14a72e5e57775042e9f82edfd15c1e30 Author: Andrew Lamb <and...@nerdnetworks.org> AuthorDate: Thu Jul 17 15:23:04 2025 -0400 Add alternate index strategy footnote to parquet indexing blog (#90) --- content/blog/2025-07-14-user-defined-parquet-indexes.md | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/content/blog/2025-07-14-user-defined-parquet-indexes.md b/content/blog/2025-07-14-user-defined-parquet-indexes.md index 323a9dd..aefc3da 100644 --- a/content/blog/2025-07-14-user-defined-parquet-indexes.md +++ b/content/blog/2025-07-14-user-defined-parquet-indexes.md @@ -104,7 +104,7 @@ Modern Parquet writers create these indexes automatically and provide APIs to co --- -Embedding user-defined indexes in Parquet files is straightforward and follows the same principles as standard index structures: +Embedding user-defined indexes in Parquet files is straightforward and follows the same principles as standard index structures<sup>[6](#footnote6)</sup>: 1. Serialize the index into a binary format and write it into the file body before the Thrift-encoded footer metadata. @@ -592,3 +592,5 @@ it out, we would love for you to join us. <a id="footnote4"></a>`4`: For more information about external indexes, see [this talk](https://www.youtube.com/watch?v=74YsJT1-Rdk) and the [parquet_index.rs] and [advanced_parquet_index.rs] examples in the DataFusion repository. <a id="footnote5"></a>`5`: For information about rewriting files to optimize for specific queries, such as resorting, repartitioning, and tuning data page and row group sizes, see [XiangpengHao/liquid‑cache#227](https://github.com/XiangpengHao/liquid-cache/issues/227) and the conversation between [JigaoLuo](https://github.com/JigaoLuo) and [XiangpengHao](https://github.com/XiangpengHao) for details. We hope to make a future post about this topic. + +<a id="footnote6"></a>`6`: An index can also be stored inline in the key-value metadata. This approach is simple to implement and ensures the index is available once the footer is read, without additional I/O. However, it requires the index to be serialized as a UTF-8 string, which may be less efficient and increases the size of the footer metadata, impacting all Parquet readers, even those that ignore the index. --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@datafusion.apache.org For additional commands, e-mail: commits-h...@datafusion.apache.org