zhuqi-lucas commented on code in PR #79:
URL: https://github.com/apache/datafusion-site/pull/79#discussion_r2186508081


##########
content/blog/datafusion-custom-parquet-index.md:
##########
@@ -0,0 +1,232 @@
+## Accelerating Query Processing in DataFusion with Embedded Parquet Indexes
+
+It’s a common misconception that Parquet can only deliver basic Min/Max 
pruning and Bloom filters—and that adding anything “smarter” requires inventing 
a whole new file format. In fact, Parquet’s design already lets you embed 
custom indexing data *inside* the file (via unused footer metadata and byte 
regions) without breaking compatibility. In this post, we’ll show how 
DataFusion can leverage a **compact distinct‑value index** written directly 
into Parquet files—preserving complete interchangeability with other 
tools—while enabling ultra‑fast file‑level pruning.

Review Comment:
   Good suggestion @alamb !



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to