zhuqi-lucas commented on PR #79: URL: https://github.com/apache/datafusion-site/pull/79#issuecomment-3040695911
> I just pushed a commit that reworked the intro a bit and started filling out the background > > <img alt="Screenshot 2025-07-05 at 5 06 45 PM" width="917" src="https://private-user-images.githubusercontent.com/490673/462836291-efe36816-7fed-44d7-9158-1b2fc19ffb19.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3NTE3NzI4NjIsIm5iZiI6MTc1MTc3MjU2MiwicGF0aCI6Ii80OTA2NzMvNDYyODM2MjkxLWVmZTM2ODE2LTdmZWQtNDRkNy05MTU4LTFiMmZjMTlmZmIxOS5wbmc_WC1BbXotQWxnb3JpdGhtPUFXUzQtSE1BQy1TSEEyNTYmWC1BbXotQ3JlZGVudGlhbD1BS0lBVkNPRFlMU0E1M1BRSzRaQSUyRjIwMjUwNzA2JTJGdXMtZWFzdC0xJTJGczMlMkZhd3M0X3JlcXVlc3QmWC1BbXotRGF0ZT0yMDI1MDcwNlQwMzI5MjJaJlgtQW16LUV4cGlyZXM9MzAwJlgtQW16LVNpZ25hdHVyZT03NmIzOTg5OTg2NzliM2VkMmFhYWNkNWRmYmU5NDU5ZWViZTM0NGE5NWMwY2U1NzA4MDk4M2FiYjQ1ZjEzY2NkJlgtQW16LVNpZ25lZEhlYWRlcnM9aG9zdCJ9.P56Oj5QdDq5Yr98KkqtAdSxBLwjwKdDJn0I4qTbGW0E"> <img alt="Screenshot 2025-07-05 at 5 06 41 PM" width="722" src="https://private-user-images.githubusercontent.com/490673/462836293-07e36677-46a6-4f2e -8b8a-c5eab9545167.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3NTE3NzI4NjIsIm5iZiI6MTc1MTc3MjU2MiwicGF0aCI6Ii80OTA2NzMvNDYyODM2MjkzLTA3ZTM2Njc3LTQ2YTYtNGYyZS04YjhhLWM1ZWFiOTU0NTE2Ny5wbmc_WC1BbXotQWxnb3JpdGhtPUFXUzQtSE1BQy1TSEEyNTYmWC1BbXotQ3JlZGVudGlhbD1BS0lBVkNPRFlMU0E1M1BRSzRaQSUyRjIwMjUwNzA2JTJGdXMtZWFzdC0xJTJGczMlMkZhd3M0X3JlcXVlc3QmWC1BbXotRGF0ZT0yMDI1MDcwNlQwMzI5MjJaJlgtQW16LUV4cGlyZXM9MzAwJlgtQW16LVNpZ25hdHVyZT0zNWU2NjE0NDBiNmJjZWZjMzQ0ZTAxYThjNjIyNGE4NjhiYmExMTM5Y2MyNDdiMTNlNmU3NzU2ZjA0ZjJjNjQ1JlgtQW16LVNpZ25lZEhlYWRlcnM9aG9zdCJ9.QFx5p_md3rqLd8SgHPea0zLaBNvmQpE-VdN--QdSlTY"> > @JigaoLuo the outlook section you describe sounds great. I envision it right after the > > ``` > ## 1. Parquet 101: File Anatomy & Standard Index Structures > ``` > > Section > > Perhaps like > > ``` > ## 2. Extending Parquet with Special Indexes > ``` > > (this is where figure 2 goes and where we will explain how to embed a custom index). So it makes a lot of sense to mention here the potential usecases (and that the index can be written after each row group or at the end of the file, and it can have information for each row group, individual row groups, columns, etc, whatever you want > > I would also be interested to hear what @zhuqi-lucas thinks Amazing work thank you @alamb. > Regarding my impression during reading: **"the Embedded Index is just a hashset to speed up scans, which adds overhead to Parquet."** as mentioned as a follow-up here: [#79 (comment)](https://github.com/apache/datafusion-site/pull/79#discussion_r2186572247) > > If other readers also has the same impression, it might unintentionally limit how readers perceive its potential of the Embedded Index. To address this, we could consider adding **a short Outlook section** (either at the beginning or the end of the blog) to explicitly highlight what the Embedded Index is capable of. It’s not just a hashset for pruning; in principle, it could support a wide range of use cases. Use cases are also discussed here: [apache/datafusion#16374 (comment)](https://github.com/apache/datafusion/issues/16374#issuecomment-3039796047) > > I’d be happy to help draft such an Outlook section, pending confirmation from your side. Looks great @JigaoLuo ! Feel free to add it, thank you! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org