zhuqi-lucas commented on PR #79:
URL: https://github.com/apache/datafusion-site/pull/79#issuecomment-3040695911

   > I just pushed a commit that reworked the intro a bit and started filling 
out the background
   > 
   > <img alt="Screenshot 2025-07-05 at 5 06 45 PM" width="917" 
src="https://private-user-images.githubusercontent.com/490673/462836291-efe36816-7fed-44d7-9158-1b2fc19ffb19.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3NTE3NzI4NjIsIm5iZiI6MTc1MTc3MjU2MiwicGF0aCI6Ii80OTA2NzMvNDYyODM2MjkxLWVmZTM2ODE2LTdmZWQtNDRkNy05MTU4LTFiMmZjMTlmZmIxOS5wbmc_WC1BbXotQWxnb3JpdGhtPUFXUzQtSE1BQy1TSEEyNTYmWC1BbXotQ3JlZGVudGlhbD1BS0lBVkNPRFlMU0E1M1BRSzRaQSUyRjIwMjUwNzA2JTJGdXMtZWFzdC0xJTJGczMlMkZhd3M0X3JlcXVlc3QmWC1BbXotRGF0ZT0yMDI1MDcwNlQwMzI5MjJaJlgtQW16LUV4cGlyZXM9MzAwJlgtQW16LVNpZ25hdHVyZT03NmIzOTg5OTg2NzliM2VkMmFhYWNkNWRmYmU5NDU5ZWViZTM0NGE5NWMwY2U1NzA4MDk4M2FiYjQ1ZjEzY2NkJlgtQW16LVNpZ25lZEhlYWRlcnM9aG9zdCJ9.P56Oj5QdDq5Yr98KkqtAdSxBLwjwKdDJn0I4qTbGW0E";>
 <img alt="Screenshot 2025-07-05 at 5 06 41 PM" width="722" 
src="https://private-user-images.githubusercontent.com/490673/462836293-07e36677-46a6-4f2e
 
-8b8a-c5eab9545167.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3NTE3NzI4NjIsIm5iZiI6MTc1MTc3MjU2MiwicGF0aCI6Ii80OTA2NzMvNDYyODM2MjkzLTA3ZTM2Njc3LTQ2YTYtNGYyZS04YjhhLWM1ZWFiOTU0NTE2Ny5wbmc_WC1BbXotQWxnb3JpdGhtPUFXUzQtSE1BQy1TSEEyNTYmWC1BbXotQ3JlZGVudGlhbD1BS0lBVkNPRFlMU0E1M1BRSzRaQSUyRjIwMjUwNzA2JTJGdXMtZWFzdC0xJTJGczMlMkZhd3M0X3JlcXVlc3QmWC1BbXotRGF0ZT0yMDI1MDcwNlQwMzI5MjJaJlgtQW16LUV4cGlyZXM9MzAwJlgtQW16LVNpZ25hdHVyZT0zNWU2NjE0NDBiNmJjZWZjMzQ0ZTAxYThjNjIyNGE4NjhiYmExMTM5Y2MyNDdiMTNlNmU3NzU2ZjA0ZjJjNjQ1JlgtQW16LVNpZ25lZEhlYWRlcnM9aG9zdCJ9.QFx5p_md3rqLd8SgHPea0zLaBNvmQpE-VdN--QdSlTY">
   > @JigaoLuo the outlook section you describe sounds great. I envision it 
right after the
   > 
   > ```
   > ## 1. Parquet 101: File Anatomy & Standard Index Structures
   > ```
   > 
   > Section
   > 
   > Perhaps like
   > 
   > ```
   > ## 2. Extending Parquet with Special Indexes
   > ```
   > 
   > (this is where figure 2 goes and where we will explain how to embed a 
custom index). So it makes a lot of sense to mention here the potential 
usecases (and that the index can be written after each row group or at the end 
of the file, and it can have information for each row group, individual row 
groups, columns, etc, whatever you want
   > 
   > I would also be interested to hear what @zhuqi-lucas thinks
   
   Amazing work thank you @alamb.
   
   
   
   > Regarding my impression during reading: **"the Embedded Index is just a 
hashset to speed up scans, which adds overhead to Parquet."** as mentioned as a 
follow-up here: [#79 
(comment)](https://github.com/apache/datafusion-site/pull/79#discussion_r2186572247)
   > 
   > If other readers also has the same impression, it might unintentionally 
limit how readers perceive its potential of the Embedded Index. To address 
this, we could consider adding **a short Outlook section** (either at the 
beginning or the end of the blog) to explicitly highlight what the Embedded 
Index is capable of. It’s not just a hashset for pruning; in principle, it 
could support a wide range of use cases. Use cases are also discussed here: 
[apache/datafusion#16374 
(comment)](https://github.com/apache/datafusion/issues/16374#issuecomment-3039796047)
   > 
   > I’d be happy to help draft such an Outlook section, pending confirmation 
from your side.
   
   Looks great @JigaoLuo ! Feel free to add it, thank you!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to