Re: [I] Implement (optional) distinct count population in Parquet statistics [arrow-rs]

via GitHub Mon, 20 Apr 2026 17:27:18 -0700


JanKaul commented on issue #8608:
URL: https://github.com/apache/arrow-rs/issues/8608#issuecomment-4285115654


   Sorry, I didn't have time lately to look into this. But I would be very 
interested to somehow provide distinct count estimates for parquet files, 
mostly for Int64 columns.
   
   If one would use distinct estimates as part of the key=value metadata, we 
would first need to agree on a standard representation of these estimates. 
Ideally this would be the same across different language implementations. 
That's why I thought it would be easier to use the "distinct_count" field. But 
I get it's not the cleanest approach.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [I] Implement (optional) distinct count population in Parquet statistics [arrow-rs]

Reply via email to