mapleFU commented on PR #37016:
URL: https://github.com/apache/arrow/pull/37016#issuecomment-1664963461

   Personally, I thin `distinct_count` might only usable when using 
**dictionary encoding**. And using it would be bug-prune.
   
   Before my patch https://github.com/apache/arrow/pull/35989/files , the 
`distinct_count` is directly added with another. So I think the distinct_count 
is always misused.
   
   I think your commit is reasonable, but also I think using `distinct_count` 
is a bit dangerous. Maybe add a `SetDistinctCount` and maintaining it outside 
would be better? Personally, I use `SetDistinctCount` when building the 
statistics of a dictionary-encoded non-fallbacked ColumnChunk
   
   Also cc @pitrou @wgtmac 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to