Re: [I] [Bug] ColStatsData.isValid() falsely rejects sampled column statistics when a column is (almost) all NULL [doris]

via GitHub Tue, 16 Jun 2026 05:33:09 -0700


MilanTyagi2004 commented on issue #64122:
URL: https://github.com/apache/doris/issues/64122#issuecomment-4718766782


   Thanks for the clarification.
   
   My current PR uses ndv normalization (0 -> 1), but based on the feedback 
from @englefly and @morrySnow, I understand that this is not the preferred 
direction because it may affect cardinality estimation and plan quality.
   
   Before revising the implementation, could you please clarify the expected 
fix path?
   
   My understanding is:
   
   1. Statistics collection should not fail for this pattern.
   2. The statistics can still be written into the statistics table.
   3. When these statistics are consumed later, they should be treated as 
UNKNOWN rather than being used directly.
   
   Could you point me to the preferred location for handling this conversion to 
UNKNOWN (for example during ColumnStatistic construction/loading, 
StatisticsUtil, or another layer)?
   
   I would like to align the implementation with the intended design before 
updating the PR.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [I] [Bug] ColStatsData.isValid() falsely rejects sampled column statistics when a column is (almost) all NULL [doris]

Reply via email to