Hi Dev,

I am not really sure if it is feasible to start this discussion. According
to the contribution guidelines, dev ml is the right place to reach
consensus.

In ColumnStats, Currently ndv, which stands for "number of distinct
values", is used. First of all, it is difficult to understand the meaning
with the abbreviation. Second, it might be good to use a professional
naming instead.



Suggestion:

replace ndv with granularityNumber:



The good news, afaik, is that the method getNdv() hasn't been used within
Flink which means the renaming will have very limited impact.



ColumnStats {

/** number of distinct values. */

@Deprecated
private final Long ndv;



/**Granularity refers to the level of details used to sort and separate
data at column level. Highly granular data is categorized or separated very
precisely. For example, the granularity number of gender columns should
normally be 2. The granularity number of the month column will be 12. In
the SQL world, it means the number of distinct values. */

private final Long granularityNumber;



@Deprecated
public Long getNdv()
{ return ndv; }



public Long getGranularityNumber()
{ return granularityNumber; }
}

Best regards,
-- 

Jing

Reply via email to