paul-rogers commented on a change in pull request #11549:
URL: https://github.com/apache/druid/pull/11549#discussion_r682983402



##########
File path: docs/querying/segmentmetadataquery.md
##########
@@ -144,16 +144,16 @@ Types of column analyses are described below:
 
 ### cardinality
 
-* `cardinality` in the result will return the size of the bitmap index or 
dictionary encoding for string dimensions, or null for other dimension types.
- If `merge` was set, the result will be the max of this value across segments. 
Only relevant for dimension columns.
+* `cardinality` in the result will return the number of unique values present 
in a string column. It is null for other column types.
+ If `merge` is set, the result will be the max of this value across segments. 
Only relevant for string columns.
 
 ### minmax
 
-* Estimated min/max values for each column. Only relevant for dimension 
columns.
+* Estimated min/max values for each column. Only reported for string columns.
 
 ### size
 
-* `size` in the result will contain the estimated total segment byte size as 
if the data were stored in text format
+* `size` in the result will contain the estimated total segment byte size as 
if the data were stored in text format. This is _not_ the actual storage size 
of the column in Druid.

Review comment:
       Pointer to where I might find the actual storage size? I want to know 
the amount of space the column takes so I know if it is worth the cost to store.

##########
File path: docs/querying/segmentmetadataquery.md
##########
@@ -144,16 +144,16 @@ Types of column analyses are described below:
 
 ### cardinality
 
-* `cardinality` in the result will return the size of the bitmap index or 
dictionary encoding for string dimensions, or null for other dimension types.
- If `merge` was set, the result will be the max of this value across segments. 
Only relevant for dimension columns.
+* `cardinality` in the result will return the number of unique values present 
in a string column. It is null for other column types.
+ If `merge` is set, the result will be the max of this value across segments. 
Only relevant for string columns.

Review comment:
       This is not clear to us newbies. Does "max" mean the largest number of 
any segment, or the aggregated total across segments? Both are useful: if I 
have 1M rows, and see a cardinality of 1K, that could mean 1K total, or 1K per 
segment, which says something else if each segment has 1K rows...




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to