[ 
https://issues.apache.org/jira/browse/CALCITE-4223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17200256#comment-17200256
 ] 

Julian Hyde commented on CALCITE-4223:
--------------------------------------

[~Chunwei Lei], We don't need to change {{interface RelOptTable}} at all. We 
don't need a new {{interface ColumnStatistics}}.  But we should change all of 
the metadata methods that deal with table scans to see whether the table has 
the statistics so that we can return a better result.

For example:
{noformat}
diff --git a/core/src/main/java/org/apache/calcite/rel/metadata/RelMdSize.java 
b/core/src/main/java/org/apache/calcite/rel/metadata/RelMdSize.java
index 458df6b34..d50e32a51 100644
--- a/core/src/main/java/org/apache/calcite/rel/metadata/RelMdSize.java
+++ b/core/src/main/java/org/apache/calcite/rel/metadata/RelMdSize.java
@@ -172,6 +172,11 @@ public Double averageRowSize(RelNode rel, RelMetadataQuery 
mq) {
 
   public List<Double> averageColumnSizes(TableScan rel, RelMetadataQuery mq) {
     final List<RelDataTypeField> fields = rel.getRowType().getFieldList();
+    final BuiltInMetadata.Size size =
+        rel.getTable().unwrap(BuiltInMetadata.Size.class);
+    if (size != null && size.averageColumnSizes() != null) {
+      return size.averageColumnSizes();
+    }
     final ImmutableList.Builder<Double> list = ImmutableList.builder();
     for (RelDataTypeField field : fields) {
       list.add(averageTypeValueSize(field.getType()));
{noformat}


> Introducing column statistics to RelOptTable
> --------------------------------------------
>
>                 Key: CALCITE-4223
>                 URL: https://issues.apache.org/jira/browse/CALCITE-4223
>             Project: Calcite
>          Issue Type: Improvement
>            Reporter: Chunwei Lei
>            Assignee: Chunwei Lei
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 40m
>  Remaining Estimate: 0h
>
> Many systems depend on column statistics to compute more accurate stats, such 
> as NDV, average column size, and so on. It would be nice if Calcite can 
> provide such an interface.
> Column statistics might include NDV, average/max column length, number of 
> nulls, number of trues, number of falses and so on. 
> What do you think?
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to