[ https://issues.apache.org/jira/browse/CALCITE-4223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17200256#comment-17200256 ]
Julian Hyde commented on CALCITE-4223: -------------------------------------- [~Chunwei Lei], We don't need to change {{interface RelOptTable}} at all. We don't need a new {{interface ColumnStatistics}}. But we should change all of the metadata methods that deal with table scans to see whether the table has the statistics so that we can return a better result. For example: {noformat} diff --git a/core/src/main/java/org/apache/calcite/rel/metadata/RelMdSize.java b/core/src/main/java/org/apache/calcite/rel/metadata/RelMdSize.java index 458df6b34..d50e32a51 100644 --- a/core/src/main/java/org/apache/calcite/rel/metadata/RelMdSize.java +++ b/core/src/main/java/org/apache/calcite/rel/metadata/RelMdSize.java @@ -172,6 +172,11 @@ public Double averageRowSize(RelNode rel, RelMetadataQuery mq) { public List<Double> averageColumnSizes(TableScan rel, RelMetadataQuery mq) { final List<RelDataTypeField> fields = rel.getRowType().getFieldList(); + final BuiltInMetadata.Size size = + rel.getTable().unwrap(BuiltInMetadata.Size.class); + if (size != null && size.averageColumnSizes() != null) { + return size.averageColumnSizes(); + } final ImmutableList.Builder<Double> list = ImmutableList.builder(); for (RelDataTypeField field : fields) { list.add(averageTypeValueSize(field.getType())); {noformat} > Introducing column statistics to RelOptTable > -------------------------------------------- > > Key: CALCITE-4223 > URL: https://issues.apache.org/jira/browse/CALCITE-4223 > Project: Calcite > Issue Type: Improvement > Reporter: Chunwei Lei > Assignee: Chunwei Lei > Priority: Major > Labels: pull-request-available > Time Spent: 40m > Remaining Estimate: 0h > > Many systems depend on column statistics to compute more accurate stats, such > as NDV, average column size, and so on. It would be nice if Calcite can > provide such an interface. > Column statistics might include NDV, average/max column length, number of > nulls, number of trues, number of falses and so on. > What do you think? > -- This message was sent by Atlassian Jira (v8.3.4#803005)