Github user jianqiao commented on a diff in the pull request:
https://github.com/apache/incubator-quickstep/pull/332#discussion_r170369358
--- Diff: query_optimizer/cost_model/StarSchemaSimpleCostModel.cpp ---
@@ -493,7 +493,7 @@ std::size_t
StarSchemaSimpleCostModel::getNumDistinctValues(
return stat.getNumDistinctValues(rel_attr_id);
}
}
- return estimateCardinalityForTableReference(table_reference);
+ return estimateCardinalityForTableReference(table_reference) * 0.1;
--- End diff --
This estimation ratio can be any decimal number that is not close to `1` --
in that case the optimizer would choose bad plans in some situations as the
column appears to have "unique" values.
`0.1` tends to be a reasonable choice -- we may also have `0.05`, `0.2`,
etc., which can be adjusted later when there are actual demands.
---