Github user jianqiao commented on a diff in the pull request: https://github.com/apache/incubator-quickstep/pull/332#discussion_r170369358 --- Diff: query_optimizer/cost_model/StarSchemaSimpleCostModel.cpp --- @@ -493,7 +493,7 @@ std::size_t StarSchemaSimpleCostModel::getNumDistinctValues( return stat.getNumDistinctValues(rel_attr_id); } } - return estimateCardinalityForTableReference(table_reference); + return estimateCardinalityForTableReference(table_reference) * 0.1; --- End diff -- This estimation ratio can be any decimal number that is not close to `1` -- in that case the optimizer would choose bad plans in some situations as the column appears to have "unique" values. `0.1` tends to be a reasonable choice -- we may also have `0.05`, `0.2`, etc., which can be adjusted later when there are actual demands.
---