Hi, Vladimir ~

In ReflectiveSchema, Statistics of FieldTable is given as UNKNOWN[1][2].
When reading a table's row count, if no statistics given, a default value
of 100 will be returned [3] -- this is relatively a bigger value compared
with the fields defined in HRFKUKSchema.
When a materialized view gets matched, view-sql is executed and the values
are wrapped in an ArrayTable and accurate row count is given [4].
So I'm not sure when a materialized view containing JOIN gets matched but
cannot help reduce cost of the plan.

HRFKUKSchema is only used in MaterializationTest. There's no existing test
checking content of the query result. Most of them checks whether same
results are returned no matter if materialized view is used or not. If we
add rows to existing emps table, how can tests be invalidated ?

Best,
Jin

[1]
https://github.com/apache/calcite/blob/master/core/src/main/java/org/apache/calcite/adapter/java/ReflectiveSchema.java#L369
[2]
https://github.com/apache/calcite/blob/master/core/src/main/java/org/apache/calcite/schema/Statistics.java#L40
[3]
https://github.com/apache/calcite/blob/master/core/src/main/java/org/apache/calcite/prepare/RelOptTableImpl.java#L239
[4]
https://github.com/apache/calcite/blob/master/core/src/main/java/org/apache/calcite/adapter/clone/ArrayTable.java#L82

Vladimir Sitnikov <sitnikov.vladi...@gmail.com> 于2020年1月4日周六 上午4:21写道:

> Hi,
>
> It looks like MaterializationTest heavily relies on inaccurate statistics
> for hr.emps and hr.depts tables.
>
> I was trying to improve statistic estimation for better join planning (see
> https://github.com/apache/calcite/pull/1712 ),
> and it looks like better estimates open the eyes of the optimizer, and now
> it realizes it does not really need to use materialized view
> for 4-row long table.
>
> In other words, the cost of the table access is more-or-less the same as
> the cost of the materialized view access.
>
> It looks like the way to go here is to add hr_with_extra_rows scheme so it
> contains the same emps and depts tables, but it should
> have bigger tables.
> Adding rows to the existing emps table is not an option because it would
> invalidate lots of tests.
>
> Does anybody have better ideas?
>
> Vladimir
>

Reply via email to