date:20200104

[jira] [Created] (CALCITE-3706) IMPLEMENT POWEROFTWOF FUNCTION

2020-01-04 Thread Forward Xu (Jira)

Forward Xu created CALCITE-3706:
---

 Summary: IMPLEMENT POWEROFTWOF FUNCTION 
 Key: CALCITE-3706
 URL: https://issues.apache.org/jira/browse/CALCITE-3706
 Project: Calcite
  Issue Type: Improvement
Reporter: Forward Xu
Assignee: Forward Xu


IMPLEMENT POWEROFTWOF FUNCTION



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (CALCITE-3705) IMPLEMENT POWEROFTWOD FUNCTION

2020-01-04 Thread Forward Xu (Jira)

Forward Xu created CALCITE-3705:
---

 Summary: IMPLEMENT POWEROFTWOD FUNCTION
 Key: CALCITE-3705
 URL: https://issues.apache.org/jira/browse/CALCITE-3705
 Project: Calcite
  Issue Type: Improvement
Reporter: Forward Xu
Assignee: Forward Xu


IMPLEMENT POWEROFTWOD FUNCTION



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (CALCITE-3704) IMPLEMENT TWOTOTHEDOUBLESCALEDOWN FUNCTION

2020-01-04 Thread Forward Xu (Jira)

Forward Xu created CALCITE-3704:
---

 Summary: IMPLEMENT TWOTOTHEDOUBLESCALEDOWN FUNCTION
 Key: CALCITE-3704
 URL: https://issues.apache.org/jira/browse/CALCITE-3704
 Project: Calcite
  Issue Type: Improvement
Reporter: Forward Xu
Assignee: Forward Xu


IMPLEMENT TWOTOTHEDOUBLESCALEDOWN FUNCTION



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (CALCITE-3703) IMPLEMENT TWOTOTHEDOUBLESCALEUP FUNCTION

2020-01-04 Thread Forward Xu (Jira)

Forward Xu created CALCITE-3703:
---

 Summary: IMPLEMENT TWOTOTHEDOUBLESCALEUP FUNCTION
 Key: CALCITE-3703
 URL: https://issues.apache.org/jira/browse/CALCITE-3703
 Project: Calcite
  Issue Type: Improvement
Reporter: Forward Xu
Assignee: Forward Xu


IMPLEMENT TWOTOTHEDOUBLESCALEUP FUNCTION



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (CALCITE-3701) IMPLEMENT NEXTDOWN FUNCTION

2020-01-04 Thread Forward Xu (Jira)

Forward Xu created CALCITE-3701:
---

 Summary: IMPLEMENT NEXTDOWN FUNCTION
 Key: CALCITE-3701
 URL: https://issues.apache.org/jira/browse/CALCITE-3701
 Project: Calcite
  Issue Type: Improvement
Reporter: Forward Xu
Assignee: Forward Xu


IMPLEMENT NEXTDOWN FUNCTION



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (CALCITE-3702) IMPLEMENT SCALB FUNCTION

2020-01-04 Thread Forward Xu (Jira)

Forward Xu created CALCITE-3702:
---

 Summary: IMPLEMENT SCALB FUNCTION
 Key: CALCITE-3702
 URL: https://issues.apache.org/jira/browse/CALCITE-3702
 Project: Calcite
  Issue Type: Improvement
Reporter: Forward Xu
Assignee: Forward Xu


IMPLEMENT SCALB FUNCTION



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (CALCITE-3700) IMPLEMENT NEXTUP FUNCTION

2020-01-04 Thread Forward Xu (Jira)

Forward Xu created CALCITE-3700:
---

 Summary: IMPLEMENT NEXTUP FUNCTION
 Key: CALCITE-3700
 URL: https://issues.apache.org/jira/browse/CALCITE-3700
 Project: Calcite
  Issue Type: Improvement
Reporter: Forward Xu
Assignee: Forward Xu


IMPLEMENT NEXTUP FUNCTION



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (CALCITE-3699) IMPLEMENT NEXTAFTER FUNCTION

2020-01-04 Thread Forward Xu (Jira)

Forward Xu created CALCITE-3699:
---

 Summary: IMPLEMENT NEXTAFTER FUNCTION
 Key: CALCITE-3699
 URL: https://issues.apache.org/jira/browse/CALCITE-3699
 Project: Calcite
  Issue Type: Improvement
Reporter: Forward Xu
Assignee: Forward Xu


IMPLEMENT NEXTAFTER FUNCTION



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (CALCITE-3698) IMPLEMENT GETEXPONENT FUNCTION

2020-01-04 Thread Forward Xu (Jira)

Forward Xu created CALCITE-3698:
---

 Summary: IMPLEMENT GETEXPONENT FUNCTION
 Key: CALCITE-3698
 URL: https://issues.apache.org/jira/browse/CALCITE-3698
 Project: Calcite
  Issue Type: Improvement
Reporter: Forward Xu
Assignee: Forward Xu


IMPLEMENT GETEXPONENT FUNCTION



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (CALCITE-3696) IMPLEMENT HYPOT FUNCTION

2020-01-04 Thread Forward Xu (Jira)

Forward Xu created CALCITE-3696:
---

 Summary: IMPLEMENT HYPOT FUNCTION
 Key: CALCITE-3696
 URL: https://issues.apache.org/jira/browse/CALCITE-3696
 Project: Calcite
  Issue Type: Improvement
Reporter: Forward Xu
Assignee: Forward Xu


IMPLEMENT HYPOT FUNCTION



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (CALCITE-3697) IMPLEMENT COPYSIGN FUNCTION

2020-01-04 Thread Forward Xu (Jira)

Forward Xu created CALCITE-3697:
---

 Summary: IMPLEMENT COPYSIGN FUNCTION
 Key: CALCITE-3697
 URL: https://issues.apache.org/jira/browse/CALCITE-3697
 Project: Calcite
  Issue Type: Improvement
Reporter: Forward Xu
Assignee: Forward Xu


IMPLEMENT COPYSIGN FUNCTION



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (CALCITE-3695) IMPLEMENT TANH FUNCTION

2020-01-04 Thread Forward Xu (Jira)

Forward Xu created CALCITE-3695:
---

 Summary: IMPLEMENT TANH FUNCTION
 Key: CALCITE-3695
 URL: https://issues.apache.org/jira/browse/CALCITE-3695
 Project: Calcite
  Issue Type: Improvement
Reporter: Forward Xu
Assignee: Forward Xu


IMPLEMENT TANH FUNCTION



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (CALCITE-3694) IMPLEMENT SINH FUNCTION

2020-01-04 Thread Forward Xu (Jira)

Forward Xu created CALCITE-3694:
---

 Summary: IMPLEMENT SINH FUNCTION
 Key: CALCITE-3694
 URL: https://issues.apache.org/jira/browse/CALCITE-3694
 Project: Calcite
  Issue Type: Improvement
Reporter: Forward Xu
Assignee: Forward Xu


IMPLEMENT SINH FUNCTION



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (CALCITE-3693) IMPLEMENT ULP FUNCTION

2020-01-04 Thread Forward Xu (Jira)

Forward Xu created CALCITE-3693:
---

 Summary: IMPLEMENT ULP FUNCTION
 Key: CALCITE-3693
 URL: https://issues.apache.org/jira/browse/CALCITE-3693
 Project: Calcite
  Issue Type: Improvement
Reporter: Forward Xu
Assignee: Forward Xu


IMPLEMENT ULP FUNCTION



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (CALCITE-3691) IMPLEMENT NEGATEEXACT FUNCTION

2020-01-04 Thread Forward Xu (Jira)

Forward Xu created CALCITE-3691:
---

 Summary: IMPLEMENT NEGATEEXACT FUNCTION
 Key: CALCITE-3691
 URL: https://issues.apache.org/jira/browse/CALCITE-3691
 Project: Calcite
  Issue Type: Improvement
Reporter: Forward Xu
Assignee: Forward Xu


IMPLEMENT NEGATEEXACT FUNCTION



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (CALCITE-3692) IMPLEMENT TOINTEXACT FUNCTION

2020-01-04 Thread Forward Xu (Jira)

Forward Xu created CALCITE-3692:
---

 Summary: IMPLEMENT TOINTEXACT FUNCTION
 Key: CALCITE-3692
 URL: https://issues.apache.org/jira/browse/CALCITE-3692
 Project: Calcite
  Issue Type: Improvement
Reporter: Forward Xu
Assignee: Forward Xu


IMPLEMENT TOINTEXACT FUNCTION



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (CALCITE-3690) IMPLEMENT DECREMENTEXACT FUNCTION

2020-01-04 Thread Forward Xu (Jira)

Forward Xu created CALCITE-3690:
---

 Summary: IMPLEMENT DECREMENTEXACT FUNCTION
 Key: CALCITE-3690
 URL: https://issues.apache.org/jira/browse/CALCITE-3690
 Project: Calcite
  Issue Type: Improvement
Reporter: Forward Xu
Assignee: Forward Xu


IMPLEMENT DECREMENTEXACT FUNCTION



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (CALCITE-3689) IMPLEMENT INCREMENTEXACT FUNCTION

2020-01-04 Thread Forward Xu (Jira)

Forward Xu created CALCITE-3689:
---

 Summary: IMPLEMENT INCREMENTEXACT FUNCTION
 Key: CALCITE-3689
 URL: https://issues.apache.org/jira/browse/CALCITE-3689
 Project: Calcite
  Issue Type: Improvement
Reporter: Forward Xu
Assignee: Forward Xu


IMPLEMENT INCREMENTEXACT FUNCTION



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (CALCITE-3688) IMPLEMENT MULTIPLYEXACT FUNCTION

2020-01-04 Thread Forward Xu (Jira)

Forward Xu created CALCITE-3688:
---

 Summary: IMPLEMENT MULTIPLYEXACT FUNCTION
 Key: CALCITE-3688
 URL: https://issues.apache.org/jira/browse/CALCITE-3688
 Project: Calcite
  Issue Type: Improvement
Reporter: Forward Xu
Assignee: Forward Xu


IMPLEMENT MULTIPLYEXACT FUNCTION



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (CALCITE-3687) IMPLEMENT SUBTRACTEXACT FUNCTION

2020-01-04 Thread Forward Xu (Jira)

Forward Xu created CALCITE-3687:
---

 Summary: IMPLEMENT SUBTRACTEXACT FUNCTION
 Key: CALCITE-3687
 URL: https://issues.apache.org/jira/browse/CALCITE-3687
 Project: Calcite
  Issue Type: Improvement
Reporter: Forward Xu
Assignee: Forward Xu


IMPLEMENT SUBTRACTEXACT FUNCTION



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (CALCITE-3686) IMPLEMENT ADDEXACT FUNCTION

2020-01-04 Thread Forward Xu (Jira)

Forward Xu created CALCITE-3686:
---

 Summary: IMPLEMENT ADDEXACT FUNCTION
 Key: CALCITE-3686
 URL: https://issues.apache.org/jira/browse/CALCITE-3686
 Project: Calcite
  Issue Type: Improvement
Reporter: Forward Xu
Assignee: Forward Xu


IMPLEMENT ADDEXACT FUNCTION



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (CALCITE-3685) IMPLEMENT RINT FUNCTION

2020-01-04 Thread Forward Xu (Jira)

Forward Xu created CALCITE-3685:
---

 Summary: IMPLEMENT RINT FUNCTION
 Key: CALCITE-3685
 URL: https://issues.apache.org/jira/browse/CALCITE-3685
 Project: Calcite
  Issue Type: Improvement
Reporter: Forward Xu
Assignee: Forward Xu


IMPLEMENT RINT FUNCTION



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (CALCITE-3684) IMPLEMENT CBRT FUNCTION

2020-01-04 Thread Forward Xu (Jira)

Forward Xu created CALCITE-3684:
---

 Summary: IMPLEMENT CBRT FUNCTION
 Key: CALCITE-3684
 URL: https://issues.apache.org/jira/browse/CALCITE-3684
 Project: Calcite
  Issue Type: Improvement
Reporter: Forward Xu
Assignee: Forward Xu


IMPLEMENT CBRT FUNCTION.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (CALCITE-3683) Enhanced MATH Function

2020-01-04 Thread Forward Xu (Jira)

Forward Xu created CALCITE-3683:
---

 Summary: Enhanced MATH Function
 Key: CALCITE-3683
 URL: https://issues.apache.org/jira/browse/CALCITE-3683
 Project: Calcite
  Issue Type: Improvement
  Components: core
Reporter: Forward Xu






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Re: [DISCUSS] Revert [CALCITE-1842] Sort.computeSelfCost() calls makeCost() with arguments in wrong order

2020-01-04 Thread Vladimir Sitnikov

>I think we should try to make our cost
estimations more realistic in terms of cpu and io and don't try to put
everything in rows as it is the case for various operators.
This of course requires the VolcanoCost to be adapted.

Well. The more I revise costs, the more I incline to that opinion as well.

It looks the comparison should be based on cpu+io*CPU_PER_IO. Then 'rows'
becomes obsolete :-/

It seems rows duplicates rowCount metadata, thus I am inclined to hide
'rows' from toString output. Probably, toString should show
cpu+io*CPU_PER_IO result.

Vladimir

Calcite-Master - Build # 1540 - Still Failing

2020-01-04 Thread Apache Jenkins Server

The Apache Jenkins build system has built Calcite-Master (build #1540)

Status: Still Failing

Check console output at https://builds.apache.org/job/Calcite-Master/1540/ to 
view the results.

Re: [DISCUSS] Revert [CALCITE-1842] Sort.computeSelfCost() calls makeCost() with arguments in wrong order

2020-01-04 Thread Stamatis Zampetakis

Hi Vladimir,

I think we should leave it as it was.

The fact that VolcanoCost does not exploit the cpu and io information is a
problem of that class and not of the Sort#computeSelfCost method. Note that
the VolcanoPlanner can be configured with a RelOptCostFactory which means
that people who use the planner may use another implementation of
RelOptCost which does take into account cpu and io metrics.

Regarding the more general question, I think we should try to make our cost
estimations more realistic in terms of cpu and io and don't try to put
everything in rows as it is the case for various operators.
This of course requires the VolcanoCost to be adapted.

Best,
Stamatis

On Sun, Dec 29, 2019 at 7:33 PM Vladimir Sitnikov <
sitnikov.vladi...@gmail.com> wrote:

> Hi,
>
> I'm inclined to revert
>
> https://github.com/apache/calcite/commit/48a20668647b5a5e86073ef0e9ce206669ad6867
> Motivation can be found in
>
> https://issues.apache.org/jira/browse/CALCITE-1842?focusedCommentId=17004696=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17004696
>
> WDYT?
>
> The question there is Sort#computeSelfCost
> We have (rows, cpu, io) cost fields, however, most of the time we use just
> **rows** to represent the costing.
> For instance, EnumerableHashJoin computes the cost and returns (rows, 0,
> 0).
>
> CALCITE-1842 adjusted Sort costing so it moved NLogN to cpu field, and it
> makes the sorting virtually free
> because the current Volcano is using rows field only when comparing the
> costs.
>
> Unfortunately, CALCITE-1842 has no tests, so I don't really see what was
> the problem.
>
> Vladimir
>

Re: [DISCUSS] CALCITE-3656, 3657, 1842: cost improvements, cost units

2020-01-04 Thread Vladimir Sitnikov

Technically speaking, single-block read time for HDDs is pretty much
stable, so the use of seconds might be not that bad.
However, it seconds might be complicated to measure CPU-like activity (e.g.
different machines might execute EnumerableJoin at different rate :( )


What if we benchmark a trivial EnumerableCalc(EnumerableTableScan) for a
table of 100 rows and 10 columns
and call it a single cost unit?

In other words, we could have an etalon benchmark that takes X seconds and
we could call it a single cost unit.

For instance, org.apache.calcite.rel.core.Sort#computeSelfCost returns a
cost.
Of course, it has NLogN assumption, but which multiplier should it use?

One could measure the wallclock time for the sort, and divide it by the
time it takes to execute the etalon cost benchmark.

WDYT?

Vladimir

[DISCUSS] MaterializationTest#testAggregateMaterializationOnCountDistinctQuery1 is very fragile

2020-01-04 Thread Vladimir Sitnikov

Hi,

It looks like testAggregateMaterializationOnCountDistinctQuery1 is invalid.

The test creates materialization for
select deptno, empid, salary from emps group by deptno, empid, salary

Then it issues the SQL:

select deptno, count(distinct empid) as c from (
select deptno, empid
from emps
group by deptno, empid
group by deptno


The expected plan is
EnumerableAggregate(group=[{0}], C=[COUNT($1)])
  EnumerableTableScan(table=[[hr, m0]]

However, that does not work if the optimizer knows emps.empid is a unique
key for the table.
The materialized view is created as "select deptno, empid, salary from
emps" (because grouping is not needed),
and the materialized view loses uniqueness information, thus it can't
effectively use the materialized view later (see
https://issues.apache.org/jira/browse/CALCITE-3682 ).

I'm inclined to either disable the test or remove empid from grouping
column.
However, if I remove empid, then distinct should probably be removed as
well.

Any thoughts?

Vladimir

[jira] [Created] (CALCITE-3682) MaterializationService#defineMaterialization loses information on unique keys

2020-01-04 Thread Vladimir Sitnikov (Jira)

Vladimir Sitnikov created CALCITE-3682:
--

 Summary: MaterializationService#defineMaterialization loses 
information on unique keys
 Key: CALCITE-3682
 URL: https://issues.apache.org/jira/browse/CALCITE-3682
 Project: Calcite
  Issue Type: Bug
  Components: core
Affects Versions: 1.21.0
Reporter: Vladimir Sitnikov


The impacted test is testAggregateMaterializationOnCountDistinctQuery1

The test defines materialized view for the following SQL:
{code:sql}select deptno, empid, salary from emps group by deptno, empid, 
salary{code}

In practice, the optimizer might be able to tell that empid is a unique key, 
thus it could understand the grouping is not needed.
However, when it defines a materialized view, it loses uniqueness information, 
so it declares the view as

{code:sql}select deptno, empid, salary from emps{code}
and the uniqueness is not there.

org.apache.calcite.materialize.MaterializationService.DefaultTableFactory 
should probably compute metadata (e.g. unique keys, something else?) and 
propagate it to the materialized view.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Re: [DISCUSS] CALCITE-3656, 3657, 1842: cost improvements, cost units

2020-01-04 Thread Michael Mior

I understand the cost doesn't have to match actual execution duration
and it doesn't really matter if it does as long as we can get the
relative ordering of plans roughly similar. That's why I'm suggesting
not calling the cost seconds, even if we are trying to roughly
approximate them. But I don't feel that strongly about this.
--
Michael Mior
mm...@apache.org

Le sam. 4 janv. 2020 à 14:18, Vladimir Sitnikov
 a écrit :
>
> Michael>although I would be hesitant to refer to "seconds"
>
> Do you have better ideas?
> If my memory serves me well, PostgreSQL uses seconds as well for cost units.
> OracleDB is using "singleblock read" for the cost unit.
>
> Michael>how long execution will take on any particular system
>
> The idea for the cost is to be able to compare different plans.
> The cost does not have to match the actual execution duration.
>
> Then there might be tuning knobs like "how much seconds a single io lasts".
>
> Vladimir

Re: [DISCUSS] CALCITE-3656, 3657, 1842: cost improvements, cost units

2020-01-04 Thread Vladimir Sitnikov

Michael>although I would be hesitant to refer to "seconds"

Do you have better ideas?
If my memory serves me well, PostgreSQL uses seconds as well for cost units.
OracleDB is using "singleblock read" for the cost unit.

Michael>how long execution will take on any particular system

The idea for the cost is to be able to compare different plans.
The cost does not have to match the actual execution duration.

Then there might be tuning knobs like "how much seconds a single io lasts".

Vladimir

Re: [DISCUSS] CALCITE-3656, 3657, 1842: cost improvements, cost units

2020-01-04 Thread Michael Mior

A cost unit sounds fine to me, although I would be hesitant to refer
to "seconds" or other concrete measurements since there's no easy way
to guess how long execution will take on any particular system.
--
Michael Mior
mm...@apache.org

Le sam. 4 janv. 2020 à 10:56, Vladimir Sitnikov
 a écrit :
>
> Hi,
>
> I've spent some time on stabilizing the costs (see
> https://github.com/apache/calcite/pull/1702/commits ), and it looks like we
> might want to have some notion of "cost unit".
>
> For instance, we want to express that sorting table with 2 int columns is
> cheaper than sorting table with 22 int columns.
> Unfortunately, VolcanoCost is compared by rows field only, so, for now, we
> express the number of fields into the cost#rows field by adding something
> like "cost += fieldCount * 0.1" :(
>
> Of course, you might say that cost is pluggable, however, I would like to
> make the default implementation sane.
> At least it should be good enough for inspirational purposes.
>
> What do you think if add a way to convert Cost to double?
> For instance, we can add measurer.measure(cost) that would weight the cost
> or we can add method like `double cost#toSeconds()`.
>
> I guess, if we add a unit (e.g. microsecond), then we could even
> micro-benchmark different join implementations, and use the appropriate
> cost values
> for extra columns and so on.
>
> I fully understand that the cost does not have to be precise, however, it
> is sad to guestimate the multipliers for an extra field in projection.
>
>
>
> Just to recap:
> 1) I've started with making tests parallel <-- this was the main goal
> 2) Then I run into EnumerableJoinTest#testSortMergeJoinWithEquiCondition
> which was using static variables
> 3) As I fixed EnumerableMergeJoinRule, it turned out the optimizer started
> to use merge join all over the place
> 4) It was caused by inappropriate costing of Sort, which I fixed
> 5) Then I updated the cost functions of EnumerableHashJoin and
> EnumerableNestedLoopJoin, and it was not enough because ReflectiveSchema
> was not providing the proper statistics
> 6) Then I updated ReflectiveSchema and Calc to propagate uniqueness and
> rowcount metadata.
>
> All of the above seems to be more-or-less stable (the plans improved!), and
> the failing tests, for now, are MaterializationTest.
>
> The problems with those tests are the cost differences between NestedLoop
> and HashJoin are tiny.
> For instance:
> testJoinMaterialization8
>
> EnumerableProject(empid=[$2]): rowcount = 6.6, cumulative cost =
> {105.19 rows, 82.6 cpu, 0.0 io}, id = 780
>   EnumerableHashJoin(condition=[=($1, $4)], joinType=[inner]): rowcount =
> 6.6, cumulative cost = {98.6 rows, 76.0 cpu, 0.0 io}, id = 779
> EnumerableProject(name=[$0], name0=[CAST($0):VARCHAR]): rowcount =
> 22.0, cumulative cost = {44.0 rows, 67.0 cpu, 0.0 io}, id = 777
>   EnumerableTableScan(table=[[hr, m0]]): rowcount = 22.0, cumulative
> cost = {22.0 rows, 23.0 cpu, 0.0 io}, id = 152
> EnumerableProject(empid=[$0], name=[$1], name0=[CAST($1):VARCHAR]):
> rowcount = 2.0, cumulative cost = {4.0 rows, 9.0 cpu, 0.0 io}, id = 778
>   EnumerableTableScan(table=[[hr, dependents]]): rowcount = 2.0,
> cumulative cost = {2.0 rows, 3.0 cpu, 0.0 io}, id = 125
>
> vs
>
> EnumerableProject(empid=[$0]): rowcount = 6.6, cumulative cost =
> {81.19 rows, 55.6 cpu, 0.0 io}, id = 778
>   EnumerableNestedLoopJoin(condition=[=(CAST($1):VARCHAR,
> CAST($2):VARCHAR)], joinType=[inner]): rowcount = 6.6, cumulative cost =
> {74.6 rows, 49.0 cpu, 0.0 io}, id = 777
> EnumerableTableScan(table=[[hr, dependents]]): rowcount = 2.0,
> cumulative cost = {2.0 rows, 3.0 cpu, 0.0 io}, id = 125
> EnumerableTableScan(table=[[hr, m0]]): rowcount = 22.0, cumulative cost
> = {22.0 rows, 23.0 cpu, 0.0 io}, id = 152
>
> The second plan looks cheaper to the optimizer, however, the key difference
> comes from three projects in the first plan (project account for
> 6.6+22+2=30.6 cost).
> If I increase hr.dependents table to 3 rows, then hash-based plan becomes
> cheaper.
>
> As for me both plans looks acceptable, however, it is sad to analyze/debug
> those differences without being able to tell if that is a plan degradation
> or if it is acceptable.
>
> Vladimir

Re: [DISCUSS] CALCITE-3661, CALCITE-3665, MaterializationTest vs HR schema statistics

2020-01-04 Thread Vladimir Sitnikov

Jin>In ReflectiveSchema, Statistics of FieldTable is given as UNKNOWN[1][2].

Please check[CALCITE-3661] Derive rowCount statistics for tables in
ReflectiveSchema that are based on arrays/collections
and [CALCITE-3680] Add ability to express unique constraints in
ReflectiveSchema
commits in https://github.com/apache/calcite/pull/1702/commits

The commits enable the optimizer to see the proper row count for tables in
ReflectiveSchema.

Vladimir

Re: [DISCUSS] CALCITE-3661, CALCITE-3665, MaterializationTest vs HR schema statistics

2020-01-04 Thread XING JIN

Hi, Vladimir ~

In ReflectiveSchema, Statistics of FieldTable is given as UNKNOWN[1][2].
When reading a table's row count, if no statistics given, a default value
of 100 will be returned [3] -- this is relatively a bigger value compared
with the fields defined in HRFKUKSchema.
When a materialized view gets matched, view-sql is executed and the values
are wrapped in an ArrayTable and accurate row count is given [4].
So I'm not sure when a materialized view containing JOIN gets matched but
cannot help reduce cost of the plan.

HRFKUKSchema is only used in MaterializationTest. There's no existing test
checking content of the query result. Most of them checks whether same
results are returned no matter if materialized view is used or not. If we
add rows to existing emps table, how can tests be invalidated ?

Best,
Jin

[1]
https://github.com/apache/calcite/blob/master/core/src/main/java/org/apache/calcite/adapter/java/ReflectiveSchema.java#L369
[2]
https://github.com/apache/calcite/blob/master/core/src/main/java/org/apache/calcite/schema/Statistics.java#L40
[3]
https://github.com/apache/calcite/blob/master/core/src/main/java/org/apache/calcite/prepare/RelOptTableImpl.java#L239
[4]
https://github.com/apache/calcite/blob/master/core/src/main/java/org/apache/calcite/adapter/clone/ArrayTable.java#L82

Vladimir Sitnikov  于2020年1月4日周六 上午4:21写道：

> Hi,
>
> It looks like MaterializationTest heavily relies on inaccurate statistics
> for hr.emps and hr.depts tables.
>
> I was trying to improve statistic estimation for better join planning (see
> https://github.com/apache/calcite/pull/1712 ),
> and it looks like better estimates open the eyes of the optimizer, and now
> it realizes it does not really need to use materialized view
> for 4-row long table.
>
> In other words, the cost of the table access is more-or-less the same as
> the cost of the materialized view access.
>
> It looks like the way to go here is to add hr_with_extra_rows scheme so it
> contains the same emps and depts tables, but it should
> have bigger tables.
> Adding rows to the existing emps table is not an option because it would
> invalidate lots of tests.
>
> Does anybody have better ideas?
>
> Vladimir
>

[jira] [Created] (CALCITE-3681) Refine RelMdColumnUniqueness and RelMdRowCount for Aggregate

2020-01-04 Thread Vladimir Sitnikov (Jira)

Vladimir Sitnikov created CALCITE-3681:
--

 Summary: Refine RelMdColumnUniqueness and RelMdRowCount for 
Aggregate
 Key: CALCITE-3681
 URL: https://issues.apache.org/jira/browse/CALCITE-3681
 Project: Calcite
  Issue Type: Improvement
  Components: core
Affects Versions: 1.21.0
Reporter: Vladimir Sitnikov
Assignee: Vladimir Sitnikov


Aggregate might have grouping sets, so 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (CALCITE-3680) Add ability to express unique constraints in ReflectiveSchema

2020-01-04 Thread Vladimir Sitnikov (Jira)

Vladimir Sitnikov created CALCITE-3680:
--

 Summary: Add ability to express unique constraints in 
ReflectiveSchema
 Key: CALCITE-3680
 URL: https://issues.apache.org/jira/browse/CALCITE-3680
 Project: Calcite
  Issue Type: Bug
  Components: core
Affects Versions: 1.21.0
Reporter: Vladimir Sitnikov
Assignee: Vladimir Sitnikov


RelReferentialConstraint is there to express foreign keys, so it makes sense to 
add RelUniqueKeyConstraint to express unique keys in reflective shema.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[DISCUSS] CALCITE-3656, 3657, 1842: cost improvements, cost units

2020-01-04 Thread Vladimir Sitnikov

Hi,

I've spent some time on stabilizing the costs (see
https://github.com/apache/calcite/pull/1702/commits ), and it looks like we
might want to have some notion of "cost unit".

For instance, we want to express that sorting table with 2 int columns is
cheaper than sorting table with 22 int columns.
Unfortunately, VolcanoCost is compared by rows field only, so, for now, we
express the number of fields into the cost#rows field by adding something
like "cost += fieldCount * 0.1" :(

Of course, you might say that cost is pluggable, however, I would like to
make the default implementation sane.
At least it should be good enough for inspirational purposes.

What do you think if add a way to convert Cost to double?
For instance, we can add measurer.measure(cost) that would weight the cost
or we can add method like `double cost#toSeconds()`.

I guess, if we add a unit (e.g. microsecond), then we could even
micro-benchmark different join implementations, and use the appropriate
cost values
for extra columns and so on.

I fully understand that the cost does not have to be precise, however, it
is sad to guestimate the multipliers for an extra field in projection.



Just to recap:
1) I've started with making tests parallel <-- this was the main goal
2) Then I run into EnumerableJoinTest#testSortMergeJoinWithEquiCondition
which was using static variables
3) As I fixed EnumerableMergeJoinRule, it turned out the optimizer started
to use merge join all over the place
4) It was caused by inappropriate costing of Sort, which I fixed
5) Then I updated the cost functions of EnumerableHashJoin and
EnumerableNestedLoopJoin, and it was not enough because ReflectiveSchema
was not providing the proper statistics
6) Then I updated ReflectiveSchema and Calc to propagate uniqueness and
rowcount metadata.

All of the above seems to be more-or-less stable (the plans improved!), and
the failing tests, for now, are MaterializationTest.

The problems with those tests are the cost differences between NestedLoop
and HashJoin are tiny.
For instance:
testJoinMaterialization8

EnumerableProject(empid=[$2]): rowcount = 6.6, cumulative cost =
{105.19 rows, 82.6 cpu, 0.0 io}, id = 780
  EnumerableHashJoin(condition=[=($1, $4)], joinType=[inner]): rowcount =
6.6, cumulative cost = {98.6 rows, 76.0 cpu, 0.0 io}, id = 779
EnumerableProject(name=[$0], name0=[CAST($0):VARCHAR]): rowcount =
22.0, cumulative cost = {44.0 rows, 67.0 cpu, 0.0 io}, id = 777
  EnumerableTableScan(table=[[hr, m0]]): rowcount = 22.0, cumulative
cost = {22.0 rows, 23.0 cpu, 0.0 io}, id = 152
EnumerableProject(empid=[$0], name=[$1], name0=[CAST($1):VARCHAR]):
rowcount = 2.0, cumulative cost = {4.0 rows, 9.0 cpu, 0.0 io}, id = 778
  EnumerableTableScan(table=[[hr, dependents]]): rowcount = 2.0,
cumulative cost = {2.0 rows, 3.0 cpu, 0.0 io}, id = 125

vs

EnumerableProject(empid=[$0]): rowcount = 6.6, cumulative cost =
{81.19 rows, 55.6 cpu, 0.0 io}, id = 778
  EnumerableNestedLoopJoin(condition=[=(CAST($1):VARCHAR,
CAST($2):VARCHAR)], joinType=[inner]): rowcount = 6.6, cumulative cost =
{74.6 rows, 49.0 cpu, 0.0 io}, id = 777
EnumerableTableScan(table=[[hr, dependents]]): rowcount = 2.0,
cumulative cost = {2.0 rows, 3.0 cpu, 0.0 io}, id = 125
EnumerableTableScan(table=[[hr, m0]]): rowcount = 22.0, cumulative cost
= {22.0 rows, 23.0 cpu, 0.0 io}, id = 152

The second plan looks cheaper to the optimizer, however, the key difference
comes from three projects in the first plan (project account for
6.6+22+2=30.6 cost).
If I increase hr.dependents table to 3 rows, then hash-based plan becomes
cheaper.

As for me both plans looks acceptable, however, it is sad to analyze/debug
those differences without being able to tell if that is a plan degradation
or if it is acceptable.

Vladimir

[jira] [Created] (CALCITE-3679) Calcite to support Lamda Expressions

2020-01-04 Thread Ritesh (Jira)

Ritesh created CALCITE-3679:
---

 Summary: Calcite to support Lamda Expressions
 Key: CALCITE-3679
 URL: https://issues.apache.org/jira/browse/CALCITE-3679
 Project: Calcite
  Issue Type: New Feature
Reporter: Ritesh
Assignee: Ritesh






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (CALCITE-3678) Calcite to support map_filter function

2020-01-04 Thread Ritesh (Jira)

Ritesh created CALCITE-3678:
---

 Summary: Calcite to support map_filter function
 Key: CALCITE-3678
 URL: https://issues.apache.org/jira/browse/CALCITE-3678
 Project: Calcite
  Issue Type: New Feature
Reporter: Ritesh
Assignee: Ritesh


[https://prestodb.io/docs/current/functions/map.html]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

40 matches

Mail list logo