[jira] [Commented] (CALCITE-2635) getMonotonocity is slow on wide tables

2019-01-09 Thread Julian Hyde (JIRA)


[ 
https://issues.apache.org/jira/browse/CALCITE-2635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16738613#comment-16738613
 ] 

Julian Hyde commented on CALCITE-2635:
--

I have a $350 [Intel 
NUC|https://www.amazon.com/gp/product/B01N2UMKZ5/ref=ppx_od_dt_b_detailpages00?ie=UTF8=1]
 in my home office. It is 99% idle. You're welcome to have ssh access.

> getMonotonocity is slow on wide tables
> --
>
> Key: CALCITE-2635
> URL: https://issues.apache.org/jira/browse/CALCITE-2635
> Project: Calcite
>  Issue Type: Improvement
>  Components: core
>Reporter: Gian Merlino
>Assignee: Gian Merlino
>Priority: Major
>  Labels: performance
> Fix For: 1.19.0
>
>
> RelOptTableImpl's getMonotonocity does an indexOf on 
> {{rowType.getFieldNames()}}, which is O(N) in the number of fields. 
> IdentifierNamespace calls getMonotonicity once for every field in the table 
> namespace, so it becomes O(N^2) in the number of fields. We observed 2-4 
> second query planning times with a table that had 18,000 columns, reduced to 
> about 150ms after patching getMonotonicity to be O(1) in the number of fields.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (CALCITE-2635) getMonotonocity is slow on wide tables

2019-01-09 Thread Vladimir Sitnikov (JIRA)


[ 
https://issues.apache.org/jira/browse/CALCITE-2635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16738603#comment-16738603
 ] 

Vladimir Sitnikov commented on CALCITE-2635:


I guess one of the most difficult things to get is the hardware which is not 
shared among 100500 projects.

> getMonotonocity is slow on wide tables
> --
>
> Key: CALCITE-2635
> URL: https://issues.apache.org/jira/browse/CALCITE-2635
> Project: Calcite
>  Issue Type: Improvement
>  Components: core
>Reporter: Gian Merlino
>Assignee: Gian Merlino
>Priority: Major
>  Labels: performance
>
> RelOptTableImpl's getMonotonocity does an indexOf on 
> {{rowType.getFieldNames()}}, which is O(N) in the number of fields. 
> IdentifierNamespace calls getMonotonicity once for every field in the table 
> namespace, so it becomes O(N^2) in the number of fields. We observed 2-4 
> second query planning times with a table that had 18,000 columns, reduced to 
> about 150ms after patching getMonotonicity to be O(1) in the number of fields.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (CALCITE-2635) getMonotonocity is slow on wide tables

2019-01-09 Thread Julian Hyde (JIRA)


[ 
https://issues.apache.org/jira/browse/CALCITE-2635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16738592#comment-16738592
 ] 

Julian Hyde commented on CALCITE-2635:
--

Sure, we can merge this with no extra tests.

But I would be grateful for help getting to a performance testing framework.

> getMonotonocity is slow on wide tables
> --
>
> Key: CALCITE-2635
> URL: https://issues.apache.org/jira/browse/CALCITE-2635
> Project: Calcite
>  Issue Type: Improvement
>  Components: core
>Reporter: Gian Merlino
>Assignee: Gian Merlino
>Priority: Major
>  Labels: performance
>
> RelOptTableImpl's getMonotonocity does an indexOf on 
> {{rowType.getFieldNames()}}, which is O(N) in the number of fields. 
> IdentifierNamespace calls getMonotonicity once for every field in the table 
> namespace, so it becomes O(N^2) in the number of fields. We observed 2-4 
> second query planning times with a table that had 18,000 columns, reduced to 
> about 150ms after patching getMonotonicity to be O(1) in the number of fields.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (CALCITE-2635) getMonotonocity is slow on wide tables

2019-01-09 Thread Julian Hyde (JIRA)


[ 
https://issues.apache.org/jira/browse/CALCITE-2635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16738586#comment-16738586
 ] 

Julian Hyde commented on CALCITE-2635:
--

I agree, https://arewefastyet.com would be awesome. But we've been talking for 
years about performance regression tests and no one has done anything. Whatever 
approach we take, we will want to annotate in the code which tests are 
considered performance-critical. Then we can gather performance of those tests 
over time, on the same hardware, and look for trends/variance.

The variance of a single-threaded test between the slowest and fastest hardware 
is no more than 5x, whereas the variance between a good and bad algorithm can 
be several orders of magnitude. So it's not too difficult to write a useful 
test.

> getMonotonocity is slow on wide tables
> --
>
> Key: CALCITE-2635
> URL: https://issues.apache.org/jira/browse/CALCITE-2635
> Project: Calcite
>  Issue Type: Improvement
>  Components: core
>Reporter: Gian Merlino
>Assignee: Gian Merlino
>Priority: Major
>  Labels: performance
>
> RelOptTableImpl's getMonotonocity does an indexOf on 
> {{rowType.getFieldNames()}}, which is O(N) in the number of fields. 
> IdentifierNamespace calls getMonotonicity once for every field in the table 
> namespace, so it becomes O(N^2) in the number of fields. We observed 2-4 
> second query planning times with a table that had 18,000 columns, reduced to 
> about 150ms after patching getMonotonicity to be O(1) in the number of fields.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (CALCITE-2635) getMonotonocity is slow on wide tables

2019-01-09 Thread Vladimir Sitnikov (JIRA)


[ 
https://issues.apache.org/jira/browse/CALCITE-2635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16738551#comment-16738551
 ] 

Vladimir Sitnikov commented on CALCITE-2635:


{quote}@PerformanceTest(expectedDuration = "2s", variance = "5%"){quote}

Expected duration depends on the hardware. For instance, notebook, virtual 
machine, desktop, vps, etc, all could have very different raw performance.

I think it is much better to invest time to having something like 
https://arewefastyet.com
In other words, we could have a set of "standard" benchmarks + consistent 
machine for execution + scheduled executions so we can track regressions.

I'm inclined to merge this fix with no extra tests.


Note: the change is a clear win.
Alternative option is to implement HashMap to speedup 
{{org.apache.calcite.rel.type.RelDataType#getField(String fieldName, boolean 
caseSensitive, boolean elideRecord)}}. We do have 
{{org.apache.calcite.rel.type.RelDataTypeFactoryImpl#canonize(org.apache.calcite.rel.type.RelDataType)}},
 so lazy initialized cache of field positions might help.


However, we don't really expect single table to have lots of collations, so we 
could just go with PR#891
On top of that, we might add a hard limit like "try no more than first 50 
collations of the table", so even a table with extreme amount of collations 
won't create a problem for {{getMonotonocity}}

> getMonotonocity is slow on wide tables
> --
>
> Key: CALCITE-2635
> URL: https://issues.apache.org/jira/browse/CALCITE-2635
> Project: Calcite
>  Issue Type: Improvement
>  Components: core
>Reporter: Gian Merlino
>Assignee: Gian Merlino
>Priority: Major
>  Labels: performance
>
> RelOptTableImpl's getMonotonocity does an indexOf on 
> {{rowType.getFieldNames()}}, which is O(N) in the number of fields. 
> IdentifierNamespace calls getMonotonicity once for every field in the table 
> namespace, so it becomes O(N^2) in the number of fields. We observed 2-4 
> second query planning times with a table that had 18,000 columns, reduced to 
> about 150ms after patching getMonotonicity to be O(1) in the number of fields.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (CALCITE-2635) getMonotonocity is slow on wide tables

2018-10-22 Thread Julian Hyde (JIRA)


[ 
https://issues.apache.org/jira/browse/CALCITE-2635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16659771#comment-16659771
 ] 

Julian Hyde commented on CALCITE-2635:
--

bq. 
Any suggestions on what file is appropriate for that test?

If you meant database file, I wouldn't use a database file, just a mock.

If you meant java file, SqlToRelConverterTest.java will probably work. I notice 
that MockRelOptSchema.getTableForMember calls deduceMonotonicity which calls 
SqlValidatorTable.getMonotonicity, so the bug will surely show up.

> getMonotonocity is slow on wide tables
> --
>
> Key: CALCITE-2635
> URL: https://issues.apache.org/jira/browse/CALCITE-2635
> Project: Calcite
>  Issue Type: Improvement
>  Components: core
>Reporter: Gian Merlino
>Assignee: Gian Merlino
>Priority: Major
>  Labels: performance
>
> RelOptTableImpl's getMonotonocity does an indexOf on 
> {{rowType.getFieldNames()}}, which is O(N) in the number of fields. 
> IdentifierNamespace calls getMonotonicity once for every field in the table 
> namespace, so it becomes O(N^2) in the number of fields. We observed 2-4 
> second query planning times with a table that had 18,000 columns, reduced to 
> about 150ms after patching getMonotonicity to be O(1) in the number of fields.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (CALCITE-2635) getMonotonocity is slow on wide tables

2018-10-22 Thread Julian Hyde (JIRA)


[ 
https://issues.apache.org/jira/browse/CALCITE-2635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16659635#comment-16659635
 ] 

Julian Hyde commented on CALCITE-2635:
--

We never found a good solution to performance tests. You can't set a hard 
deadline, because there's inevitably noise (e.g. GC pauses) that may make a 
test take 100% longer than usual.

The solution would probably be to run the test in a consistent environment over 
time, discard outliers (the GC pauses) and check that the moving average of 
running times stay within a range. But this is hard to orchestrate.

I therefore think we should write a junit test with the expected running time 
described in a comment above the test. Developers can run it manually, and it 
will be run as part of the suite. If the test starts running 100x slower 
overnight maybe we will notice, and maybe we won't.

Consider creating an annotation so that we can find all performance tests:
{code}
@PerformanceTest(expectedDuration = "2s", variance = "5%")
{code}

> getMonotonocity is slow on wide tables
> --
>
> Key: CALCITE-2635
> URL: https://issues.apache.org/jira/browse/CALCITE-2635
> Project: Calcite
>  Issue Type: Improvement
>  Components: core
>Reporter: Gian Merlino
>Assignee: Gian Merlino
>Priority: Major
>  Labels: performance
>
> RelOptTableImpl's getMonotonocity does an indexOf on 
> {{rowType.getFieldNames()}}, which is O(N) in the number of fields. 
> IdentifierNamespace calls getMonotonicity once for every field in the table 
> namespace, so it becomes O(N^2) in the number of fields. We observed 2-4 
> second query planning times with a table that had 18,000 columns, reduced to 
> about 150ms after patching getMonotonicity to be O(1) in the number of fields.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (CALCITE-2635) getMonotonocity is slow on wide tables

2018-10-20 Thread Gian Merlino (JIRA)


[ 
https://issues.apache.org/jira/browse/CALCITE-2635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16658057#comment-16658057
 ] 

Gian Merlino commented on CALCITE-2635:
---

Any suggestions on what file is appropriate for that test?

I'm a bit concerned about the ability of a test like that to verify 
performance: the added overhead is only about 2 seconds, which is probably 
within normal variation across test environments. Even trying both ways (a big 
table and small table) and comparing in the same environment seems liable to 
false failures.

> getMonotonocity is slow on wide tables
> --
>
> Key: CALCITE-2635
> URL: https://issues.apache.org/jira/browse/CALCITE-2635
> Project: Calcite
>  Issue Type: Improvement
>  Components: core
>Reporter: Gian Merlino
>Assignee: Gian Merlino
>Priority: Major
>  Labels: performance
>
> RelOptTableImpl's getMonotonocity does an indexOf on 
> {{rowType.getFieldNames()}}, which is O(N) in the number of fields. 
> IdentifierNamespace calls getMonotonicity once for every field in the table 
> namespace, so it becomes O(N^2) in the number of fields. We observed 2-4 
> second query planning times with a table that had 18,000 columns, reduced to 
> about 150ms after patching getMonotonicity to be O(1) in the number of fields.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (CALCITE-2635) getMonotonocity is slow on wide tables

2018-10-20 Thread Julian Hyde (JIRA)


[ 
https://issues.apache.org/jira/browse/CALCITE-2635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16658055#comment-16658055
 ] 

Julian Hyde commented on CALCITE-2635:
--

I love cases like this. Perfect opportunity to add a test - mock a table with 
2 columns, prepare a query on it, and make sure performance is ok. 

> getMonotonocity is slow on wide tables
> --
>
> Key: CALCITE-2635
> URL: https://issues.apache.org/jira/browse/CALCITE-2635
> Project: Calcite
>  Issue Type: Improvement
>  Components: core
>Reporter: Gian Merlino
>Assignee: Gian Merlino
>Priority: Major
>  Labels: performance
>
> RelOptTableImpl's getMonotonocity does an indexOf on 
> {{rowType.getFieldNames()}}, which is O(N) in the number of fields. 
> IdentifierNamespace calls getMonotonicity once for every field in the table 
> namespace, so it becomes O(N^2) in the number of fields. We observed 2-4 
> second query planning times with a table that had 18,000 columns, reduced to 
> about 150ms after patching getMonotonicity to be O(1) in the number of fields.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)