[jira] [Commented] (CALCITE-2635) getMonotonocity is slow on wide tables

Vladimir Sitnikov (JIRA) Wed, 09 Jan 2019 11:09:31 -0800


    [ 
https://issues.apache.org/jira/browse/CALCITE-2635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16738551#comment-16738551
 ]


Vladimir Sitnikov commented on CALCITE-2635:
--------------------------------------------

{quote}@PerformanceTest(expectedDuration = "2s", variance = "5%"){quote}

Expected duration depends on the hardware. For instance, notebook, virtual 
machine, desktop, vps, etc, all could have very different raw performance.

I think it is much better to invest time to having something like 
https://arewefastyet.com
In other words, we could have a set of "standard" benchmarks + consistent 
machine for execution + scheduled executions so we can track regressions.

I'm inclined to merge this fix with no extra tests.


Note: the change is a clear win.
Alternative option is to implement HashMap to speedup 
{{org.apache.calcite.rel.type.RelDataType#getField(String fieldName, boolean 
caseSensitive, boolean elideRecord)}}. We do have 
{{org.apache.calcite.rel.type.RelDataTypeFactoryImpl#canonize(org.apache.calcite.rel.type.RelDataType)}},
 so lazy initialized cache of field positions might help.


However, we don't really expect single table to have lots of collations, so we 
could just go with PR#891
On top of that, we might add a hard limit like "try no more than first 50 
collations of the table", so even a table with extreme amount of collations 
won't create a problem for {{getMonotonocity}}

> getMonotonocity is slow on wide tables
> --------------------------------------
>
>                 Key: CALCITE-2635
>                 URL: https://issues.apache.org/jira/browse/CALCITE-2635
>             Project: Calcite
>          Issue Type: Improvement
>          Components: core
>            Reporter: Gian Merlino
>            Assignee: Gian Merlino
>            Priority: Major
>              Labels: performance
>
> RelOptTableImpl's getMonotonocity does an indexOf on 
> {{rowType.getFieldNames()}}, which is O(N) in the number of fields. 
> IdentifierNamespace calls getMonotonicity once for every field in the table 
> namespace, so it becomes O(N^2) in the number of fields. We observed 2-4 
> second query planning times with a table that had 18,000 columns, reduced to 
> about 150ms after patching getMonotonicity to be O(1) in the number of fields.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (CALCITE-2635) getMonotonocity is slow on wide tables

Reply via email to