Here are a couple of cases that give you some of the background:
* https://issues.apache.org/jira/browse/CALCITE-794
* https://issues.apache.org/jira/browse/CALCITE-604

There are still some outstanding issues:
* https://issues.apache.org/jira/browse/CALCITE-790
* https://issues.apache.org/jira/browse/CALCITE-1048

On Thu, May 25, 2017 at 11:57 AM, Julian Hyde <[email protected]> wrote:
> This is a hard problem. We need to make a lot of metadata calls
> (directly and indirectly), we want it to be efficient (hence we want
> some caching) but we want the metadata to change when we make
> improvements. The RelMetadataQuery system, including generated code,
> is a carefully thought out solution to that problem.
>
> I don't recommend adding thread-level caching. The metadata goes out
> of date when you apply transformation rules.
>
> The idea is to create a RelMetadataQuery instance when you want a
> "transaction" giving you read-consistent view of the metadata.
> Probably inside an onMatch method, and used for all metadata calls
> from that method.
>
> If RelMetadataQuery.instance() is being called from within a metadata
> handler that is bad news. Eliminate all such calls and see whether the
> instance count goes down.
>
> On Thu, May 25, 2017 at 11:43 AM, Remus Rusanu <[email protected]> 
> wrote:
>> Hi all,
>>
>> Investigating performance of Hive query compile I found some problems around 
>> memorization in GeneratedMetadataHandler_xxx classes. These classes are 
>> generated from JaninonRelMetadataProvider and the generated code does 
>> memorization, anchored on the ‘map’ field of the RelMetadataQuery ‘mq’ 
>> parameter. My measurements show that these calls explode into deep recursive 
>> stacks. I was measuring a complex query (49 joins, plenty of expressions) 
>> and some of the numbers look staggering. Take for instance 
>> GeneratedMetadataHandler_RowCount.getRowCount:
>> -          It is called 9303 times as top call
>> -          It generates 165754 total recursive calls, up to a nest level 18
>> -          The memo cache is hit 18065 (successful found key)
>> -          The memo is populated 147689 times (missed key)
>> -          The function gets no less than 22186 distinct RelMetadataQuery 
>> `mq` instances (!!).
>>
>> This situation repeats for each GeneratedMetadataHandler_XXX code:
>> Class
>>
>>  Top Calls
>>
>>  Total Calls
>>
>>  Memo Cache Hits
>>
>>  Memo Cache Miss
>>
>>  Distinct RelMetadataQuery instances
>>
>> getMaxRowCount
>>
>> 489
>>
>> 105262
>>
>> 6615
>>
>> 98647
>>
>> 489
>>
>> getDistinctRowCount
>>
>> 6828
>>
>> 74286
>>
>> 6179
>>
>> 68107
>>
>> 5828
>>
>> areColumnsUnique
>>
>> 19197
>>
>> 149708
>>
>> 13367
>>
>> 136341
>>
>> 2139
>>
>> getColumnOrigins
>>
>> 250
>>
>> 3021
>>
>> 39
>>
>> 2982
>>
>> 24
>>
>> getCumulativeCost
>>
>> 2249
>>
>> 9267
>>
>> 4660
>>
>> 4607
>>
>> 1
>>
>> getNonCumulativeCost
>>
>> 3559
>>
>> 3559
>>
>> 1285
>>
>> 3559
>>
>> 18
>>
>> getSelectivity
>>
>> 15636
>>
>> 35727
>>
>> 1114
>>
>> 34613
>>
>> 4984
>>
>> getUniqueKeys
>>
>> 26311
>>
>> 111715
>>
>> 5212
>>
>> 106503
>>
>> 3266
>>
>>
>> Looking at this, the root problems seems to be the fact that the code uses 
>> often the construct `RelMetadataQuery.instance()` to obtain a reference to a 
>> needed object. But each call to `instance()` returns a new object, and this 
>> new object has a new, clean, memorization `map` field. So we have a very 
>> poor memo cache hit ratio, but far worse is the effect of repeating 
>> ad-nauseam recursive calls on deep trees. I made an experiment where I 
>> modified the code in `RelMetadataQuery.instance()` to reference a 
>> threadLocal `map` field and the difference is like night vs. day:
>>
>> Class
>>
>>  Top Calls
>>
>>  Total Calls
>>
>>  Memo Cache Hits
>>
>>  Memo Cache Miss
>>
>> getRowCount
>>
>> 8179
>>
>> 14426
>>
>> 12920
>>
>> 1506
>>
>> getMaxRowCount
>>
>> 489
>>
>> 2535
>>
>> 1327
>>
>> 1208
>>
>> getDistinctRowCount
>>
>> 3138
>>
>> 7947
>>
>> 4939
>>
>> 3008
>>
>> areColumnsUnique
>>
>> 1103
>>
>> 3191
>>
>> 1495
>>
>> 1696
>>
>> getColumnOrigins
>>
>> 250
>>
>> 1273
>>
>> 205
>>
>> 1068
>>
>> getCumulativeCost
>>
>> 2249
>>
>> 6755
>>
>> 4288
>>
>> 2467
>>
>> getNonCumulativeCost
>>
>> 2274
>>
>> 2274
>>
>> 0
>>
>> 2274
>>
>> getSelectivity
>>
>> 539
>>
>> 562
>>
>> 27
>>
>> 535
>>
>> getUniqueKeys
>>
>> 2635
>>
>> 5248
>>
>> 2437
>>
>> 2811
>>
>>
>> Gone are the deep recursive calls and explosion into +100k calls. I’m seeing 
>> 2x - 10x compile time improvements.
>>
>> I’m asking here if there is some reason behind the frequent replacement of 
>> the RelMetadataQuery object being used (and hence a clean mem cache), or is 
>> just some unintended consequence? I am making now changes on Hive side to 
>> address this (HIVE-16757), if the cache reset effect is accidental we should 
>> address this as well in Calcite.
>>
>> BTW I think `RelMetadataQuery.instance()` should be named 
>> `RelMetadataQuery.createInstance()` to be clear what the effect is.
>>
>> Thanks,
>> ~Remus

Reply via email to