Re: On RelMetadataQuery.instance and memoization of GeneratedMetadataHandler_xxx calls

Remus Rusanu Tue, 30 May 2017 14:11:32 -0700

That was a low hanging fruit :) It cut 50% of compile time.

My next finding is that onMatch seems overaggressive for creating a new 
RelMetadataQuery. I’m now looking at instrumentation results and on this 
problem query onMatch is called 98989 times, but transformTo only 717 times. Am 
I wrong in assuming that the RelMetadataQuery should normally change only when 
transformTo is called?
I’m experimenting with passing a ‘current’ mq from planner to the rules (via 
the RuleCall), use it in onMatch, and have this ‘current’ be invalidated only 
when a transformTo is called. The results are spectacular (basically, the other 
50% of compile time…)


Thanks,
~Remus

On 5/30/17, 1:30 PM, "Julian Hyde" <[email protected]> wrote:

    Glad you found it.
    
    Yeah, I strongly recommend that you check for and remove calls to 
deprecated methods each time you upgrade Calcite. Even if the code builds, all 
may not be well. Deprecating the method was the strongest hint we could give 
when we made that change.
    
    Julian
    
    
    > On May 28, 2017, at 2:01 PM, Remus Rusanu <[email protected]> wrote:
    > 
    > Thanks Julian,
    > 
    > After despairing for a day over the countless places where we invoke in 
code `instance()` and having to go over them one-by one, I made a break through 
today when I added instrumentation to count where we call instance() from at 
runtime. Turns out the culprit is use of one deprecated API: 
AbstractRelNode.getRows():
    > 
    >  @SuppressWarnings("deprecation")
    >  public final double getRows() {
    >    return estimateRowCount(RelMetadataQuery.instance());
    >  }
    > 
    > This alone was called directly 2995340 times, and causing indirectly 
about ~1M more calls. These are Hive code calls, not Calcite issues. I made 
changes to use the estimateRowCount(RelMetadataQuery) call instead, and the 
biggest offenders all had handy mq instances to pass in. I’m waiting on tests, 
but it looks very promising.
    > 
    > Thanks,
    > ~Remus
    > 
    > On 5/25/17, 11:57 AM, "Julian Hyde" <[email protected]> wrote:
    > 
    >    This is a hard problem. We need to make a lot of metadata calls
    >    (directly and indirectly), we want it to be efficient (hence we want
    >    some caching) but we want the metadata to change when we make
    >    improvements. The RelMetadataQuery system, including generated code,
    >    is a carefully thought out solution to that problem.
    > 
    >    I don't recommend adding thread-level caching. The metadata goes out
    >    of date when you apply transformation rules.
    > 
    >    The idea is to create a RelMetadataQuery instance when you want a
    >    "transaction" giving you read-consistent view of the metadata.
    >    Probably inside an onMatch method, and used for all metadata calls
    >    from that method.
    > 
    >    If RelMetadataQuery.instance() is being called from within a metadata
    >    handler that is bad news. Eliminate all such calls and see whether the
    >    instance count goes down.
    > 
    >    On Thu, May 25, 2017 at 11:43 AM, Remus Rusanu 
<[email protected]> wrote:
    >> Hi all,
    >> 
    >> Investigating performance of Hive query compile I found some problems 
around memorization in GeneratedMetadataHandler_xxx classes. These classes are 
generated from JaninonRelMetadataProvider and the generated code does 
memorization, anchored on the ‘map’ field of the RelMetadataQuery ‘mq’ 
parameter. My measurements show that these calls explode into deep recursive 
stacks. I was measuring a complex query (49 joins, plenty of expressions) and 
some of the numbers look staggering. Take for instance 
GeneratedMetadataHandler_RowCount.getRowCount:
    >> -          It is called 9303 times as top call
    >> -          It generates 165754 total recursive calls, up to a nest level 
18
    >> -          The memo cache is hit 18065 (successful found key)
    >> -          The memo is populated 147689 times (missed key)
    >> -          The function gets no less than 22186 distinct 
RelMetadataQuery `mq` instances (!!).
    >> 
    >> This situation repeats for each GeneratedMetadataHandler_XXX code:
    >> Class
    >> 
    >> Top Calls
    >> 
    >> Total Calls
    >> 
    >> Memo Cache Hits
    >> 
    >> Memo Cache Miss
    >> 
    >> Distinct RelMetadataQuery instances
    >> 
    >> getMaxRowCount
    >> 
    >> 489
    >> 
    >> 105262
    >> 
    >> 6615
    >> 
    >> 98647
    >> 
    >> 489
    >> 
    >> getDistinctRowCount
    >> 
    >> 6828
    >> 
    >> 74286
    >> 
    >> 6179
    >> 
    >> 68107
    >> 
    >> 5828
    >> 
    >> areColumnsUnique
    >> 
    >> 19197
    >> 
    >> 149708
    >> 
    >> 13367
    >> 
    >> 136341
    >> 
    >> 2139
    >> 
    >> getColumnOrigins
    >> 
    >> 250
    >> 
    >> 3021
    >> 
    >> 39
    >> 
    >> 2982
    >> 
    >> 24
    >> 
    >> getCumulativeCost
    >> 
    >> 2249
    >> 
    >> 9267
    >> 
    >> 4660
    >> 
    >> 4607
    >> 
    >> 1
    >> 
    >> getNonCumulativeCost
    >> 
    >> 3559
    >> 
    >> 3559
    >> 
    >> 1285
    >> 
    >> 3559
    >> 
    >> 18
    >> 
    >> getSelectivity
    >> 
    >> 15636
    >> 
    >> 35727
    >> 
    >> 1114
    >> 
    >> 34613
    >> 
    >> 4984
    >> 
    >> getUniqueKeys
    >> 
    >> 26311
    >> 
    >> 111715
    >> 
    >> 5212
    >> 
    >> 106503
    >> 
    >> 3266
    >> 
    >> 
    >> Looking at this, the root problems seems to be the fact that the code 
uses often the construct `RelMetadataQuery.instance()` to obtain a reference to 
a needed object. But each call to `instance()` returns a new object, and this 
new object has a new, clean, memorization `map` field. So we have a very poor 
memo cache hit ratio, but far worse is the effect of repeating ad-nauseam 
recursive calls on deep trees. I made an experiment where I modified the code 
in `RelMetadataQuery.instance()` to reference a threadLocal `map` field and the 
difference is like night vs. day:
    >> 
    >> Class
    >> 
    >> Top Calls
    >> 
    >> Total Calls
    >> 
    >> Memo Cache Hits
    >> 
    >> Memo Cache Miss
    >> 
    >> getRowCount
    >> 
    >> 8179
    >> 
    >> 14426
    >> 
    >> 12920
    >> 
    >> 1506
    >> 
    >> getMaxRowCount
    >> 
    >> 489
    >> 
    >> 2535
    >> 
    >> 1327
    >> 
    >> 1208
    >> 
    >> getDistinctRowCount
    >> 
    >> 3138
    >> 
    >> 7947
    >> 
    >> 4939
    >> 
    >> 3008
    >> 
    >> areColumnsUnique
    >> 
    >> 1103
    >> 
    >> 3191
    >> 
    >> 1495
    >> 
    >> 1696
    >> 
    >> getColumnOrigins
    >> 
    >> 250
    >> 
    >> 1273
    >> 
    >> 205
    >> 
    >> 1068
    >> 
    >> getCumulativeCost
    >> 
    >> 2249
    >> 
    >> 6755
    >> 
    >> 4288
    >> 
    >> 2467
    >> 
    >> getNonCumulativeCost
    >> 
    >> 2274
    >> 
    >> 2274
    >> 
    >> 0
    >> 
    >> 2274
    >> 
    >> getSelectivity
    >> 
    >> 539
    >> 
    >> 562
    >> 
    >> 27
    >> 
    >> 535
    >> 
    >> getUniqueKeys
    >> 
    >> 2635
    >> 
    >> 5248
    >> 
    >> 2437
    >> 
    >> 2811
    >> 
    >> 
    >> Gone are the deep recursive calls and explosion into +100k calls. I’m 
seeing 2x - 10x compile time improvements.
    >> 
    >> I’m asking here if there is some reason behind the frequent replacement 
of the RelMetadataQuery object being used (and hence a clean mem cache), or is 
just some unintended consequence? I am making now changes on Hive side to 
address this (HIVE-16757), if the cache reset effect is accidental we should 
address this as well in Calcite.
    >> 
    >> BTW I think `RelMetadataQuery.instance()` should be named 
`RelMetadataQuery.createInstance()` to be clear what the effect is.
    >> 
    >> Thanks,
    >> ~Remus
    > 
    > 
    >

Re: On RelMetadataQuery.instance and memoization of GeneratedMetadataHandler_xxx calls

Reply via email to