Re: On RelMetadataQuery.instance and memoization of GeneratedMetadataHandler_xxx calls

Remus Rusanu Tue, 30 May 2017 14:27:51 -0700

➢     I guess you'd add a method to RelOptRuleCall to acquire a
➢     RelMetadataQuery, so that each rule doesn't have to know about the
➢     caching policy?


That was exactly my plan.
I’ll create a JIRA and start with the easy stuff. 

On 5/30/17, 2:22 PM, "Julian Hyde" <[email protected]> wrote:

    That approach seems reasonable.
    
    Maybe you could go even further: there are occasions where transformTo
    produces a RelNode but it is equivalent to something already known.
    (But be careful of the case where it is equivalent but in a different
    RelSet than expected; that causes a merge of two equivalence sets,
    which will likely cause significant cost changes.)
    
    I guess you'd add a method to RelOptRuleCall to acquire a
    RelMetadataQuery, so that each rule doesn't have to know about the
    caching policy?
    
    Also see if you can reduce the number of times onMatch is called, by
    making RelOptRule.matches() and RelOptRuleOperand.matches() more
    selective.
    
    Julian
    
    
    On Tue, May 30, 2017 at 2:10 PM, Remus Rusanu <[email protected]> 
wrote:
    > That was a low hanging fruit :) It cut 50% of compile time.
    >
    > My next finding is that onMatch seems overaggressive for creating a new 
RelMetadataQuery. I’m now looking at instrumentation results and on this 
problem query onMatch is called 98989 times, but transformTo only 717 times. Am 
I wrong in assuming that the RelMetadataQuery should normally change only when 
transformTo is called?
    > I’m experimenting with passing a ‘current’ mq from planner to the rules 
(via the RuleCall), use it in onMatch, and have this ‘current’ be invalidated 
only when a transformTo is called. The results are spectacular (basically, the 
other 50% of compile time…)
    >
    > Thanks,
    > ~Remus
    >
    > On 5/30/17, 1:30 PM, "Julian Hyde" <[email protected]> wrote:
    >
    >     Glad you found it.
    >
    >     Yeah, I strongly recommend that you check for and remove calls to 
deprecated methods each time you upgrade Calcite. Even if the code builds, all 
may not be well. Deprecating the method was the strongest hint we could give 
when we made that change.
    >
    >     Julian
    >
    >
    >     > On May 28, 2017, at 2:01 PM, Remus Rusanu <[email protected]> 
wrote:
    >     >
    >     > Thanks Julian,
    >     >
    >     > After despairing for a day over the countless places where we 
invoke in code `instance()` and having to go over them one-by one, I made a 
break through today when I added instrumentation to count where we call 
instance() from at runtime. Turns out the culprit is use of one deprecated API: 
AbstractRelNode.getRows():
    >     >
    >     >  @SuppressWarnings("deprecation")
    >     >  public final double getRows() {
    >     >    return estimateRowCount(RelMetadataQuery.instance());
    >     >  }
    >     >
    >     > This alone was called directly 2995340 times, and causing 
indirectly about ~1M more calls. These are Hive code calls, not Calcite issues. 
I made changes to use the estimateRowCount(RelMetadataQuery) call instead, and 
the biggest offenders all had handy mq instances to pass in. I’m waiting on 
tests, but it looks very promising.
    >     >
    >     > Thanks,
    >     > ~Remus
    >     >
    >     > On 5/25/17, 11:57 AM, "Julian Hyde" <[email protected]> wrote:
    >     >
    >     >    This is a hard problem. We need to make a lot of metadata calls
    >     >    (directly and indirectly), we want it to be efficient (hence we 
want
    >     >    some caching) but we want the metadata to change when we make
    >     >    improvements. The RelMetadataQuery system, including generated 
code,
    >     >    is a carefully thought out solution to that problem.
    >     >
    >     >    I don't recommend adding thread-level caching. The metadata goes 
out
    >     >    of date when you apply transformation rules.
    >     >
    >     >    The idea is to create a RelMetadataQuery instance when you want a
    >     >    "transaction" giving you read-consistent view of the metadata.
    >     >    Probably inside an onMatch method, and used for all metadata 
calls
    >     >    from that method.
    >     >
    >     >    If RelMetadataQuery.instance() is being called from within a 
metadata
    >     >    handler that is bad news. Eliminate all such calls and see 
whether the
    >     >    instance count goes down.
    >     >
    >     >    On Thu, May 25, 2017 at 11:43 AM, Remus Rusanu 
<[email protected]> wrote:
    >     >> Hi all,
    >     >>
    >     >> Investigating performance of Hive query compile I found some 
problems around memorization in GeneratedMetadataHandler_xxx classes. These 
classes are generated from JaninonRelMetadataProvider and the generated code 
does memorization, anchored on the ‘map’ field of the RelMetadataQuery ‘mq’ 
parameter. My measurements show that these calls explode into deep recursive 
stacks. I was measuring a complex query (49 joins, plenty of expressions) and 
some of the numbers look staggering. Take for instance 
GeneratedMetadataHandler_RowCount.getRowCount:
    >     >> -          It is called 9303 times as top call
    >     >> -          It generates 165754 total recursive calls, up to a nest 
level 18
    >     >> -          The memo cache is hit 18065 (successful found key)
    >     >> -          The memo is populated 147689 times (missed key)
    >     >> -          The function gets no less than 22186 distinct 
RelMetadataQuery `mq` instances (!!).
    >     >>
    >     >> This situation repeats for each GeneratedMetadataHandler_XXX code:
    >     >> Class
    >     >>
    >     >> Top Calls
    >     >>
    >     >> Total Calls
    >     >>
    >     >> Memo Cache Hits
    >     >>
    >     >> Memo Cache Miss
    >     >>
    >     >> Distinct RelMetadataQuery instances
    >     >>
    >     >> getMaxRowCount
    >     >>
    >     >> 489
    >     >>
    >     >> 105262
    >     >>
    >     >> 6615
    >     >>
    >     >> 98647
    >     >>
    >     >> 489
    >     >>
    >     >> getDistinctRowCount
    >     >>
    >     >> 6828
    >     >>
    >     >> 74286
    >     >>
    >     >> 6179
    >     >>
    >     >> 68107
    >     >>
    >     >> 5828
    >     >>
    >     >> areColumnsUnique
    >     >>
    >     >> 19197
    >     >>
    >     >> 149708
    >     >>
    >     >> 13367
    >     >>
    >     >> 136341
    >     >>
    >     >> 2139
    >     >>
    >     >> getColumnOrigins
    >     >>
    >     >> 250
    >     >>
    >     >> 3021
    >     >>
    >     >> 39
    >     >>
    >     >> 2982
    >     >>
    >     >> 24
    >     >>
    >     >> getCumulativeCost
    >     >>
    >     >> 2249
    >     >>
    >     >> 9267
    >     >>
    >     >> 4660
    >     >>
    >     >> 4607
    >     >>
    >     >> 1
    >     >>
    >     >> getNonCumulativeCost
    >     >>
    >     >> 3559
    >     >>
    >     >> 3559
    >     >>
    >     >> 1285
    >     >>
    >     >> 3559
    >     >>
    >     >> 18
    >     >>
    >     >> getSelectivity
    >     >>
    >     >> 15636
    >     >>
    >     >> 35727
    >     >>
    >     >> 1114
    >     >>
    >     >> 34613
    >     >>
    >     >> 4984
    >     >>
    >     >> getUniqueKeys
    >     >>
    >     >> 26311
    >     >>
    >     >> 111715
    >     >>
    >     >> 5212
    >     >>
    >     >> 106503
    >     >>
    >     >> 3266
    >     >>
    >     >>
    >     >> Looking at this, the root problems seems to be the fact that the 
code uses often the construct `RelMetadataQuery.instance()` to obtain a 
reference to a needed object. But each call to `instance()` returns a new 
object, and this new object has a new, clean, memorization `map` field. So we 
have a very poor memo cache hit ratio, but far worse is the effect of repeating 
ad-nauseam recursive calls on deep trees. I made an experiment where I modified 
the code in `RelMetadataQuery.instance()` to reference a threadLocal `map` 
field and the difference is like night vs. day:
    >     >>
    >     >> Class
    >     >>
    >     >> Top Calls
    >     >>
    >     >> Total Calls
    >     >>
    >     >> Memo Cache Hits
    >     >>
    >     >> Memo Cache Miss
    >     >>
    >     >> getRowCount
    >     >>
    >     >> 8179
    >     >>
    >     >> 14426
    >     >>
    >     >> 12920
    >     >>
    >     >> 1506
    >     >>
    >     >> getMaxRowCount
    >     >>
    >     >> 489
    >     >>
    >     >> 2535
    >     >>
    >     >> 1327
    >     >>
    >     >> 1208
    >     >>
    >     >> getDistinctRowCount
    >     >>
    >     >> 3138
    >     >>
    >     >> 7947
    >     >>
    >     >> 4939
    >     >>
    >     >> 3008
    >     >>
    >     >> areColumnsUnique
    >     >>
    >     >> 1103
    >     >>
    >     >> 3191
    >     >>
    >     >> 1495
    >     >>
    >     >> 1696
    >     >>
    >     >> getColumnOrigins
    >     >>
    >     >> 250
    >     >>
    >     >> 1273
    >     >>
    >     >> 205
    >     >>
    >     >> 1068
    >     >>
    >     >> getCumulativeCost
    >     >>
    >     >> 2249
    >     >>
    >     >> 6755
    >     >>
    >     >> 4288
    >     >>
    >     >> 2467
    >     >>
    >     >> getNonCumulativeCost
    >     >>
    >     >> 2274
    >     >>
    >     >> 2274
    >     >>
    >     >> 0
    >     >>
    >     >> 2274
    >     >>
    >     >> getSelectivity
    >     >>
    >     >> 539
    >     >>
    >     >> 562
    >     >>
    >     >> 27
    >     >>
    >     >> 535
    >     >>
    >     >> getUniqueKeys
    >     >>
    >     >> 2635
    >     >>
    >     >> 5248
    >     >>
    >     >> 2437
    >     >>
    >     >> 2811
    >     >>
    >     >>
    >     >> Gone are the deep recursive calls and explosion into +100k calls. 
I’m seeing 2x - 10x compile time improvements.
    >     >>
    >     >> I’m asking here if there is some reason behind the frequent 
replacement of the RelMetadataQuery object being used (and hence a clean mem 
cache), or is just some unintended consequence? I am making now changes on Hive 
side to address this (HIVE-16757), if the cache reset effect is accidental we 
should address this as well in Calcite.
    >     >>
    >     >> BTW I think `RelMetadataQuery.instance()` should be named 
`RelMetadataQuery.createInstance()` to be clear what the effect is.
    >     >>
    >     >> Thanks,
    >     >> ~Remus
    >     >
    >     >
    >     >
    >
    >
    >
    >

Re: On RelMetadataQuery.instance and memoization of GeneratedMetadataHandler_xxx calls

Reply via email to