That was a low hanging fruit :) It cut 50% of compile time. My next finding is that onMatch seems overaggressive for creating a new RelMetadataQuery. I’m now looking at instrumentation results and on this problem query onMatch is called 98989 times, but transformTo only 717 times. Am I wrong in assuming that the RelMetadataQuery should normally change only when transformTo is called? I’m experimenting with passing a ‘current’ mq from planner to the rules (via the RuleCall), use it in onMatch, and have this ‘current’ be invalidated only when a transformTo is called. The results are spectacular (basically, the other 50% of compile time…)
Thanks, ~Remus On 5/30/17, 1:30 PM, "Julian Hyde" <[email protected]> wrote: Glad you found it. Yeah, I strongly recommend that you check for and remove calls to deprecated methods each time you upgrade Calcite. Even if the code builds, all may not be well. Deprecating the method was the strongest hint we could give when we made that change. Julian > On May 28, 2017, at 2:01 PM, Remus Rusanu <[email protected]> wrote: > > Thanks Julian, > > After despairing for a day over the countless places where we invoke in code `instance()` and having to go over them one-by one, I made a break through today when I added instrumentation to count where we call instance() from at runtime. Turns out the culprit is use of one deprecated API: AbstractRelNode.getRows(): > > @SuppressWarnings("deprecation") > public final double getRows() { > return estimateRowCount(RelMetadataQuery.instance()); > } > > This alone was called directly 2995340 times, and causing indirectly about ~1M more calls. These are Hive code calls, not Calcite issues. I made changes to use the estimateRowCount(RelMetadataQuery) call instead, and the biggest offenders all had handy mq instances to pass in. I’m waiting on tests, but it looks very promising. > > Thanks, > ~Remus > > On 5/25/17, 11:57 AM, "Julian Hyde" <[email protected]> wrote: > > This is a hard problem. We need to make a lot of metadata calls > (directly and indirectly), we want it to be efficient (hence we want > some caching) but we want the metadata to change when we make > improvements. The RelMetadataQuery system, including generated code, > is a carefully thought out solution to that problem. > > I don't recommend adding thread-level caching. The metadata goes out > of date when you apply transformation rules. > > The idea is to create a RelMetadataQuery instance when you want a > "transaction" giving you read-consistent view of the metadata. > Probably inside an onMatch method, and used for all metadata calls > from that method. > > If RelMetadataQuery.instance() is being called from within a metadata > handler that is bad news. Eliminate all such calls and see whether the > instance count goes down. > > On Thu, May 25, 2017 at 11:43 AM, Remus Rusanu <[email protected]> wrote: >> Hi all, >> >> Investigating performance of Hive query compile I found some problems around memorization in GeneratedMetadataHandler_xxx classes. These classes are generated from JaninonRelMetadataProvider and the generated code does memorization, anchored on the ‘map’ field of the RelMetadataQuery ‘mq’ parameter. My measurements show that these calls explode into deep recursive stacks. I was measuring a complex query (49 joins, plenty of expressions) and some of the numbers look staggering. Take for instance GeneratedMetadataHandler_RowCount.getRowCount: >> - It is called 9303 times as top call >> - It generates 165754 total recursive calls, up to a nest level 18 >> - The memo cache is hit 18065 (successful found key) >> - The memo is populated 147689 times (missed key) >> - The function gets no less than 22186 distinct RelMetadataQuery `mq` instances (!!). >> >> This situation repeats for each GeneratedMetadataHandler_XXX code: >> Class >> >> Top Calls >> >> Total Calls >> >> Memo Cache Hits >> >> Memo Cache Miss >> >> Distinct RelMetadataQuery instances >> >> getMaxRowCount >> >> 489 >> >> 105262 >> >> 6615 >> >> 98647 >> >> 489 >> >> getDistinctRowCount >> >> 6828 >> >> 74286 >> >> 6179 >> >> 68107 >> >> 5828 >> >> areColumnsUnique >> >> 19197 >> >> 149708 >> >> 13367 >> >> 136341 >> >> 2139 >> >> getColumnOrigins >> >> 250 >> >> 3021 >> >> 39 >> >> 2982 >> >> 24 >> >> getCumulativeCost >> >> 2249 >> >> 9267 >> >> 4660 >> >> 4607 >> >> 1 >> >> getNonCumulativeCost >> >> 3559 >> >> 3559 >> >> 1285 >> >> 3559 >> >> 18 >> >> getSelectivity >> >> 15636 >> >> 35727 >> >> 1114 >> >> 34613 >> >> 4984 >> >> getUniqueKeys >> >> 26311 >> >> 111715 >> >> 5212 >> >> 106503 >> >> 3266 >> >> >> Looking at this, the root problems seems to be the fact that the code uses often the construct `RelMetadataQuery.instance()` to obtain a reference to a needed object. But each call to `instance()` returns a new object, and this new object has a new, clean, memorization `map` field. So we have a very poor memo cache hit ratio, but far worse is the effect of repeating ad-nauseam recursive calls on deep trees. I made an experiment where I modified the code in `RelMetadataQuery.instance()` to reference a threadLocal `map` field and the difference is like night vs. day: >> >> Class >> >> Top Calls >> >> Total Calls >> >> Memo Cache Hits >> >> Memo Cache Miss >> >> getRowCount >> >> 8179 >> >> 14426 >> >> 12920 >> >> 1506 >> >> getMaxRowCount >> >> 489 >> >> 2535 >> >> 1327 >> >> 1208 >> >> getDistinctRowCount >> >> 3138 >> >> 7947 >> >> 4939 >> >> 3008 >> >> areColumnsUnique >> >> 1103 >> >> 3191 >> >> 1495 >> >> 1696 >> >> getColumnOrigins >> >> 250 >> >> 1273 >> >> 205 >> >> 1068 >> >> getCumulativeCost >> >> 2249 >> >> 6755 >> >> 4288 >> >> 2467 >> >> getNonCumulativeCost >> >> 2274 >> >> 2274 >> >> 0 >> >> 2274 >> >> getSelectivity >> >> 539 >> >> 562 >> >> 27 >> >> 535 >> >> getUniqueKeys >> >> 2635 >> >> 5248 >> >> 2437 >> >> 2811 >> >> >> Gone are the deep recursive calls and explosion into +100k calls. I’m seeing 2x - 10x compile time improvements. >> >> I’m asking here if there is some reason behind the frequent replacement of the RelMetadataQuery object being used (and hence a clean mem cache), or is just some unintended consequence? I am making now changes on Hive side to address this (HIVE-16757), if the cache reset effect is accidental we should address this as well in Calcite. >> >> BTW I think `RelMetadataQuery.instance()` should be named `RelMetadataQuery.createInstance()` to be clear what the effect is. >> >> Thanks, >> ~Remus > > >
