Here are a couple of cases that give you some of the background: * https://issues.apache.org/jira/browse/CALCITE-794 * https://issues.apache.org/jira/browse/CALCITE-604
There are still some outstanding issues: * https://issues.apache.org/jira/browse/CALCITE-790 * https://issues.apache.org/jira/browse/CALCITE-1048 On Thu, May 25, 2017 at 11:57 AM, Julian Hyde <[email protected]> wrote: > This is a hard problem. We need to make a lot of metadata calls > (directly and indirectly), we want it to be efficient (hence we want > some caching) but we want the metadata to change when we make > improvements. The RelMetadataQuery system, including generated code, > is a carefully thought out solution to that problem. > > I don't recommend adding thread-level caching. The metadata goes out > of date when you apply transformation rules. > > The idea is to create a RelMetadataQuery instance when you want a > "transaction" giving you read-consistent view of the metadata. > Probably inside an onMatch method, and used for all metadata calls > from that method. > > If RelMetadataQuery.instance() is being called from within a metadata > handler that is bad news. Eliminate all such calls and see whether the > instance count goes down. > > On Thu, May 25, 2017 at 11:43 AM, Remus Rusanu <[email protected]> > wrote: >> Hi all, >> >> Investigating performance of Hive query compile I found some problems around >> memorization in GeneratedMetadataHandler_xxx classes. These classes are >> generated from JaninonRelMetadataProvider and the generated code does >> memorization, anchored on the ‘map’ field of the RelMetadataQuery ‘mq’ >> parameter. My measurements show that these calls explode into deep recursive >> stacks. I was measuring a complex query (49 joins, plenty of expressions) >> and some of the numbers look staggering. Take for instance >> GeneratedMetadataHandler_RowCount.getRowCount: >> - It is called 9303 times as top call >> - It generates 165754 total recursive calls, up to a nest level 18 >> - The memo cache is hit 18065 (successful found key) >> - The memo is populated 147689 times (missed key) >> - The function gets no less than 22186 distinct RelMetadataQuery >> `mq` instances (!!). >> >> This situation repeats for each GeneratedMetadataHandler_XXX code: >> Class >> >> Top Calls >> >> Total Calls >> >> Memo Cache Hits >> >> Memo Cache Miss >> >> Distinct RelMetadataQuery instances >> >> getMaxRowCount >> >> 489 >> >> 105262 >> >> 6615 >> >> 98647 >> >> 489 >> >> getDistinctRowCount >> >> 6828 >> >> 74286 >> >> 6179 >> >> 68107 >> >> 5828 >> >> areColumnsUnique >> >> 19197 >> >> 149708 >> >> 13367 >> >> 136341 >> >> 2139 >> >> getColumnOrigins >> >> 250 >> >> 3021 >> >> 39 >> >> 2982 >> >> 24 >> >> getCumulativeCost >> >> 2249 >> >> 9267 >> >> 4660 >> >> 4607 >> >> 1 >> >> getNonCumulativeCost >> >> 3559 >> >> 3559 >> >> 1285 >> >> 3559 >> >> 18 >> >> getSelectivity >> >> 15636 >> >> 35727 >> >> 1114 >> >> 34613 >> >> 4984 >> >> getUniqueKeys >> >> 26311 >> >> 111715 >> >> 5212 >> >> 106503 >> >> 3266 >> >> >> Looking at this, the root problems seems to be the fact that the code uses >> often the construct `RelMetadataQuery.instance()` to obtain a reference to a >> needed object. But each call to `instance()` returns a new object, and this >> new object has a new, clean, memorization `map` field. So we have a very >> poor memo cache hit ratio, but far worse is the effect of repeating >> ad-nauseam recursive calls on deep trees. I made an experiment where I >> modified the code in `RelMetadataQuery.instance()` to reference a >> threadLocal `map` field and the difference is like night vs. day: >> >> Class >> >> Top Calls >> >> Total Calls >> >> Memo Cache Hits >> >> Memo Cache Miss >> >> getRowCount >> >> 8179 >> >> 14426 >> >> 12920 >> >> 1506 >> >> getMaxRowCount >> >> 489 >> >> 2535 >> >> 1327 >> >> 1208 >> >> getDistinctRowCount >> >> 3138 >> >> 7947 >> >> 4939 >> >> 3008 >> >> areColumnsUnique >> >> 1103 >> >> 3191 >> >> 1495 >> >> 1696 >> >> getColumnOrigins >> >> 250 >> >> 1273 >> >> 205 >> >> 1068 >> >> getCumulativeCost >> >> 2249 >> >> 6755 >> >> 4288 >> >> 2467 >> >> getNonCumulativeCost >> >> 2274 >> >> 2274 >> >> 0 >> >> 2274 >> >> getSelectivity >> >> 539 >> >> 562 >> >> 27 >> >> 535 >> >> getUniqueKeys >> >> 2635 >> >> 5248 >> >> 2437 >> >> 2811 >> >> >> Gone are the deep recursive calls and explosion into +100k calls. I’m seeing >> 2x - 10x compile time improvements. >> >> I’m asking here if there is some reason behind the frequent replacement of >> the RelMetadataQuery object being used (and hence a clean mem cache), or is >> just some unintended consequence? I am making now changes on Hive side to >> address this (HIVE-16757), if the cache reset effect is accidental we should >> address this as well in Calcite. >> >> BTW I think `RelMetadataQuery.instance()` should be named >> `RelMetadataQuery.createInstance()` to be clear what the effect is. >> >> Thanks, >> ~Remus
