Todd Lipcon has posted comments on this change. ( http://gerrit.cloudera.org:8080/11388 )
Change subject: WIP: IMPALA-7527: add fetch-from-catalogd cache info to profile ...................................................................... Patch Set 3: (6 comments) http://gerrit.cloudera.org:8080/#/c/11388/3//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/11388/3//COMMIT_MSG@32 PS3, Line 32: breaking down by category? > Yep, just looking at the code, I have a feeling it could be chatty and most Here's a sample of 'explain select * from lineitem; profile;' on a cold cache: Frontend: - CatalogFetch.ColumnStats.Misses: 16 (16) - CatalogFetch.ColumnStats.Requests: 16 (16) - CatalogFetch.ColumnStats.Time: 21ms - CatalogFetch.Config.Misses: 1 (1) - CatalogFetch.Config.Requests: 1 (1) - CatalogFetch.Config.Time: 37ms - CatalogFetch.DatabaseList.Hits: 1 (1) - CatalogFetch.DatabaseList.Requests: 1 (1) - CatalogFetch.DatabaseList.Time: 0 - CatalogFetch.PartitionLists.Misses: 1 (1) - CatalogFetch.PartitionLists.Requests: 1 (1) - CatalogFetch.PartitionLists.Time: 5ms - CatalogFetch.Partitions.Hits: 2 (2) - CatalogFetch.Partitions.Misses: 1 (1) - CatalogFetch.Partitions.Requests: 3 (3) - CatalogFetch.Partitions.Time: 13ms - CatalogFetch.TableNames.Hits: 1 (1) - CatalogFetch.TableNames.Requests: 1 (1) - CatalogFetch.TableNames.Time: 0 - CatalogFetch.Tables.Misses: 1 (1) - CatalogFetch.Tables.Requests: 1 (1) - CatalogFetch.Tables.Time: 345ms on a warm cache: Frontend: - CatalogFetch.ColumnStats.Hits: 16 (16) - CatalogFetch.ColumnStats.Requests: 16 (16) - CatalogFetch.ColumnStats.Time: 0 - CatalogFetch.Config.Hits: 1 (1) - CatalogFetch.Config.Requests: 1 (1) - CatalogFetch.Config.Time: 0 - CatalogFetch.DatabaseList.Hits: 1 (1) - CatalogFetch.DatabaseList.Requests: 1 (1) - CatalogFetch.DatabaseList.Time: 0 - CatalogFetch.PartitionLists.Hits: 1 (1) - CatalogFetch.PartitionLists.Requests: 1 (1) - CatalogFetch.PartitionLists.Time: 0 - CatalogFetch.Partitions.Hits: 2 (2) - CatalogFetch.Partitions.Requests: 2 (2) - CatalogFetch.Partitions.Time: 0 - CatalogFetch.TableNames.Hits: 1 (1) - CatalogFetch.TableNames.Requests: 1 (1) - CatalogFetch.TableNames.Time: 0 - CatalogFetch.Tables.Misses: 1 (1) - CatalogFetch.Tables.Requests: 1 (1) - CatalogFetch.Tables.Time: 1ms (the number of lines is the same). If we think this is too long, we could simply count misses only, or we could aggregate all of the counters into a single "CatalogFetch" instead of breaking out by object type. We could also add a more specific thrift struct to profiles with its own stringification, but that seems like it might turn into quite a bit of work. http://gerrit.cloudera.org:8080/#/c/11388/3//COMMIT_MSG@33 PS3, Line 33: - worth trying to include the byte sizes of metadata fetched? > Could be useful from incremental stats / large partitioned tables. k i'll see if it's easy to do http://gerrit.cloudera.org:8080/#/c/11388/3/fe/src/main/java/org/apache/impala/catalog/local/CatalogdMetaProvider.java File fe/src/main/java/org/apache/impala/catalog/local/CatalogdMetaProvider.java: http://gerrit.cloudera.org:8080/#/c/11388/3/fe/src/main/java/org/apache/impala/catalog/local/CatalogdMetaProvider.java@282 PS3, Line 282: CacheStats > I see these are used only for tests at the moment. Is there anything worthw yea, I have a separate JIRA filed to add daemon-level stats as a metric http://gerrit.cloudera.org:8080/#/c/11388/3/fe/src/main/java/org/apache/impala/catalog/local/CatalogdMetaProvider.java@369 PS3, Line 369: addStatsToProfile > these stats are per query... would it be useful to aggregate these for the Yea, CacheStats will give top-level stats, but not per-object-type stats, etc. I think providing super actionable metrics on whether a larger cache would help (and how much larger it needs to be) is a bit tricky. Two approaches I've seen: - track the actual last access time/age of items that are evicted from the cache, and expose that as a metric. If you're evicting stuff that was used in the last five minutes, maybe you'd be better off with a larger cache to fit more recent stuff. CON: since we're using version-number based invalidation for granular table metadata instead of active eviction, it is totally fine to be evicting such items if they're stale, because we know we would never hit them again. So, this might not be an accurate measure. We could probably add some background thread scanning through the cache to more aggressively evict items which are stale-version-numbered to solve this. - add a shadow/victim cache which remembers the keys of evicted items, as well as a counter of how many bytes had been evicted from the cache at the time they were evicted. If we get a miss on the main cache which then hits the shadow cache, we can affirmatively calculate "if you had a cache that were N MB larger, this would have still been cache-resident", and then give pretty good "what-if" estimates for cache hit rates and performance gains based on a larger cache capacity. CON: more code complexity I'd like to defer these for later work though. http://gerrit.cloudera.org:8080/#/c/11388/3/fe/src/main/java/org/apache/impala/service/FrontendProfile.java File fe/src/main/java/org/apache/impala/service/FrontendProfile.java: http://gerrit.cloudera.org:8080/#/c/11388/3/fe/src/main/java/org/apache/impala/service/FrontendProfile.java@43 PS3, Line 43: * This class is thread-safe. > thx. I don't want to derail the discussion in this patch with incremental s I haven't looked much at those code paths but perhaps the "merge stats" code path could add a TRuntimeProfileNode to its result thrift type and reuse this same code? http://gerrit.cloudera.org:8080/#/c/11388/3/fe/src/main/java/org/apache/impala/service/FrontendProfile.java@151 PS3, Line 151: oldThreadLocalValue_ = THREAD_LOCAL.get(); > I see now. I think maybe all that is needed is more comments, perhaps in cr k I think I'll just prevent nested scopes and document it as such -- To view, visit http://gerrit.cloudera.org:8080/11388 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I808d55a225338912ebaebd0cf71c36db944b7276 Gerrit-Change-Number: 11388 Gerrit-PatchSet: 3 Gerrit-Owner: Todd Lipcon <t...@apache.org> Gerrit-Reviewer: Andrew Sherman <asher...@cloudera.com> Gerrit-Reviewer: Bharath Vissapragada <bhara...@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Todd Lipcon <t...@apache.org> Gerrit-Reviewer: Vuk Ercegovac <vercego...@cloudera.com> Gerrit-Comment-Date: Fri, 07 Sep 2018 18:56:18 +0000 Gerrit-HasComments: Yes