Todd Lipcon has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11388 )

Change subject: WIP: IMPALA-7527: add fetch-from-catalogd cache info to profile
......................................................................


Patch Set 3:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/11388/3//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/11388/3//COMMIT_MSG@32
PS3, Line 32:   breaking down by category?
> Yep, just looking at the code, I have a feeling it could be chatty and most
Here's a sample of 'explain select * from lineitem; profile;' on a cold cache:

  Frontend:
     - CatalogFetch.ColumnStats.Misses: 16 (16)
     - CatalogFetch.ColumnStats.Requests: 16 (16)
     - CatalogFetch.ColumnStats.Time: 21ms
     - CatalogFetch.Config.Misses: 1 (1)
     - CatalogFetch.Config.Requests: 1 (1)
     - CatalogFetch.Config.Time: 37ms
     - CatalogFetch.DatabaseList.Hits: 1 (1)
     - CatalogFetch.DatabaseList.Requests: 1 (1)
     - CatalogFetch.DatabaseList.Time: 0
     - CatalogFetch.PartitionLists.Misses: 1 (1)
     - CatalogFetch.PartitionLists.Requests: 1 (1)
     - CatalogFetch.PartitionLists.Time: 5ms
     - CatalogFetch.Partitions.Hits: 2 (2)
     - CatalogFetch.Partitions.Misses: 1 (1)
     - CatalogFetch.Partitions.Requests: 3 (3)
     - CatalogFetch.Partitions.Time: 13ms
     - CatalogFetch.TableNames.Hits: 1 (1)
     - CatalogFetch.TableNames.Requests: 1 (1)
     - CatalogFetch.TableNames.Time: 0
     - CatalogFetch.Tables.Misses: 1 (1)
     - CatalogFetch.Tables.Requests: 1 (1)
     - CatalogFetch.Tables.Time: 345ms

on a warm cache:
  Frontend:
     - CatalogFetch.ColumnStats.Hits: 16 (16)
     - CatalogFetch.ColumnStats.Requests: 16 (16)
     - CatalogFetch.ColumnStats.Time: 0
     - CatalogFetch.Config.Hits: 1 (1)
     - CatalogFetch.Config.Requests: 1 (1)
     - CatalogFetch.Config.Time: 0
     - CatalogFetch.DatabaseList.Hits: 1 (1)
     - CatalogFetch.DatabaseList.Requests: 1 (1)
     - CatalogFetch.DatabaseList.Time: 0
     - CatalogFetch.PartitionLists.Hits: 1 (1)
     - CatalogFetch.PartitionLists.Requests: 1 (1)
     - CatalogFetch.PartitionLists.Time: 0
     - CatalogFetch.Partitions.Hits: 2 (2)
     - CatalogFetch.Partitions.Requests: 2 (2)
     - CatalogFetch.Partitions.Time: 0
     - CatalogFetch.TableNames.Hits: 1 (1)
     - CatalogFetch.TableNames.Requests: 1 (1)
     - CatalogFetch.TableNames.Time: 0
     - CatalogFetch.Tables.Misses: 1 (1)
     - CatalogFetch.Tables.Requests: 1 (1)
     - CatalogFetch.Tables.Time: 1ms


(the number of lines is the same). If we think this is too long, we could 
simply count misses only, or we could aggregate all of the counters into a 
single "CatalogFetch" instead of breaking out by object type. We could also add 
a more specific thrift struct to profiles with its own stringification, but 
that seems like it might turn into quite a bit of work.


http://gerrit.cloudera.org:8080/#/c/11388/3//COMMIT_MSG@33
PS3, Line 33: - worth trying to include the byte sizes of metadata fetched?
> Could be useful from incremental stats / large partitioned tables.
k i'll see if it's easy to do


http://gerrit.cloudera.org:8080/#/c/11388/3/fe/src/main/java/org/apache/impala/catalog/local/CatalogdMetaProvider.java
File fe/src/main/java/org/apache/impala/catalog/local/CatalogdMetaProvider.java:

http://gerrit.cloudera.org:8080/#/c/11388/3/fe/src/main/java/org/apache/impala/catalog/local/CatalogdMetaProvider.java@282
PS3, Line 282: CacheStats
> I see these are used only for tests at the moment. Is there anything worthw
yea, I have a separate JIRA filed to add daemon-level stats as a metric


http://gerrit.cloudera.org:8080/#/c/11388/3/fe/src/main/java/org/apache/impala/catalog/local/CatalogdMetaProvider.java@369
PS3, Line 369: addStatsToProfile
> these stats are per query... would it be useful to aggregate these for the
Yea, CacheStats will give top-level stats, but not per-object-type stats, etc.

I think providing super actionable metrics on whether a larger cache would help 
(and how much larger it needs to be) is a bit tricky. Two approaches I've seen:

- track the actual last access time/age of items that are evicted from the 
cache, and expose that as a metric. If you're evicting stuff that was used in 
the last five minutes, maybe you'd be better off with a larger cache to fit 
more recent stuff.
CON: since we're using version-number based invalidation for granular table 
metadata instead of active eviction, it is totally fine to be evicting such 
items if they're stale, because we know we would never hit them again. So, this 
might not be an accurate measure. We could probably add some background thread 
scanning through the cache to more aggressively evict items which are 
stale-version-numbered to solve this.

- add a shadow/victim cache which remembers the keys of evicted items, as well 
as a counter of how many bytes had been evicted from the cache at the time they 
were evicted. If we get a miss on the main cache which then hits the shadow 
cache, we can affirmatively calculate "if you had a cache that were N MB 
larger, this would have still been cache-resident", and then give pretty good 
"what-if" estimates for cache hit rates and performance gains based on a larger 
cache capacity.
CON: more code complexity

I'd like to defer these for later work though.


http://gerrit.cloudera.org:8080/#/c/11388/3/fe/src/main/java/org/apache/impala/service/FrontendProfile.java
File fe/src/main/java/org/apache/impala/service/FrontendProfile.java:

http://gerrit.cloudera.org:8080/#/c/11388/3/fe/src/main/java/org/apache/impala/service/FrontendProfile.java@43
PS3, Line 43:  * This class is thread-safe.
> thx. I don't want to derail the discussion in this patch with incremental s
I haven't looked much at those code paths but perhaps the "merge stats" code 
path could add a TRuntimeProfileNode to its result thrift type and reuse this 
same code?


http://gerrit.cloudera.org:8080/#/c/11388/3/fe/src/main/java/org/apache/impala/service/FrontendProfile.java@151
PS3, Line 151:       oldThreadLocalValue_ = THREAD_LOCAL.get();
> I see now. I think maybe all that is needed is more comments, perhaps in cr
k I think I'll just prevent nested scopes and document it as such



--
To view, visit http://gerrit.cloudera.org:8080/11388
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I808d55a225338912ebaebd0cf71c36db944b7276
Gerrit-Change-Number: 11388
Gerrit-PatchSet: 3
Gerrit-Owner: Todd Lipcon <t...@apache.org>
Gerrit-Reviewer: Andrew Sherman <asher...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bhara...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <t...@apache.org>
Gerrit-Reviewer: Vuk Ercegovac <vercego...@cloudera.com>
Gerrit-Comment-Date: Fri, 07 Sep 2018 18:56:18 +0000
Gerrit-HasComments: Yes

Reply via email to