Dimitris Tsirogiannis has posted comments on this change. ( http://gerrit.cloudera.org:8080/8529 )
Change subject: [PREVIEW] IMPALA-4886: Expose table metrics in the catalog web UI. ...................................................................... Patch Set 2: (28 comments) Thanks for the review Bharath. Many nice suggestions. There will be a sequence of patches to improve our ability to monitor/debug the catalog. I just want to avoid cramming too much stuff in a single patch which is already kind of big. http://gerrit.cloudera.org:8080/#/c/8529/2/be/src/catalog/catalog-server.h File be/src/catalog/catalog-server.h: http://gerrit.cloudera.org:8080/#/c/8529/2/be/src/catalog/catalog-server.h@201 PS2, Line 201: used > Clarify what is 'used'? Done http://gerrit.cloudera.org:8080/#/c/8529/2/be/src/catalog/catalog-server.h@217 PS2, Line 217: Update > nit: May be 'GetTopNTableUsageMetrics()' or something (I know it sounds hor Done http://gerrit.cloudera.org:8080/#/c/8529/2/be/src/catalog/catalog-server.h@224 PS2, Line 224: object_name > nit: probably just 'name'? Its implicit (from the URL) that its a table. Done http://gerrit.cloudera.org:8080/#/c/8529/2/be/src/catalog/catalog-server.cc File be/src/catalog/catalog-server.cc: http://gerrit.cloudera.org:8080/#/c/8529/2/be/src/catalog/catalog-server.cc@359 PS2, Line 359: UpdateCatalogUsage(document); > Whats your take on moving top-x stuff to the /metrics page and populating t In principal, that's a good idea. I plan on doing it in the future for more generic catalog metrics. It will require some thinking into how to best integrate the JVM metrics with the metrics management system we use in the backend, so I thought of postponing it for another patch. http://gerrit.cloudera.org:8080/#/c/8529/2/be/src/catalog/catalog-server.cc@390 PS2, Line 390: has_metrics.SetBool(true); > Just curious, whats the use of has_metrics=true/has_large_tables=true? We c That's for preventing the impalad web ui from trying to print the catalog usage metrics. I had issue doing this from mustache, so I resorted to using this boolean flag. http://gerrit.cloudera.org:8080/#/c/8529/2/be/src/catalog/catalog-server.cc@492 PS2, Line 492: Webserver::ArgumentMap::const_iterator object_name_arg = args.find("object_name"); > should we enable json view for this? (?json) Good idea. Left a TODO. http://gerrit.cloudera.org:8080/#/c/8529/2/be/src/catalog/catalog.h File be/src/catalog/catalog.h: http://gerrit.cloudera.org:8080/#/c/8529/2/be/src/catalog/catalog.h@90 PS2, Line 90: in > nit: into? Done http://gerrit.cloudera.org:8080/#/c/8529/2/be/src/catalog/catalog.h@96 PS2, Line 96: Status GetCatalogUsage(TGetCatalogUsageResponse* response); > Same comment as in the other header, may be rename to convey that it is Get I renamed the other one to GetCatalogUsage. I thought of keeping it a bit more generic in case we want to add other stuff as well. Thoughts? http://gerrit.cloudera.org:8080/#/c/8529/2/common/thrift/Frontend.thrift File common/thrift/Frontend.thrift: http://gerrit.cloudera.org:8080/#/c/8529/2/common/thrift/Frontend.thrift@90 PS2, Line 90: // Arguments to getTableMetrics, which returns the metrics of a specific table. : struct TGetTableMetricsParams { : 1: required string db : 2: required string tbl : } : : // Response to a getTableMetrics request. The response contains all the collected metrics : // pretty-printed into a string. : struct TGetTableMetricsResponse { : 1: required string metrics : } > Can't we use TTableName and String directly? (or) probably use typedefs Good point about TTableName (used that). I kept the TGetTableMetrics* structs for consistency with the other API calls. http://gerrit.cloudera.org:8080/#/c/8529/2/common/thrift/JniCatalog.thrift File common/thrift/JniCatalog.thrift: http://gerrit.cloudera.org:8080/#/c/8529/2/common/thrift/JniCatalog.thrift@604 PS2, Line 604: Currently, it keeps track of the : // estimated memory usage and the number of metadata operations that were performed on : // that table since it was loaded. > Remove here and describe below may be, can likely become stale soon (Same f Done http://gerrit.cloudera.org:8080/#/c/8529/2/common/thrift/JniCatalog.thrift@607 PS2, Line 607: Stats > Looks like we are using Stats/Metrics interchangeably. Maybe use Metrics ev Good point. Done http://gerrit.cloudera.org:8080/#/c/8529/2/common/thrift/JniCatalog.thrift@608 PS2, Line 608: required string table_name > TTableName? Done http://gerrit.cloudera.org:8080/#/c/8529/2/fe/src/main/java/org/apache/impala/catalog/Catalog.java File fe/src/main/java/org/apache/impala/catalog/Catalog.java: http://gerrit.cloudera.org:8080/#/c/8529/2/fe/src/main/java/org/apache/impala/catalog/Catalog.java@154 PS2, Line 154: Table tbl = db.getTable(tableName); : if (tbl != null) { : tbl.incrementMetadataOpsCount(); : CatalogUsageMonitor.INSTANCE.updateFrequentlyAccessedTables(tbl); : } > Looks subtle and flaky to me. This seems to work on the premise that we cal Done http://gerrit.cloudera.org:8080/#/c/8529/2/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java: http://gerrit.cloudera.org:8080/#/c/8529/2/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@1007 PS2, Line 1007: final Timer.Context context > Should the scope include tryLockTable()? Done http://gerrit.cloudera.org:8080/#/c/8529/2/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@1372 PS2, Line 1372: public TGetCatalogUsageResponse getCatalogUsage() { > getTopN...Metrics? See previous comment. Thought best to keep it more generic. Let me know what you think. http://gerrit.cloudera.org:8080/#/c/8529/2/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@1391 PS2, Line 1391: public String getTableMetrics(String dbName, String tblName) throws CatalogException { > Doc? Done http://gerrit.cloudera.org:8080/#/c/8529/2/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@1392 PS2, Line 1392: Table tbl = getTable(dbName, tblName); > This is the problem I was talking about. This call increments the metadatOp Yeap, good point. Fixed it. http://gerrit.cloudera.org:8080/#/c/8529/2/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@1398 PS2, Line 1398: No metrics available for table " + dbName + "." + tblName > Also may be add "Table not yet loaded..."? Its unclear why metrics are not Done http://gerrit.cloudera.org:8080/#/c/8529/2/fe/src/main/java/org/apache/impala/catalog/CatalogUsageMonitor.java File fe/src/main/java/org/apache/impala/catalog/CatalogUsageMonitor.java: http://gerrit.cloudera.org:8080/#/c/8529/2/fe/src/main/java/org/apache/impala/catalog/CatalogUsageMonitor.java@47 PS2, Line 47: true > It is unclear why evict should be true (or) whats the usecase for that. I wanted to avoid the scenario where 25 tables that were accessed in the past prevent any other table from showing in the top n. In your example, if the 25 tables are still relevant (accessed frequently), they will keep showing up in the topN. Essentially, all the infrequent ones will be competing for the last spot in the list. Obviously, there are other better ways to implement something like that but that was just one simple approach. What I really want is an LFU cache with dynamic aging. http://gerrit.cloudera.org:8080/#/c/8529/2/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java: http://gerrit.cloudera.org:8080/#/c/8529/2/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@221 PS2, Line 221: StorageStats > Maybe call FileMetadataStats to be ni sync with rest of the class? Done http://gerrit.cloudera.org:8080/#/c/8529/2/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@1201 PS2, Line 1201: public void load(boolean reuseMetadata, IMetaStoreClient client, > Make rest of the load()'s private so that all loads happen via this call an Unfortunately, they both need to be public (sigh). Since the former calls the latter, I moved the timer there to make sure we capture all calls. http://gerrit.cloudera.org:8080/#/c/8529/2/fe/src/main/java/org/apache/impala/catalog/KuduTable.java File fe/src/main/java/org/apache/impala/catalog/KuduTable.java: http://gerrit.cloudera.org:8080/#/c/8529/2/fe/src/main/java/org/apache/impala/catalog/KuduTable.java@234 PS2, Line 234: kuduTableName_, e); > formatting (below too) Done http://gerrit.cloudera.org:8080/#/c/8529/2/fe/src/main/java/org/apache/impala/catalog/Table.java File fe/src/main/java/org/apache/impala/catalog/Table.java: http://gerrit.cloudera.org:8080/#/c/8529/2/fe/src/main/java/org/apache/impala/catalog/Table.java@135 PS2, Line 135: public void setEstimatedMetadataSize(long estimatedMetadataSize) { : estimatedMetadataSize_.set(estimatedMetadataSize); : } : : public void incrementMetadataOpsCount() { metadataOpsCount_.incrementAndGet(); } > Can these make a call to CatalogUsageMonitor.increment*(this) too? We can a Good point. Done http://gerrit.cloudera.org:8080/#/c/8529/2/fe/src/main/java/org/apache/impala/common/Metrics.java File fe/src/main/java/org/apache/impala/common/Metrics.java: http://gerrit.cloudera.org:8080/#/c/8529/2/fe/src/main/java/org/apache/impala/common/Metrics.java@143 PS2, Line 143: > %le ? Not sure I follow. Why le? I just want to print the % to indicate percentile. http://gerrit.cloudera.org:8080/#/c/8529/2/fe/src/main/java/org/apache/impala/common/Metrics.java@147 PS2, Line 147: } > Same as my comment in the backend, can we expose json versions of these? to Added a TODO. http://gerrit.cloudera.org:8080/#/c/8529/2/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java: http://gerrit.cloudera.org:8080/#/c/8529/2/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@259 PS2, Line 259: > Can we add gauge metrics around this switch-case block to count DDLs by typ Good point. That will be added in a follow up patch where we will add these types of metrics (not tied to specific objects). We also want to add a a view of in-flight catalog operations (similar to what impalads have in queries/ tab). http://gerrit.cloudera.org:8080/#/c/8529/2/fe/src/main/java/org/apache/impala/util/TopNCache.java File fe/src/main/java/org/apache/impala/util/TopNCache.java: http://gerrit.cloudera.org:8080/#/c/8529/2/fe/src/main/java/org/apache/impala/util/TopNCache.java@41 PS2, Line 41: R extends Long> > Do we need this? It is almost always Long, no? Yes, it's kind of annoying to have to specify it here. I need it for the function_ declaration though. http://gerrit.cloudera.org:8080/#/c/8529/2/www/catalog.tmpl File www/catalog.tmpl: http://gerrit.cloudera.org:8080/#/c/8529/2/www/catalog.tmpl@23 PS2, Line 23: {{?has_large_tables}} > Like I mentioned elsewhere, my feeling is that these belong to metrics and You can get ?json of that page and get the metrics (I just tried it). We need to think about the catalog metrics a bit more. The metrics exposed in this patch are tied to specific objects (tables that come and go). To my mind there is another class of "generic" metrics to capture the catalog's behavior and I think this should definitely be in the /metrics page. -- To view, visit http://gerrit.cloudera.org:8080/8529 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I37d407979e6d3b1a444b6b6265900b148facde9e Gerrit-Change-Number: 8529 Gerrit-PatchSet: 2 Gerrit-Owner: Dimitris Tsirogiannis <dtsirogian...@cloudera.com> Gerrit-Reviewer: Bharath Vissapragada <bhara...@cloudera.com> Gerrit-Reviewer: Dimitris Tsirogiannis <dtsirogian...@cloudera.com> Gerrit-Reviewer: Philip Zeyliger <phi...@cloudera.com> Gerrit-Reviewer: Vuk Ercegovac <vercego...@cloudera.com> Gerrit-Comment-Date: Mon, 11 Dec 2017 19:43:56 +0000 Gerrit-HasComments: Yes