Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/20377 )
Change subject: IMPALA-12385: Enable Periodic metrics by default ...................................................................... Patch Set 2: (5 comments) http://gerrit.cloudera.org:8080/#/c/20377/1//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/20377/1//COMMIT_MSG@12 PS1, Line 12: resource_trace_ratio to 1 AFAIK, there is a pretty significant overhead on always sampling this metrics. Seems like parsing /proc/stat, /proc/net/dev, /proc/diskstats does not come cheap. I'm not sure if this should be enabled by default. http://gerrit.cloudera.org:8080/#/c/20377/1/be/src/runtime/query-state.cc File be/src/runtime/query-state.cc: http://gerrit.cloudera.org:8080/#/c/20377/1/be/src/runtime/query-state.cc@221 PS1, Line 221: AddSamplingTimeSeriesCounter Will this cause interpretation problem if different host happen to resize its sampling period differently? In contrast, ChunkedTimeSeriesCounter does not resize it sampling period, right? http://gerrit.cloudera.org:8080/#/c/20377/1/be/src/util/periodic-counter-updater.cc File be/src/util/periodic-counter-updater.cc: http://gerrit.cloudera.org:8080/#/c/20377/1/be/src/util/periodic-counter-updater.cc@30 PS1, Line 30: periodic_counter_update_period_ms, 50 I'm a bit concern about lowering this to 10x. Can the code in PeriodicCounterUpdater::UpdateLoop() keep up in such short sampling period under heavy-concurrent queries? It looks like PeriodicCounterUpdater is a singleton per impalad. http://gerrit.cloudera.org:8080/#/c/20377/1/be/src/util/runtime-profile-counters.h File be/src/util/runtime-profile-counters.h: http://gerrit.cloudera.org:8080/#/c/20377/1/be/src/util/runtime-profile-counters.h@807 PS1, Line 807: typedef StreamingSampler<int64_t, 64> StreamingCounterSampler; If initial_period = 50ms, and MAX_SAMPLES = 64, that means it will take 3200ms before the sampling period doubled to 100ms. Will this hurt performance of short latency queries? http://gerrit.cloudera.org:8080/#/c/20377/1/be/src/util/streaming-sampler.h File be/src/util/streaming-sampler.h: http://gerrit.cloudera.org:8080/#/c/20377/1/be/src/util/streaming-sampler.h@40 PS1, Line 40: int initial_period I'd rather keep this default to 500, but then add new parameter in AddSamplingTimeSeriesCounter for customized initial_period. I see this kind of counter is being used in other places like following: be/src/runtime/fragment-instance-state.cc: mem_usage_sampled_counter_ = profile()->AddSamplingTimeSeriesCounter("MemoryUsage", be/src/runtime/fragment-instance-state.cc: thread_usage_sampled_counter_ = profile()->AddSamplingTimeSeriesCounter("ThreadUsage", be/src/runtime/krpc-data-stream-recvr.cc: enqueue_profile_->AddSamplingTimeSeriesCounter("DeferredQueueSize", TUnit::UNIT, Their sampling period should probably stay at 500, while sampling counters from host_profile_ starts at lower initial_period. -- To view, visit http://gerrit.cloudera.org:8080/20377 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic8e5cbfd4b324081158574ceb8f4b3a062a69fd1 Gerrit-Change-Number: 20377 Gerrit-PatchSet: 2 Gerrit-Owner: Kurt Deschler <kdesc...@cloudera.com> Gerrit-Reviewer: David Rorke <dro...@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Riza Suminto <riza.sumi...@cloudera.com> Gerrit-Reviewer: Surya Hebbar <sheb...@cloudera.com> Gerrit-Comment-Date: Fri, 18 Aug 2023 16:23:59 +0000 Gerrit-HasComments: Yes