chewbranca commented on code in PR #5491: URL: https://github.com/apache/couchdb/pull/5491#discussion_r2059415798
########## src/couch_stats/src/csrt_logger.erl: ########## @@ -0,0 +1,401 @@ +% Licensed under the Apache License, Version 2.0 (the "License"); you may not +% use this file except in compliance with the License. You may obtain a copy of +% the License at +% +% http://www.apache.org/licenses/LICENSE-2.0 +% +% Unless required by applicable law or agreed to in writing, software +% distributed under the License is distributed on an "AS IS" BASIS, WITHOUT +% WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the +% License for the specific language governing permissions and limitations under +% the License. + +-module(csrt_logger). + +%% Process lifetime logging api +-export([ + get_tracker/0, + log_process_lifetime_report/1, + put_tracker/1, + stop_tracker/0, + stop_tracker/1, + track/1, + tracker/1 +]). + +%% Raw API that bypasses is_enabled checks +-export([ + do_lifetime_report/1, + do_status_report/1, + do_report/2, + maybe_report/2 +]). + +-export([ + start_link/0, + init/1, + handle_call/3, + handle_cast/2, + handle_info/2 +]). + +%% Config update subscription API +-export([ + subscribe_changes/0, + handle_config_change/5, + handle_config_terminate/3 +]). + +%% Matchers +-export([ + get_matcher/1, + get_matchers/0, + is_match/1, + is_match/2, + matcher_on_dbname/1, + matcher_on_docs_read/1, + matcher_on_docs_written/1, + matcher_on_rows_read/1, + matcher_on_worker_changes_processed/1, + matcher_on_ioq_calls/1, + matcher_on_nonce/1, + reload_matchers/0 +]). + +-include_lib("stdlib/include/ms_transform.hrl"). +-include_lib("couch_stats_resource_tracker.hrl"). + +-define(MATCHERS_KEY, {?MODULE, all_csrt_matchers}). +-define(CONF_MATCHERS_ENABLED, "csrt_logger.matchers_enabled"). +-define(CONF_MATCHERS_THRESHOLD, "csrt_logger.matchers_threshold"). +-define(CONF_MATCHERS_DBNAMES, "csrt_logger.dbnames_io"). + +-record(st, { + matchers = #{} +}). + +-spec track(Rctx :: rctx()) -> pid(). +track(#rctx{pid_ref=PidRef}) -> + case get_tracker() of + undefined -> + Pid = spawn(?MODULE, tracker, [PidRef]), + put_tracker(Pid), + Pid; + Pid when is_pid(Pid) -> + Pid + end. + +-spec tracker(PidRef :: pid_ref()) -> ok. +tracker({Pid, _Ref}=PidRef) -> Review Comment: The idea with naming it `PidRef` is intentionally to _not_ be totally opaque and give a natural logic to what these ID values represent. For example, a coordinator process creates a new `PidRef` by way of `{self(), make_ref()}` and now we've got a logical handle to a particular process inducing work associated with a particular reference. So I wanted this to be a unique ID value that makes sense as to what it represents because we have all kinds of different process hierarchies within CouchDB with different requirements. For example, the `chttpd` processes that actually call `csrt:create_coordinator_context` are reused processes so the need a new ref for every incoming http request and I wanted to be _really_ sure we're not overlapping counting between requests. So the idea is that a `PidRef` is very much a particular process with a `ref()` that represents a subset of the work that process performs, where it could be a full or partial subset. Then you can pull a `PidRef` out of a JSON blob returned from querying the `_active_resources` and go find all the associated workers or look directly at the pid and see what all it's doing. This becomes even more important for extending beyond the RPC worker process and tracking induced work in other areas of the system. For example, passing along a `PidRef` when making a request to `couch_file` is essential for tracking work induced within `couch_file` itself by the particular worker process, or compactor process, or indexer, etc. I do agree on minimizing the defined structural definitions within the codebase, and to that effect I migrated most all actual instances of pattern matching on `PidRef` into `csrt_server` itself, but I missed this one and `csrt:destroy_context`: ``` (! 6187)-> grep -r '{[^,]\+,[^}]\+} \?= \?PidRef' src/couch_stats/src/*.erl src/couch_stats/src/csrt.erl:destroy_context({_, _} = PidRef) -> src/couch_stats/src/csrt_logger.erl:tracker({Pid, _Ref} = PidRef) -> src/couch_stats/src/csrt_server.erl:destroy_resource({_, _} = PidRef) -> src/couch_stats/src/csrt_server.erl:update_counter({_Pid, _Ref} = PidRef, Field, Count) when Count >= 0 -> src/couch_stats/src/csrt_server.erl:inc({_Pid, _Ref} = PidRef, Field, N) when is_integer(N) andalso N > 0 -> src/couch_stats/src/csrt_server.erl:update_element({_Pid, _Ref} = PidRef, Update) -> ``` I'll drop the structural pattern matching on `PidRef` out of `csrt.erl` and `csrt_logger.erl`, but I feel that the notion of a `PidRef` makes sense logically and helps mentally model the flow of this data through the system, at least for me. On an amusing note, I had a similar thought process as you when naming this, and one of the reasons I went with `PidRef` is we already have precedent for referencing the notion of "PidRef"'s here https://github.com/apache/couchdb/blob/couch-stats-resource-tracker-v3-rebase/src/mem3/src/mem3.erl#L457-L458 and in fact it's even utilized in the OTP codebase as well: https://github.com/search?q=repo%3Aerlang%2Fotp%20PidRef&type=code -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
