chewbranca commented on code in PR #5491:
URL: https://github.com/apache/couchdb/pull/5491#discussion_r2059415798


##########
src/couch_stats/src/csrt_logger.erl:
##########
@@ -0,0 +1,401 @@
+% Licensed under the Apache License, Version 2.0 (the "License"); you may not
+% use this file except in compliance with the License. You may obtain a copy of
+% the License at
+%
+%   http://www.apache.org/licenses/LICENSE-2.0
+%
+% Unless required by applicable law or agreed to in writing, software
+% distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
+% WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
+% License for the specific language governing permissions and limitations under
+% the License.
+
+-module(csrt_logger).
+
+%% Process lifetime logging api
+-export([
+    get_tracker/0,
+    log_process_lifetime_report/1,
+    put_tracker/1,
+    stop_tracker/0,
+    stop_tracker/1,
+    track/1,
+    tracker/1
+]).
+
+%% Raw API that bypasses is_enabled checks
+-export([
+    do_lifetime_report/1,
+    do_status_report/1,
+    do_report/2,
+    maybe_report/2
+]).
+
+-export([
+    start_link/0,
+    init/1,
+    handle_call/3,
+    handle_cast/2,
+    handle_info/2
+]).
+
+%% Config update subscription API
+-export([
+    subscribe_changes/0,
+    handle_config_change/5,
+    handle_config_terminate/3
+]).
+
+%% Matchers
+-export([
+    get_matcher/1,
+    get_matchers/0,
+    is_match/1,
+    is_match/2,
+    matcher_on_dbname/1,
+    matcher_on_docs_read/1,
+    matcher_on_docs_written/1,
+    matcher_on_rows_read/1,
+    matcher_on_worker_changes_processed/1,
+    matcher_on_ioq_calls/1,
+    matcher_on_nonce/1,
+    reload_matchers/0
+]).
+
+-include_lib("stdlib/include/ms_transform.hrl").
+-include_lib("couch_stats_resource_tracker.hrl").
+
+-define(MATCHERS_KEY, {?MODULE, all_csrt_matchers}).
+-define(CONF_MATCHERS_ENABLED, "csrt_logger.matchers_enabled").
+-define(CONF_MATCHERS_THRESHOLD, "csrt_logger.matchers_threshold").
+-define(CONF_MATCHERS_DBNAMES, "csrt_logger.dbnames_io").
+
+-record(st, {
+    matchers = #{}
+}).
+
+-spec track(Rctx :: rctx()) -> pid().
+track(#rctx{pid_ref=PidRef}) ->
+    case get_tracker() of
+        undefined ->
+            Pid = spawn(?MODULE, tracker, [PidRef]),
+            put_tracker(Pid),
+            Pid;
+        Pid when is_pid(Pid) ->
+            Pid
+    end.
+
+-spec tracker(PidRef :: pid_ref()) -> ok.
+tracker({Pid, _Ref}=PidRef) ->

Review Comment:
   The idea with naming it `PidRef` is intentionally to _not_ be totally opaque 
and give a natural logic to what these ID values represent. For example, a 
coordinator process creates a new `PidRef` by way of `{self(), make_ref()}` and 
now we've got a logical handle to a particular process inducing work associated 
with a particular reference.
   
   So I wanted this to be a unique ID value that makes sense as to what it 
represents because we have all kinds of different process hierarchies within 
CouchDB with different requirements. For example, the `chttpd` processes that 
actually call `csrt:create_coordinator_context` are reused processes so the 
need a new ref for every incoming http request and I wanted to be _really_ sure 
we're not overlapping counting between requests.
   
   So the idea is that a `PidRef` is very much a particular process with a 
`ref()` that represents a subset of the work that process performs, where it 
could be a full or partial subset. Then you can pull a `PidRef` out of a JSON 
blob returned from querying the `_active_resources` and go find all the 
associated workers or look directly at the pid and see what all it's doing.
   
   This becomes even more important for extending beyond the RPC worker process 
and tracking induced work in other areas of the system. For example, passing 
along a `PidRef` when making a request to `couch_file` is essential for 
tracking work induced within `couch_file` itself by the particular worker 
process, or compactor process, or indexer, etc.
   
   I do agree on minimizing the defined structural definitions within the 
codebase, and to that effect I migrated most all actual instances of pattern 
matching on `PidRef` into `csrt_server` itself, but I missed this one and 
`csrt:destroy_context`:
   
   ```
   (! 6187)-> grep -r '{[^,]\+,[^}]\+} \?= \?PidRef' src/couch_stats/src/*.erl
   src/couch_stats/src/csrt.erl:destroy_context({_, _} = PidRef) ->
   src/couch_stats/src/csrt_logger.erl:tracker({Pid, _Ref} = PidRef) ->
   src/couch_stats/src/csrt_server.erl:destroy_resource({_, _} = PidRef) ->
   src/couch_stats/src/csrt_server.erl:update_counter({_Pid, _Ref} = PidRef, 
Field, Count) when Count >= 0 ->
   src/couch_stats/src/csrt_server.erl:inc({_Pid, _Ref} = PidRef, Field, N) 
when is_integer(N) andalso N > 0 ->
   src/couch_stats/src/csrt_server.erl:update_element({_Pid, _Ref} = PidRef, 
Update) ->
   ```
   
   I'll drop the structural pattern matching on `PidRef` out of `csrt.erl` and 
`csrt_logger.erl`, but I feel that the notion of a `PidRef` makes sense 
logically and helps mentally model the flow of this data through the system, at 
least for me. On an amusing note, I had a similar thought process as you when 
naming this, and one of the reasons I went with `PidRef` is we already have 
precedent for referencing the notion of "PidRef"'s here 
https://github.com/apache/couchdb/blob/couch-stats-resource-tracker-v3-rebase/src/mem3/src/mem3.erl#L457-L458
 and in fact it's even utilized in the OTP codebase as well: 
https://github.com/search?q=repo%3Aerlang%2Fotp%20PidRef&type=code



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to