> Previously you were talking about the trace filter configuration settings > being opaque blobs, but they don’t look opaque in the RFC. Did that change?
I recently discovered that there are multiple flavors of otter. The most developed one is https://github.com/project-fifo/otters. This version has a configuration DSL and generates beam module for fast filtering. Best regards, ILYA (iilyak) On 2019/09/16 20:16:26, Adam Kocoloski <kocol...@apache.org> wrote: > Looks good, I left a few comments. > > Previously you were talking about the trace filter configuration settings > being opaque blobs, but they don’t look opaque in the RFC. Did that change? > > Adam > > > On Sep 16, 2019, at 3:38 PM, Ilya Khlopotov <iil...@apache.org> wrote: > > > > Hi, > > > > The RFC is ready for review > > https://github.com/apache/couchdb-documentation/pull/440 > > > > Best regards, > > ILYA (aka iilyak) > > > > On 2019/09/10 18:32:03, Ilya Khlopotov <iil...@apache.org> wrote: > >> Hi, > >> > >> I wanted to run this idea by the ML to see if there is any interest before > >> investing time into preparing formal RFC. > >> > >> # Problem statement > >> > >> Collecting profiling data is very tricky at the moment. Developers have to > >> run generic profiling tools which are not aware of CouchDB specifics. > >> This makes it hard to do the performance optimization work. We need a tool > >> which would allow us to get profiling data from specific points in the > >> codebase. This means code instrumentation. > >> > >> # Proposed solution > >> > >> There is an https://opentracing.io/ project, which is a vendor-neutral > >> APIs and instrumentation for distributed tracing. In Erlang it is > >> implemented by https://github.com/Bluehouse-Technology/otter library. The > >> library provides a nice abstraction to start/finish tracing spans as well > >> as adding tags and log entries to a given span. In the context of CouchDB > >> this means that we can do something like the following: > >> - start tracing span on every HTTP request > >> - add tags to capture additional information such as "database name"/"name > >> of endpoint"/"nonce" > >> - add otter logs in critical parts of the codebase to get profiling data > >> for these points. > >> > >> The otter is the most useful in combination with > >> [zipkin](https://zipkin.io/) compatible server such as > >> [jaeger](https://github.com/jaegertracing/jaeger). However it can be used > >> even without zipkin. It has a configurable set of counters, which makes it > >> possible to get answers on questions like: > >> - what kind of requests are slow > >> - if we get a slow request (taking longer then configured threshold) what > >> was the trace (annotated with time spent between trace points) > >> - which function in the trace taking the most time > >> > >> # Otter API > >> > >> The otter has multiple APIs which we would choose on a case by case basis: > >> - functional API - the span structure need to be passed around (we could > >> extend `#http{}`/`#user_ctx{}`/`#db{}`) > >> - simple process dictionary API - the span data are stored in the process > >> dictionary > >> - Multiple span process dictionary API - supports multiple spans per > >> process > >> - Span id API - starts a process per span > >> > >> # Roadblocks > >> > >> One of the problems we would need to solve is to how to do multiple nodes > >> spans. We would need this functionality to trace the request from the HTTP > >> endpoint handler running on coordinator to the shard updater process > >> running on the storage nodes. > >> > >> We could use either: > >> - extend rexi or fabric to pass and aggregate span information > >> - pass span info explicitly in every fabric function. > >> > >> # Quick demo (warning very technical content) > >> > >> The goal of this demo is to demonstrate the value of otter without zipkin > >> server. > >> > >> ``` > >> diff --git a/rebar.config.script b/rebar.config.script > >> index c38b6e235..c2b162751 100644 > >> --- a/rebar.config.script > >> +++ b/rebar.config.script > >> @@ -129,6 +129,11 @@ OptionalDeps = case WithProper of > >> [] > >> end, > >> > >> +ManualDeps = [ > >> + {otter, {url, "https://github.com/Bluehouse-Technology/otter"}, > >> {branch, "master"}}, > >> + {otter_lib, {url, "https://github.com/Bluehouse-Technology/otter_lib"}, > >> {branch, "master"}} > >> +], > >> + > >> BaseUrl = "https://github.com/apache/", > >> > >> MakeDep = fun > >> @@ -152,7 +157,7 @@ end, > >> AddConfig = [ > >> {require_otp_vsn, "19|20|21|22"}, > >> {deps_dir, "src"}, > >> - {deps, lists:map(MakeDep, DepDescs ++ OptionalDeps)}, > >> + {deps, lists:map(MakeDep, DepDescs ++ OptionalDeps ++ ManualDeps)}, > >> {sub_dirs, SubDirs}, > >> {lib_dirs, ["src"]}, > >> {erl_opts, [{i, "../"} | ErlOpts]}, > >> diff --git a/src/chttpd/src/chttpd.erl b/src/chttpd/src/chttpd.erl > >> index 1e1d638be..a7aad5010 100644 > >> --- a/src/chttpd/src/chttpd.erl > >> +++ b/src/chttpd/src/chttpd.erl > >> @@ -158,6 +158,7 @@ handle_request(MochiReq0) -> > >> > >> handle_request_int(MochiReq) -> > >> Begin = os:timestamp(), > >> + otter_span_pdict_api:start_with_tags("http request", [{"begin_ts", > >> Begin}]), > >> case config:get("chttpd", "socket_options") of > >> undefined -> > >> ok; > >> @@ -233,6 +234,7 @@ handle_request_int(MochiReq) -> > >> > >> % put small token on heap to keep requests synced to backend calls > >> erlang:put(nonce, Nonce), > >> + otter_span_pdict_api:tag("nonce", Nonce), > >> > >> % suppress duplicate log > >> erlang:put(dont_log_request, true), > >> @@ -282,6 +284,8 @@ after_request(HttpReq, HttpResp0) -> > >> end, > >> HttpResp2 = update_stats(HttpReq, HttpResp1), > >> chttpd_stats:report(HttpReq, HttpResp2), > >> + otter_span_pdict_api:tag("status", HttpResp2#httpd_resp.status), > >> + otter_span_pdict_api:log("completed"), > >> + otter_span_pdict_api:finish(), > >> maybe_log(HttpReq, HttpResp2), > >> HttpResp2. > >> > >> diff --git a/src/fabric/src/fabric.erl b/src/fabric/src/fabric.erl > >> index 27fa8c045..a1972b445 100644 > >> --- a/src/fabric/src/fabric.erl > >> +++ b/src/fabric/src/fabric.erl > >> @@ -56,6 +56,7 @@ all_dbs() -> > >> %% @doc returns a list of all database names > >> -spec all_dbs(Prefix::iodata()) -> {ok, [binary()]}. > >> all_dbs(Prefix) when is_binary(Prefix) -> > >> + otter_span_pdict_api:tag("request_type", "_all_dbs"), > >> Length = byte_size(Prefix), > >> MatchingDbs = mem3:fold_shards(fun(#shard{dbname=DbName}, Acc) -> > >> case DbName of > >> ``` > >> > >> ``` > >> application:start(otter_lib). > >> application:start(otter). > >> f(Rules), Rules = [ > >> {[ > >> {greater, otter_span_duration, 10} > >> ],[ > >> {snapshot_count, [long_span], [otter_span_name]}, > >> send_to_zipkin > >> ]} > >> ]. > >> otter_config:write(filter_rules, Rules). > >> otter:counter_snapshot([long_span,"http request"]). > >> [{[long_span,"http request"], > >> [{snap_timestamp,{2019,9,10,13,46,43,368208}}, > >> {data,{span,1568123203366286,17299637839902614236, > >> "http request",782788946072648712,undefined, > >> [{"begin_ts",{1568,123203,366255}}, > >> {"nonce","a0a1d7c58e"}, > >> {"status",ok}], > >> [],1911}}]}] > >> ``` > >> > >> Unfortunately `counter_snapshot` API doesn't return log events. However > >> this problem is very easy to overcome. Here is the example of how to get > >> matching spans logged into a log file. > >> > >> ``` > >> diff --git a/src/chttpd/src/chttpd.erl b/src/chttpd/src/chttpd.erl > >> index 1e1d638be..5f2d60690 100644 > >> --- a/src/chttpd/src/chttpd.erl > >> +++ b/src/chttpd/src/chttpd.erl > >> @@ -33,7 +33,7 @@ > >> send_chunked_error/2, send_json/2,send_json/3,send_json/4, > >> validate_ctype/2]). > >> > >> --export([authenticate_request/3]). > >> +-export([authenticate_request/3, zipkin_log/2]). > >> > >> @@ -1207,6 +1216,10 @@ get_user(#httpd{user_ctx = #user_ctx{name = User}}) > >> -> > >> get_user(#httpd{user_ctx = undefined}) -> > >> "undefined". > >> > >> +zipkin_log(_URL, EncodedSpans) -> > >> + Spans = otter_lib_zipkin_thrift:decode_spans(EncodedSpans), > >> + couch_log:error("ZIPKIN :: ~p~n", [Spans]). > >> + > >> ``` > >> > >> ``` > >> otter_config:write(http_client, {chttpd, zipkin_log}). > >> ``` > >> > >> This would produce the log events looking as the following: > >> ``` > >> [error] 2019-09-10T18:01:38.758631Z node1@127.0.0.1 <0.4737.0> -------- > >> ZIPKIN :: [{span,1568138498710679,3212384889493927141,<<"http > >> request">>,345610195655038913,undefined,[{<<"lc">>,<<>>,{<<"otter_test">>,{127,0,0,1},0}},{<<"begin_ts">>,<<"{1568,138498,710639}">>},{<<"path">>,<<"_all_dbs">>},{<<"method">>,<<"GET">>},{<<"nonce">>,<<"c225c6aef1">>},{<<"status">>,<<"ok">>}],[{1568138498712456,<<"completed">>}],1784}] > >> ``` > >> > >> As you can see the timestamp of the `completed` event is included in the > >> entry. Also, since it is a function call we can format the event in any > >> way we like. > >> > >> # Conclusion > >> > >> - The otter configuration is simple and presence of `http_client` > >> parameter allows us to use otter without zipkin server. > >> - The API is simple which makes it possible to replace otter with > >> something else if we wouldn't be happy with the implementation. > >> - The codebase of otter is concise and easy to re-implement in case we > >> would need to. > >> > >> Overall I don't think it will be too complicated to introduce basic > >> functionality and extend it latter when we need to. > >> > >> In case you agree with direction I would appreciate any feedback which > >> would help to form requirements for the RFC and PR in the future. > >> > >> Best regards, > >> ILYA (aka iilyak) > >> > >