Re: Introduction of open tracing (otter)

Ilya Khlopotov Tue, 17 Sep 2019 05:34:11 -0700

> Previously you were talking about the trace filter configuration settings 
> being opaque blobs, but they don’t look opaque in the RFC. Did that change?


I recently discovered that there are multiple flavors of otter. The most 
developed one is https://github.com/project-fifo/otters. This version has a 
configuration DSL and generates beam module for fast filtering. 

Best regards,
ILYA (iilyak)

On 2019/09/16 20:16:26, Adam Kocoloski <[email protected]> wrote: 
> Looks good, I left a few comments.
> 
> Previously you were talking about the trace filter configuration settings 
> being opaque blobs, but they don’t look opaque in the RFC. Did that change?
> 
> Adam
> 
> > On Sep 16, 2019, at 3:38 PM, Ilya Khlopotov <[email protected]> wrote:
> > 
> > Hi, 
> > 
> > The RFC is ready for review 
> > https://github.com/apache/couchdb-documentation/pull/440
> > 
> > Best regards,
> > ILYA (aka iilyak)
> > 
> > On 2019/09/10 18:32:03, Ilya Khlopotov <[email protected]> wrote: 
> >> Hi,
> >> 
> >> I wanted to run this idea by the ML to see if there is any interest before 
> >> investing time into preparing formal RFC.
> >> 
> >> # Problem statement
> >> 
> >> Collecting profiling data is very tricky at the moment. Developers have to 
> >> run generic profiling tools which are not aware of CouchDB specifics. 
> >> This makes it hard to do the performance optimization work. We need a tool 
> >> which would allow us to get profiling data from specific points in the 
> >> codebase. This means code instrumentation. 
> >> 
> >> # Proposed solution
> >> 
> >> There is an https://opentracing.io/ project, which is a vendor-neutral 
> >> APIs and instrumentation for distributed tracing. In Erlang it is 
> >> implemented by https://github.com/Bluehouse-Technology/otter library. The 
> >> library provides a nice abstraction to start/finish tracing spans as well 
> >> as adding tags and log entries to a given span. In the context of CouchDB 
> >> this means that we can do something like the following:
> >> - start tracing span on every HTTP request
> >> - add tags to capture additional information such as "database name"/"name 
> >> of endpoint"/"nonce"
> >> - add otter logs in critical parts of the codebase to get profiling data 
> >> for these points.
> >> 
> >> The otter is the most useful in combination with 
> >> [zipkin](https://zipkin.io/) compatible server such as 
> >> [jaeger](https://github.com/jaegertracing/jaeger). However it can be used 
> >> even without zipkin. It has a configurable set of counters, which makes it 
> >> possible to get answers on questions like:
> >> - what kind of requests are slow
> >> - if we get a slow request (taking longer then configured threshold) what 
> >> was the trace (annotated with time spent between trace points)
> >> - which function in the trace taking the most time 
> >> 
> >> # Otter API
> >> 
> >> The otter has multiple APIs which we would choose on a case by case basis:
> >> - functional API - the span structure need to be passed around (we could 
> >> extend `#http{}`/`#user_ctx{}`/`#db{}`)
> >> - simple process dictionary API - the span data are stored in the process 
> >> dictionary
> >> - Multiple span process dictionary API - supports multiple spans per 
> >> process
> >> - Span id API - starts a process per span
> >> 
> >> # Roadblocks
> >> 
> >> One of the problems we would need to solve is to how to do multiple nodes 
> >> spans. We would need this functionality to trace the request from the HTTP 
> >> endpoint handler running on coordinator to the shard updater process 
> >> running on the storage nodes. 
> >> 
> >> We could use either:
> >> - extend rexi or fabric to pass and aggregate span information
> >> - pass span info explicitly in every fabric function.
> >> 
> >> # Quick demo (warning very technical content)
> >> 
> >> The goal of this demo is to demonstrate the value of otter without zipkin 
> >> server. 
> >> 
> >> ```
> >> diff --git a/rebar.config.script b/rebar.config.script
> >> index c38b6e235..c2b162751 100644
> >> --- a/rebar.config.script
> >> +++ b/rebar.config.script
> >> @@ -129,6 +129,11 @@ OptionalDeps = case WithProper of
> >>         []
> >> end,
> >> 
> >> +ManualDeps = [
> >> +  {otter, {url, "https://github.com/Bluehouse-Technology/otter"}, 
> >> {branch, "master"}},
> >> +  {otter_lib, {url, "https://github.com/Bluehouse-Technology/otter_lib"}, 
> >> {branch, "master"}}
> >> +],
> >> +
> >> BaseUrl = "https://github.com/apache/";,
> >> 
> >> MakeDep = fun
> >> @@ -152,7 +157,7 @@ end,
> >> AddConfig = [
> >>     {require_otp_vsn, "19|20|21|22"},
> >>     {deps_dir, "src"},
> >> -    {deps, lists:map(MakeDep, DepDescs ++ OptionalDeps)},
> >> +    {deps, lists:map(MakeDep, DepDescs ++ OptionalDeps ++ ManualDeps)},
> >>     {sub_dirs, SubDirs},
> >>     {lib_dirs, ["src"]},
> >>     {erl_opts, [{i, "../"} | ErlOpts]},
> >> diff --git a/src/chttpd/src/chttpd.erl b/src/chttpd/src/chttpd.erl
> >> index 1e1d638be..a7aad5010 100644
> >> --- a/src/chttpd/src/chttpd.erl
> >> +++ b/src/chttpd/src/chttpd.erl
> >> @@ -158,6 +158,7 @@ handle_request(MochiReq0) ->
> >> 
> >> handle_request_int(MochiReq) ->
> >>     Begin = os:timestamp(),
> >> +    otter_span_pdict_api:start_with_tags("http request", [{"begin_ts", 
> >> Begin}]),
> >>     case config:get("chttpd", "socket_options") of
> >>     undefined ->
> >>         ok;
> >> @@ -233,6 +234,7 @@ handle_request_int(MochiReq) ->
> >> 
> >>     % put small token on heap to keep requests synced to backend calls
> >>     erlang:put(nonce, Nonce),
> >> +    otter_span_pdict_api:tag("nonce", Nonce),
> >> 
> >>     % suppress duplicate log
> >>     erlang:put(dont_log_request, true),
> >> @@ -282,6 +284,8 @@ after_request(HttpReq, HttpResp0) ->
> >>         end,
> >>     HttpResp2 = update_stats(HttpReq, HttpResp1),
> >>     chttpd_stats:report(HttpReq, HttpResp2),
> >> +    otter_span_pdict_api:tag("status", HttpResp2#httpd_resp.status),
> >> +    otter_span_pdict_api:log("completed"),
> >> +    otter_span_pdict_api:finish(),
> >>     maybe_log(HttpReq, HttpResp2),
> >>     HttpResp2.
> >> 
> >> diff --git a/src/fabric/src/fabric.erl b/src/fabric/src/fabric.erl
> >> index 27fa8c045..a1972b445 100644
> >> --- a/src/fabric/src/fabric.erl
> >> +++ b/src/fabric/src/fabric.erl
> >> @@ -56,6 +56,7 @@ all_dbs() ->
> >> %% @doc returns a list of all database names
> >> -spec all_dbs(Prefix::iodata()) -> {ok, [binary()]}.
> >> all_dbs(Prefix) when is_binary(Prefix) ->
> >> +    otter_span_pdict_api:tag("request_type", "_all_dbs"),
> >>     Length = byte_size(Prefix),
> >>     MatchingDbs = mem3:fold_shards(fun(#shard{dbname=DbName}, Acc) ->
> >>         case DbName of
> >> ```
> >> 
> >> ```
> >> application:start(otter_lib).
> >> application:start(otter).
> >> f(Rules), Rules = [
> >>  {[
> >>        {greater, otter_span_duration, 10}
> >>    ],[
> >>        {snapshot_count, [long_span], [otter_span_name]},
> >>        send_to_zipkin
> >>   ]}
> >> ].
> >> otter_config:write(filter_rules, Rules).
> >> otter:counter_snapshot([long_span,"http request"]).
> >> [{[long_span,"http request"],
> >>  [{snap_timestamp,{2019,9,10,13,46,43,368208}},
> >>   {data,{span,1568123203366286,17299637839902614236,
> >>               "http request",782788946072648712,undefined,
> >>               [{"begin_ts",{1568,123203,366255}},
> >>                {"nonce","a0a1d7c58e"},
> >>                {"status",ok}],
> >>               [],1911}}]}]
> >> ```
> >> 
> >> Unfortunately `counter_snapshot` API doesn't return log events. However 
> >> this problem is very easy to overcome. Here is the example of how to get 
> >> matching spans logged into a log file.
> >> 
> >> ```
> >> diff --git a/src/chttpd/src/chttpd.erl b/src/chttpd/src/chttpd.erl
> >> index 1e1d638be..5f2d60690 100644
> >> --- a/src/chttpd/src/chttpd.erl
> >> +++ b/src/chttpd/src/chttpd.erl
> >> @@ -33,7 +33,7 @@
> >>     send_chunked_error/2, send_json/2,send_json/3,send_json/4,
> >>     validate_ctype/2]).
> >> 
> >> --export([authenticate_request/3]).
> >> +-export([authenticate_request/3, zipkin_log/2]).
> >> 
> >> @@ -1207,6 +1216,10 @@ get_user(#httpd{user_ctx = #user_ctx{name = User}}) 
> >> ->
> >> get_user(#httpd{user_ctx = undefined}) ->
> >>     "undefined".
> >> 
> >> +zipkin_log(_URL, EncodedSpans) ->
> >> +    Spans = otter_lib_zipkin_thrift:decode_spans(EncodedSpans),
> >> +    couch_log:error("ZIPKIN :: ~p~n", [Spans]).
> >> +
> >> ```
> >> 
> >> ```
> >> otter_config:write(http_client, {chttpd, zipkin_log}).
> >> ```
> >> 
> >> This would produce the log events looking as the following:
> >> ```
> >> [error] 2019-09-10T18:01:38.758631Z [email protected] <0.4737.0> -------- 
> >> ZIPKIN :: [{span,1568138498710679,3212384889493927141,<<"http 
> >> request">>,345610195655038913,undefined,[{<<"lc">>,<<>>,{<<"otter_test">>,{127,0,0,1},0}},{<<"begin_ts">>,<<"{1568,138498,710639}">>},{<<"path">>,<<"_all_dbs">>},{<<"method">>,<<"GET">>},{<<"nonce">>,<<"c225c6aef1">>},{<<"status">>,<<"ok">>}],[{1568138498712456,<<"completed">>}],1784}]
> >> ```
> >> 
> >> As you can see the timestamp of the `completed` event is included in the 
> >> entry. Also, since it is a function call we can format the event in any 
> >> way we like.
> >> 
> >> # Conclusion
> >> 
> >> - The otter configuration is simple and presence of `http_client` 
> >> parameter allows us to use otter without zipkin server. 
> >> - The API is simple which makes it possible to replace otter with 
> >> something else if we wouldn't be happy with the implementation.
> >> - The codebase of otter is concise and easy to re-implement in case we 
> >> would need to.
> >> 
> >> Overall I don't think it will be too complicated to introduce basic 
> >> functionality and extend it latter when we need to.
> >> 
> >> In case you agree with direction I would appreciate any feedback which 
> >> would help to form requirements for the RFC and PR in the future.
> >> 
> >> Best regards,
> >> ILYA (aka iilyak)
> >> 
> 
>

Re: Introduction of open tracing (otter)

Reply via email to