Re: Introduction of open tracing (otter)

Ilya Khlopotov Wed, 11 Sep 2019 11:09:26 -0700

> For me the main question is how much overhead is associated with tracing.


Bellow is a breakdown on cost of various operations

> Can an admin safely configure it to run in production? 

I believe it can be used in production if we are careful with filters. We need 
to configure prefilter_rules to discard most of the events and explicitly allow 
the things we are interested in. Which means it cannot be user facing feature 
at the moment. At least until we do real benchmark.

 > Is it possible to sample just a small percentage of events? 

It is possible to configure restrictive filters. However there are no safety 
net in place which would shield an operator from mistakes. 

>  Does the overhead change if no OpenTracing tracer is configured?

The cost for disabled OpenTracing is:
  - lists:reverse for a list which length is proportional to number of 
otter_span_pdict_api:log calls
  - calculation of a timestamp for every otter_span_pdict_api:log/start/finish 
call 
  - put and gets from process dictionary
  - one ets:lookup
  - a bunch of cheap operations

# cost analysis

## disabled

- otter_span_pdict_api:start
  - few calls to construct span tuple (erlang record)
  - put span tuple into process dictionary
- otter_span_pdict_api:tag
  - get value from process dictionary
  - put value to process dictionary
- otter_span_pdict_api:log
  - get value from process dictionary
  - calculate timestamp
  - add new entry to logs field of a span record
  - put new span into process dictionary
- otter_span_pdict_api:finish
  - get value from process dictionary
  - calculate timestamp
  - reverse list containing log events 
  - application:get_env (which is ets:lookup)

## enabled

In addition to the cost of disabled case we also do:
- otter_span_pdict_api:finish
  - run filter to process actions configured for a span (few operations on 
lists)
  - if last configured action is send_to_zipkin
    - ets:lookup + ets:insert
- every zipkin_batch_interval_ms (100 ms by default) we do
  - ets:lookup
  - ets:ets:tab2list
  - ets:delete_all_objects
  - 3 calls to application:get_env (which is ets:lookup)
  - encode spans to zipkin_thrift
  - http POST request to zipkin server

# implementation details

There are some architectural problems in otter_conn_zipkin.erl (can loose 
events)
  - it uses two ets pages (active and shadow). There are races on swap.
  - it deletes all objects from ets before doing http requests. If http request 
fail all spans are gone.
  - all spans are sent in one request. Which can be problematic for big number 
of events.

Good news are otter_conn_zipkin.erl is about 100 lines. We could contribute 
better sender or use fork. In my opinion it is not a blocker.

Best regards,
ILYA

On 2019/09/10 20:46:40, Adam Kocoloski <[email protected]> wrote: 
> I think this is a great idea overall, particularly given the number of 
> significant changes that are happening in the codebase between 3.0 and 4.0.
> 
> For me the main question is how much overhead is associated with tracing. Can 
> an admin safely configure it to run in production? Is it possible to sample 
> just a small percentage of events? Does the overhead change if no OpenTracing 
> tracer is configured?
> 
> I also think a full picture here might include the ability to inherit 
> client-provided spans, so an app developer could drill down from her own code 
> into the database internals and figure out why some DB request was 
> unexpectedly slow.
> 
> Thanks for starting this discussion. Cheers,
> 
> Adam
> 
> > On Sep 10, 2019, at 2:32 PM, Ilya Khlopotov <[email protected]> wrote:
> > 
> > Hi,
> > 
> > I wanted to run this idea by the ML to see if there is any interest before 
> > investing time into preparing formal RFC.
> > 
> > # Problem statement
> > 
> > Collecting profiling data is very tricky at the moment. Developers have to 
> > run generic profiling tools which are not aware of CouchDB specifics. 
> > This makes it hard to do the performance optimization work. We need a tool 
> > which would allow us to get profiling data from specific points in the 
> > codebase. This means code instrumentation. 
> > 
> > # Proposed solution
> > 
> > There is an https://opentracing.io/ project, which is a vendor-neutral APIs 
> > and instrumentation for distributed tracing. In Erlang it is implemented by 
> > https://github.com/Bluehouse-Technology/otter library. The library provides 
> > a nice abstraction to start/finish tracing spans as well as adding tags and 
> > log entries to a given span. In the context of CouchDB this means that we 
> > can do something like the following:
> > - start tracing span on every HTTP request
> > - add tags to capture additional information such as "database name"/"name 
> > of endpoint"/"nonce"
> > - add otter logs in critical parts of the codebase to get profiling data 
> > for these points.
> > 
> > The otter is the most useful in combination with 
> > [zipkin](https://zipkin.io/) compatible server such as 
> > [jaeger](https://github.com/jaegertracing/jaeger). However it can be used 
> > even without zipkin. It has a configurable set of counters, which makes it 
> > possible to get answers on questions like:
> > - what kind of requests are slow
> > - if we get a slow request (taking longer then configured threshold) what 
> > was the trace (annotated with time spent between trace points)
> > - which function in the trace taking the most time 
> > 
> > # Otter API
> > 
> > The otter has multiple APIs which we would choose on a case by case basis:
> > - functional API - the span structure need to be passed around (we could 
> > extend `#http{}`/`#user_ctx{}`/`#db{}`)
> > - simple process dictionary API - the span data are stored in the process 
> > dictionary
> > - Multiple span process dictionary API - supports multiple spans per process
> > - Span id API - starts a process per span
> > 
> > # Roadblocks
> > 
> > One of the problems we would need to solve is to how to do multiple nodes 
> > spans. We would need this functionality to trace the request from the HTTP 
> > endpoint handler running on coordinator to the shard updater process 
> > running on the storage nodes. 
> > 
> > We could use either:
> > - extend rexi or fabric to pass and aggregate span information
> > - pass span info explicitly in every fabric function.
> > 
> > # Quick demo (warning very technical content)
> > 
> > The goal of this demo is to demonstrate the value of otter without zipkin 
> > server. 
> > 
> > ```
> > diff --git a/rebar.config.script b/rebar.config.script
> > index c38b6e235..c2b162751 100644
> > --- a/rebar.config.script
> > +++ b/rebar.config.script
> > @@ -129,6 +129,11 @@ OptionalDeps = case WithProper of
> >         []
> > end,
> > 
> > +ManualDeps = [
> > +  {otter, {url, "https://github.com/Bluehouse-Technology/otter"}, {branch, 
> > "master"}},
> > +  {otter_lib, {url, "https://github.com/Bluehouse-Technology/otter_lib"}, 
> > {branch, "master"}}
> > +],
> > +
> > BaseUrl = "https://github.com/apache/";,
> > 
> > MakeDep = fun
> > @@ -152,7 +157,7 @@ end,
> > AddConfig = [
> >     {require_otp_vsn, "19|20|21|22"},
> >     {deps_dir, "src"},
> > -    {deps, lists:map(MakeDep, DepDescs ++ OptionalDeps)},
> > +    {deps, lists:map(MakeDep, DepDescs ++ OptionalDeps ++ ManualDeps)},
> >     {sub_dirs, SubDirs},
> >     {lib_dirs, ["src"]},
> >     {erl_opts, [{i, "../"} | ErlOpts]},
> > diff --git a/src/chttpd/src/chttpd.erl b/src/chttpd/src/chttpd.erl
> > index 1e1d638be..a7aad5010 100644
> > --- a/src/chttpd/src/chttpd.erl
> > +++ b/src/chttpd/src/chttpd.erl
> > @@ -158,6 +158,7 @@ handle_request(MochiReq0) ->
> > 
> > handle_request_int(MochiReq) ->
> >     Begin = os:timestamp(),
> > +    otter_span_pdict_api:start_with_tags("http request", [{"begin_ts", 
> > Begin}]),
> >     case config:get("chttpd", "socket_options") of
> >     undefined ->
> >         ok;
> > @@ -233,6 +234,7 @@ handle_request_int(MochiReq) ->
> > 
> >     % put small token on heap to keep requests synced to backend calls
> >     erlang:put(nonce, Nonce),
> > +    otter_span_pdict_api:tag("nonce", Nonce),
> > 
> >     % suppress duplicate log
> >     erlang:put(dont_log_request, true),
> > @@ -282,6 +284,8 @@ after_request(HttpReq, HttpResp0) ->
> >         end,
> >     HttpResp2 = update_stats(HttpReq, HttpResp1),
> >     chttpd_stats:report(HttpReq, HttpResp2),
> > +    otter_span_pdict_api:tag("status", HttpResp2#httpd_resp.status),
> > +    otter_span_pdict_api:log("completed"),
> > +    otter_span_pdict_api:finish(),
> >     maybe_log(HttpReq, HttpResp2),
> >     HttpResp2.
> > 
> > diff --git a/src/fabric/src/fabric.erl b/src/fabric/src/fabric.erl
> > index 27fa8c045..a1972b445 100644
> > --- a/src/fabric/src/fabric.erl
> > +++ b/src/fabric/src/fabric.erl
> > @@ -56,6 +56,7 @@ all_dbs() ->
> > %% @doc returns a list of all database names
> > -spec all_dbs(Prefix::iodata()) -> {ok, [binary()]}.
> > all_dbs(Prefix) when is_binary(Prefix) ->
> > +    otter_span_pdict_api:tag("request_type", "_all_dbs"),
> >     Length = byte_size(Prefix),
> >     MatchingDbs = mem3:fold_shards(fun(#shard{dbname=DbName}, Acc) ->
> >         case DbName of
> > ```
> > 
> > ```
> > application:start(otter_lib).
> > application:start(otter).
> > f(Rules), Rules = [
> >  {[
> >        {greater, otter_span_duration, 10}
> >    ],[
> >        {snapshot_count, [long_span], [otter_span_name]},
> >        send_to_zipkin
> >   ]}
> > ].
> > otter_config:write(filter_rules, Rules).
> > otter:counter_snapshot([long_span,"http request"]).
> > [{[long_span,"http request"],
> >  [{snap_timestamp,{2019,9,10,13,46,43,368208}},
> >   {data,{span,1568123203366286,17299637839902614236,
> >               "http request",782788946072648712,undefined,
> >               [{"begin_ts",{1568,123203,366255}},
> >                {"nonce","a0a1d7c58e"},
> >                {"status",ok}],
> >               [],1911}}]}]
> > ```
> > 
> > Unfortunately `counter_snapshot` API doesn't return log events. However 
> > this problem is very easy to overcome. Here is the example of how to get 
> > matching spans logged into a log file.
> > 
> > ```
> > diff --git a/src/chttpd/src/chttpd.erl b/src/chttpd/src/chttpd.erl
> > index 1e1d638be..5f2d60690 100644
> > --- a/src/chttpd/src/chttpd.erl
> > +++ b/src/chttpd/src/chttpd.erl
> > @@ -33,7 +33,7 @@
> >     send_chunked_error/2, send_json/2,send_json/3,send_json/4,
> >     validate_ctype/2]).
> > 
> > --export([authenticate_request/3]).
> > +-export([authenticate_request/3, zipkin_log/2]).
> > 
> > @@ -1207,6 +1216,10 @@ get_user(#httpd{user_ctx = #user_ctx{name = User}}) 
> > ->
> > get_user(#httpd{user_ctx = undefined}) ->
> >     "undefined".
> > 
> > +zipkin_log(_URL, EncodedSpans) ->
> > +    Spans = otter_lib_zipkin_thrift:decode_spans(EncodedSpans),
> > +    couch_log:error("ZIPKIN :: ~p~n", [Spans]).
> > +
> > ```
> > 
> > ```
> > otter_config:write(http_client, {chttpd, zipkin_log}).
> > ```
> > 
> > This would produce the log events looking as the following:
> > ```
> > [error] 2019-09-10T18:01:38.758631Z [email protected] <0.4737.0> -------- 
> > ZIPKIN :: [{span,1568138498710679,3212384889493927141,<<"http 
> > request">>,345610195655038913,undefined,[{<<"lc">>,<<>>,{<<"otter_test">>,{127,0,0,1},0}},{<<"begin_ts">>,<<"{1568,138498,710639}">>},{<<"path">>,<<"_all_dbs">>},{<<"method">>,<<"GET">>},{<<"nonce">>,<<"c225c6aef1">>},{<<"status">>,<<"ok">>}],[{1568138498712456,<<"completed">>}],1784}]
> > ```
> > 
> > As you can see the timestamp of the `completed` event is included in the 
> > entry. Also, since it is a function call we can format the event in any way 
> > we like.
> > 
> > # Conclusion
> > 
> > - The otter configuration is simple and presence of `http_client` parameter 
> > allows us to use otter without zipkin server. 
> > - The API is simple which makes it possible to replace otter with something 
> > else if we wouldn't be happy with the implementation.
> > - The codebase of otter is concise and easy to re-implement in case we 
> > would need to.
> > 
> > Overall I don't think it will be too complicated to introduce basic 
> > functionality and extend it latter when we need to.
> > 
> > In case you agree with direction I would appreciate any feedback which 
> > would help to form requirements for the RFC and PR in the future.
> > 
> > Best regards,
> > ILYA (aka iilyak)
> 
>

Re: Introduction of open tracing (otter)

Reply via email to