[
https://issues.apache.org/jira/browse/HTRACE-20?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14248786#comment-14248786
]
Colin Patrick McCabe commented on HTRACE-20:
--------------------------------------------
Elliot wrote:
bq. Lots of traces are really useful if things are anomalous. However it's
sometimes the case that they are only anomalous once or very infrequently. So
the probabilistic sampler can miss all the interesting traces. We should
provide the ability keep a trace 100% of the time if something interesting
happened.
Interesting idea. Could we just use {{AlwaysSampler}} and aggressively "age
out" old trace spans in {{traced}}? Or were you thinking of having the client
hold on to the trace spans, but only send them if some API call was made later?
In the second case, I wouldn't call this a "sampler" since sampling, by
definition, implies taking a subset of the traces and discarding the rest. It
could be a decorator on the span receiver, perhaps?
Note that Google's Dapper paper talks about probabilistic tracing. (See
http://static.googleusercontent.com/media/research.google.com/en/us/pubs/archive/36356.pdf
). They seem to find sampling to be adequate. Quote:
bq. In practice, we have found that there is still an adequate amount of trace
data for high-volume services when using a sampling rate as low as 1/1024.
So maybe we'll find the same thing... on a big enough cluster, even infrequent
things happen "often enough." My instinct would be to get the web gui,
htraced, and so on finished and deployed on a big cluster or two and see
whether this is a problem in practice.
> Add after the fact sampler
> --------------------------
>
> Key: HTRACE-20
> URL: https://issues.apache.org/jira/browse/HTRACE-20
> Project: HTrace
> Issue Type: Bug
> Reporter: Elliott Clark
>
> Lots of traces are really useful if things are anomalous. However it's
> sometimes the case that they are only anomalous once or very infrequently. So
> the probabilistic sampler can miss all the interesting traces. We should
> provide the ability keep a trace 100% of the time if something interesting
> happened.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)