Andrew Olson created HTRACE-200:
-----------------------------------
Summary: Reduce rate of logged errors if Zipkin Collector service
is down
Key: HTRACE-200
URL: https://issues.apache.org/jira/browse/HTRACE-200
Project: HTrace
Issue Type: Improvement
Reporter: Andrew Olson
Priority: Minor
We see a flood of errors logged by the ZipkinSpanReceiver when our Zipkin
Collector service is not running - about one error every second or two, by each
of our processes that are instrumented with HTrace and configured to send
traces to Zipkin. Exacerbating the problem for us, it seems that with
commons-logging, every line of the exception stack trace includes a prefix like
"2015-06-29 09:03:25 zipkinSpanReceiver-0 STDIO [ERROR]", so that Splunk parses
it as a separate error message. Here [1] is an example log file. It would be
nice if this error logging could be rate-limited to something like no more than
one per minute, or possibly only the initial occurrence logged until a
successful send occurs to reset the state.
[1] http://pastebin.com/AieewfhF
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)