Andrew Olson created HTRACE-200:
-----------------------------------

             Summary: Reduce rate of logged errors if Zipkin Collector service 
is down
                 Key: HTRACE-200
                 URL: https://issues.apache.org/jira/browse/HTRACE-200
             Project: HTrace
          Issue Type: Improvement
            Reporter: Andrew Olson
            Priority: Minor


We see a flood of errors logged by the ZipkinSpanReceiver when our Zipkin 
Collector service is not running - about one error every second or two, by each 
of our processes that are instrumented with HTrace and configured to send 
traces to Zipkin. Exacerbating the problem for us, it seems that with 
commons-logging, every line of the exception stack trace includes a prefix like 
"2015-06-29 09:03:25 zipkinSpanReceiver-0 STDIO [ERROR]", so that Splunk parses 
it as a separate error message. Here [1] is an example log file. It would be 
nice if this error logging could be rate-limited to something like no more than 
one per minute, or possibly only the initial occurrence logged until a 
successful send occurs to reset the state.

[1] http://pastebin.com/AieewfhF



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to