Andras Piros created OOZIE-3132:
-----------------------------------

             Summary: Instrument SLAService and SLACalculatorMemory
                 Key: OOZIE-3132
                 URL: https://issues.apache.org/jira/browse/OOZIE-3132
             Project: Oozie
          Issue Type: Improvement
          Components: core
    Affects Versions: 4.3.0
            Reporter: Andras Piros
            Assignee: Andras Piros
             Fix For: 5.0.0b1


When there are lots of {{WorkflowJobBean}} and {{CoordinatorJobBean}} instances 
that have to be followed up on creating {{SLASummaryBean}} instances, following 
can occur:
* we set {{oozie.sla.service.SLAService.capacity}} to a sane value like 
{{10000}} to preserve heap consumption
* {{SLACalculatorMemory#addRegistration()}} and 
{{SLACalculatorMemory#updateRegistration}} would:
** either emit {{TRACE}} level logs like {{SLA Registration Event - Job:}} 
showing the add / update of {{SLARegistrationBean}} was successful
** or emit {{ERROR}} level logs like {{SLACalculator memory capacity reached. 
Cannot add or update new SLA Registration entry for job}} showing the add / 
update of {{SLARegistrationBean}} was not successful

Since sometimes stale or already processed {{SLAEvent}} entries from 
{{SLACalculatorMemory#slaMap}} get removed, it's pretty hard to say what is its 
the actual size - that is, whether the next add or update command will succeed

We need an {{Instrumentation.Counter}} instance that gets incremented when 
there is an {{SLACalculatorMemory#slaMap#put()}} with a new entry added, and 
gets decremented when there happens a {{SLACalculatorMemory#slaMap#remove()}} 
with an existing entry removed. This counter will be automatically present 
within REST interface, and Oozie client.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to