[ https://issues.apache.org/jira/browse/OOZIE-1911?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Mona Chitnis updated OOZIE-1911: -------------------------------- Attachment: OOZIE-1911-4.patch Final patch attached reviewed from Reviewboard > SLA calculation in HA mode does wrong bit comparison for 'start' and > 'duration' > ------------------------------------------------------------------------------- > > Key: OOZIE-1911 > URL: https://issues.apache.org/jira/browse/OOZIE-1911 > Project: Oozie > Issue Type: Bug > Affects Versions: trunk > Reporter: Mona Chitnis > Assignee: Mona Chitnis > Fix For: trunk > > Attachments: OOZIE-1911-4.patch > > > In chronological order: > Server 1: > Job's SLA eventProcessed set to 0101 => Start and End sla processed. > Server 2: > Receives above job's status event, processes remaining 'duration' sla. > eventProcessed now = 0111, but incremented to 1000 due to > {code} > SLACalculatorMemory.addJobStatus() : 762 > if (slaCalc.getEventProcessed() == 7) { > slaInfo.setEventProcessed(8); > slaMap.remove(jobId); > } > {code} > Back to Server 1: (doing periodic SLA checks) > {code} > SLACalculatorMemory.updateJobSla() : 483 > if ((eventProc & 1) == 0) { // first bit (start-processed) unset > if (reg.getExpectedStart() != null) { > if (reg.getExpectedStart().getTime() + jobEventLatency < > System.currentTimeMillis()) { > // goes ahead and enqueues another START_MISS event and > DURATION_MET event > {code} > Conclusion, need to fix that check for least significant bit (and next to it) > for 'start' and 'duration' to avoid duplicate events -- This message was sent by Atlassian JIRA (v6.2#6252)