[ https://issues.apache.org/jira/browse/OOZIE-1984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Mona Chitnis updated OOZIE-1984: -------------------------------- Attachment: OOZIE-1984.patch > SLACalculator in HA mode performs duplicate operations on records with > completed jobs > ------------------------------------------------------------------------------------- > > Key: OOZIE-1984 > URL: https://issues.apache.org/jira/browse/OOZIE-1984 > Project: Oozie > Issue Type: Bug > Affects Versions: trunk > Reporter: Mona Chitnis > Fix For: trunk, 4.1.0 > > Attachments: OOZIE-1984.patch > > > Scenario: > SLA periodic run has already processed start,duration and end for a job's sla > entry. But job notification for that job came after this, and triggers the > sla listener. > Buggy part: > {code} > SLACalculatorMemory.java > else if > (Services.get().get(JobsConcurrencyService.class).isHighlyAvailableMode()) { > // jobid might not exist in slaMap in HA Setting > SLARegistrationBean slaRegBean = > SLARegistrationQueryExecutor.getInstance().get( > SLARegQuery.GET_SLA_REG_ALL, jobId); > if (slaRegBean != null) { // filter out jobs picked by SLA > job event listener > // but not actually configured for > SLA > SLASummaryBean slaSummaryBean = > SLASummaryQueryExecutor.getInstance().get( > SLASummaryQuery.GET_SLA_SUMMARY, jobId); > slaCalc = new SLACalcStatus(slaSummaryBean, slaRegBean); > if (slaCalc.getEventProcessed() < 7) { > slaMap.put(jobId, slaCalc); > } > } > } > } > if (slaCalc != null) { > .. > Object eventProcObj = ((SLASummaryQueryExecutor) > SLASummaryQueryExecutor.getInstance()) > > .getSingleValue(SLASummaryQuery.GET_SLA_SUMMARY_EVENTPROCESSED, jobId); > byte eventProc = ((Byte) eventProcObj).byteValue(); > .. > processJobEndSuccessSLA(slaCalc, startTime, endTime); > {code} > method processJobEndSuccesSLA goes ahead and checks second LSB bit of > eventProc and sends duration event _again_. So the bug here is two-fold: > * if all events are already processed, still invokes this function > * event processed is 8 (1000), so second LSB bit is unset and hence duration > processed. > Fix - not invoke function when eventProc = 1000 -- This message was sent by Atlassian JIRA (v6.2#6252)