[ 
https://issues.apache.org/jira/browse/OOZIE-1984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mona Chitnis updated OOZIE-1984:
--------------------------------

    Attachment: OOZIE-1984.patch

> SLACalculator in HA mode performs duplicate operations on records with 
> completed jobs
> -------------------------------------------------------------------------------------
>
>                 Key: OOZIE-1984
>                 URL: https://issues.apache.org/jira/browse/OOZIE-1984
>             Project: Oozie
>          Issue Type: Bug
>    Affects Versions: trunk
>            Reporter: Mona Chitnis
>             Fix For: trunk, 4.1.0
>
>         Attachments: OOZIE-1984.patch
>
>
> Scenario:
> SLA periodic run has already processed start,duration and end for a job's sla 
> entry. But job notification for that job came after this, and triggers the 
> sla listener.
> Buggy part:
> {code}
> SLACalculatorMemory.java
> else if 
> (Services.get().get(JobsConcurrencyService.class).isHighlyAvailableMode()) {
>                 // jobid might not exist in slaMap in HA Setting
>                 SLARegistrationBean slaRegBean = 
> SLARegistrationQueryExecutor.getInstance().get(
>                         SLARegQuery.GET_SLA_REG_ALL, jobId);
>                 if (slaRegBean != null) { // filter out jobs picked by SLA 
> job event listener
>                                           // but not actually configured for 
> SLA
>                     SLASummaryBean slaSummaryBean = 
> SLASummaryQueryExecutor.getInstance().get(
>                             SLASummaryQuery.GET_SLA_SUMMARY, jobId);
>                     slaCalc = new SLACalcStatus(slaSummaryBean, slaRegBean);
>                     if (slaCalc.getEventProcessed() < 7) {
>                         slaMap.put(jobId, slaCalc);
>                     }
>                 }
>             }
>         }
>         if (slaCalc != null) {
> ..
> Object eventProcObj = ((SLASummaryQueryExecutor) 
> SLASummaryQueryExecutor.getInstance())
>                                 
> .getSingleValue(SLASummaryQuery.GET_SLA_SUMMARY_EVENTPROCESSED, jobId);
>                         byte eventProc = ((Byte) eventProcObj).byteValue();
> ..
> processJobEndSuccessSLA(slaCalc, startTime, endTime);
> {code}
> method processJobEndSuccesSLA goes ahead and checks second LSB bit of 
> eventProc and sends duration event _again_. So the bug here is two-fold:
>  * if all events are already processed, still invokes this function
>  * event processed is 8 (1000), so second LSB bit is unset and hence duration 
> processed.
> Fix - not invoke function when eventProc = 1000



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to