phet commented on code in PR #3896:
URL: https://github.com/apache/gobblin/pull/3896#discussion_r1550017946
##########
gobblin-service/src/main/java/org/apache/gobblin/service/modules/orchestration/MostlyMySqlDagManagementStateStore.java:
##########
@@ -211,12 +211,13 @@ public boolean containsDag(DagManager.DagId dagId) throws
IOException {
return this.dagStateStore.existsDag(dagId);
}
- public Optional<Pair<Dag.DagNode<JobExecutionPlan>, JobStatus>>
getDagNodeWithJobStatus(DagNodeId dagNodeId) {
- Optional<JobStatus> jobStatus = getJobStatus(dagNodeId);
- if (this.dagNodes.containsKey(dagNodeId) && jobStatus.isPresent()) {
- return Optional.of(ImmutablePair.of(this.dagNodes.get(dagNodeId),
jobStatus.get()));
+ public Pair<Optional<Dag.DagNode<JobExecutionPlan>>, Optional<JobStatus>>
getDagNodeWithJobStatus(DagNodeId dagNodeId) {
+ if (this.dagNodes.containsKey(dagNodeId)) {
+ // no point of searching for status if the node itself is absent.
Review Comment:
good comment... but let's move to the `else` block
##########
gobblin-service/src/main/java/org/apache/gobblin/service/modules/orchestration/MostlyMySqlDagManagementStateStore.java:
##########
@@ -211,12 +211,13 @@ public boolean containsDag(DagManager.DagId dagId) throws
IOException {
return this.dagStateStore.existsDag(dagId);
}
- public Optional<Pair<Dag.DagNode<JobExecutionPlan>, JobStatus>>
getDagNodeWithJobStatus(DagNodeId dagNodeId) {
- Optional<JobStatus> jobStatus = getJobStatus(dagNodeId);
- if (this.dagNodes.containsKey(dagNodeId) && jobStatus.isPresent()) {
- return Optional.of(ImmutablePair.of(this.dagNodes.get(dagNodeId),
jobStatus.get()));
+ public Pair<Optional<Dag.DagNode<JobExecutionPlan>>, Optional<JobStatus>>
getDagNodeWithJobStatus(DagNodeId dagNodeId) {
+ if (this.dagNodes.containsKey(dagNodeId)) {
+ // no point of searching for status if the node itself is absent.
+ Optional<JobStatus> jobStatus = getJobStatus(dagNodeId);
Review Comment:
probably personal preference, but I don't see a need to introduce the name
`jobStatus` (since `getJobStatus(dagNodeId)` is already clear)
##########
gobblin-service/src/main/java/org/apache/gobblin/service/monitoring/KafkaJobStatusMonitor.java:
##########
@@ -221,13 +236,13 @@ protected void
processMessage(DecodeableKafkaRecord<byte[],byte[]> message) {
* It fills missing fields in job status and also merge the fields with the
* existing job status in the state store. Merging is required because we
* do not want to lose the information sent by other GobblinTrackingEvents.
- * @param jobStatus
+ * Returns false if adding this state transitions the job status of the job
to final, otherwise returns false.
+ * It will also return false if the job status was already final before
calling this method.
Review Comment:
this is out-of-date for the `Optional`
##########
gobblin-service/src/main/java/org/apache/gobblin/service/modules/orchestration/task/ReevaluateDagTask.java:
##########
@@ -0,0 +1,44 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.gobblin.service.modules.orchestration.task;
+
+import org.apache.gobblin.service.modules.orchestration.DagActionStore;
+import org.apache.gobblin.service.modules.orchestration.DagTaskVisitor;
+import org.apache.gobblin.service.modules.orchestration.LeaseAttemptStatus;
+
+
+/**
+ * A {@link DagTask} responsible to handle re-evaluate dag actions.
+ */
+
+public class ReevaluateDagTask extends DagTask {
+ public ReevaluateDagTask(DagActionStore.DagAction dagAction,
LeaseAttemptStatus.LeaseObtainedStatus leaseObtainedStatus,
+ DagActionStore dagActionStore) {
+ super(dagAction, leaseObtainedStatus, dagActionStore);
+ }
+
+ public <T> T host(DagTaskVisitor<T> visitor) {
+ return visitor.meet(this);
+ }
+
+ @Override
+ public boolean conclude() {
+ // todo - release lease
+ return true;
+ }
Review Comment:
since urmi's recent change is in, looks like a "merge oversight", as [we now
conclude
in](https://github.com/apache/gobblin/blob/fb85f07e544dae8cd51edff1387136a55df28aa5/gobblin-service/src/main/java/org/apache/gobblin/service/modules/orchestration/task/DagTask.java#L60)
`DagTask`.
to guard against this possible error, let's make it
```
public *final* boolean DagTask::conclude
```
##########
gobblin-service/src/main/java/org/apache/gobblin/service/modules/orchestration/proc/ReevaluateDagProc.java:
##########
@@ -0,0 +1,194 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.gobblin.service.modules.orchestration.proc;
+
+import java.io.IOException;
+import java.util.Optional;
+import java.util.Set;
+
+import org.apache.commons.lang3.tuple.ImmutablePair;
+import org.apache.commons.lang3.tuple.Pair;
+
+import lombok.extern.slf4j.Slf4j;
+
+import org.apache.gobblin.metrics.event.TimingEvent;
+import org.apache.gobblin.service.ExecutionStatus;
+import org.apache.gobblin.service.modules.flowgraph.Dag;
+import org.apache.gobblin.service.modules.flowgraph.DagNodeId;
+import
org.apache.gobblin.service.modules.orchestration.DagManagementStateStore;
+import org.apache.gobblin.service.modules.orchestration.DagManagerUtils;
+import org.apache.gobblin.service.modules.orchestration.task.ReevaluateDagTask;
+import org.apache.gobblin.service.modules.spec.JobExecutionPlan;
+import org.apache.gobblin.service.monitoring.FlowStatusGenerator;
+import org.apache.gobblin.service.monitoring.JobStatus;
+
+
+/**
+ * An implementation for {@link DagProc} that launches a new job if there
exists a job whose pre-requisite jobs are
+ * completed successfully. If there are no more jobs to run and no job is
running for the Dag, it cleans up the Dag.
+ * (In future), if there are multiple new jobs to be launched, separate launch
dag actions are created for each of them.
+ */
+@Slf4j
+public class ReevaluateDagProc extends
DagProc<Pair<Optional<Dag.DagNode<JobExecutionPlan>>, Optional<JobStatus>>> {
+
+ public ReevaluateDagProc(ReevaluateDagTask reEvaluateDagTask) {
+ super(reEvaluateDagTask);
+ }
+
+ @Override
+ protected Pair<Optional<Dag.DagNode<JobExecutionPlan>>, Optional<JobStatus>>
initialize(DagManagementStateStore dagManagementStateStore)
+ throws IOException {
+ Pair<Optional<Dag.DagNode<JobExecutionPlan>>, Optional<JobStatus>>
dagNodeWithJobStatus =
+ dagManagementStateStore.getDagNodeWithJobStatus(this.dagNodeId);
+
+ if (!dagNodeWithJobStatus.getLeft().isPresent() ||
!dagNodeWithJobStatus.getRight().isPresent()) {
+ // this is possible when MALA malfunctions and a duplicated reevaluate
dag proc is launched for a dag node that is
+ // already "reevaluated" and cleaned up.
+ return ImmutablePair.of(Optional.empty(), Optional.empty());
+ }
+
+ ExecutionStatus executionStatus =
ExecutionStatus.valueOf(dagNodeWithJobStatus.getRight().get().getEventName());
+ if
(!FlowStatusGenerator.FINISHED_STATUSES.contains(executionStatus.name())) {
+ log.warn("Job status for dagNode {} is {}. Re-evaluate dag action should
have been created only for finished status - {}",
+ dagNodeId, executionStatus, FlowStatusGenerator.FINISHED_STATUSES);
+ return ImmutablePair.of(Optional.empty(), Optional.empty());
+ }
+
+ setStatus(dagManagementStateStore, dagNodeWithJobStatus.getLeft().get(),
executionStatus);
+ return dagNodeWithJobStatus;
+ }
+
+ @Override
+ protected void act(DagManagementStateStore dagManagementStateStore,
Pair<Optional<Dag.DagNode<JobExecutionPlan>>, Optional<JobStatus>>
dagNodeWithJobStatus)
+ throws IOException {
+ if (!dagNodeWithJobStatus.getLeft().isPresent()) {
+ log.error("DagNode or its job status not found for a Reevaluate
DagAction with dag node id {}", this.dagNodeId);
+ return;
+ }
+
+ Dag.DagNode<JobExecutionPlan> dagNode =
dagNodeWithJobStatus.getLeft().get();
+ JobStatus jobStatus = dagNodeWithJobStatus.getRight().get();
+ ExecutionStatus executionStatus = dagNode.getValue().getExecutionStatus();
+ onJobFinish(dagManagementStateStore, dagNode, executionStatus);
+ Dag<JobExecutionPlan> dag =
dagManagementStateStore.getDag(getDagId()).get();
Review Comment:
rather than the ordering:
1. calling `onJobFinish`
2. inside that callstack accessing `DMSS::getDag`
3. returning from `onJobFinish`
4. here calling `DMSS::getDag`
could you instead have:
1. called `DMSS::getDag` here in `act`
2. passed that to `onJobFinish`, so no need to `DMSS::getDag` within
3. after `onJobFinish` returns, continue using *that same` result of (1.)
?
##########
gobblin-service/src/main/java/org/apache/gobblin/service/modules/orchestration/proc/ReevaluateDagProc.java:
##########
@@ -0,0 +1,194 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.gobblin.service.modules.orchestration.proc;
+
+import java.io.IOException;
+import java.util.Optional;
+import java.util.Set;
+
+import org.apache.commons.lang3.tuple.ImmutablePair;
+import org.apache.commons.lang3.tuple.Pair;
+
+import lombok.extern.slf4j.Slf4j;
+
+import org.apache.gobblin.metrics.event.TimingEvent;
+import org.apache.gobblin.service.ExecutionStatus;
+import org.apache.gobblin.service.modules.flowgraph.Dag;
+import org.apache.gobblin.service.modules.flowgraph.DagNodeId;
+import
org.apache.gobblin.service.modules.orchestration.DagManagementStateStore;
+import org.apache.gobblin.service.modules.orchestration.DagManagerUtils;
+import org.apache.gobblin.service.modules.orchestration.task.ReevaluateDagTask;
+import org.apache.gobblin.service.modules.spec.JobExecutionPlan;
+import org.apache.gobblin.service.monitoring.FlowStatusGenerator;
+import org.apache.gobblin.service.monitoring.JobStatus;
+
+
+/**
+ * An implementation for {@link DagProc} that launches a new job if there
exists a job whose pre-requisite jobs are
+ * completed successfully. If there are no more jobs to run and no job is
running for the Dag, it cleans up the Dag.
+ * (In future), if there are multiple new jobs to be launched, separate launch
dag actions are created for each of them.
+ */
+@Slf4j
+public class ReevaluateDagProc extends
DagProc<Pair<Optional<Dag.DagNode<JobExecutionPlan>>, Optional<JobStatus>>> {
+
+ public ReevaluateDagProc(ReevaluateDagTask reEvaluateDagTask) {
+ super(reEvaluateDagTask);
+ }
+
+ @Override
+ protected Pair<Optional<Dag.DagNode<JobExecutionPlan>>, Optional<JobStatus>>
initialize(DagManagementStateStore dagManagementStateStore)
+ throws IOException {
+ Pair<Optional<Dag.DagNode<JobExecutionPlan>>, Optional<JobStatus>>
dagNodeWithJobStatus =
+ dagManagementStateStore.getDagNodeWithJobStatus(this.dagNodeId);
+
+ if (!dagNodeWithJobStatus.getLeft().isPresent() ||
!dagNodeWithJobStatus.getRight().isPresent()) {
+ // this is possible when MALA malfunctions and a duplicated reevaluate
dag proc is launched for a dag node that is
+ // already "reevaluated" and cleaned up.
+ return ImmutablePair.of(Optional.empty(), Optional.empty());
+ }
+
+ ExecutionStatus executionStatus =
ExecutionStatus.valueOf(dagNodeWithJobStatus.getRight().get().getEventName());
+ if
(!FlowStatusGenerator.FINISHED_STATUSES.contains(executionStatus.name())) {
+ log.warn("Job status for dagNode {} is {}. Re-evaluate dag action should
have been created only for finished status - {}",
+ dagNodeId, executionStatus, FlowStatusGenerator.FINISHED_STATUSES);
+ return ImmutablePair.of(Optional.empty(), Optional.empty());
+ }
+
+ setStatus(dagManagementStateStore, dagNodeWithJobStatus.getLeft().get(),
executionStatus);
+ return dagNodeWithJobStatus;
+ }
+
+ @Override
+ protected void act(DagManagementStateStore dagManagementStateStore,
Pair<Optional<Dag.DagNode<JobExecutionPlan>>, Optional<JobStatus>>
dagNodeWithJobStatus)
+ throws IOException {
+ if (!dagNodeWithJobStatus.getLeft().isPresent()) {
+ log.error("DagNode or its job status not found for a Reevaluate
DagAction with dag node id {}", this.dagNodeId);
+ return;
Review Comment:
I agree with logging at `ERROR`, also to still `return`, rather than
throwing. given the severity, I suggest further to increment a metric we might
be able to alert on.
in addition, let's please add a code comment to capture the understanding
for maintainers. did you say this arises when the MALA leasing doesn't work
cleanly and another `DagProc::process` has cleaned up the Dag, yet did not
complete the lease before this current one acquired its own?
##########
gobblin-service/src/main/java/org/apache/gobblin/service/monitoring/KafkaJobStatusMonitor.java:
##########
@@ -277,9 +292,8 @@ static void
addJobStatusToStateStore(org.apache.gobblin.configuration.State jobS
modifyStateIfRetryRequired(jobStatus);
stateStore.put(storeName, tableName, jobStatus);
- if (isNewStateTransitionToFinal(jobStatus, states)) {
- eventProducer.emitObservabilityEvent(jobStatus);
- }
+
+ return isNewStateTransitionToFinal(jobStatus, states) ?
Optional.of(jobStatus) : Optional.empty();
Review Comment:
more canonical:
```
return Optional.of(jobStatus).filter(jobStatus ->
isNewStateTransitionToFinal(jobStatus, states)
);
```
##########
gobblin-service/src/main/java/org/apache/gobblin/service/modules/orchestration/DagActionStore.java:
##########
@@ -35,7 +35,7 @@ enum DagActionType {
LAUNCH, // Launch new flow execution invoked adhoc or through scheduled
trigger
RETRY, // Invoked through DagManager for flows configured to allow retries
CANCEL, // Invoked through DagManager if flow has been stuck in
Orchestrated state for a while
- ADVANCE // Launch next step in multi-hop dag
+ REEVALUATE // Launch next step in multi-hop dag
Review Comment:
nit: let's align this comment w/ the description you just added to
`DagActionStoreChangeEvent.avsc`
##########
gobblin-service/src/main/java/org/apache/gobblin/service/modules/orchestration/proc/DagProc.java:
##########
@@ -40,6 +40,8 @@
* actions based on the type of {@link DagTask}. Submitting events in time is
found to be important in
* <a href="https://github.com/apache/gobblin/pull/3641">PR#3641</a>, hence
initialize and act methods submit events as
* they happen.
+ * Parameter T is the type of object that needs an initialisation before the
dag proc does the main work in `act` method.
Review Comment:
great description! the javadoc syntax should be:
```
@param <T> type of the initialization "state" on which to {@link DagProc#act}
```
(the key above is `<T>` syntax for a generic type name, rather than `T`/`t`,
for a regular param name)
##########
gobblin-service/src/main/java/org/apache/gobblin/service/monitoring/KafkaJobStatusMonitorFactory.java:
##########
@@ -49,24 +52,29 @@ public class KafkaJobStatusMonitorFactory implements
Provider<KafkaJobStatusMoni
private final JobIssueEventHandler jobIssueEventHandler;
private final MultiContextIssueRepository issueRepository;
private final boolean instrumentationEnabled;
+ private final DagActionStore dagActionStore;
+ private final boolean dagProcEngineEnabled;
Review Comment:
I didn't see this being used in `KafkaJobStatusMonitor`, so you don't seem
to need to have it here.
##########
gobblin-service/src/main/java/org/apache/gobblin/service/modules/orchestration/proc/ReevaluateDagProc.java:
##########
@@ -0,0 +1,194 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.gobblin.service.modules.orchestration.proc;
+
+import java.io.IOException;
+import java.util.Optional;
+import java.util.Set;
+
+import org.apache.commons.lang3.tuple.ImmutablePair;
+import org.apache.commons.lang3.tuple.Pair;
+
+import lombok.extern.slf4j.Slf4j;
+
+import org.apache.gobblin.metrics.event.TimingEvent;
+import org.apache.gobblin.service.ExecutionStatus;
+import org.apache.gobblin.service.modules.flowgraph.Dag;
+import org.apache.gobblin.service.modules.flowgraph.DagNodeId;
+import
org.apache.gobblin.service.modules.orchestration.DagManagementStateStore;
+import org.apache.gobblin.service.modules.orchestration.DagManagerUtils;
+import org.apache.gobblin.service.modules.orchestration.task.ReevaluateDagTask;
+import org.apache.gobblin.service.modules.spec.JobExecutionPlan;
+import org.apache.gobblin.service.monitoring.FlowStatusGenerator;
+import org.apache.gobblin.service.monitoring.JobStatus;
+
+
+/**
+ * An implementation for {@link DagProc} that launches a new job if there
exists a job whose pre-requisite jobs are
+ * completed successfully. If there are no more jobs to run and no job is
running for the Dag, it cleans up the Dag.
Review Comment:
suggest:
```
A {@link DagProc} to launch any subsequent (dependent) job(s) once all
pre-requisite job(s) in the Dag have succeeded. When there are no more jobs to
run and no more running, it cleans up the Dag.
```
##########
gobblin-service/src/main/java/org/apache/gobblin/service/monitoring/KafkaJobStatusMonitor.java:
##########
@@ -193,7 +197,18 @@ protected void
processMessage(DecodeableKafkaRecord<byte[],byte[]> message) {
org.apache.gobblin.configuration.State jobStatus =
parseJobStatus(gobblinTrackingEvent);
if (jobStatus != null) {
try (Timer.Context context =
getMetricContext().timer(GET_AND_SET_JOB_STATUS).time()) {
- addJobStatusToStateStore(jobStatus, this.stateStore,
this.eventProducer);
+ Optional<org.apache.gobblin.configuration.State> updatedJobStatus
= addJobStatusToStateStore(jobStatus, this.stateStore);
+ boolean isJobStatusUpdated = updatedJobStatus.isPresent();
Review Comment:
I don't see a need for this intermediate name. javadoc on
`addJobStatusToStateStore` should be the place to convey semantics of
`Optional::isPresent()` vs. not
##########
gobblin-service/src/main/java/org/apache/gobblin/service/modules/orchestration/proc/ReevaluateDagProc.java:
##########
@@ -0,0 +1,194 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.gobblin.service.modules.orchestration.proc;
+
+import java.io.IOException;
+import java.util.Optional;
+import java.util.Set;
+
+import org.apache.commons.lang3.tuple.ImmutablePair;
+import org.apache.commons.lang3.tuple.Pair;
+
+import lombok.extern.slf4j.Slf4j;
+
+import org.apache.gobblin.metrics.event.TimingEvent;
+import org.apache.gobblin.service.ExecutionStatus;
+import org.apache.gobblin.service.modules.flowgraph.Dag;
+import org.apache.gobblin.service.modules.flowgraph.DagNodeId;
+import
org.apache.gobblin.service.modules.orchestration.DagManagementStateStore;
+import org.apache.gobblin.service.modules.orchestration.DagManagerUtils;
+import org.apache.gobblin.service.modules.orchestration.task.ReevaluateDagTask;
+import org.apache.gobblin.service.modules.spec.JobExecutionPlan;
+import org.apache.gobblin.service.monitoring.FlowStatusGenerator;
+import org.apache.gobblin.service.monitoring.JobStatus;
+
+
+/**
+ * An implementation for {@link DagProc} that launches a new job if there
exists a job whose pre-requisite jobs are
+ * completed successfully. If there are no more jobs to run and no job is
running for the Dag, it cleans up the Dag.
+ * (In future), if there are multiple new jobs to be launched, separate launch
dag actions are created for each of them.
+ */
+@Slf4j
+public class ReevaluateDagProc extends
DagProc<Pair<Optional<Dag.DagNode<JobExecutionPlan>>, Optional<JobStatus>>> {
+
+ public ReevaluateDagProc(ReevaluateDagTask reEvaluateDagTask) {
+ super(reEvaluateDagTask);
+ }
+
+ @Override
+ protected Pair<Optional<Dag.DagNode<JobExecutionPlan>>, Optional<JobStatus>>
initialize(DagManagementStateStore dagManagementStateStore)
+ throws IOException {
+ Pair<Optional<Dag.DagNode<JobExecutionPlan>>, Optional<JobStatus>>
dagNodeWithJobStatus =
+ dagManagementStateStore.getDagNodeWithJobStatus(this.dagNodeId);
+
+ if (!dagNodeWithJobStatus.getLeft().isPresent() ||
!dagNodeWithJobStatus.getRight().isPresent()) {
+ // this is possible when MALA malfunctions and a duplicated reevaluate
dag proc is launched for a dag node that is
+ // already "reevaluated" and cleaned up.
+ return ImmutablePair.of(Optional.empty(), Optional.empty());
+ }
+
+ ExecutionStatus executionStatus =
ExecutionStatus.valueOf(dagNodeWithJobStatus.getRight().get().getEventName());
+ if
(!FlowStatusGenerator.FINISHED_STATUSES.contains(executionStatus.name())) {
+ log.warn("Job status for dagNode {} is {}. Re-evaluate dag action should
have been created only for finished status - {}",
+ dagNodeId, executionStatus, FlowStatusGenerator.FINISHED_STATUSES);
+ return ImmutablePair.of(Optional.empty(), Optional.empty());
+ }
+
+ setStatus(dagManagementStateStore, dagNodeWithJobStatus.getLeft().get(),
executionStatus);
+ return dagNodeWithJobStatus;
+ }
+
+ @Override
+ protected void act(DagManagementStateStore dagManagementStateStore,
Pair<Optional<Dag.DagNode<JobExecutionPlan>>, Optional<JobStatus>>
dagNodeWithJobStatus)
+ throws IOException {
+ if (!dagNodeWithJobStatus.getLeft().isPresent()) {
+ log.error("DagNode or its job status not found for a Reevaluate
DagAction with dag node id {}", this.dagNodeId);
+ return;
+ }
+
+ Dag.DagNode<JobExecutionPlan> dagNode =
dagNodeWithJobStatus.getLeft().get();
+ JobStatus jobStatus = dagNodeWithJobStatus.getRight().get();
+ ExecutionStatus executionStatus = dagNode.getValue().getExecutionStatus();
+ onJobFinish(dagManagementStateStore, dagNode, executionStatus);
+ Dag<JobExecutionPlan> dag =
dagManagementStateStore.getDag(getDagId()).get();
+
+ if (jobStatus.isShouldRetry()) {
+ log.info("Retrying job: {}, current attempts: {}, max attempts: {}",
+ DagManagerUtils.getFullyQualifiedJobName(dagNode),
jobStatus.getCurrentAttempts(), jobStatus.getMaxAttempts());
+ // todo - be careful when unsetting this, it is possible that this is
set to FAILED because some other job in the
+ // dag failed and is also not retryable. in that case if this job's
retry passes, overall status of the dag can be
+ // set to PASS, which would be incorrect.
+ dag.setFlowEvent(null);
+ DagProcUtils.submitJobToExecutor(dagManagementStateStore, dagNode,
getDagId());
+ } else if (!dagManagementStateStore.hasRunningJobs(getDagId())) {
+ if (dag.getFlowEvent() == null) {
+ // If the dag flow event is not set and there are no more jobs
running, then it is successful
+ // also note that `onJobFinish` method does whatever is required to do
after job finish, determining a Dag's
+ // status is not possible on individual job's finish status
+ dag.setFlowEvent(TimingEvent.FlowTimings.FLOW_SUCCEEDED);
+ }
+ String flowEvent = dag.getFlowEvent();
+ DagManagerUtils.emitFlowEvent(eventSubmitter, dag, flowEvent);
+ if (flowEvent.equals(TimingEvent.FlowTimings.FLOW_SUCCEEDED)) {
+ // todo - verify if work from PR#3641 is required
+ dagManagementStateStore.deleteDag(getDagId());
+ } else {
+ dagManagementStateStore.markDagFailed(dag);
+ }
+ }
+ }
+
+ /**
+ * Sets status of a dag node inside the given Dag.
+ * todo - DMSS should support this functionality like an atomic get-and-set
operation.
+ */
+ private void setStatus(DagManagementStateStore dagManagementStateStore,
+ Dag.DagNode<JobExecutionPlan> dagNode, ExecutionStatus executionStatus)
throws IOException {
+ Dag<JobExecutionPlan> dag =
dagManagementStateStore.getDag(getDagId()).get();
+ DagNodeId dagNodeId = dagNode.getValue().getId();
+ for (Dag.DagNode<JobExecutionPlan> node : dag.getNodes()) {
+ if (node.getValue().getId().equals(dagNodeId)) {
+ node.getValue().setExecutionStatus(executionStatus);
+ dagManagementStateStore.checkpointDag(dag);
+ return;
+ }
+ }
+ log.error("DagNode with id {} not found in Dag {}", dagNodeId, getDagId());
+ }
+
+ /**
+ * Method that defines the actions to be performed when a job finishes
either successfully or with failure.
+ * This method updates the state of the dag and performs clean up actions as
necessary.
+ */
+ private void onJobFinish(DagManagementStateStore dagManagementStateStore,
+ Dag.DagNode<JobExecutionPlan> dagNode, ExecutionStatus executionStatus)
+ throws IOException {
+ String jobName = DagManagerUtils.getFullyQualifiedJobName(dagNode);
+ log.info("Job {} of Dag {} has finished with status {}", jobName,
getDagId(), executionStatus.name());
+ // Only decrement counters and quota for jobs that actually ran on the
executor, not from a GaaS side failure/skip event
+ if (dagManagementStateStore.releaseQuota(dagNode)) {
+
dagManagementStateStore.getDagManagerMetrics().decrementRunningJobMetrics(dagNode);
+ }
+
+ Dag<JobExecutionPlan> dag =
dagManagementStateStore.getDag(getDagId()).get();
+
+ switch (executionStatus) {
+ case FAILED:
+ dag.setMessage("Flow failed because job " + jobName + " failed");
+ dag.setFlowEvent(TimingEvent.FlowTimings.FLOW_FAILED);
+
dagManagementStateStore.getDagManagerMetrics().incrementExecutorFailed(dagNode);
+ break;
+ case CANCELLED:
+ dag.setFlowEvent(TimingEvent.FlowTimings.FLOW_CANCELLED);
+ break;
+ case COMPLETE:
+
dagManagementStateStore.getDagManagerMetrics().incrementExecutorSuccess(dagNode);
+ submitNext(dagManagementStateStore);
+ break;
+ default:
+ log.warn("It should not reach here. Job status {} is unexpected.",
executionStatus);
+ }
+
+ //Checkpoint the dag state, it should have an updated value of dag fields
+ dagManagementStateStore.checkpointDag(dag);
+ dagManagementStateStore.deleteDagNodeState(getDagId(), dagNode);
+ }
+
+ /**
+ * Submit next set of Dag nodes in the Dag identified by the provided dagId
+ */
+ void submitNext(DagManagementStateStore dagManagementStateStore) throws
IOException {
Review Comment:
we can wait to refactor until we actually implement `handleMultipleJobs`,
but for now, please keep this method signature
[aligned](https://github.com/apache/gobblin/blob/fb85f07e544dae8cd51edff1387136a55df28aa5/gobblin-service/src/main/java/org/apache/gobblin/service/modules/orchestration/proc/LaunchDagProc.java#L87)
w/ `LaunchDagProc::submitNext`.
my expectation is to eventually have a common base class (e.g.
`AbstractExecutorSubmittingDagProc<T>`), which only those two among the
`DagProc`s would derive from
##########
gobblin-service/src/main/java/org/apache/gobblin/service/monitoring/KafkaJobStatusMonitor.java:
##########
@@ -193,7 +197,18 @@ protected void
processMessage(DecodeableKafkaRecord<byte[],byte[]> message) {
org.apache.gobblin.configuration.State jobStatus =
parseJobStatus(gobblinTrackingEvent);
if (jobStatus != null) {
try (Timer.Context context =
getMetricContext().timer(GET_AND_SET_JOB_STATUS).time()) {
- addJobStatusToStateStore(jobStatus, this.stateStore,
this.eventProducer);
+ Optional<org.apache.gobblin.configuration.State> updatedJobStatus
= addJobStatusToStateStore(jobStatus, this.stateStore);
+ boolean isJobStatusUpdated = updatedJobStatus.isPresent();
+ // todo - retried/resumed jobs *may* not be handled here, we may
want to create their dag action elsewhere
+ if (isJobStatusUpdated) {
+ jobStatus = updatedJobStatus.get();
+ this.eventProducer.emitObservabilityEvent(jobStatus);
+ String flowName =
jobStatus.getProp(TimingEvent.FlowEventConstants.FLOW_NAME_FIELD);
+ String flowGroup =
jobStatus.getProp(TimingEvent.FlowEventConstants.FLOW_GROUP_FIELD);
+ String flowExecutionId =
jobStatus.getProp(TimingEvent.FlowEventConstants.FLOW_EXECUTION_ID_FIELD);
+ String jobName =
jobStatus.getProp(TimingEvent.FlowEventConstants.JOB_NAME_FIELD);
+ this.dagActionStore.addJobDagAction(flowGroup, flowName,
flowExecutionId, jobName, DagActionStore.DagActionType.REEVALUATE);
Review Comment:
seems like a common pattern. where would you suggest to add a util factory
method like:
```
DagAction createFromJobState(State s, DagActionType actionType)
```
?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]