debabhishek53 commented on code in PR #4187:
URL: https://github.com/apache/gobblin/pull/4187#discussion_r3115366930


##########
gobblin-restli/gobblin-flow-config-service/gobblin-flow-config-service-server/src/main/java/org/apache/gobblin/service/FlowExecutionResource.java:
##########
@@ -220,13 +220,25 @@ public static FlowExecution convertFlowStatus(FlowStatus 
monitoringFlowStatus,
 
     
jobStatusArray.sort(Comparator.comparing((org.apache.gobblin.service.JobStatus 
js) -> js.getExecutionStatistics().getExecutionStartTime()));
 
+    // $UNKNOWN is a Pegasus in-memory sentinel (not declared in 
ExecutionStatus.pdl) that can arise
+    // when no flow-level (NA/NA) status event was persisted for this 
execution (e.g., orchestration
+    // race in ReevaluateDagProc). Serializing it produces HTTP 500 and 
poisons the whole collection
+    // response. Coerce to PENDING so the record serializes; the flow status 
is effectively "unknown
+    // but the flow exists", and polling callers will keep polling until a 
terminal state is known.
+    ExecutionStatus flowExecutionStatus = 
monitoringFlowStatus.getFlowExecutionStatus();
+    if (flowExecutionStatus == ExecutionStatus.$UNKNOWN) {
+      log.warn("FlowExecution {}/{}/{} has $UNKNOWN flow status; coercing to 
PENDING. Check state store for data quality issue.",
+          flowId.getFlowGroup(), flowId.getFlowName(), 
monitoringFlowStatus.getFlowExecutionId());
+      flowExecutionStatus = ExecutionStatus.PENDING;

Review Comment:
   Why PENDING over RUNNING as the coercion target? RUNNING might be a closer 
semantic match than PENDING, since the flow is clearly active enough for 
executions to appear in the collection response.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to