[ https://issues.apache.org/jira/browse/GOBBLIN-1509?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Kip Kohn updated GOBBLIN-1509: ------------------------------ Summary: Ensure flows transition to FAILED and not stuck in COMPILED upon DagManager::addDag error (was: Ensure flows not stuck in COMPILED upon DagManager::addDag error) > Ensure flows transition to FAILED and not stuck in COMPILED upon > DagManager::addDag error > ----------------------------------------------------------------------------------------- > > Key: GOBBLIN-1509 > URL: https://issues.apache.org/jira/browse/GOBBLIN-1509 > Project: Apache Gobblin > Issue Type: Bug > Reporter: Kip Kohn > Priority: Major > > Presently, addDag failure leaves the flow marooned in the COMPILED state, as > the warranted FLOW_FAILED event is never sent. Particularly insidious is > that scheduled flows with their execution stuck in COMPILED miss their next > execution, unless `flow.allowConcurrentExecutions` is set. Thus the > scheduled flow is stuck in its entirety, not merely a single execution. > One observed cause of addDag failure is when the DagStateStore is backed by a > replicated DB (e.g. MySqlDagStateStore) that just switched leaders. Cached > connections in the pool may suddently point to a read-only follower unable to > DagStateStore::writeCheckpoint. -- This message was sent by Atlassian Jira (v8.3.4#803005)