The reason I opened this thread is because there are some concerns from user stale processes in error. Let me quote one of them "I can see the system being flooded with those instances in case of wrong params or an system which is down that fails the flwo" Therefore, the use case certainly exists and we need to cope with it somehow. Another possible solution, rather than a timeout, is to allow automatic abortion of a workflow which suffers an error. That makes sense for workflows which are unlikely to be retriggered. Maybe we can add process metadata to indicate a process should be aborted when an error occurs.
On Fri, May 3, 2024 at 10:01 PM Enrique Gonzalez Martinez < [email protected]> wrote: > To be honest error state in a process is a bit strange. Anyway at this > point it means that the process is active and stale for some reason. > > Recovery should be something that historically needs to have some human > intervention so i dont see why would you try to clean up anything in the > system. The instance is alive but staled. Aborting a process should be done > manually. We cannot make a decision on behalf of the user. > > I am open to discuss re triggering mechanism but not for aborting process > instance automatically. It does not cover a real use scenario. > I am also open to discuss certain abort policies if certain criterias are > met. > > -1 to the proposal. > > > El vie, 3 may 2024, 13:11, Francisco Javier Tirado Sarti < > [email protected]> escribió: > > > Hi all, > > According to my interpretation of the engine code [1], all unexpected and > > unhandled errors during node execution are currently intercepted and the > > process state is set to Error, but the process instance remains active to > > allow users to update the model and retrigger process instance execution. > > Although a clever approach to allow recovery of processes that uses do > not > > want to execute again from start (they might have failed because there > was > > a typo in a human task), this potentially creates a large number of idle > > process instances that are not going to be deleted from memory/db > > (depending if persistence is configured or not, in production it will be) > > unless the users manually abort them. If the user does not monitor them, > > this policy might jeopardize the performance of the whole application. > > I would like to explore the possibility of setting a timeout for process > > instances on error (that will be of course configurable). If the process > > instance has not been acted upon for a reasonable amount of time, it will > > be automatically aborted. > > > > [1] > > > > > https://github.com/apache/incubator-kie-kogito-runtimes/blob/main/jbpm/jbpm-flow/src/main/java/org/jbpm/workflow/instance/impl/NodeInstanceImpl.java#L247-L251 > > >
