Just a couple of thoughts to add to the discussion...

The concept of the composite state manager really doesn't require any
framework level changes right? Even if we chose not to provide it out
of the box, it could still be implemented by anyone in a custom NAR.

For the state id, I do think this is a common problem for anyone
trying to recreate a flow and pick up where a previous instance of the
flow left off. I'm not sure if this would make it any simpler, but I
wonder if something could be done at a process group level when a
process group is under version control. If there was a unique id that
could be set at the versioned process group level, essentially an
instance/deployment id of the flow, then the component state id could
be the versioned component id + the versioned PG deployment id. This
could be done for all stateful components in the versioned PG, rather
than needing to enter it for every stateful component. It would still
require setting this new id manually, but just moves it a level higher
from individual components.


On Sat, Dec 9, 2023 at 5:48 PM David Handermann
<[email protected]> wrote:
>
> Pierre,
>
> Thanks for taking the time to put together the feature proposal with
> additional background on the related Jira issues. The topic of
> disaster recovery is an important and challenging one, so it is
> definitely worth careful consideration.
>
> At a high level, I think it is worth considering some additional use
> case flows, as that would highlight some additional considerations.
>
> Taking the ListS3 to PutDatabaseRecord example, it raises some
> concerns in itself that might be addressed in better ways. For
> example, instead of relying on localized cluster state, a more
> resilient approach could make use of an event queueing system like
> Amazon SQS or SNS based on S3 Event Notifications. That avoids ListS3
> entirely and provides a more fault-tolerant architecture for tracking
> S3 items to be processed.
>
> The other half of the equation is the destination database. Although
> it is also external to NiFi, the example provided implies that the
> destination supports global redundancy such that communication from
> different regions remains possible in the event of a single region
> failure. That is certainly possible with various storage solutions, it
> just highlights the fact that a true disaster recovery configuration
> requires end-to-end design.
>
> In the initial proposal, the diagrams show regional State Management
> solutions. The concept of a composite state management solution is
> interesting, but it seems to be attempting to make up for the lack of
> a true distributed, resilient, and cross-region state management
> solution. Granted, ZooKeeper and Kubernetes ConfigMap storage may not
> be a good fit for a cross-region solution. However, it seems like it
> would be better to evaluate an optimal cross-region state management
> implementation, as opposed implementing some type of replication or
> leader-follower design in NiFi itself.
>
> To be clear, this is certainly a topic worth considering, but I am not
> confident that the implementation steps outlined in the initial two
> Jira issues will provide a robust or maintainable solution. Supporting
> component-level configuration of a custom state identifier seems prone
> to error, and also requires a lot of manual configuration at the
> individual Processor level. Supporting a composite state management
> could have other benefits, but it also adds a layer of complexity that
> may not even achieve the desired outcome, depending on the
> capabilities of the underlying storage implementations.
>
> With that background, I think it would be worth evaluating alternative
> approaches before moving to any kind of implementation. I'm sure there
> are aspects I have not considered, so I welcome additional perspective
> on the positives and negatives of the proposed solution.
>
> Regards,
> David Handermann
>
> On Fri, Dec 8, 2023 at 8:32 AM Pierre Villard
> <[email protected]> wrote:
> >
> > Team,
> >
> > I just published a feature proposal here:
> > https://cwiki.apache.org/confluence/display/NIFI/State+Management+improvements+for+Disaster+Recovery+scenarios
> >
> > This feature proposal is to provide a more detailed explanation around the
> > two below JIRAs:
> > https://issues.apache.org/jira/browse/NIFI-11776
> > https://issues.apache.org/jira/browse/NIFI-11777
> >
> > I'd love to hear your thoughts before we get started with the actual
> > implementation.
> >
> > Thanks,
> > Pierre

Reply via email to