Re: [DISCUSS] State Management improvements for DR scenarios

Bryan Bende Wed, 13 Dec 2023 18:05:37 -0800

Pierre,

Yes, what you described sounds like what I was thinking. It was really
just shifting the user specified state id from processor level to PG
level, since it feels like we really would want to use the versioned
component ids, but we can't since the same flow can be imported
multiple times in your NiFi instance, so we need some differentiator
which seemed like it could be a user specific "deployment id" or "flow
id" or whatever the right terminology is to describe an instance of an
imported flow.


There are really three ways to end up with versioned ids...

1. Create PG by importing from registry
2. Create PG by uploading flow definition json
3. Create PG and issue replace request with flow definition json

I guess where this idea breaks down is with nesting/composing of PGs.
Imagine there is parent PG with child PGs A and B which were both
created from the same versioned flow content. You wouldn't be able to
set this new id at the parent PG level, it would have to be set on
each of the child PGs to differentiate the processors inside them. So
maybe this gets kind of confusing too.

I'm not opposed to the component level state id, I was just trying to
think if there were any other ways of doing it.

On Wed, Dec 13, 2023 at 1:33 PM Pierre Villard
<[email protected]> wrote:
>
> >
> > Even with an option for deterministic state identifiers, I am still
> > concerned about a composite state manager provider. It seems prone to
> > getting providers out of sync, as either backing state store could fail,
> > resulting in inconsistent state across regions. This is the primary reason
> > why it seems better to evaluate a resilient storage solution instead of a
> > composite solution.
> >
>
> I 100% agree with this statement. And my preference would be to have
> implementations that leverage solutions where cross-region replication is
> already available.
>
> Supporting customization of the state identifier at the component seems
> > like it gets complicated at the user level. Bryan's suggestion about a
> > Process Group setting would be helpful along these lines. Taking that one
> > step further, if this were a configurable mode, deterministic UUID version
> > 5 could be used, which could base the state identifier on the Process Group
> > Name and Component Name. That would yield an identifier that would be the
> > same across flows without component level configuration. It still raises
> > questions about uniqueness, but as a Process Group setting or even
> > application property, it is much less complex for a user.
> >
>
> Having a deterministic UUID based on PG ID and name would be awesome but I
> share the same concern around the unicity. I would not be surprised if a
> single process group contains multiple instances of the same component type
> that requires a state and where the user is not changing the default name.
>
> I'm not sure I completely follow the suggestion by Bryan. Is the below what
> we're suggesting?
> - We would have a custom ID at the process group level that would have to
> be set by the user when starting version control OR when checking out a
> versioned flow? (this would not be part of what is versioned and could
> default to the UUID of the process group and only exposed in an "advanced
> section" to not confuse normal users)
> - If this is set, then the state ID would be something like a deterministic
> UUID made of this custom ID + versioned ID of the component
>
> Still trying to wrap my head around this in case the same versioned flow is
> instantiated multiple times in both regions.
>
> Le lun. 11 déc. 2023 à 22:17, David Handermann <[email protected]>
> a écrit :
>
> > Pierre,
> >
> > Thanks for the helpful reply.
> >
> > I am not concerned about complexity where necessary, so the main issue is
> > about the complexity to support custom state identifiers at the framework
> > level, versus a resilient state management solution.
> >
> > Supporting customization of the state identifier at the component seems
> > like it gets complicated at the user level. Bryan's suggestion about a
> > Process Group setting would be helpful along these lines. Taking that one
> > step further, if this were a configurable mode, deterministic UUID version
> > 5 could be used, which could base the state identifier on the Process Group
> > Name and Component Name. That would yield an identifier that would be the
> > same across flows without component level configuration. It still raises
> > questions about uniqueness, but as a Process Group setting or even
> > application property, it is much less complex for a user.
> >
> > Even with an option for deterministic state identifiers, I am still
> > concerned about a composite state manager provider. It seems prone to
> > getting providers out of sync, as either backing state store could fail,
> > resulting in inconsistent state across regions. This is the primary reason
> > why it seems better to evaluate a resilient storage solution instead of a
> > composite solution.
> >
> > Regards,
> > David Handermann
> >
> > On Mon, Dec 11, 2023, 6:23 AM Pierre Villard <[email protected]>
> > wrote:
> >
> > > Hi David and Bryan, thanks for the feedback.
> > >
> > > David,
> > >
> > > Regarding the source and destination. It was just an example. There are
> > > plenty of different use cases you can think of. Of course, this is
> > assuming
> > > that the user of NiFi has a source and destination which are highly
> > > available but this is the user's responsibility. You can take the example
> > > of ListSFTP -> FetchSFTP -> doSomething -> PutS3.
> > >
> > > I definitely agree that having a state manager that supports cross region
> > > replication would be ideal (and IIRC Redis does support this). The
> > approach
> > > of leader/follower makes it easier from a user point of view but I'm
> > > definitely fine considering state manager implementations - for example,
> > in
> > > the past, I suggested a database based implementation (Postgres would be
> > a
> > > good candidate for example).
> > >
> > > When you say it's adding a layer of complexity, it definitely does but
> > only
> > > for users that are looking for options when it comes to DR. In any case,
> > > setting up any piece of software to support DR scenarios requires a lot
> > of
> > > work and is a lot of complexity. However the proposal is not changing
> > > anything to the default behavior and most users would not ever care about
> > > the custom state ID. The goal is to provide ways for a user to support DR
> > > scenarios.
> > >
> > > The big topic is the state ID, right now the coupling between the UUID
> > and
> > > the state is making things extremely complex.
> > >
> > > Bryan,
> > >
> > > Yeah that could be an option I guess. Need to think more about it.
> > >
> > >
> > > Le lun. 11 déc. 2023 à 03:39, Bryan Bende <[email protected]> a écrit :
> > >
> > > > Just a couple of thoughts to add to the discussion...
> > > >
> > > > The concept of the composite state manager really doesn't require any
> > > > framework level changes right? Even if we chose not to provide it out
> > > > of the box, it could still be implemented by anyone in a custom NAR.
> > > >
> > > > For the state id, I do think this is a common problem for anyone
> > > > trying to recreate a flow and pick up where a previous instance of the
> > > > flow left off. I'm not sure if this would make it any simpler, but I
> > > > wonder if something could be done at a process group level when a
> > > > process group is under version control. If there was a unique id that
> > > > could be set at the versioned process group level, essentially an
> > > > instance/deployment id of the flow, then the component state id could
> > > > be the versioned component id + the versioned PG deployment id. This
> > > > could be done for all stateful components in the versioned PG, rather
> > > > than needing to enter it for every stateful component. It would still
> > > > require setting this new id manually, but just moves it a level higher
> > > > from individual components.
> > > >
> > > >
> > > > On Sat, Dec 9, 2023 at 5:48 PM David Handermann
> > > > <[email protected]> wrote:
> > > > >
> > > > > Pierre,
> > > > >
> > > > > Thanks for taking the time to put together the feature proposal with
> > > > > additional background on the related Jira issues. The topic of
> > > > > disaster recovery is an important and challenging one, so it is
> > > > > definitely worth careful consideration.
> > > > >
> > > > > At a high level, I think it is worth considering some additional use
> > > > > case flows, as that would highlight some additional considerations.
> > > > >
> > > > > Taking the ListS3 to PutDatabaseRecord example, it raises some
> > > > > concerns in itself that might be addressed in better ways. For
> > > > > example, instead of relying on localized cluster state, a more
> > > > > resilient approach could make use of an event queueing system like
> > > > > Amazon SQS or SNS based on S3 Event Notifications. That avoids ListS3
> > > > > entirely and provides a more fault-tolerant architecture for tracking
> > > > > S3 items to be processed.
> > > > >
> > > > > The other half of the equation is the destination database. Although
> > > > > it is also external to NiFi, the example provided implies that the
> > > > > destination supports global redundancy such that communication from
> > > > > different regions remains possible in the event of a single region
> > > > > failure. That is certainly possible with various storage solutions,
> > it
> > > > > just highlights the fact that a true disaster recovery configuration
> > > > > requires end-to-end design.
> > > > >
> > > > > In the initial proposal, the diagrams show regional State Management
> > > > > solutions. The concept of a composite state management solution is
> > > > > interesting, but it seems to be attempting to make up for the lack of
> > > > > a true distributed, resilient, and cross-region state management
> > > > > solution. Granted, ZooKeeper and Kubernetes ConfigMap storage may not
> > > > > be a good fit for a cross-region solution. However, it seems like it
> > > > > would be better to evaluate an optimal cross-region state management
> > > > > implementation, as opposed implementing some type of replication or
> > > > > leader-follower design in NiFi itself.
> > > > >
> > > > > To be clear, this is certainly a topic worth considering, but I am
> > not
> > > > > confident that the implementation steps outlined in the initial two
> > > > > Jira issues will provide a robust or maintainable solution.
> > Supporting
> > > > > component-level configuration of a custom state identifier seems
> > prone
> > > > > to error, and also requires a lot of manual configuration at the
> > > > > individual Processor level. Supporting a composite state management
> > > > > could have other benefits, but it also adds a layer of complexity
> > that
> > > > > may not even achieve the desired outcome, depending on the
> > > > > capabilities of the underlying storage implementations.
> > > > >
> > > > > With that background, I think it would be worth evaluating
> > alternative
> > > > > approaches before moving to any kind of implementation. I'm sure
> > there
> > > > > are aspects I have not considered, so I welcome additional
> > perspective
> > > > > on the positives and negatives of the proposed solution.
> > > > >
> > > > > Regards,
> > > > > David Handermann
> > > > >
> > > > > On Fri, Dec 8, 2023 at 8:32 AM Pierre Villard
> > > > > <[email protected]> wrote:
> > > > > >
> > > > > > Team,
> > > > > >
> > > > > > I just published a feature proposal here:
> > > > > >
> > > >
> > >
> > https://cwiki.apache.org/confluence/display/NIFI/State+Management+improvements+for+Disaster+Recovery+scenarios
> > > > > >
> > > > > > This feature proposal is to provide a more detailed explanation
> > > around
> > > > the
> > > > > > two below JIRAs:
> > > > > > https://issues.apache.org/jira/browse/NIFI-11776
> > > > > > https://issues.apache.org/jira/browse/NIFI-11777
> > > > > >
> > > > > > I'd love to hear your thoughts before we get started with the
> > actual
> > > > > > implementation.
> > > > > >
> > > > > > Thanks,
> > > > > > Pierre
> > > >
> > >
> >

Re: [DISCUSS] State Management improvements for DR scenarios

Reply via email to