Re: Re: a question about nifi's source code
Hi,Andy It seems I found the problems, I compare the sourcecode which i am viewing to the sourcecode on github, it appears to be different, and i am confused, because i didn't change it myslf, i will keep this update as long as i found the reason. thanks you very much. Best Regards YuNing From: Andy LoPresto Date: 2017-08-02 14:01 To: dev Subject: Re: a question about nifi's source code Hi YuNing, The Abstract policy authorizer delegates the updateGroup action to doUpdateGroup, which is implemented by the extending concrete class. I'm not sure where you are seeing that it checks for the absence of the group, but there is an addGroup method for adding a new group. Andy LoPresto alopre...@apache.org alopresto.apa...@gmail.com PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4 BACE 3C6E F65B 2F7D EF69 > On Aug 1, 2017, at 19:48, YuNing wrote: > > Hello,everyone >I am work on reading nifi's source code . it confuse me when i read > the class "AbstractPolicyBasedAuthorizer", I found there is no method to add > a user to an existing group. I noticed there is an "updateGroup" method > ,which i don't think can be used to add users to existing group, because it > checks the input group to make sure it didn't exist. >Am i right? can anyone help me, thanks alot ! > > Best Regards > YuNing >
Re: Re: a question about nifi's source code
this is the updateGroup Method , and doUpdateGroup method; /** * The group represented by the provided instance will be updated based on the provided instance. * * @param group an updated group instance * @return the updated group instance, or null if no matching group was found * @throws AuthorizationAccessException if there was an unexpected error performing the operation * @throws IllegalStateException if there is already a group with the same name */ public final synchronized Group updateGroup(Group group) throws AuthorizationAccessException { if (tenantExists(group.getIdentifier(), group.getName())) { throw new IllegalStateException(String.format("User/user group already exists with the identity '%s'.", group.getName())); } return doUpdateGroup(group); } /** * The group represented by the provided instance will be updated based on the provided instance. * * @param group an updated group instance * @return the updated group instance, or null if no matching group was found * @throws AuthorizationAccessException if there was an unexpected error performing the operation */ public abstract Group doUpdateGroup(Group group) throws AuthorizationAccessException; bel...@163.com From: Andy LoPresto Date: 2017-08-02 14:01 To: dev Subject: Re: a question about nifi's source code Hi YuNing, The Abstract policy authorizer delegates the updateGroup action to doUpdateGroup, which is implemented by the extending concrete class. I'm not sure where you are seeing that it checks for the absence of the group, but there is an addGroup method for adding a new group. Andy LoPresto alopre...@apache.org alopresto.apa...@gmail.com PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4 BACE 3C6E F65B 2F7D EF69 > On Aug 1, 2017, at 19:48, YuNing wrote: > > Hello,everyone >I am work on reading nifi's source code . it confuse me when i read > the class "AbstractPolicyBasedAuthorizer", I found there is no method to add > a user to an existing group. I noticed there is an "updateGroup" method > ,which i don't think can be used to add users to existing group, because it > checks the input group to make sure it didn't exist. >Am i right? can anyone help me, thanks alot ! > > Best Regards > YuNing >
Re: Re: a question about nifi's source code
png pciture can't be seen, i copy it there. /** * Adds a new group. * * @param group the Group to add * @return the added Group * @throws AuthorizationAccessException if there was an unexpected error performing the operation * @throws IllegalStateException if a group with the same name already exists */ public final synchronized Group addGroup(Group group) throws AuthorizationAccessException { if (tenantExists(group.getIdentifier(), group.getName())) { throw new IllegalStateException(String.format("User/user group already exists with the identity '%s'.", group.getName())); } return doAddGroup(group); } From: Andy LoPresto Date: 2017-08-02 14:01 To: dev Subject: Re: a question about nifi's source code Hi YuNing, The Abstract policy authorizer delegates the updateGroup action to doUpdateGroup, which is implemented by the extending concrete class. I'm not sure where you are seeing that it checks for the absence of the group, but there is an addGroup method for adding a new group. Andy LoPresto alopre...@apache.org alopresto.apa...@gmail.com PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4 BACE 3C6E F65B 2F7D EF69 > On Aug 1, 2017, at 19:48, YuNing wrote: > > Hello,everyone >I am work on reading nifi's source code . it confuse me when i read > the class "AbstractPolicyBasedAuthorizer", I found there is no method to add > a user to an existing group. I noticed there is an "updateGroup" method > ,which i don't think can be used to add users to existing group, because it > checks the input group to make sure it didn't exist. >Am i right? can anyone help me, thanks alot ! > > Best Regards > YuNing >
Re: Re: a question about nifi's source code
Hi Andy, The concrete class I am using is FileAuthorizer, and updateGroup of abstract policy authorizer method is listed below. i didn't know why it needs to check th group's existence, and i didn't know how to add a user to a group. thanks for your reply. Best Regards YuNing From: Andy LoPresto Date: 2017-08-02 14:01 To: dev Subject: Re: a question about nifi's source code Hi YuNing, The Abstract policy authorizer delegates the updateGroup action to doUpdateGroup, which is implemented by the extending concrete class. I'm not sure where you are seeing that it checks for the absence of the group, but there is an addGroup method for adding a new group. Andy LoPresto alopre...@apache.org alopresto.apa...@gmail.com PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4 BACE 3C6E F65B 2F7D EF69 > On Aug 1, 2017, at 19:48, YuNing wrote: > > Hello,everyone >I am work on reading nifi's source code . it confuse me when i read > the class "AbstractPolicyBasedAuthorizer", I found there is no method to add > a user to an existing group. I noticed there is an "updateGroup" method > ,which i don't think can be used to add users to existing group, because it > checks the input group to make sure it didn't exist. >Am i right? can anyone help me, thanks alot ! > > Best Regards > YuNing >
Re: a question about nifi's source code
Hi YuNing, The Abstract policy authorizer delegates the updateGroup action to doUpdateGroup, which is implemented by the extending concrete class. I'm not sure where you are seeing that it checks for the absence of the group, but there is an addGroup method for adding a new group. Andy LoPresto alopre...@apache.org alopresto.apa...@gmail.com PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4 BACE 3C6E F65B 2F7D EF69 > On Aug 1, 2017, at 19:48, YuNing wrote: > > Hello,everyone >I am work on reading nifi's source code . it confuse me when i read > the class "AbstractPolicyBasedAuthorizer", I found there is no method to add > a user to an existing group. I noticed there is an "updateGroup" method > ,which i don't think can be used to add users to existing group, because it > checks the input group to make sure it didn't exist. >Am i right? can anyone help me, thanks alot ! > > Best Regards > YuNing >
a question about nifi's source code
Hello,everyone I am work on reading nifi's source code . it confuse me when i read the class "AbstractPolicyBasedAuthorizer", I found there is no method to add a user to an existing group. I noticed there is an "updateGroup" method ,which i don't think can be used to add users to existing group, because it checks the input group to make sure it didn't exist. Am i right? can anyone help me, thanks alot ! Best Regards YuNing
nifi is unmodifiable
Hi, For some reason when I start NiFi, I am unable to edit anything. I have attached my authorizers, users, and authorizations file. From the console I can login with ldap and the current-user API call is returning this: {"identity":"my-distinguished-name","anonymous":false,"provenancePermissions":{"canRead":true,"canWrite":true},"countersPermissions":{"canRead":true,"canWrite":true},"tenantsPermissions":{"canRead":true,"canWrite":true},"controllerPermissions":{"canRead":true,"canWrite":true},"policiesPermissions":{"canRead":true,"canWrite":true},"systemPermissions":{"canRead":true,"canWrite":true},"restrictedComponentsPermissions":{"canRead":true,"canWrite":true}} In all the files and in the response json I replaced my DN with “my-distinguished-name” but I did confirm the values match up exactly. For some reason I still cannot edit the flow from the console. I don’t see anything suspicious in the logs. Please help, Michael Knapp The information contained in this e-mail is confidential and/or proprietary to Capital One and/or its affiliates and may only be used solely in performance of work or services for Capital One. The information transmitted herewith is intended only for use by the individual or entity to which it is addressed. If the reader of this message is not the intended recipient, you are hereby notified that any review, retransmission, dissemination, distribution, copying or other use of, or taking of any action in reliance upon this information is strictly prohibited. If you have received this communication in error, please contact the sender and delete the material from your computer. authorizations.xml Description: authorizations.xml authorizers.xml Description: authorizers.xml users.xml Description: users.xml
Re: [DISCUSS] Increasing durability in MiNiFi C++
Leveraging containerization sounds like a solid testing approach. It could be automated which fits in nicely for enterprise environments that might want to test configuration changes in an emulated environment before pushing out to every device. And, yes certainly more tickets would be welcome rather than piling this all onto MINIFI-356. Let me know if you want me to help put those together or collaborate on the design. Thanks all, Kevin On 8/1/17, 11:09, "Marc" wrote: Hey Kevin, These are really good points. I like the concepts laid out in number six. That helps solidify my belief that there is a greater scope of durability and reliability that is better captured in a series of tickets beyond the original ticket's intent. Certainly a good idea to take a page from the mobile platforms' play book. I think a notification model applies ties into Andy's previous response RE sandboxing. Not immediately sure the best way to tackle that but I'll put some thoughts into a ticket. Regarding testability: My thought was that we should leverage some of the work being done for containerization to help guide our testing. We can certainly make arbitrary test environments to set a file system into read/write mode, consume all memory in a queue, etc. Whether that is good enough remains to be seen. With our current unit test and integration tests, this is much more difficult to replicate as opposed to a container where we have the freedom to 'break stuff'. I haven't fully scoped out what is needed for testability, so ideas are certainly welcome. Unfortunately my ideas/plans are in their infancy. On Tue, Aug 1, 2017 at 10:56 AM, Kevin Doran wrote: > Hi Marc, > > Thanks for the write up in email and on the linked JIRAs. I took a loot > just now and have some initial thoughts (a lot of this probably goes > without saying): > > 1. I agree that partial failures (eg, slower reads/writes, decreased > network bandwith, etc) are hard to classify and should stay out of scope > for now until we tackle complete failures (eg, no disk, no network). > > 2. Logging and readme documentation will be important to assist > troubleshooting / debugging. If an agent is configured to use a persistent > repository, and it has degraded to a volatile repository, that could be > really confusing to a novice user/admin who is trying to figure out how the > agent is working. Therefore we need to make sure changes to agent behavior > that occur as part of continuing operations are logged at some level. > > 3. Have you given any thoughts to testability? Forcing environments that > would trigger failover capabilities will be difficult, both for developers > implementing those capabilities and admins / operations folks that want to > test their configurations before deploying them. > > 4. I think in a lot of cases, graceful degradation / continued operation > of the MiNiFi agent will be desirable. However, if we go with that, the > corresponding controls over the "bounds of the client" as you put it are > key (e.g., a configuration option for repositories that specifies a > failover repository and the parameters for when to failover). > > 5. In terms of utilization caps, I think we should definitely have them, > and make them configurable where possible. I guess this is another way to > express the bounds of the clients, eg "do whatever you need to keep > running, but never use more than XXMB of memory". Disk/memory footprints of > persistent/volatile repositories are probably easy ones to start with. > There should be default/built-in prioritizers for deciding which flow files > to drop when the cap is reached, and over time we can make that extensible. > I think this is in line with Joe's comment on the JIRA [1] that data from > different sensors will likely have different importance and we need a way > to deal with that. At the end of the day, if a flow is failing, but inputs > are still coming in, and the agent has a utilization cap... something has > to be dropped. > > 6. There might be some concepts from the mobile platform space that we > could carry over to the design of the agent. For example, on iOS, the OS is > able to send lots of signals to apps regarding what is happening at the > platform level, and the app can be implemented to act appropriately in > different scenarios. For example, a memory warning for which apps are > supposed to dispose of any volatile resources that are nonessential or can > be recreated, or a signal that the app is about to enter a background > state. Maybe there are some good designs that can be carried over so custom > processors have push/pull hooks into the state of the platform that is > prov
Re: [DISCUSS] Increasing durability in MiNiFi C++
Hey Kevin, These are really good points. I like the concepts laid out in number six. That helps solidify my belief that there is a greater scope of durability and reliability that is better captured in a series of tickets beyond the original ticket's intent. Certainly a good idea to take a page from the mobile platforms' play book. I think a notification model applies ties into Andy's previous response RE sandboxing. Not immediately sure the best way to tackle that but I'll put some thoughts into a ticket. Regarding testability: My thought was that we should leverage some of the work being done for containerization to help guide our testing. We can certainly make arbitrary test environments to set a file system into read/write mode, consume all memory in a queue, etc. Whether that is good enough remains to be seen. With our current unit test and integration tests, this is much more difficult to replicate as opposed to a container where we have the freedom to 'break stuff'. I haven't fully scoped out what is needed for testability, so ideas are certainly welcome. Unfortunately my ideas/plans are in their infancy. On Tue, Aug 1, 2017 at 10:56 AM, Kevin Doran wrote: > Hi Marc, > > Thanks for the write up in email and on the linked JIRAs. I took a loot > just now and have some initial thoughts (a lot of this probably goes > without saying): > > 1. I agree that partial failures (eg, slower reads/writes, decreased > network bandwith, etc) are hard to classify and should stay out of scope > for now until we tackle complete failures (eg, no disk, no network). > > 2. Logging and readme documentation will be important to assist > troubleshooting / debugging. If an agent is configured to use a persistent > repository, and it has degraded to a volatile repository, that could be > really confusing to a novice user/admin who is trying to figure out how the > agent is working. Therefore we need to make sure changes to agent behavior > that occur as part of continuing operations are logged at some level. > > 3. Have you given any thoughts to testability? Forcing environments that > would trigger failover capabilities will be difficult, both for developers > implementing those capabilities and admins / operations folks that want to > test their configurations before deploying them. > > 4. I think in a lot of cases, graceful degradation / continued operation > of the MiNiFi agent will be desirable. However, if we go with that, the > corresponding controls over the "bounds of the client" as you put it are > key (e.g., a configuration option for repositories that specifies a > failover repository and the parameters for when to failover). > > 5. In terms of utilization caps, I think we should definitely have them, > and make them configurable where possible. I guess this is another way to > express the bounds of the clients, eg "do whatever you need to keep > running, but never use more than XXMB of memory". Disk/memory footprints of > persistent/volatile repositories are probably easy ones to start with. > There should be default/built-in prioritizers for deciding which flow files > to drop when the cap is reached, and over time we can make that extensible. > I think this is in line with Joe's comment on the JIRA [1] that data from > different sensors will likely have different importance and we need a way > to deal with that. At the end of the day, if a flow is failing, but inputs > are still coming in, and the agent has a utilization cap... something has > to be dropped. > > 6. There might be some concepts from the mobile platform space that we > could carry over to the design of the agent. For example, on iOS, the OS is > able to send lots of signals to apps regarding what is happening at the > platform level, and the app can be implemented to act appropriately in > different scenarios. For example, a memory warning for which apps are > supposed to dispose of any volatile resources that are nonessential or can > be recreated, or a signal that the app is about to enter a background > state. Maybe there are some good designs that can be carried over so custom > processors have push/pull hooks into the state of the platform that is > provided by the framework. Eg, maybe a processor wants to have conditional > logic based on the state of memory or network i/o and the minifi framework > has APIs that make that discoverable (pull), and perhaps all custom > processors can implement an interface that allows them to receive > notifications from the framework when it detects some of these > partial / complete failure conditions or is approaching configured > utilization caps (push). > > I've watched both JIRAs and will follow this thread as well. I'll chime in > with more after I have time to think about this more and as more people > respond. I agree input from people with experiences from the field would be > really useful here. > > Kevin > > [1] https://issues.apache.org/jira/browse/MINIFI-356? > focusedCommentId=1
Re: [DISCUSS] Increasing durability in MiNiFi C++
I feel that MINIFI-356 is pretty key in all things, when I think of jagged edge use cases that are missing connectivity for days but have large mass storage devices this feels really limiting. When I consider the variety of devices I have tested with thus far most of them only have a single storage media mount. RasPi's right now seem to be the de-typical entry IoT and most are only using the single media mount this to me represents an area where even operating in degraded mode won't help us as the OS will fail on its own eventually without its disk. With that said is it more valuable to use the storage media we have initially then it is to find a way to run without it? No doubt there are other scenarios where this is very useful and I see more of them initially in the 'non-jagged' space. For example a factory line PC within the Enterprise network is always connected, it may never experience backpressure soley because it can send as fast as it collects the data. If we assume that the OS disks and Repo disks are not the same, and the repo did fail there would be value in continuing to operate collecting and sending data, but for all intents we dont care about backpressure here becuase we can still send it as fast as its collected. ~~Kevins' Response's 2. Logging and readme documentation will be important to assist troubleshooting / debugging. If an agent is configured to use a persistent repository, and it has degraded to a volatile repository, that could be really confusing to a novice user/admin who is trying to figure out how the agent is working. Therefore we need to make sure changes to agent behavior that occur as part of continuing operations are logged at some level. I would also expect initially its default off, and has to be manually enabled. 3. Testing Just intally thinking I can re-use a RasPi but attach an ESATA, a hard failure of removing the drive itself, or unmounting it at the OS level may do this. While leaving the OS drive (SD card) still plugged in. On Tue, Aug 1, 2017 at 9:59 AM, Marc wrote: > Good Morning, > > I've begun capturing some details in a ticket for durability and > reliability of MiNiFi C++ clients [1]. The scope of this ticket is > continuing operations despite failure within specific components. There is > a linked ticket [2] attempts to address some of the concerns brought up in > MINIFI-356, focusing no memory usage. > > The spirit of the ticket was meant to capture conditions of known > failure; however, given that more discussion has blossomed, I'd like to > assess the experience of the mailing list. Continuing operations in any > environment is difficult, particularly one in which we likely have little > to no control. Simply gathering information to know when a failure is > occurring is a major part of the battle. According to the tickets, there > needs to be some discussion of how we classify failure. > > The ticket addressed the low hanging fruit, but there are certainly more > conditions of failure. If a disk switches to read/write mode, disks becomes > full and/or out of inode entries etc, we know a complete failure occurred > and thus can switch our type of write activity to use a volatile repo. I > recognize that partial failures may occur, but how do we classify these? > Should we classify these at all or would this be venturing into a rabbit > hole? > >For memory we can likely throttle queue sizes as needed. For networking > and other components we could likely find other measures of failure. The > goal, no matter the component, is to continue operations without human > intervention -- with the hope that the configuration makes the bounds of > the client obvious. > >My gut reaction is to separate partial failure as the low hanging fruit > of complete failure is much easier to address, but would love to hear the > reaction of this list. Further, any input on the types of failures to > address would be appreciated. Look forward to any and all responses. > > Best Regards, > Marc > > [1] https://issues.apache.org/jira/browse/MINIFI-356 > [2] https://issues.apache.org/jira/browse/MINIFI-360 > -- Joseph
Re: [DISCUSS] Increasing durability in MiNiFi C++
Hi Marc, Thanks for the write up in email and on the linked JIRAs. I took a loot just now and have some initial thoughts (a lot of this probably goes without saying): 1. I agree that partial failures (eg, slower reads/writes, decreased network bandwith, etc) are hard to classify and should stay out of scope for now until we tackle complete failures (eg, no disk, no network). 2. Logging and readme documentation will be important to assist troubleshooting / debugging. If an agent is configured to use a persistent repository, and it has degraded to a volatile repository, that could be really confusing to a novice user/admin who is trying to figure out how the agent is working. Therefore we need to make sure changes to agent behavior that occur as part of continuing operations are logged at some level. 3. Have you given any thoughts to testability? Forcing environments that would trigger failover capabilities will be difficult, both for developers implementing those capabilities and admins / operations folks that want to test their configurations before deploying them. 4. I think in a lot of cases, graceful degradation / continued operation of the MiNiFi agent will be desirable. However, if we go with that, the corresponding controls over the "bounds of the client" as you put it are key (e.g., a configuration option for repositories that specifies a failover repository and the parameters for when to failover). 5. In terms of utilization caps, I think we should definitely have them, and make them configurable where possible. I guess this is another way to express the bounds of the clients, eg "do whatever you need to keep running, but never use more than XXMB of memory". Disk/memory footprints of persistent/volatile repositories are probably easy ones to start with. There should be default/built-in prioritizers for deciding which flow files to drop when the cap is reached, and over time we can make that extensible. I think this is in line with Joe's comment on the JIRA [1] that data from different sensors will likely have different importance and we need a way to deal with that. At the end of the day, if a flow is failing, but inputs are still coming in, and the agent has a utilization cap... something has to be dropped. 6. There might be some concepts from the mobile platform space that we could carry over to the design of the agent. For example, on iOS, the OS is able to send lots of signals to apps regarding what is happening at the platform level, and the app can be implemented to act appropriately in different scenarios. For example, a memory warning for which apps are supposed to dispose of any volatile resources that are nonessential or can be recreated, or a signal that the app is about to enter a background state. Maybe there are some good designs that can be carried over so custom processors have push/pull hooks into the state of the platform that is provided by the framework. Eg, maybe a processor wants to have conditional logic based on the state of memory or network i/o and the minifi framework has APIs that make that discoverable (pull), and perhaps all custom processors can implement an interface that allows them to receive notifications from the framework when it detects some of these partial / complete failure conditions or is approaching configured utilization caps (push). I've watched both JIRAs and will follow this thread as well. I'll chime in with more after I have time to think about this more and as more people respond. I agree input from people with experiences from the field would be really useful here. Kevin [1] https://issues.apache.org/jira/browse/MINIFI-356?focusedCommentId=16108832#comment-16108832 On 8/1/17, 09:59, "Marc" wrote: Good Morning, I've begun capturing some details in a ticket for durability and reliability of MiNiFi C++ clients [1]. The scope of this ticket is continuing operations despite failure within specific components. There is a linked ticket [2] attempts to address some of the concerns brought up in MINIFI-356, focusing no memory usage. The spirit of the ticket was meant to capture conditions of known failure; however, given that more discussion has blossomed, I'd like to assess the experience of the mailing list. Continuing operations in any environment is difficult, particularly one in which we likely have little to no control. Simply gathering information to know when a failure is occurring is a major part of the battle. According to the tickets, there needs to be some discussion of how we classify failure. The ticket addressed the low hanging fruit, but there are certainly more conditions of failure. If a disk switches to read/write mode, disks becomes full and/or out of inode entries etc, we know a complete failure occurred and thus can switch our type of write activity to use a volatile repo. I recognize
Re: [DISCUSS] Increasing durability in MiNiFi C++
In addition to the tickets mentioned, we probably want to do is isolate custom processors as much as possible. I.e. if a custom processor segfaults, we probably don’t want that to bring down the entire minifi process. Achieving that type of isolation might come with some tradeoffs, though. For instance, we may need to implement process-level isolation, similar to how the chromium browser isolates tab processes, but doing so would come with additional memory and IPC overhead. Maybe there are some modern sandboxing techniques we can look at. Something to consider. On 8/1/17, 9:59 AM, "Marc" wrote: Good Morning, I've begun capturing some details in a ticket for durability and reliability of MiNiFi C++ clients [1]. The scope of this ticket is continuing operations despite failure within specific components. There is a linked ticket [2] attempts to address some of the concerns brought up in MINIFI-356, focusing no memory usage. The spirit of the ticket was meant to capture conditions of known failure; however, given that more discussion has blossomed, I'd like to assess the experience of the mailing list. Continuing operations in any environment is difficult, particularly one in which we likely have little to no control. Simply gathering information to know when a failure is occurring is a major part of the battle. According to the tickets, there needs to be some discussion of how we classify failure. The ticket addressed the low hanging fruit, but there are certainly more conditions of failure. If a disk switches to read/write mode, disks becomes full and/or out of inode entries etc, we know a complete failure occurred and thus can switch our type of write activity to use a volatile repo. I recognize that partial failures may occur, but how do we classify these? Should we classify these at all or would this be venturing into a rabbit hole? For memory we can likely throttle queue sizes as needed. For networking and other components we could likely find other measures of failure. The goal, no matter the component, is to continue operations without human intervention -- with the hope that the configuration makes the bounds of the client obvious. My gut reaction is to separate partial failure as the low hanging fruit of complete failure is much easier to address, but would love to hear the reaction of this list. Further, any input on the types of failures to address would be appreciated. Look forward to any and all responses. Best Regards, Marc [1] https://issues.apache.org/jira/browse/MINIFI-356 [2] https://issues.apache.org/jira/browse/MINIFI-360
[DISCUSS] Increasing durability in MiNiFi C++
Good Morning, I've begun capturing some details in a ticket for durability and reliability of MiNiFi C++ clients [1]. The scope of this ticket is continuing operations despite failure within specific components. There is a linked ticket [2] attempts to address some of the concerns brought up in MINIFI-356, focusing no memory usage. The spirit of the ticket was meant to capture conditions of known failure; however, given that more discussion has blossomed, I'd like to assess the experience of the mailing list. Continuing operations in any environment is difficult, particularly one in which we likely have little to no control. Simply gathering information to know when a failure is occurring is a major part of the battle. According to the tickets, there needs to be some discussion of how we classify failure. The ticket addressed the low hanging fruit, but there are certainly more conditions of failure. If a disk switches to read/write mode, disks becomes full and/or out of inode entries etc, we know a complete failure occurred and thus can switch our type of write activity to use a volatile repo. I recognize that partial failures may occur, but how do we classify these? Should we classify these at all or would this be venturing into a rabbit hole? For memory we can likely throttle queue sizes as needed. For networking and other components we could likely find other measures of failure. The goal, no matter the component, is to continue operations without human intervention -- with the hope that the configuration makes the bounds of the client obvious. My gut reaction is to separate partial failure as the low hanging fruit of complete failure is much easier to address, but would love to hear the reaction of this list. Further, any input on the types of failures to address would be appreciated. Look forward to any and all responses. Best Regards, Marc [1] https://issues.apache.org/jira/browse/MINIFI-356 [2] https://issues.apache.org/jira/browse/MINIFI-360