Re: Re: a question about nifi's source code

2017-08-01 Thread YuNing
Hi,Andy 
It seems I found the problems, I compare the sourcecode which i am viewing 
to the sourcecode on github, it appears to be different, and i am confused, 
because i didn't change it myslf, i will keep this update as long as i found 
the reason. 
thanks you very much.



Best Regards
YuNing
 
From: Andy LoPresto
Date: 2017-08-02 14:01
To: dev
Subject: Re: a question about nifi's source code
Hi YuNing,
 
The Abstract policy authorizer delegates the updateGroup action to 
doUpdateGroup, which is implemented by the extending concrete class. I'm not 
sure where you are seeing that it checks for the absence of the group, but 
there is an addGroup method for adding a new group. 
 
Andy LoPresto
alopre...@apache.org
alopresto.apa...@gmail.com
PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69
 
> On Aug 1, 2017, at 19:48, YuNing  wrote:
> 
> Hello,everyone
>I am work on reading nifi's source code . it confuse me when i read 
> the class "AbstractPolicyBasedAuthorizer", I found there is no method to add 
> a user to an existing group. I noticed there is an "updateGroup" method 
> ,which i don't think can be used to add users to existing group, because it 
> checks the input group to make sure it didn't exist.
>Am i right?  can anyone help me, thanks alot !
> 
> Best Regards
> YuNing
> 


Re: Re: a question about nifi's source code

2017-08-01 Thread YuNing
this is the updateGroup Method , and doUpdateGroup method;

/**
 * The group represented by the provided instance will be updated based on 
the provided instance.
 *
 * @param group an updated group instance
 * @return the updated group instance, or null if no matching group was 
found
 * @throws AuthorizationAccessException if there was an unexpected error 
performing the operation
 * @throws IllegalStateException if there is already a group with the same 
name
 */
public final synchronized Group updateGroup(Group group) throws 
AuthorizationAccessException {
if (tenantExists(group.getIdentifier(), group.getName())) {
throw new IllegalStateException(String.format("User/user group 
already exists with the identity '%s'.", group.getName()));
}
return doUpdateGroup(group);
}
/**
 * The group represented by the provided instance will be updated based on 
the provided instance.
 *
 * @param group an updated group instance
 * @return the updated group instance, or null if no matching group was 
found
 * @throws AuthorizationAccessException if there was an unexpected error 
performing the operation
 */
public abstract Group doUpdateGroup(Group group) throws 
AuthorizationAccessException;



bel...@163.com
 
From: Andy LoPresto
Date: 2017-08-02 14:01
To: dev
Subject: Re: a question about nifi's source code
Hi YuNing,
 
The Abstract policy authorizer delegates the updateGroup action to 
doUpdateGroup, which is implemented by the extending concrete class. I'm not 
sure where you are seeing that it checks for the absence of the group, but 
there is an addGroup method for adding a new group. 
 
Andy LoPresto
alopre...@apache.org
alopresto.apa...@gmail.com
PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69
 
> On Aug 1, 2017, at 19:48, YuNing  wrote:
> 
> Hello,everyone
>I am work on reading nifi's source code . it confuse me when i read 
> the class "AbstractPolicyBasedAuthorizer", I found there is no method to add 
> a user to an existing group. I noticed there is an "updateGroup" method 
> ,which i don't think can be used to add users to existing group, because it 
> checks the input group to make sure it didn't exist.
>Am i right?  can anyone help me, thanks alot !
> 
> Best Regards
> YuNing
> 


Re: Re: a question about nifi's source code

2017-08-01 Thread YuNing
png pciture can't be seen, i copy it there. 
  /**
 * Adds a new group.
 *
 * @param group the Group to add
 * @return the added Group
 * @throws AuthorizationAccessException if there was an unexpected error 
performing the operation
 * @throws IllegalStateException if a group with the same name already 
exists
 */
public final synchronized Group addGroup(Group group) throws 
AuthorizationAccessException {
if (tenantExists(group.getIdentifier(), group.getName())) {
throw new IllegalStateException(String.format("User/user group 
already exists with the identity '%s'.", group.getName()));
}
return doAddGroup(group);
}

 
From: Andy LoPresto
Date: 2017-08-02 14:01
To: dev
Subject: Re: a question about nifi's source code
Hi YuNing,
 
The Abstract policy authorizer delegates the updateGroup action to 
doUpdateGroup, which is implemented by the extending concrete class. I'm not 
sure where you are seeing that it checks for the absence of the group, but 
there is an addGroup method for adding a new group. 
 
Andy LoPresto
alopre...@apache.org
alopresto.apa...@gmail.com
PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69
 
> On Aug 1, 2017, at 19:48, YuNing  wrote:
> 
> Hello,everyone
>I am work on reading nifi's source code . it confuse me when i read 
> the class "AbstractPolicyBasedAuthorizer", I found there is no method to add 
> a user to an existing group. I noticed there is an "updateGroup" method 
> ,which i don't think can be used to add users to existing group, because it 
> checks the input group to make sure it didn't exist.
>Am i right?  can anyone help me, thanks alot !
> 
> Best Regards
> YuNing
> 


Re: Re: a question about nifi's source code

2017-08-01 Thread YuNing
   Hi Andy,
The concrete class I am using is FileAuthorizer, and updateGroup of 
abstract policy authorizer  method is listed below. i didn't know why it needs 
to check th group's existence, and i didn't know how to add a user to a group.  
thanks for your reply.

Best Regards 
YuNing
 
From: Andy LoPresto
Date: 2017-08-02 14:01
To: dev
Subject: Re: a question about nifi's source code
Hi YuNing,
 
The Abstract policy authorizer delegates the updateGroup action to 
doUpdateGroup, which is implemented by the extending concrete class. I'm not 
sure where you are seeing that it checks for the absence of the group, but 
there is an addGroup method for adding a new group. 
 
Andy LoPresto
alopre...@apache.org
alopresto.apa...@gmail.com
PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69
 
> On Aug 1, 2017, at 19:48, YuNing  wrote:
> 
> Hello,everyone
>I am work on reading nifi's source code . it confuse me when i read 
> the class "AbstractPolicyBasedAuthorizer", I found there is no method to add 
> a user to an existing group. I noticed there is an "updateGroup" method 
> ,which i don't think can be used to add users to existing group, because it 
> checks the input group to make sure it didn't exist.
>Am i right?  can anyone help me, thanks alot !
> 
> Best Regards
> YuNing
> 


Re: a question about nifi's source code

2017-08-01 Thread Andy LoPresto
Hi YuNing,

The Abstract policy authorizer delegates the updateGroup action to 
doUpdateGroup, which is implemented by the extending concrete class. I'm not 
sure where you are seeing that it checks for the absence of the group, but 
there is an addGroup method for adding a new group. 

Andy LoPresto
alopre...@apache.org
alopresto.apa...@gmail.com
PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69

> On Aug 1, 2017, at 19:48, YuNing  wrote:
> 
> Hello,everyone
>I am work on reading nifi's source code . it confuse me when i read 
> the class "AbstractPolicyBasedAuthorizer", I found there is no method to add 
> a user to an existing group. I noticed there is an "updateGroup" method 
> ,which i don't think can be used to add users to existing group, because it 
> checks the input group to make sure it didn't exist.
>Am i right?  can anyone help me, thanks alot !
> 
> Best Regards
> YuNing
> 


a question about nifi's source code

2017-08-01 Thread YuNing
Hello,everyone
I am work on reading nifi's source code . it confuse me when i read the 
class "AbstractPolicyBasedAuthorizer", I found there is no method to add a user 
to an existing group. I noticed there is an "updateGroup" method ,which i don't 
think can be used to add users to existing group, because it checks the input 
group to make sure it didn't exist.
Am i right?  can anyone help me, thanks alot !

Best Regards
YuNing



nifi is unmodifiable

2017-08-01 Thread Knapp, Michael
Hi,

For some reason when I start NiFi, I am unable to edit anything.  I have 
attached my authorizers, users, and authorizations file.  From the console I 
can login with ldap and the current-user API call is returning this:

{"identity":"my-distinguished-name","anonymous":false,"provenancePermissions":{"canRead":true,"canWrite":true},"countersPermissions":{"canRead":true,"canWrite":true},"tenantsPermissions":{"canRead":true,"canWrite":true},"controllerPermissions":{"canRead":true,"canWrite":true},"policiesPermissions":{"canRead":true,"canWrite":true},"systemPermissions":{"canRead":true,"canWrite":true},"restrictedComponentsPermissions":{"canRead":true,"canWrite":true}}

In all the files and in the response json I replaced my DN with 
“my-distinguished-name” but I did confirm the values match up exactly.

For some reason I still cannot edit the flow from the console.  I don’t see 
anything suspicious in the logs.

Please help,

Michael Knapp


The information contained in this e-mail is confidential and/or proprietary to 
Capital One and/or its affiliates and may only be used solely in performance of 
work or services for Capital One. The information transmitted herewith is 
intended only for use by the individual or entity to which it is addressed. If 
the reader of this message is not the intended recipient, you are hereby 
notified that any review, retransmission, dissemination, distribution, copying 
or other use of, or taking of any action in reliance upon this information is 
strictly prohibited. If you have received this communication in error, please 
contact the sender and delete the material from your computer.


authorizations.xml
Description: authorizations.xml


authorizers.xml
Description: authorizers.xml


users.xml
Description: users.xml


Re: [DISCUSS] Increasing durability in MiNiFi C++

2017-08-01 Thread Kevin Doran
Leveraging containerization sounds like a solid testing approach. It could be 
automated which fits in nicely for enterprise environments that might want to 
test configuration changes in an emulated environment before pushing out to 
every device.

And, yes certainly more tickets would be welcome rather than piling this all 
onto MINIFI-356. Let me know if you want me to help put those together or 
collaborate on the design.

Thanks all,
Kevin

On 8/1/17, 11:09, "Marc"  wrote:

Hey Kevin,
   These are really good points. I like the concepts laid out in number
six. That helps solidify my belief that there is a greater scope of
durability and reliability that is better captured in a series of tickets
beyond the original ticket's intent. Certainly a good idea to take a page
from the mobile platforms' play book. I think a notification model applies
ties into Andy's previous response RE sandboxing. Not immediately sure the
best way to tackle that but I'll put some thoughts into a ticket.

   Regarding testability: My thought was that we should leverage some of
the work being done for containerization to help guide our testing. We can
certainly make arbitrary test environments to set a file system into
read/write mode, consume all memory in a queue, etc. Whether that is good
enough remains to be seen. With our current unit test and integration
tests, this is much more difficult to replicate as opposed to a container
where we have the freedom to 'break stuff'. I haven't fully scoped out what
is needed for testability, so ideas are certainly welcome. Unfortunately my
ideas/plans are in their infancy.



On Tue, Aug 1, 2017 at 10:56 AM, Kevin Doran 
wrote:

> Hi Marc,
>
> Thanks for the write up in email and on the linked JIRAs. I took a loot
> just now and have some initial thoughts (a lot of this probably goes
> without saying):
>
> 1. I agree that partial failures (eg, slower reads/writes, decreased
> network bandwith, etc) are hard to classify and should stay out of scope
> for now until we tackle complete failures (eg, no disk, no network).
>
> 2. Logging and readme documentation will be important to assist
> troubleshooting / debugging. If an agent is configured to use a persistent
> repository, and it has degraded to a volatile repository, that could be
> really confusing to a novice user/admin who is trying to figure out how 
the
> agent is working. Therefore we need to make sure changes to agent behavior
> that occur as part of continuing operations are logged at some level.
>
> 3. Have you given any thoughts to testability? Forcing environments that
> would trigger failover capabilities will be difficult, both for developers
> implementing those capabilities and admins / operations folks that want to
> test their configurations before deploying them.
>
> 4. I think in a lot of cases, graceful degradation / continued operation
> of the MiNiFi agent will be desirable. However, if we go with that, the
> corresponding controls over the "bounds of the client" as you put it are
> key (e.g., a configuration option for repositories that specifies a
> failover repository and the parameters for when to failover).
>
> 5. In terms of utilization caps, I think we should definitely have them,
> and make them configurable where possible. I guess this is another way to
> express the bounds of the clients, eg "do whatever you need to keep
> running, but never use more than XXMB of memory". Disk/memory footprints 
of
> persistent/volatile repositories are probably easy ones to start with.
> There should be default/built-in prioritizers for deciding which flow 
files
> to drop when the cap is reached, and over time we can make that 
extensible.
> I think this is in line with  Joe's comment on the JIRA [1] that data from
> different sensors will likely have different importance and we need a way
> to deal with that. At the end of the day, if a flow is failing, but inputs
> are still coming in, and the agent has a utilization cap... something has
> to be dropped.
>
> 6. There might be some concepts from the mobile platform space that we
> could carry over to the design of the agent. For example, on iOS, the OS 
is
> able to send lots of signals to apps regarding what is happening at the
> platform level, and the app can be implemented to act appropriately in
> different scenarios. For example, a memory warning for which apps are
> supposed to dispose of any volatile resources that are nonessential or can
> be recreated, or a signal that the app is about to enter a background
> state. Maybe there are some good designs that can be carried over so 
custom
> processors have push/pull hooks into the state of the platform that is
> prov

Re: [DISCUSS] Increasing durability in MiNiFi C++

2017-08-01 Thread Marc
Hey Kevin,
   These are really good points. I like the concepts laid out in number
six. That helps solidify my belief that there is a greater scope of
durability and reliability that is better captured in a series of tickets
beyond the original ticket's intent. Certainly a good idea to take a page
from the mobile platforms' play book. I think a notification model applies
ties into Andy's previous response RE sandboxing. Not immediately sure the
best way to tackle that but I'll put some thoughts into a ticket.

   Regarding testability: My thought was that we should leverage some of
the work being done for containerization to help guide our testing. We can
certainly make arbitrary test environments to set a file system into
read/write mode, consume all memory in a queue, etc. Whether that is good
enough remains to be seen. With our current unit test and integration
tests, this is much more difficult to replicate as opposed to a container
where we have the freedom to 'break stuff'. I haven't fully scoped out what
is needed for testability, so ideas are certainly welcome. Unfortunately my
ideas/plans are in their infancy.



On Tue, Aug 1, 2017 at 10:56 AM, Kevin Doran 
wrote:

> Hi Marc,
>
> Thanks for the write up in email and on the linked JIRAs. I took a loot
> just now and have some initial thoughts (a lot of this probably goes
> without saying):
>
> 1. I agree that partial failures (eg, slower reads/writes, decreased
> network bandwith, etc) are hard to classify and should stay out of scope
> for now until we tackle complete failures (eg, no disk, no network).
>
> 2. Logging and readme documentation will be important to assist
> troubleshooting / debugging. If an agent is configured to use a persistent
> repository, and it has degraded to a volatile repository, that could be
> really confusing to a novice user/admin who is trying to figure out how the
> agent is working. Therefore we need to make sure changes to agent behavior
> that occur as part of continuing operations are logged at some level.
>
> 3. Have you given any thoughts to testability? Forcing environments that
> would trigger failover capabilities will be difficult, both for developers
> implementing those capabilities and admins / operations folks that want to
> test their configurations before deploying them.
>
> 4. I think in a lot of cases, graceful degradation / continued operation
> of the MiNiFi agent will be desirable. However, if we go with that, the
> corresponding controls over the "bounds of the client" as you put it are
> key (e.g., a configuration option for repositories that specifies a
> failover repository and the parameters for when to failover).
>
> 5. In terms of utilization caps, I think we should definitely have them,
> and make them configurable where possible. I guess this is another way to
> express the bounds of the clients, eg "do whatever you need to keep
> running, but never use more than XXMB of memory". Disk/memory footprints of
> persistent/volatile repositories are probably easy ones to start with.
> There should be default/built-in prioritizers for deciding which flow files
> to drop when the cap is reached, and over time we can make that extensible.
> I think this is in line with  Joe's comment on the JIRA [1] that data from
> different sensors will likely have different importance and we need a way
> to deal with that. At the end of the day, if a flow is failing, but inputs
> are still coming in, and the agent has a utilization cap... something has
> to be dropped.
>
> 6. There might be some concepts from the mobile platform space that we
> could carry over to the design of the agent. For example, on iOS, the OS is
> able to send lots of signals to apps regarding what is happening at the
> platform level, and the app can be implemented to act appropriately in
> different scenarios. For example, a memory warning for which apps are
> supposed to dispose of any volatile resources that are nonessential or can
> be recreated, or a signal that the app is about to enter a background
> state. Maybe there are some good designs that can be carried over so custom
> processors have push/pull hooks into the state of the platform that is
> provided by the framework. Eg, maybe a processor wants to have conditional
> logic based on the state of memory or network i/o and the minifi framework
> has APIs that make that discoverable (pull), and perhaps all custom
> processors can implement an interface that allows them to receive
> notifications from the framework when it detects some of these
>   partial / complete failure conditions or is approaching configured
> utilization caps (push).
>
> I've watched both JIRAs and will follow this thread as well. I'll chime in
> with more after I have time to think about this more and as more people
> respond. I agree input from people with experiences from the field would be
> really useful here.
>
> Kevin
>
> [1] https://issues.apache.org/jira/browse/MINIFI-356?
> focusedCommentId=1

Re: [DISCUSS] Increasing durability in MiNiFi C++

2017-08-01 Thread Joseph Niemiec
I feel that MINIFI-356 is pretty key in all things, when I think of jagged
edge use cases that are missing connectivity for days but have large mass
storage devices this feels really limiting.  When I consider the variety of
devices I have tested with thus far most of them only have a single storage
media mount. RasPi's right now seem to be the de-typical entry IoT and most
are only using the single media mount this to me represents an area where
even operating in degraded mode won't help us as the OS will fail on its
own eventually without its disk.

With that said is it more valuable to use the storage media we have
initially then it is to find a way to run without it?


No doubt there are other scenarios where this is very useful and I see more
of them initially in the 'non-jagged' space. For example a factory line PC
within the Enterprise network is always connected, it may never experience
backpressure soley because it can send as fast as it collects the data. If
we assume that the OS disks and Repo disks are not the same, and the repo
did fail there would be value in continuing to operate collecting and
sending data, but for all intents we dont care about backpressure here
becuase we can still send it as fast as its collected.

~~Kevins' Response's

2. Logging and readme documentation will be important to assist
troubleshooting / debugging. If an agent is configured to use a persistent
repository, and it has degraded to a volatile repository, that could be
really confusing to a novice user/admin who is trying to figure out how the
agent is working. Therefore we need to make sure changes to agent behavior
that occur as part of continuing operations are logged at some level.

I would also expect initially its default off, and has to be manually
enabled.


3. Testing

Just intally thinking I can re-use a RasPi but attach an ESATA, a hard
failure of removing the drive itself, or unmounting it at the OS level may
do this. While leaving the OS drive (SD card) still plugged in.



On Tue, Aug 1, 2017 at 9:59 AM, Marc  wrote:

> Good Morning,
>
>   I've begun capturing some details in a ticket for durability and
> reliability of MiNiFi C++ clients [1]. The scope of this ticket is
> continuing operations despite failure within specific components. There is
> a linked ticket [2] attempts to address some of the concerns brought up in
> MINIFI-356, focusing no memory usage.
>
>   The spirit of the ticket was meant to capture conditions of known
> failure; however, given that more discussion has blossomed, I'd like to
> assess the experience of the mailing list. Continuing operations in any
> environment is difficult, particularly one in which we likely have little
> to no control. Simply gathering information to know when a failure is
> occurring is a major part of the battle. According to the tickets, there
> needs to be some discussion of how we classify failure.
>
>   The ticket addressed the low hanging fruit, but there are certainly more
> conditions of failure. If a disk switches to read/write mode, disks becomes
> full and/or out of inode entries etc, we know a complete failure occurred
> and thus can switch our type of write activity to use a volatile repo. I
> recognize that partial failures may occur, but how do we classify these?
> Should we classify these at all or would this be venturing into a rabbit
> hole?
>
>For memory we can likely throttle queue sizes as needed. For networking
> and other components we could likely find other measures of failure. The
> goal, no matter the component, is to continue operations without human
> intervention -- with the hope that the configuration makes the bounds of
> the client obvious.
>
>My gut reaction is to separate partial failure as the low hanging fruit
> of complete failure is much easier to address, but would love to hear the
> reaction of this list. Further, any input on the types of failures to
> address would be appreciated. Look forward to any and all responses.
>
>   Best Regards,
>   Marc
>
> [1] https://issues.apache.org/jira/browse/MINIFI-356
> [2] https://issues.apache.org/jira/browse/MINIFI-360
>



-- 
Joseph


Re: [DISCUSS] Increasing durability in MiNiFi C++

2017-08-01 Thread Kevin Doran
Hi Marc,

Thanks for the write up in email and on the linked JIRAs. I took a loot just 
now and have some initial thoughts (a lot of this probably goes without saying):

1. I agree that partial failures (eg, slower reads/writes, decreased network 
bandwith, etc) are hard to classify and should stay out of scope for now until 
we tackle complete failures (eg, no disk, no network). 

2. Logging and readme documentation will be important to assist troubleshooting 
/ debugging. If an agent is configured to use a persistent repository, and it 
has degraded to a volatile repository, that could be really confusing to a 
novice user/admin who is trying to figure out how the agent is working. 
Therefore we need to make sure changes to agent behavior that occur as part of 
continuing operations are logged at some level.

3. Have you given any thoughts to testability? Forcing environments that would 
trigger failover capabilities will be difficult, both for developers 
implementing those capabilities and admins / operations folks that want to test 
their configurations before deploying them.

4. I think in a lot of cases, graceful degradation / continued operation of the 
MiNiFi agent will be desirable. However, if we go with that, the corresponding 
controls over the "bounds of the client" as you put it are key (e.g., a 
configuration option for repositories that specifies a failover repository and 
the parameters for when to failover).

5. In terms of utilization caps, I think we should definitely have them, and 
make them configurable where possible. I guess this is another way to express 
the bounds of the clients, eg "do whatever you need to keep running, but never 
use more than XXMB of memory". Disk/memory footprints of persistent/volatile 
repositories are probably easy ones to start with. There should be 
default/built-in prioritizers for deciding which flow files to drop when the 
cap is reached, and over time we can make that extensible. I think this is in 
line with  Joe's comment on the JIRA [1] that data from different sensors will 
likely have different importance and we need a way to deal with that. At the 
end of the day, if a flow is failing, but inputs are still coming in, and the 
agent has a utilization cap... something has to be dropped. 

6. There might be some concepts from the mobile platform space that we could 
carry over to the design of the agent. For example, on iOS, the OS is able to 
send lots of signals to apps regarding what is happening at the platform level, 
and the app can be implemented to act appropriately in different scenarios. For 
example, a memory warning for which apps are supposed to dispose of any 
volatile resources that are nonessential or can be recreated, or a signal that 
the app is about to enter a background state. Maybe there are some good designs 
that can be carried over so custom processors have push/pull hooks into the 
state of the platform that is provided by the framework. Eg, maybe a processor 
wants to have conditional logic based on the state of memory or network i/o and 
the minifi framework has APIs that make that discoverable (pull), and perhaps 
all custom processors can implement an interface that allows them to receive 
notifications from the framework when it detects some of these
  partial / complete failure conditions or is approaching configured 
utilization caps (push).

I've watched both JIRAs and will follow this thread as well. I'll chime in with 
more after I have time to think about this more and as more people respond. I 
agree input from people with experiences from the field would be really useful 
here.

Kevin

[1] 
https://issues.apache.org/jira/browse/MINIFI-356?focusedCommentId=16108832#comment-16108832

On 8/1/17, 09:59, "Marc"  wrote:

Good Morning,

  I've begun capturing some details in a ticket for durability and
reliability of MiNiFi C++ clients [1]. The scope of this ticket is
continuing operations despite failure within specific components. There is
a linked ticket [2] attempts to address some of the concerns brought up in
MINIFI-356, focusing no memory usage.

  The spirit of the ticket was meant to capture conditions of known
failure; however, given that more discussion has blossomed, I'd like to
assess the experience of the mailing list. Continuing operations in any
environment is difficult, particularly one in which we likely have little
to no control. Simply gathering information to know when a failure is
occurring is a major part of the battle. According to the tickets, there
needs to be some discussion of how we classify failure.

  The ticket addressed the low hanging fruit, but there are certainly more
conditions of failure. If a disk switches to read/write mode, disks becomes
full and/or out of inode entries etc, we know a complete failure occurred
and thus can switch our type of write activity to use a volatile repo. I
recognize

Re: [DISCUSS] Increasing durability in MiNiFi C++

2017-08-01 Thread Andy Christianson
In addition to the tickets mentioned, we probably want to do is isolate custom 
processors as much as possible. I.e. if a custom processor segfaults, we 
probably don’t want that to bring down the entire minifi process. Achieving 
that type of isolation might come with some tradeoffs, though. For instance, we 
may need to implement process-level isolation, similar to how the chromium 
browser isolates tab processes, but doing so would come with additional memory 
and IPC overhead. Maybe there are some modern sandboxing techniques we can look 
at.

Something to consider.

On 8/1/17, 9:59 AM, "Marc"  wrote:

Good Morning,

  I've begun capturing some details in a ticket for durability and
reliability of MiNiFi C++ clients [1]. The scope of this ticket is
continuing operations despite failure within specific components. There is
a linked ticket [2] attempts to address some of the concerns brought up in
MINIFI-356, focusing no memory usage.

  The spirit of the ticket was meant to capture conditions of known
failure; however, given that more discussion has blossomed, I'd like to
assess the experience of the mailing list. Continuing operations in any
environment is difficult, particularly one in which we likely have little
to no control. Simply gathering information to know when a failure is
occurring is a major part of the battle. According to the tickets, there
needs to be some discussion of how we classify failure.

  The ticket addressed the low hanging fruit, but there are certainly more
conditions of failure. If a disk switches to read/write mode, disks becomes
full and/or out of inode entries etc, we know a complete failure occurred
and thus can switch our type of write activity to use a volatile repo. I
recognize that partial failures may occur, but how do we classify these?
Should we classify these at all or would this be venturing into a rabbit
hole?

   For memory we can likely throttle queue sizes as needed. For networking
and other components we could likely find other measures of failure. The
goal, no matter the component, is to continue operations without human
intervention -- with the hope that the configuration makes the bounds of
the client obvious.

   My gut reaction is to separate partial failure as the low hanging fruit
of complete failure is much easier to address, but would love to hear the
reaction of this list. Further, any input on the types of failures to
address would be appreciated. Look forward to any and all responses.

  Best Regards,
  Marc

[1] https://issues.apache.org/jira/browse/MINIFI-356
[2] https://issues.apache.org/jira/browse/MINIFI-360




[DISCUSS] Increasing durability in MiNiFi C++

2017-08-01 Thread Marc
Good Morning,

  I've begun capturing some details in a ticket for durability and
reliability of MiNiFi C++ clients [1]. The scope of this ticket is
continuing operations despite failure within specific components. There is
a linked ticket [2] attempts to address some of the concerns brought up in
MINIFI-356, focusing no memory usage.

  The spirit of the ticket was meant to capture conditions of known
failure; however, given that more discussion has blossomed, I'd like to
assess the experience of the mailing list. Continuing operations in any
environment is difficult, particularly one in which we likely have little
to no control. Simply gathering information to know when a failure is
occurring is a major part of the battle. According to the tickets, there
needs to be some discussion of how we classify failure.

  The ticket addressed the low hanging fruit, but there are certainly more
conditions of failure. If a disk switches to read/write mode, disks becomes
full and/or out of inode entries etc, we know a complete failure occurred
and thus can switch our type of write activity to use a volatile repo. I
recognize that partial failures may occur, but how do we classify these?
Should we classify these at all or would this be venturing into a rabbit
hole?

   For memory we can likely throttle queue sizes as needed. For networking
and other components we could likely find other measures of failure. The
goal, no matter the component, is to continue operations without human
intervention -- with the hope that the configuration makes the bounds of
the client obvious.

   My gut reaction is to separate partial failure as the low hanging fruit
of complete failure is much easier to address, but would love to hear the
reaction of this list. Further, any input on the types of failures to
address would be appreciated. Look forward to any and all responses.

  Best Regards,
  Marc

[1] https://issues.apache.org/jira/browse/MINIFI-356
[2] https://issues.apache.org/jira/browse/MINIFI-360