Re: API to get all Policies

2018-11-09 Thread Lars Francke
Andy,

that's a good question. I have to admit that I thought about it and then
saw that there is already an Authorizable for this scenario so I assumed
that part was already taken care of. So whoever has the permission to view
"access all policies" would also be able to use the API? Were you thinking
of something different?

Cheers,
Lars



On Fri, Nov 9, 2018 at 12:35 AM Andy LoPresto  wrote:

> Lars,
>
> What access controls do you anticipate putting on this API endpoint and
> what potential issues do you see? I could see this being abused if not
> secured very carefully, and it doesn’t seem like a common use case
> (notwithstanding your current requirement). Is this something that can be
> done by using the NiFi CLI to iterate/recurse through the various PGs and
> retrieve these policies?
>
> Andy LoPresto
> alopre...@apache.org
> alopresto.apa...@gmail.com
> PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69
>
> > On Nov 9, 2018, at 3:31 AM, Lars Francke  wrote:
> >
> > Hi,
> >
> > I was tasked with writing a tool to generate a kind of "audit report".
> For
> > that I need to get all policies that people have across various systems.
> > NiFi is one of them.
> >
> > I see that we have a REST API for Policies but that doesn't expose a
> method
> > to expose _all_ policies. I'd like to add a REST endpoint that allows
> > retrieving all policies.
> > Before I open a Jira I wanted to get feedback whether this addition would
> > be acceptable.
> >
> > Implementation notes
> > This is how I see the current flow of requests from the
> > AccessPolicyResource to the actual AccessPolicyProider:
> >
> > AccessPolicyResource -> NiFiServiceFacade (StandardNiFiServiceFacade) ->
> > AccessPolicyDAO (StandardPolicyBasedAuthorizerDAO) ->
> AccessPolicyProvider
> >
> > Fortunately the AccessPolicyProvider already has a method to retrieve all
> > policies. Should there be custom implementations by third-parties they
> > already support the necessary methods and I believe the classes that need
> > to be changed are all NiFi internal:
> >
> > * AccessPolicyResource
> > * NiFiServiceFacade
> > * StandardNiFiServiceFacade
> > * AccessPolicyDAO
> > * StandardPolicyBasedAuthorizerDAO
> > * And probably a bunch of others especially test classes
> >
> > If I don't hear any objections I will open a Jira issue and would try and
> > provide a patch.
> >
> > Cheers,
> > Lars
>
>


Re: API to get all Policies

2018-11-09 Thread Lars Francke
I've just tried implementing my new resource and it seems to work fine and
as I expect it to. Also in regards to authorization. Users cannot see
anything that they are not allowed to do anyway.

Regarding your other comments: I agree that it's probably not a super
common use case.

Either way I'd love to use a API that I can access remotely as I need to
connect to other systems as well (e.g. Kafka, HBase etc.) so I don't want
to colocate my service on one of the NiFi machines.
But yes I could probably get a list of all resources somehow using the API
and then send one request per resource. But that seems overly complicated.

So if you don't object I'd create a Jira.

Cheers,
Lars


On Fri, Nov 9, 2018 at 10:01 AM Lars Francke  wrote:

> Andy,
>
> that's a good question. I have to admit that I thought about it and then
> saw that there is already an Authorizable for this scenario so I assumed
> that part was already taken care of. So whoever has the permission to view
> "access all policies" would also be able to use the API? Were you thinking
> of something different?
>
> Cheers,
> Lars
>
>
>
> On Fri, Nov 9, 2018 at 12:35 AM Andy LoPresto 
> wrote:
>
>> Lars,
>>
>> What access controls do you anticipate putting on this API endpoint and
>> what potential issues do you see? I could see this being abused if not
>> secured very carefully, and it doesn’t seem like a common use case
>> (notwithstanding your current requirement). Is this something that can be
>> done by using the NiFi CLI to iterate/recurse through the various PGs and
>> retrieve these policies?
>>
>> Andy LoPresto
>> alopre...@apache.org
>> alopresto.apa...@gmail.com
>> PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69
>>
>> > On Nov 9, 2018, at 3:31 AM, Lars Francke 
>> wrote:
>> >
>> > Hi,
>> >
>> > I was tasked with writing a tool to generate a kind of "audit report".
>> For
>> > that I need to get all policies that people have across various systems.
>> > NiFi is one of them.
>> >
>> > I see that we have a REST API for Policies but that doesn't expose a
>> method
>> > to expose _all_ policies. I'd like to add a REST endpoint that allows
>> > retrieving all policies.
>> > Before I open a Jira I wanted to get feedback whether this addition
>> would
>> > be acceptable.
>> >
>> > Implementation notes
>> > This is how I see the current flow of requests from the
>> > AccessPolicyResource to the actual AccessPolicyProider:
>> >
>> > AccessPolicyResource -> NiFiServiceFacade (StandardNiFiServiceFacade) ->
>> > AccessPolicyDAO (StandardPolicyBasedAuthorizerDAO) ->
>> AccessPolicyProvider
>> >
>> > Fortunately the AccessPolicyProvider already has a method to retrieve
>> all
>> > policies. Should there be custom implementations by third-parties they
>> > already support the necessary methods and I believe the classes that
>> need
>> > to be changed are all NiFi internal:
>> >
>> > * AccessPolicyResource
>> > * NiFiServiceFacade
>> > * StandardNiFiServiceFacade
>> > * AccessPolicyDAO
>> > * StandardPolicyBasedAuthorizerDAO
>> > * And probably a bunch of others especially test classes
>> >
>> > If I don't hear any objections I will open a Jira issue and would try
>> and
>> > provide a patch.
>> >
>> > Cheers,
>> > Lars
>>
>>


Re: API to get all Policies

2018-11-09 Thread Kevin Doran
Hi Lars, 

I think as long as the following are true (it sounds like they are from what 
you have looked at):

1. the proposed endpoint does not require adding any additional Authorizable or 
policy to protect, and 
2. the proposed endpoint does not expose any information that the authenticated 
client/user would not already have access to view, and is merely acting as a 
convenience method to return a list of things they could fetch individually

then in that case this is probably fine. No objection from me.

Any time we are adding a collection endpoint, my main concern is if pagination 
of the results also needs to be added (i.e., if for typical usage of NiFi, the 
response size of the JSON result would be larger than is reasonable to transmit 
in a single HTTP round trip, or if creating the response would be unreasonable 
load on the server). In typical usage of NiFi, I don't think the number of 
policies is that large (perhaps others can chime in if they feel differently?), 
so it would come down to what is the size of a policy element when returned in 
a list. If it is very large, you may also want to introduce a summary 
view/perspective of the policy that reduces the amount of information to the 
minimal that is required for a list view... I think that may already exist for 
NiFi in the AccessPolicySummary object, but it's been a while since I've looked 
at the code so I may be forgetting the details or confusing it with the NiFi 
Registry implementation, which does have a get all policies endpoint.

Lastly, take care that the Swagger annotations that are used to drive the Rest 
API documentation are accurate. If you have any questions on that let me know. 
Happy to help review a PR if you submit one.

Regards,
Kevin

On 11/9/18, 06:23, "Lars Francke"  wrote:

I've just tried implementing my new resource and it seems to work fine and
as I expect it to. Also in regards to authorization. Users cannot see
anything that they are not allowed to do anyway.

Regarding your other comments: I agree that it's probably not a super
common use case.

Either way I'd love to use a API that I can access remotely as I need to
connect to other systems as well (e.g. Kafka, HBase etc.) so I don't want
to colocate my service on one of the NiFi machines.
But yes I could probably get a list of all resources somehow using the API
and then send one request per resource. But that seems overly complicated.

So if you don't object I'd create a Jira.

Cheers,
Lars


On Fri, Nov 9, 2018 at 10:01 AM Lars Francke  wrote:

> Andy,
>
> that's a good question. I have to admit that I thought about it and then
> saw that there is already an Authorizable for this scenario so I assumed
> that part was already taken care of. So whoever has the permission to view
> "access all policies" would also be able to use the API? Were you thinking
> of something different?
>
> Cheers,
> Lars
>
>
>
> On Fri, Nov 9, 2018 at 12:35 AM Andy LoPresto 
> wrote:
>
>> Lars,
>>
>> What access controls do you anticipate putting on this API endpoint and
>> what potential issues do you see? I could see this being abused if not
>> secured very carefully, and it doesn’t seem like a common use case
>> (notwithstanding your current requirement). Is this something that can be
>> done by using the NiFi CLI to iterate/recurse through the various PGs and
>> retrieve these policies?
>>
>> Andy LoPresto
>> alopre...@apache.org
>> alopresto.apa...@gmail.com
>> PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69
>>
>> > On Nov 9, 2018, at 3:31 AM, Lars Francke 
>> wrote:
>> >
>> > Hi,
>> >
>> > I was tasked with writing a tool to generate a kind of "audit report".
>> For
>> > that I need to get all policies that people have across various 
systems.
>> > NiFi is one of them.
>> >
>> > I see that we have a REST API for Policies but that doesn't expose a
>> method
>> > to expose _all_ policies. I'd like to add a REST endpoint that allows
>> > retrieving all policies.
>> > Before I open a Jira I wanted to get feedback whether this addition
>> would
>> > be acceptable.
>> >
>> > Implementation notes
>> > This is how I see the current flow of requests from the
>> > AccessPolicyResource to the actual AccessPolicyProider:
>> >
>> > AccessPolicyResource -> NiFiServiceFacade (StandardNiFiServiceFacade) 
->
>> > AccessPolicyDAO (StandardPolicyBasedAuthorizerDAO) ->
>> AccessPolicyProvider
>> >
>> > Fortunately the AccessPolicyProvider already has a method to retrieve
>> all
>> > policies. Should there be custom implementations by third-parties they
>> > already support the necessary methods and I believe the classes that
>> need
>> > to be c

RE: [EXT] Re: New Standard Pattern - Put Exception that caused failure in an attribute

2018-11-09 Thread Peter Wicks (pwicks)
A one week bump on this thread. --Peter

-Original Message-
From: Peter Wicks (pwicks) 
Sent: Friday, November 2, 2018 11:54 AM
To: dev@nifi.apache.org
Subject: RE: [EXT] Re: New Standard Pattern - Put Exception that caused failure 
in an attribute

Dev Team,

I don’t think we've reached a conclusion on this discussion, but would like 
too. I had not done enough research when I originally suggested this as a, "New 
Pattern". Having done a bit more research now, I'd say this is already a well 
established pattern.

Examples using this pattern already, with exception types/text written in 
FlowFile Attributes:
 - GenerateTableFetch (Matt added back in 2017) does this for incoming 
FlowFiles that cause a SQL exception
 - PutDatabaseRecord (Matt added also in 2017 with the original version of the 
processor)
 - ValidateCSV and ValidateXML puts that validation cause as an Attribute 
(maybe not exactly the same, but feels similar)
 - InvokeHTTP, InvokeGRPC Exception class name and Exception message
 - Couchbase Processors (Put/Get) provides the exception class name
 - PutLambda (six different exception fields get written to Attributes)
 - Other AWS processors are similar in how they handle this, such as the Dynamo 
processor.
 - ExecuteStreamCommand provides the error message from a script execution as 
an attribute.
 - DeleteHDFS puts the error message as an Attribute
 - ScanHBase puts scanning errors as an Attribute
 - DeleteElasticSearch (both versions) put deletion failure messages as an 
attribute
 - InfluxDB processors do this also (influxdb.error.message)
 - ConvertExcelToCSV tracks conversion errors in an attribute
 - RethinkDB processors do this too 

Thanks,
  Peter

-Original Message-
From: James Srinivasan [mailto:james.sriniva...@gmail.com]
Sent: Tuesday, October 30, 2018 3:00 PM
To: dev@nifi.apache.org
Subject: Re: [EXT] Re: New Standard Pattern - Put Exception that caused failure 
in an attribute

Apologies if I've missed this in the discussion so far - we use the InvokeHTTP 
processor a lot, and the invokehttp.java.exception.message attribute is really 
handy diving into why things have failed without having to match up logs with 
flow files (from a system with hundreds of processors making thousands of 
requests). We also route on invokehttp.status.code (e.g. to retry 403s due to a 
race hazard in an external system) but I don't imagine we'd route on
invokehttp.java.exception.* since (as others have mentioned) it looks pretty 
fragile.

--
James
On Tue, 30 Oct 2018 at 16:44, Peter Wicks (pwicks)  wrote:
>
> Sorry for the delayed response, I've been traveling.
>
> Responses in order:
>
> Matt,
> Right now our work around is to keep retrying errors, usually with a penalty 
> or control rate processor. The problem is that we don't know why it failed, 
> and thus don't know if retry is the correct option. I have not found a way, 
> without code change, to be able to determine if retrying is the correct 
> option or not.
>
> Koji,
> Detailed error handling would indeed be a good workaround to the problems 
> raised by myself and Matt. I have not see this on other processors, but that 
> does not mean we can't do it of course.  I agree that having some kind of 
> hierarchy system for errors would be a much better solution.
>
> Pierre,
> My primary use case is as you described, a user friendly way to see what 
> actually happened without going through the log files. But I while I know 
> it's fragile, routing on exception text stored in an attribute still feels 
> like a very legitimate use case. I know in many systems there are good 
> exception types that can be used to route FlowFile's to appropriate failure 
> relationships, but as Matt mentioned, JDBC has just a handful of exception 
> types for a very large number of possible error types.
>
> I think this is probably the same rational that was used to justify this 
> feature for Execute Stream Command's inclusion of this feature in the past. 
> To many possible failure conditions to handle with just a few failure 
> conditions.
>
> Uwe,
> That is a fair question, but it doesn't feel like such a bad fit to me. It's 
> like extra metadata on the lineage, "We followed this path through the flow 
> because we had exception "  " which caused the FlowFile to follow the 
> failure route".
>
> But I still prefer the attribute, it could be another option for Detailed 
> error handling; instead of, or in addition to, additional relationships for 
> failures, the exception text could be included in an attribute.
>
> Thanks,
>   Peter
>
> -Original Message-
> From: u...@moosheimer.com [mailto:u...@moosheimer.com]
> Sent: Saturday, October 27, 2018 10:46 AM
> To: dev@nifi.apache.org
> Subject: Re: [EXT] Re: New Standard Pattern - Put Exception that 
> caused failure in an attribute
>
> Do you really want to mix provenance and data lineage with logging/error 
> information?
>
> Writing exception information/logging information withi

Re: [EXT] Re: New Standard Pattern - Put Exception that caused failure in an attribute

2018-11-09 Thread Joe Witt
Peter,

I'm not clear on what you're are asking be done precisely and how far
it be carried.

You're right there are many processors which store exception/log
details as attributes on flowfiles before they route them
(success,failure, etc..).  This is fine and can be documented with the
WritesAttribute annotations to be helpful.

Where that model breaks down though and should never be used is when
someone wants to use the text of that String to safely/reliable
automate something.  If the 'failure' reason for a given situation is
precisely knowable enough and could reasonably be valuable for routing
it should be an explicit relationship.  Attributes for exceptions/log
values are useful provided they are 'advisory' only meaning largely
just intended for users/general awareness.  But not for automation or
to define an explicit interface.

So, with the above said can you clarify what you are precisely
requesting and for 'who' - who is the actor.

Thanks
On Fri, Nov 9, 2018 at 2:12 PM Peter Wicks (pwicks)  wrote:
>
> A one week bump on this thread. --Peter
>
> -Original Message-
> From: Peter Wicks (pwicks)
> Sent: Friday, November 2, 2018 11:54 AM
> To: dev@nifi.apache.org
> Subject: RE: [EXT] Re: New Standard Pattern - Put Exception that caused 
> failure in an attribute
>
> Dev Team,
>
> I don’t think we've reached a conclusion on this discussion, but would like 
> too. I had not done enough research when I originally suggested this as a, 
> "New Pattern". Having done a bit more research now, I'd say this is already a 
> well established pattern.
>
> Examples using this pattern already, with exception types/text written in 
> FlowFile Attributes:
>  - GenerateTableFetch (Matt added back in 2017) does this for incoming 
> FlowFiles that cause a SQL exception
>  - PutDatabaseRecord (Matt added also in 2017 with the original version of 
> the processor)
>  - ValidateCSV and ValidateXML puts that validation cause as an Attribute 
> (maybe not exactly the same, but feels similar)
>  - InvokeHTTP, InvokeGRPC Exception class name and Exception message
>  - Couchbase Processors (Put/Get) provides the exception class name
>  - PutLambda (six different exception fields get written to Attributes)
>  - Other AWS processors are similar in how they handle this, such as the 
> Dynamo processor.
>  - ExecuteStreamCommand provides the error message from a script execution as 
> an attribute.
>  - DeleteHDFS puts the error message as an Attribute
>  - ScanHBase puts scanning errors as an Attribute
>  - DeleteElasticSearch (both versions) put deletion failure messages as an 
> attribute
>  - InfluxDB processors do this also (influxdb.error.message)
>  - ConvertExcelToCSV tracks conversion errors in an attribute
>  - RethinkDB processors do this too
>
> Thanks,
>   Peter
>
> -Original Message-
> From: James Srinivasan [mailto:james.sriniva...@gmail.com]
> Sent: Tuesday, October 30, 2018 3:00 PM
> To: dev@nifi.apache.org
> Subject: Re: [EXT] Re: New Standard Pattern - Put Exception that caused 
> failure in an attribute
>
> Apologies if I've missed this in the discussion so far - we use the 
> InvokeHTTP processor a lot, and the invokehttp.java.exception.message 
> attribute is really handy diving into why things have failed without having 
> to match up logs with flow files (from a system with hundreds of processors 
> making thousands of requests). We also route on invokehttp.status.code (e.g. 
> to retry 403s due to a race hazard in an external system) but I don't imagine 
> we'd route on
> invokehttp.java.exception.* since (as others have mentioned) it looks pretty 
> fragile.
>
> --
> James
> On Tue, 30 Oct 2018 at 16:44, Peter Wicks (pwicks)  wrote:
> >
> > Sorry for the delayed response, I've been traveling.
> >
> > Responses in order:
> >
> > Matt,
> > Right now our work around is to keep retrying errors, usually with a 
> > penalty or control rate processor. The problem is that we don't know why it 
> > failed, and thus don't know if retry is the correct option. I have not 
> > found a way, without code change, to be able to determine if retrying is 
> > the correct option or not.
> >
> > Koji,
> > Detailed error handling would indeed be a good workaround to the problems 
> > raised by myself and Matt. I have not see this on other processors, but 
> > that does not mean we can't do it of course.  I agree that having some kind 
> > of hierarchy system for errors would be a much better solution.
> >
> > Pierre,
> > My primary use case is as you described, a user friendly way to see what 
> > actually happened without going through the log files. But I while I know 
> > it's fragile, routing on exception text stored in an attribute still feels 
> > like a very legitimate use case. I know in many systems there are good 
> > exception types that can be used to route FlowFile's to appropriate failure 
> > relationships, but as Matt mentioned, JDBC has just a handful of exception 
> > types for a very large num

RE: [EXT] Re: New Standard Pattern - Put Exception that caused failure in an attribute

2018-11-09 Thread Peter Wicks (pwicks)
Joe,

Several different opinions have been expressed by Matt Burgess, Mark Payne and 
Bryan Bende about whether we should be storing exception information in 
attributes, and the pros and cons, in this thread. Those opinions generally 
matched yours, which is that a well-defined relationship is the best approach. 
I don't disagree in anyway with the consensus, I agree that the best solution 
is to use a well-defined relationship.

The pattern I see in the Processor list I provided below is that almost all of 
the processors work with external systems (outside of NiFi), and in many cases 
the number of distinct exception classes that can occur is low, but the variety 
of exceptions is high (JDBC/Stream Command). Matt did a good job of discussing 
this for JDBC type processors in his reply.

My users are desperate for these error details, especially on ExecuteSQL; and I 
won't lie, users are absolutely going to parse the exceptions and use 
RouteOnAttribute. And yep, it's going to be fragile and break sometimes. (I 
don't know that this will be the primary use case, as troubleshooting using 
this information will also be a major facet of its usefulness) The problem, 
especially with JDBC, is that I don't see a reasonable alternative. There are 
so many different JDBC drivers, and NiFi will only see a SQLException type with 
differing text. Even if we went down the route of putting exception parsers 
into the DBAdapters to provide per driver failure handling, all we've really 
done is move the fragileness into NiFi, where it's hard coded until the next 
release.

Thank you,
  Peter

-Original Message-
From: Joe Witt [mailto:joe.w...@gmail.com] 
Sent: Friday, November 9, 2018 12:23 PM
To: dev@nifi.apache.org
Subject: Re: [EXT] Re: New Standard Pattern - Put Exception that caused failure 
in an attribute

Peter,

I'm not clear on what you're are asking be done precisely and how far it be 
carried.

You're right there are many processors which store exception/log details as 
attributes on flowfiles before they route them (success,failure, etc..).  This 
is fine and can be documented with the WritesAttribute annotations to be 
helpful.

Where that model breaks down though and should never be used is when someone 
wants to use the text of that String to safely/reliable automate something.  If 
the 'failure' reason for a given situation is precisely knowable enough and 
could reasonably be valuable for routing it should be an explicit relationship. 
 Attributes for exceptions/log values are useful provided they are 'advisory' 
only meaning largely just intended for users/general awareness.  But not for 
automation or to define an explicit interface.

So, with the above said can you clarify what you are precisely requesting and 
for 'who' - who is the actor.

Thanks
On Fri, Nov 9, 2018 at 2:12 PM Peter Wicks (pwicks)  wrote:
>
> A one week bump on this thread. --Peter
>
> -Original Message-
> From: Peter Wicks (pwicks)
> Sent: Friday, November 2, 2018 11:54 AM
> To: dev@nifi.apache.org
> Subject: RE: [EXT] Re: New Standard Pattern - Put Exception that 
> caused failure in an attribute
>
> Dev Team,
>
> I don’t think we've reached a conclusion on this discussion, but would like 
> too. I had not done enough research when I originally suggested this as a, 
> "New Pattern". Having done a bit more research now, I'd say this is already a 
> well established pattern.
>
> Examples using this pattern already, with exception types/text written in 
> FlowFile Attributes:
>  - GenerateTableFetch (Matt added back in 2017) does this for incoming 
> FlowFiles that cause a SQL exception
>  - PutDatabaseRecord (Matt added also in 2017 with the original 
> version of the processor)
>  - ValidateCSV and ValidateXML puts that validation cause as an 
> Attribute (maybe not exactly the same, but feels similar)
>  - InvokeHTTP, InvokeGRPC Exception class name and Exception message
>  - Couchbase Processors (Put/Get) provides the exception class name
>  - PutLambda (six different exception fields get written to 
> Attributes)
>  - Other AWS processors are similar in how they handle this, such as the 
> Dynamo processor.
>  - ExecuteStreamCommand provides the error message from a script execution as 
> an attribute.
>  - DeleteHDFS puts the error message as an Attribute
>  - ScanHBase puts scanning errors as an Attribute
>  - DeleteElasticSearch (both versions) put deletion failure messages 
> as an attribute
>  - InfluxDB processors do this also (influxdb.error.message)
>  - ConvertExcelToCSV tracks conversion errors in an attribute
>  - RethinkDB processors do this too
>
> Thanks,
>   Peter
>
> -Original Message-
> From: James Srinivasan [mailto:james.sriniva...@gmail.com]
> Sent: Tuesday, October 30, 2018 3:00 PM
> To: dev@nifi.apache.org
> Subject: Re: [EXT] Re: New Standard Pattern - Put Exception that 
> caused failure in an attribute
>
> Apologies if I've missed this in the discussion so far - we use the 
> I

Re: [EXT] Re: New Standard Pattern - Put Exception that caused failure in an attribute

2018-11-09 Thread Joe Witt
Peter

Ok cool.  So i think we agree on the state of things.  And for
processors you want to add more details in failure cases to you can do
so (provided we're not just bloating attributes all over).  And we'll
recognize that this model is basically to help users and will likely
be abused and be brittle.  But I think i'm saying 'there is nothing
new to do now' then right?

Do you agree?

Thanks
On Fri, Nov 9, 2018 at 3:59 PM Peter Wicks (pwicks)  wrote:
>
> Joe,
>
> Several different opinions have been expressed by Matt Burgess, Mark Payne 
> and Bryan Bende about whether we should be storing exception information in 
> attributes, and the pros and cons, in this thread. Those opinions generally 
> matched yours, which is that a well-defined relationship is the best 
> approach. I don't disagree in anyway with the consensus, I agree that the 
> best solution is to use a well-defined relationship.
>
> The pattern I see in the Processor list I provided below is that almost all 
> of the processors work with external systems (outside of NiFi), and in many 
> cases the number of distinct exception classes that can occur is low, but the 
> variety of exceptions is high (JDBC/Stream Command). Matt did a good job of 
> discussing this for JDBC type processors in his reply.
>
> My users are desperate for these error details, especially on ExecuteSQL; and 
> I won't lie, users are absolutely going to parse the exceptions and use 
> RouteOnAttribute. And yep, it's going to be fragile and break sometimes. (I 
> don't know that this will be the primary use case, as troubleshooting using 
> this information will also be a major facet of its usefulness) The problem, 
> especially with JDBC, is that I don't see a reasonable alternative. There are 
> so many different JDBC drivers, and NiFi will only see a SQLException type 
> with differing text. Even if we went down the route of putting exception 
> parsers into the DBAdapters to provide per driver failure handling, all we've 
> really done is move the fragileness into NiFi, where it's hard coded until 
> the next release.
>
> Thank you,
>   Peter
>
> -Original Message-
> From: Joe Witt [mailto:joe.w...@gmail.com]
> Sent: Friday, November 9, 2018 12:23 PM
> To: dev@nifi.apache.org
> Subject: Re: [EXT] Re: New Standard Pattern - Put Exception that caused 
> failure in an attribute
>
> Peter,
>
> I'm not clear on what you're are asking be done precisely and how far it be 
> carried.
>
> You're right there are many processors which store exception/log details as 
> attributes on flowfiles before they route them (success,failure, etc..).  
> This is fine and can be documented with the WritesAttribute annotations to be 
> helpful.
>
> Where that model breaks down though and should never be used is when someone 
> wants to use the text of that String to safely/reliable automate something.  
> If the 'failure' reason for a given situation is precisely knowable enough 
> and could reasonably be valuable for routing it should be an explicit 
> relationship.  Attributes for exceptions/log values are useful provided they 
> are 'advisory' only meaning largely just intended for users/general 
> awareness.  But not for automation or to define an explicit interface.
>
> So, with the above said can you clarify what you are precisely requesting and 
> for 'who' - who is the actor.
>
> Thanks
> On Fri, Nov 9, 2018 at 2:12 PM Peter Wicks (pwicks)  wrote:
> >
> > A one week bump on this thread. --Peter
> >
> > -Original Message-
> > From: Peter Wicks (pwicks)
> > Sent: Friday, November 2, 2018 11:54 AM
> > To: dev@nifi.apache.org
> > Subject: RE: [EXT] Re: New Standard Pattern - Put Exception that
> > caused failure in an attribute
> >
> > Dev Team,
> >
> > I don’t think we've reached a conclusion on this discussion, but would like 
> > too. I had not done enough research when I originally suggested this as a, 
> > "New Pattern". Having done a bit more research now, I'd say this is already 
> > a well established pattern.
> >
> > Examples using this pattern already, with exception types/text written in 
> > FlowFile Attributes:
> >  - GenerateTableFetch (Matt added back in 2017) does this for incoming
> > FlowFiles that cause a SQL exception
> >  - PutDatabaseRecord (Matt added also in 2017 with the original
> > version of the processor)
> >  - ValidateCSV and ValidateXML puts that validation cause as an
> > Attribute (maybe not exactly the same, but feels similar)
> >  - InvokeHTTP, InvokeGRPC Exception class name and Exception message
> >  - Couchbase Processors (Put/Get) provides the exception class name
> >  - PutLambda (six different exception fields get written to
> > Attributes)
> >  - Other AWS processors are similar in how they handle this, such as the 
> > Dynamo processor.
> >  - ExecuteStreamCommand provides the error message from a script execution 
> > as an attribute.
> >  - DeleteHDFS puts the error message as an Attribute
> >  - ScanHBase puts scanning erro

RE: [EXT] Re: New Standard Pattern - Put Exception that caused failure in an attribute

2018-11-09 Thread Peter Wicks (pwicks)
Joe,

The only new thing we can do is Matt can finish reviewing the PR for NIFI-5744, 
which adds this as a feature to ExecuteSQL, as he was waiting for this 
discussion to come to a close. 
https://github.com/apache/nifi/pull/3107#issuecomment-433260483
But apart from that, nothing new to do now.

Thanks,
  Peter

-Original Message-
From: Joe Witt [mailto:joe.w...@gmail.com] 
Sent: Friday, November 9, 2018 2:36 PM
To: dev@nifi.apache.org
Subject: Re: [EXT] Re: New Standard Pattern - Put Exception that caused failure 
in an attribute

Peter

Ok cool.  So i think we agree on the state of things.  And for processors you 
want to add more details in failure cases to you can do so (provided we're not 
just bloating attributes all over).  And we'll recognize that this model is 
basically to help users and will likely be abused and be brittle.  But I think 
i'm saying 'there is nothing new to do now' then right?

Do you agree?

Thanks
On Fri, Nov 9, 2018 at 3:59 PM Peter Wicks (pwicks)  wrote:
>
> Joe,
>
> Several different opinions have been expressed by Matt Burgess, Mark Payne 
> and Bryan Bende about whether we should be storing exception information in 
> attributes, and the pros and cons, in this thread. Those opinions generally 
> matched yours, which is that a well-defined relationship is the best 
> approach. I don't disagree in anyway with the consensus, I agree that the 
> best solution is to use a well-defined relationship.
>
> The pattern I see in the Processor list I provided below is that almost all 
> of the processors work with external systems (outside of NiFi), and in many 
> cases the number of distinct exception classes that can occur is low, but the 
> variety of exceptions is high (JDBC/Stream Command). Matt did a good job of 
> discussing this for JDBC type processors in his reply.
>
> My users are desperate for these error details, especially on ExecuteSQL; and 
> I won't lie, users are absolutely going to parse the exceptions and use 
> RouteOnAttribute. And yep, it's going to be fragile and break sometimes. (I 
> don't know that this will be the primary use case, as troubleshooting using 
> this information will also be a major facet of its usefulness) The problem, 
> especially with JDBC, is that I don't see a reasonable alternative. There are 
> so many different JDBC drivers, and NiFi will only see a SQLException type 
> with differing text. Even if we went down the route of putting exception 
> parsers into the DBAdapters to provide per driver failure handling, all we've 
> really done is move the fragileness into NiFi, where it's hard coded until 
> the next release.
>
> Thank you,
>   Peter
>
> -Original Message-
> From: Joe Witt [mailto:joe.w...@gmail.com]
> Sent: Friday, November 9, 2018 12:23 PM
> To: dev@nifi.apache.org
> Subject: Re: [EXT] Re: New Standard Pattern - Put Exception that 
> caused failure in an attribute
>
> Peter,
>
> I'm not clear on what you're are asking be done precisely and how far it be 
> carried.
>
> You're right there are many processors which store exception/log details as 
> attributes on flowfiles before they route them (success,failure, etc..).  
> This is fine and can be documented with the WritesAttribute annotations to be 
> helpful.
>
> Where that model breaks down though and should never be used is when someone 
> wants to use the text of that String to safely/reliable automate something.  
> If the 'failure' reason for a given situation is precisely knowable enough 
> and could reasonably be valuable for routing it should be an explicit 
> relationship.  Attributes for exceptions/log values are useful provided they 
> are 'advisory' only meaning largely just intended for users/general 
> awareness.  But not for automation or to define an explicit interface.
>
> So, with the above said can you clarify what you are precisely requesting and 
> for 'who' - who is the actor.
>
> Thanks
> On Fri, Nov 9, 2018 at 2:12 PM Peter Wicks (pwicks)  wrote:
> >
> > A one week bump on this thread. --Peter
> >
> > -Original Message-
> > From: Peter Wicks (pwicks)
> > Sent: Friday, November 2, 2018 11:54 AM
> > To: dev@nifi.apache.org
> > Subject: RE: [EXT] Re: New Standard Pattern - Put Exception that 
> > caused failure in an attribute
> >
> > Dev Team,
> >
> > I don’t think we've reached a conclusion on this discussion, but would like 
> > too. I had not done enough research when I originally suggested this as a, 
> > "New Pattern". Having done a bit more research now, I'd say this is already 
> > a well established pattern.
> >
> > Examples using this pattern already, with exception types/text written in 
> > FlowFile Attributes:
> >  - GenerateTableFetch (Matt added back in 2017) does this for 
> > incoming FlowFiles that cause a SQL exception
> >  - PutDatabaseRecord (Matt added also in 2017 with the original 
> > version of the processor)
> >  - ValidateCSV and ValidateXML puts that validation cause as an 
> > Attribute (maybe no

ListSFTP is hanging

2018-11-09 Thread David Marrow
We have been using Nifi for over a year and we just turned up a new cluster.  
We move around 6TB a day of small to large files.   We are having an issue
of the ListSFTP missing files.   I know this can happen if a file with an
older date is moved into the directory because the lister is maintaining
state.   However it also seems to hang when there are 10k plus files.   I am
running Nifi 1.6 on Ubuntu 18.  The cluster has plenty of memory, CPU, and
disk space.   I am also using the distributed cache because we haven't
migrated to 1.8 yet.   

We have 20 different data flows all with their own logic.  We connect the
Lister to a remote port that is connected to a remote process group and then
distributed across the cluster to a FetchSFTP that deletes the files after
they are loaded.  

We move files into the input directory so we have permission to delete them
from the Nifi Fetch.  We are doing a find which orders the files to make
sure that we don't grab old files.  This could still be an issue and cause
us to miss a few files but it still doesn't explain why when the lister is
running and there are files to pull nothing gets pulled. 

Any suggestion for idea would be appreciated.  

Dave



--
Sent from: http://apache-nifi-developer-list.39713.n7.nabble.com/