Re: Routing to Failure relationships and Route provenance events

2016-11-17 Thread Jeff
Hello Michal,

On a previous project, I worked on a flow with hundreds of processors that
handled thousands of files per day, and had similar questions when I was
trying to figure out the best way to figure out how to handle failures
gracefully to make it possible to automatically retry processing of
possible, or at least collect enough information at the point of failure to
be able to stash the file somewhere and notify my team members of the
failure so someone could manually correct an issue and then resubmit the
flowfile.

What eventually was implemented consisted of several stages of processing
in which the flow would update an external database to log several aspects
of the stage of processing it was in, and several PutEmail processors
representing the ends of stages.  Failure relationships would be
"aggregated" by the appropriate PutEmail processor for each stage of
processing, which in turn had PutHDFS processors to stash the file at a
location representative of the stage of processing that the flowfile could
not complete.  When there was a failure, we logged that in the external
database, sent an email for notification of the failure, and then placed
the file on HDFS so that someone could fix it and put the flowfile back
into the flow so it could retry that stage.

I had thought it'd be good for NiFi to track the last processor by which a
flowfile had been processed, but regardless of that, the flow still needs
to be developed to deal with errors in the context in which they happen,
and it takes several iterations of development to get it written that way.
After the flow design had evolved in such a way to be able to handle the
failures gracefully, I found that it was very easy to diagnose where
something went wrong with a flowfile, and I could always go back into Data
Provenance if I needed more detail.

- Jeff

On Thu, Nov 17, 2016 at 4:17 AM Michal Klempa 
wrote:

> Hi,
> thank you both for responses. I understand the scenario with rerouting
> back to processor would cause infinite provenance history. It can also
> cause inifite loop, when the destination system is offline, therefore
> I am not using this approach in this case.
>
> Generaly, I have problem identifying the 'last processor which routed
> the flowfile to failure before entering failure handling'. And yes, I
> was thinking of attaching UpdateAttribute right after each failure
> connection I need to handle and distinguish. This would be really
> messy. Therefore I was thinking I am doing something wrong in general.
>
> My thoughts were, that when I can identify where the FlowFile escaped
> standard execution through failure, I can then just save flowfile
> somewhere (e.g. HDFS) with metadata (attributes) and let this for
> future inspection and especially -> manually re-entering the flow from
> the point of failure. Is this a bad approach ? Or how do you design
> flows then? Is it possible to programmatically inspect flowfile to
> find a processor which was the last in the chain touching it (even
> though this processor did not emit any provenance event at all)? If
> so, tell me, I can afford coding my processor to acoomplish this task.
>
> Thanks. Michal.
>
> On Thu, Nov 10, 2016 at 7:50 PM, Andy LoPresto 
> wrote:
> > Michael,
> >
> > A temporary solution would be to insert an UpdateAttribute processor
> between
> > the source processor (where the failure occurred) and your general
> failure
> > handling flow. This processor could add an attribute noting the location
> of
> > the failure and you could quickly determine that when debugging.
> >
> > If this seems cumbersome, you could also put a single ExecuteScript
> > processor at the beginning of your failure handling flow and query the
> > provenance events for the incoming flowfile, detect the last event that
> > occurred, and then write out an additional, arbitrary provenance event
> > indicating the failure.
> >
> > Neither are excellent solutions, and Mark is right that there should be a
> > better option for diagnosing this. Please submit a Jira capturing your
> > thoughts and we’ll see what is possible.
> >
> >
> > Andy LoPresto
> > alopre...@apache.org
> > alopresto.apa...@gmail.com
> > PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69
> >
> > On Nov 7, 2016, at 6:10 AM, Mark Payne  wrote:
> >
> > Michal,
> >
> > Currently, the guidance that we give is for processors not to emit any
> sort
> > of ROUTE event for
> > routing a FlowFile to a 'failure' relationship. While this may seem
> > counter-intuitive, we do this because
> > most of the time when a FlowFile is routed to 'failure', the failure
> > relationship is not pointing to some
> > sort of 'failure' flow like you describe here but rather the failure
> > relationship is a self-loop so that the
> > Processor tries again.
> >
> > In the scenario described above, if PostHTTP were to route a FlowFile to
> > failure and failure looped 

Re: nifi configuration file for process group

2016-11-17 Thread Yolanda Davis
Bala,

Let me add to the below in saying that in the current implementation of
custom properties, variables are bound at startup. So if the custom
property file is changed then nifi would need to be restarted. However
there are jiras and discussion around improving this in the future.

Yolanda

On Thu, Nov 17, 2016 at 10:42 AM, Yolanda Davis 
wrote:

> Hello Bala,
>
> If you are using a processor that supports expression language for the IP
> address property you can configure NiFi to use a variable within a custom
> property file that you define.  Once that is set you can refer to the
> custom variable in the IP field instead of the raw IP value.  For more
> information on how to set that up here is a link to the admin guide:
>
> https://nifi.apache.org/docs/nifi-docs/html/administration-
> guide.html#custom_properties
>
> I hope this is helpful.  Please let me know if you have any questions.
>
> Thanks,
>
> Yolanda
>
> On Thu, Nov 17, 2016 at 7:21 AM, balacode63 
> wrote:
>
>> Hi all,
>>
>> My use case is,
>>
>> 1)Process group A is having 10 processors which uses same ip address
>> ex(192.168.1.1) for some processing. ex http post
>> 2) if this ip address is changed, i need to update all the processors.
>> 3) is there any way i can handle this in a configuration file in nifi?
>> ex: this configuration data will be used across different processors
>>
>> Please guide me
>> Thanks,
>> Bala
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> --
>> View this message in context: http://apache-nifi-developer-l
>> ist.39713.n7.nabble.com/nifi-configuration-file-for-process-
>> group-tp13908.html
>> Sent from the Apache NiFi Developer List mailing list archive at
>> Nabble.com.
>>
>
>
>
> --
> --
> yolanda.m.da...@gmail.com
> @YolandaMDavis
>
>


-- 
--
yolanda.m.da...@gmail.com
@YolandaMDavis


Re: nifi configuration file for process group

2016-11-17 Thread Yolanda Davis
Hello Bala,

If you are using a processor that supports expression language for the IP
address property you can configure NiFi to use a variable within a custom
property file that you define.  Once that is set you can refer to the
custom variable in the IP field instead of the raw IP value.  For more
information on how to set that up here is a link to the admin guide:

https://nifi.apache.org/docs/nifi-docs/html/administration-guide.html#custom_properties

I hope this is helpful.  Please let me know if you have any questions.

Thanks,

Yolanda

On Thu, Nov 17, 2016 at 7:21 AM, balacode63 
wrote:

> Hi all,
>
> My use case is,
>
> 1)Process group A is having 10 processors which uses same ip address
> ex(192.168.1.1) for some processing. ex http post
> 2) if this ip address is changed, i need to update all the processors.
> 3) is there any way i can handle this in a configuration file in nifi?
> ex: this configuration data will be used across different processors
>
> Please guide me
> Thanks,
> Bala
>
>
>
>
>
>
>
>
>
> --
> View this message in context: http://apache-nifi-developer-
> list.39713.n7.nabble.com/nifi-configuration-file-for-
> process-group-tp13908.html
> Sent from the Apache NiFi Developer List mailing list archive at
> Nabble.com.
>



-- 
--
yolanda.m.da...@gmail.com
@YolandaMDavis


Re: nifi configuration file for process group

2016-11-17 Thread Joe Percivall
Hello Bala,
Are all your processors HTTP processors? Which processors are you trying to use 
to grab data from the IP? You may want to think about whether you could use 
just a couple processors and update them by hand.
If you have to have many processors that hit the same IP, they all can accept 
source connections and they all allow Expression Language, then you can use 
"GetFile" in order to grab a file with the IP you need. Then ExtractText to put 
the IP address in an attribute and lastly use the attribute with the IP in an 
Expression Language expression. You'll be able to modify the input file 
whenever you need to update the IP and your processors will automatically use 
the new IP.
Joe
- - - - - - Joseph Percivalllinkedin.com/in/Percivalle: joeperciv...@yahoo.com
 

On Thursday, November 17, 2016 8:51 AM, balacode63 
 wrote:
 

 Hi all,

My use case is,

1)Process group A is having 10 processors which uses same ip address
ex(192.168.1.1) for some processing. ex http post
2) if this ip address is changed, i need to update all the processors.
3) is there any way i can handle this in a configuration file in nifi?
    ex: this configuration data will be used across different processors

Please guide me
Thanks,
Bala









--
View this message in context: 
http://apache-nifi-developer-list.39713.n7.nabble.com/nifi-configuration-file-for-process-group-tp13908.html
Sent from the Apache NiFi Developer List mailing list archive at Nabble.com.


   

nifi configuration file for process group

2016-11-17 Thread balacode63
Hi all,

My use case is,

1)Process group A is having 10 processors which uses same ip address
ex(192.168.1.1) for some processing. ex http post
2) if this ip address is changed, i need to update all the processors.
3) is there any way i can handle this in a configuration file in nifi?
ex: this configuration data will be used across different processors

Please guide me
Thanks,
Bala









--
View this message in context: 
http://apache-nifi-developer-list.39713.n7.nabble.com/nifi-configuration-file-for-process-group-tp13908.html
Sent from the Apache NiFi Developer List mailing list archive at Nabble.com.


Re: [DISCUSS] NiFi 1.1.0 release

2016-11-17 Thread Andre
Andy,

Great to see NIFI-3050 implemented and certainly good news that NiFi 1.1.0
is set to include a number of security related improvements.



On Thu, Nov 17, 2016 at 2:38 PM, Andy LoPresto  wrote:

> Just updating this thread that NIFI-3050 [1] and NIFI-3051 [2] have been
> added to my plate for this release. Coordinated with Joe Witt and they
> should both be included.
>
> [1] https://issues.apache.org/jira/browse/NIFI-3050
> [2] https://issues.apache.org/jira/browse/NIFI-3051
>
> Andy LoPresto
> alopre...@apache.org
> *alopresto.apa...@gmail.com *
> PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69
>
> On Nov 16, 2016, at 12:08 PM, Joe Witt  wrote:
>
> Team
>
> There is a thread on apache legal-discuss that might allow for a
> graceperiod of continued usage of the json library.  Am going to keep
> a close eye on this and if VP Legal approves we'll be able to keep the
> twitter processors in which is definitely a good thing.  Will advise
>
> Thanks
> Joe
>
> On Wed, Nov 16, 2016 at 10:37 AM, Bryan Bende  wrote:
>
> I've noticed an issue with the per-instance class loading capability
> introduced in NIFI-2909 where the additional classpath resources can get
> incorrectly removed from the class loader.
>
> I was able to reproduced this with a unit test and have a fix ready. I
> believe this is important and needs to go in for the 1.1 release, going to
> re-open NIFI-2909 and submit a PR shortly.
>
> -Bryan
>
> On Wed, Nov 16, 2016 at 8:11 AM, Matt Gilman 
> wrote:
>
> I have two items that I would like to wrap up prior to creating an RC for
> 1.1.0. NIFI-2949 addresses some UX issues around Remote Process Group port
> configuration. The work is already completed and I will be reviewing it
> this today. Additionally, following recent interest on the mailing list,
> I'd like to knock out NIFI-3020. This will allow an admin to configure a
> strategy for user identity when logging in via LDAP. Specifically, it will
> support usage of the DN (the default and current implementation) as well as
> the username the user logged in as. I should be able to have a PR up for
> this work later today.
>
> Thanks!
>
> Matt
>
> [1] https://issues.apache.org/jira/browse/NIFI-2949
> [2] https://issues.apache.org/jira/browse/NIFI-3020
>
>
> On Tue, Nov 15, 2016 at 8:00 PM, Joe Witt  wrote:
>
> The code is within the twitter4j library itself.  I filed a request to
> twitter4jg.  The most likely case is we will need to submit a PR to them.
> However, I don't see this as something that should delay the release.  We
> can provide instructions for folks wanting to use the processor during
>
> the
>
> time we cannot make it available in a convenient manner.  I will provide
>
> a
>
> meaningful comment about this in release notes and pointers on what folks
> can do in the meantime.
>
> On Nov 15, 2016 7:41 PM, "Andy LoPresto"  wrote:
>
> I understand there was a discussion thread within the NiFi community
>
> for
>
> this as well and I missed responding to that at that time. It just
>
> seems
>
> to
>
> me like JSON processing is necessary for GetTwitter, which is
>
> incredibly
>
> useful for demonstrating NiFi’s ability to read from a high volume
>
> stream
>
> out of the box. With NIFI-3019 (Remove GetTwitter from default build),
>
> is
>
> there any related effort to substitute an acceptable replacement JSON
> library to restore this functionality?
>
> [1] https://issues.apache.org/jira/browse/NIFI-3019
>
> Andy LoPresto
> alopre...@apache.org
> *alopresto.apa...@gmail.com *
> PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69
>
> On Nov 15, 2016, at 4:36 PM, Andy LoPresto 
>
> wrote:
>
>
> I’m working with Bryan Rosander to close out NIFI-3024, NIFI-2655, and
> NIFI-2653. I believe Matt Burgess is working on NIFI-3011 and we
> investigated some alternate TLS config options for the new version of
>
> the
>
> client library.
>
> Is there any alternative to excluding the GetTwitter processor? Using
> Johnzon [1] or the Android re-implementation [2] discussed in the
>
> mailing
>
> list thread?
>
> [1] https://johnzon.apache.org/
> [2] https://developer.android.com/reference/org/json/package-
>
> summary.html
>
>
>
> Andy LoPresto
> alopre...@apache.org
> *alopresto.apa...@gmail.com *
> PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69
>
> On Nov 15, 2016, at 3:58 PM, Joe Witt  wrote:
>
> Team
>
> Very happy to see that we are down to three items remaining tagged to
> 1.1.0.  Solid effort over the recent weeks to close the gap including
>
> work
>
> to get past the now category x Jason dependency we had.  The most
>
> notable
>
> impact from that is the wildly popular GetTwitter processor, the fav
>
> new
>
> nifi user and demo 

Re: Routing to Failure relationships and Route provenance events

2016-11-17 Thread Michal Klempa
Hi,
thank you both for responses. I understand the scenario with rerouting
back to processor would cause infinite provenance history. It can also
cause inifite loop, when the destination system is offline, therefore
I am not using this approach in this case.

Generaly, I have problem identifying the 'last processor which routed
the flowfile to failure before entering failure handling'. And yes, I
was thinking of attaching UpdateAttribute right after each failure
connection I need to handle and distinguish. This would be really
messy. Therefore I was thinking I am doing something wrong in general.

My thoughts were, that when I can identify where the FlowFile escaped
standard execution through failure, I can then just save flowfile
somewhere (e.g. HDFS) with metadata (attributes) and let this for
future inspection and especially -> manually re-entering the flow from
the point of failure. Is this a bad approach ? Or how do you design
flows then? Is it possible to programmatically inspect flowfile to
find a processor which was the last in the chain touching it (even
though this processor did not emit any provenance event at all)? If
so, tell me, I can afford coding my processor to acoomplish this task.

Thanks. Michal.

On Thu, Nov 10, 2016 at 7:50 PM, Andy LoPresto  wrote:
> Michael,
>
> A temporary solution would be to insert an UpdateAttribute processor between
> the source processor (where the failure occurred) and your general failure
> handling flow. This processor could add an attribute noting the location of
> the failure and you could quickly determine that when debugging.
>
> If this seems cumbersome, you could also put a single ExecuteScript
> processor at the beginning of your failure handling flow and query the
> provenance events for the incoming flowfile, detect the last event that
> occurred, and then write out an additional, arbitrary provenance event
> indicating the failure.
>
> Neither are excellent solutions, and Mark is right that there should be a
> better option for diagnosing this. Please submit a Jira capturing your
> thoughts and we’ll see what is possible.
>
>
> Andy LoPresto
> alopre...@apache.org
> alopresto.apa...@gmail.com
> PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69
>
> On Nov 7, 2016, at 6:10 AM, Mark Payne  wrote:
>
> Michal,
>
> Currently, the guidance that we give is for processors not to emit any sort
> of ROUTE event for
> routing a FlowFile to a 'failure' relationship. While this may seem
> counter-intuitive, we do this because
> most of the time when a FlowFile is routed to 'failure', the failure
> relationship is not pointing to some
> sort of 'failure' flow like you describe here but rather the failure
> relationship is a self-loop so that the
> Processor tries again.
>
> In the scenario described above, if PostHTTP were to route a FlowFile to
> failure and failure looped back
> to PostHTTP, we may see that the FlowFile was routed to failure hundreds (or
> more) of times. As a result,
> the Provenance lineage would not really be very easy to follow because it
> would be filled with a huge number
> of 'ROUTE' events.
>
> That being said, there are things that we could do to be smart about this at
> the framework level. For instance,
> we could notice that the ROUTE event indicates that the FlowFile is being
> routed back to the same queue that
> it came from, so we could just discard the ROUTE event.
>
> Unfortunately, this doesn't always solve the problem, because we also often
> see scenarios where there is perhaps
> a DistributeLoad processor that load balances between 5 different PostHTTP
> processors for instance. If a PostHTTP
> fails, it routes back to the DistributeLoad. So we'd need to keep track of
> the fact that it's been to this connection before,
> even though it wasn't the last connection, and so on.
>
> So that was a really long-winded way to say: We intentionally do not emit
> ROUTE events for 'failure' because it can create
> some very complicated, hard-to-follow lineages. But we can - and should - do
> better.
>
> If this is something that you are interested in digging into, in the
> codebase, the community would be more than happy
> to help guide you along the way!
>
> Also, if you have other feedback about how you think we can handle these
> cases better, please feel free to elaborate on
> the thread.
>
> Thanks
> -Mark
>
>
>
> On Nov 7, 2016, at 5:46 AM, Michal Klempa  wrote:
>
> Hi,
> I am maintaining several dataflows and I am facing this issue in practice:
> Lets say, I have several points of possible failure within the
> dataflow (nearly every processor have failure output), I route all of
> these into my general failure handler subgroup, which basically does
> some filtering and formatting before issuing a notification by email.
>
> From my email notifications, I get the FlowFile UUID and in case i am
> curious on what happened, I go into NiFi and