How to lock getfile upto putfile write into same file?

2017-05-15 Thread prabhu Mahendran
I have scheduled getfile processor to 0 sec to track the local folder.

Issue I have faced: PutFile is appending few flowfiles into single file.
Getfile has been configured with keepsourcefile as false. So getfile is
fetching partial content before putfile writes into local location.

How to handle this issue? Can we lock the file till putfile/custom
processor completely writes the file and remove lock once completed??


How to hold on fetch file upto putfile to be complete?

2017-05-15 Thread prabhu Mahendran
Am using FetchFile processor after PutFile. Since I am using Putfile with
'append' as conflict resolution strategy to merge similar lines to
particular file.

By overall, nearly 200 success status sent to fetchfile processor for only
5 completely appended file. This leads to fetch file with deletefile option
leads to file not found for remaining 195 success status.

I want to send success only once for particular filename after grouping
similar flowfiles. Since mergecontent is not logically works, any other
option available to do this option in nifi??


Re: Queue incoherent state

2017-05-15 Thread Matt Gilman
Sorry for the delayed response. Similar behavior has been reported by some
other users [1]. Does the connection have any back pressure threshold
configured? Can new flowfiles be enqueued? Do the expiration settings have
any affect?

Lastly, if you restart the cluster does it claim the connection still has
flowfiles enqueued?

Matt

[1] https://issues.apache.org/jira/browse/NIFI-3897

On Fri, May 12, 2017 at 5:47 AM, Arnaud G  wrote:

> Hi again!
>
> I currently have  another issue with incoherent queue status.
>
> Following the upgrade to 1.2 of a cluster, I have a couple of queues that
> display through the GUI a high number of flowfiles.
>
> As the queue were no emptying despite tuning, I tried to list the content
> of the queue. This action returns that the queue contains no flowfile,
> which is not the expected as the GUI displays another value.
>
> If I try to empty the queue, I receive a message: 0 FlowFiles (0 bytes)
> out of 210'000 (92.71MB) were removed from the queue.
>
> And of course I cannot delete the queue as this action reports me that the
> queue is not empty.
>
> So somehow it seems that the queue are empty but that the current display
> of the queue don't reflect it (it is very likely that some data were lost
> during the upgrade procedure as we had to reboot a few node to change the
> heap property)
>
> What will be the best method to restore a proper state and be able to edit
> the flow file again?
>
> Thank you!
>
> Arnaud
>
>
>


Re: Reliable operation use case

2017-05-15 Thread Pierre Villard
Hi Andy,

Regarding bulletins, you can use the new SiteToSiteBulletinReportingTask
[1] to achieve what you are expecting (in NiFi 1.2.0). For provenance, you
can definitely use the REST API but could also be interested by the
SiteToSiteProvenanceReportingTask [2] with the addition of NIFI-3859 [3].
You may also be interested by this article [4].

>From a high-level perspective all what you described is perfectly doable.
Obviously, it'd be necessary to go into the details of what are the
monitoring tools you want to integrate with but, again, NiFi sounds totally
suitable to your use case.

[1]
https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi/nifi-site-to-site-reporting-nar/1.2.0/org.apache.nifi.reporting.SiteToSiteBulletinReportingTask/index.html
[2]
https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi/nifi-site-to-site-reporting-nar/1.2.0/org.apache.nifi.reporting.SiteToSiteProvenanceReportingTask/index.html
[3] https://issues.apache.org/jira/browse/NIFI-3859
[4]
https://pierrevillard.com/2017/05/13/monitoring-nifi-site2site-reporting-tasks/

Pierre



2017-05-15 19:54 GMT+02:00 Andy Yar :

> Hello,
> I'd like to know if NiFi is suitable for my slightly unusual use case.
>
> Basically, the system should route files/events/etc. from various
> sources, validate, process them and route somewhere. However, I need
> to do this in a reliable way. This means, the system needs to have
> extensive monitoring/reprocessing capabilities. Various errors can
> occur in the system - an outbound destination gets unavailable, a
> malformed event doesn't match schema, an exception occurred during
> processing, etc. In my case, the system has to allow discovering +
> fixing + reprocessing of any failed event.
>
> I've figured that it is possible to route every processor's "failure"
> state to a dead-letter-queue, fire up a notification (Slack, API call,
> etc.), reroute the DLQ to the input again and so on. It seems to be a
> bit complicated with adding a few retry steps before sending to the
> DLQ everywhere but OK.
>
> However, I'm a bit worried about the monitoring side. For instance, is
> NiFi, in its current state, able/intended to persist Bulletin Board
> messages? I can imagine that after a operator sees an error on the
> Bulletin Board he/she traces the flow file, fixes the issue and marks
> the incident as solved. Currently the BB messages seems to be a bit
> temporary. I guess the intended way is to create a custom Reporting
> task in order to handle this.
>
> The Data Provenance itself is great, but still I can't imagine it
> being usable without a major customization. In my case I would like to
> monitor mostly a set of "inbound oriented" processors. So it would be
> possible to list eg. all current events on inbound processors in a
> (pseudo) realtime way. Again, I guess custom API/tasks could handle
> this.
>
> To summarize: is NiFi usable for this particular use case?
>
> Thanks for any answers and also for your effort with this project!
>
> A.
>


Reliable operation use case

2017-05-15 Thread Andy Yar
Hello,
I'd like to know if NiFi is suitable for my slightly unusual use case.

Basically, the system should route files/events/etc. from various
sources, validate, process them and route somewhere. However, I need
to do this in a reliable way. This means, the system needs to have
extensive monitoring/reprocessing capabilities. Various errors can
occur in the system - an outbound destination gets unavailable, a
malformed event doesn't match schema, an exception occurred during
processing, etc. In my case, the system has to allow discovering +
fixing + reprocessing of any failed event.

I've figured that it is possible to route every processor's "failure"
state to a dead-letter-queue, fire up a notification (Slack, API call,
etc.), reroute the DLQ to the input again and so on. It seems to be a
bit complicated with adding a few retry steps before sending to the
DLQ everywhere but OK.

However, I'm a bit worried about the monitoring side. For instance, is
NiFi, in its current state, able/intended to persist Bulletin Board
messages? I can imagine that after a operator sees an error on the
Bulletin Board he/she traces the flow file, fixes the issue and marks
the incident as solved. Currently the BB messages seems to be a bit
temporary. I guess the intended way is to create a custom Reporting
task in order to handle this.

The Data Provenance itself is great, but still I can't imagine it
being usable without a major customization. In my case I would like to
monitor mostly a set of "inbound oriented" processors. So it would be
possible to list eg. all current events on inbound processors in a
(pseudo) realtime way. Again, I guess custom API/tasks could handle
this.

To summarize: is NiFi usable for this particular use case?

Thanks for any answers and also for your effort with this project!

A.


Re: Storage buffer separation in a multi-tenant cluster.

2017-05-15 Thread Joe Witt
Kris,

Now that five months have passed on this thread I suppose it is time
to reply :-)  Sorry about the delay

To the basis of the question we do not provide in Apache NiFi today
any built-in or integrated quota management for the various tenants of
a given system.  The multi-tenancy today is largely oriented around
isolation from an authorization/entitlements perspective.  Thus far
we've not built in anything to restrict a given portion of the flow on
CPU, memory, disk, network usage though it does seem like we're
trending toward such controls being useful or at the very least we can
better describe how to effectively accomplish that and close gaps
where necessary.

When we're talking about buffer/storage separation is the concern
about isolating usage for live flow, usage for retained objects (for
replay, click to content), or is it perhaps a security isolation
consideration?  In your idea would the isolation be made available at
a process group level?

Certainly something we can have a good discussion on for what makes
sense to introduce.

Thanks
Joe

On Thu, Jan 19, 2017 at 1:33 PM, Kristopher Kane  wrote:
> I'm thinking that NiFi uses in the wild are project oriented with regard to
> resources and not presented as an enterprise platform where the multi-tenant
> risks are of concern.
>
> On Thu, Jan 12, 2017 at 3:28 PM, Kristopher Kane 
> wrote:
>>
>> I work with a medium sized Storm cluster that is used by many tenants.  As
>> the admins of the Storm cluster we must be mindful of network and CPU IO and
>> adjust, manually, based on usage.  Many of these Storm uses would be a
>> better fit with NiFi's inbuilt capabilities and ease of use whilst leaving
>> the high throughput work in Storm.  Storm works really well out of the box
>> with many (dozens) of separate users across hundreds of topologies. We
>> simply add more nodes and don't have to worry much about load and users
>> walking over each other since our failure replay is from Kafka always.
>>
>> What isn't obvious to me is how local buffer storage is handled in a
>> multi-tenant NiFi cluster and am wondering if others have patterns out there
>> to prevent a NiFi user from eating up available disk thus downing other
>> user's workflows.
>>
>> My initial thought is a management layer outside of NiFi that invokes
>> Linux FS quotas by user account.  Does NiFi have anything built in for this
>> type of preventive measure?
>>
>> Thanks,
>>
>> Kris
>
>


Re: NiFi relaunching every few minutes

2017-05-15 Thread Janaki Joshi
Hi,

Thank you. This seems to have solved the problem. I am still wondering
though if this would eventually lead to memory leaks? Would it still use
another default garbage collector?

Thanks in advance.

Regards,
Janaki

On Fri, May 5, 2017 at 8:55 PM, Mark Payne  wrote:

> Janaki,
>
> In addition to looking at the nifi-bootstrap.log file, I would check the
> $NIFI_HOME directory
> and see if there are any files that have a name like hs_err_pid_.log.
> These indicate that
> the entire JVM suddenly crashed, generally due to a bug in Java. We have
> seen this sometimes
> when using the G1 Garbage Collector, which is currently the default
> Garbage Collector. If you are
> seeing these types of files, I would recommend commenting out the
> following line in $NIFI_HOME/conf/bootstrap.conf:
>
> java.arg.13=-XX:+UseG1GC
>
> (This is around line 49 or so in the file).
>
> I hope this helps!
>
> Thanks
> -Mark
>
> On May 5, 2017, at 7:41 AM, Joe Witt  wrote:
>
> Is there anything interesting in the nifi-bootstrap.log?
>
> On May 5, 2017 6:52 AM, "Janaki Joshi"  wrote:
>
>> Hello all,
>>
>> I've been using NiFi for a while now. Since last evening I've started
>> facing an issue that I am unable to get to the root of. My NiFi (Deployed
>> on an EC2 instance running Ubuntu) keeps relaunching every 3-4 minutes.
>> Below you can find a snippet of my app.log file. I have highlighted the
>> area where NiFi suddenly restarts without warning (the portion between the
>> dashed lines). One change which could be a possible reason is the JVM heap
>> size. I increased it from 4g to 5g yesterday (I have an 8GB RAM.) Any help
>> would be appreciated. Thanks in advance.
>>
>> Regards,
>> Janaki
>>
>> nifi-app.log:
>>
>> 2017-05-05 07:01:47,537 INFO [Write-Ahead Local State Provider
>> Maintenance] org.wali.MinimalLockingWriteAheadLog
>> org.wali.MinimalLockingWriteAheadLog@28ffa527 checkpointed with 17
>> Records and 0 Swap Files in 5 milliseconds (Stop-the-world time = 1
>> milliseconds, Clear Edit Logs time = 1 millis), max Transaction ID 50
>>
>> 2017-05-05 07:01:51,028 INFO [pool-8-thread-1]
>> o.a.n.c.r.WriteAheadFlowFileRepository Initiating checkpoint of FlowFile
>> Repository
>>
>> 2017-05-05 07:01:51,093 INFO [pool-8-thread-1]
>> org.wali.MinimalLockingWriteAheadLog org.wali.MinimalLockingWriteAh
>> eadLog@77614e7c checkpointed with 0 Records and 0 Swap Files in 65
>> milliseconds (Stop-the-world time = 31 milliseconds, Clear Edit Logs time =
>> 26 millis), max Transaction ID 7732406
>>
>> 2017-05-05 07:01:51,094 INFO [pool-8-thread-1]
>> o.a.n.c.r.WriteAheadFlowFileRepository Successfully checkpointed
>> FlowFile Repository with 0 records in 65 milliseconds
>>
>> 2017-05-05 07:02:02,743 INFO [Flow Service Tasks Thread-1]
>> o.a.nifi.controller.StandardFlowService Saved flow controller
>> org.apache.nifi.controller.FlowController@64c45102 // Another save
>> pending = false
>>
>> 2017-05-05 07:02:49,899 INFO [Flow Service Tasks Thread-1]
>> o.a.nifi.controller.StandardFlowService Saved flow controller
>> org.apache.nifi.controller.FlowController@64c45102 // Another save
>> pending = false
>>
>> 2017-05-05 07:03:48,302 INFO [Write-Ahead Local State Provider
>> Maintenance] org.wali.MinimalLockingWriteAheadLog
>> org.wali.MinimalLockingWriteAheadLog@28ffa527 checkpointed with 17
>> Records and 0 Swap Files in 20 milliseconds (Stop-the-world time = 0
>> milliseconds, Clear Edit Logs time = 0 millis), max Transaction ID 50
>>
>> 2017-05-05 07:03:51,094 INFO [pool-8-thread-1]
>> o.a.n.c.r.WriteAheadFlowFileRepository Initiating checkpoint of FlowFile
>> Repository
>>
>> 2017-05-05 07:03:51,748 INFO [pool-8-thread-1]
>> org.wali.MinimalLockingWriteAheadLog org.wali.MinimalLockingWriteAh
>> eadLog@77614e7c checkpointed with 0 Records and 0 Swap Files in 653
>> milliseconds (Stop-the-world time = 108 milliseconds, Clear Edit Logs time
>> = 536 millis), max Transaction ID 7732406
>>
>>
>> ---
>> 2017-05-05 07:03:51,748 INFO [pool-8-thread-1]
>> o.a.n.c.r.WriteAheadFlowFileRepository Successfully checkpointed
>> FlowFile Repository with 0 records in 653 milliseconds
>>
>> 2017-05-05 07:04:14,113 INFO [main] org.apache.nifi.NiFi Launching NiFi...
>> ---
>>
>>
>> 2017-05-05 07:04:14,587 INFO [main] o.a.nifi.properties.NiFiPropertiesLoader
>> Determined default nifi.properties path to be '/home/ubuntu/nifi-1.1.2/./con
>> f/nifi.properties'
>> 2017-05-05 07:04:14,592 INFO [main] o.a.nifi.properties.NiFiPropertiesLoader
>> Loaded 121 properties from /home/ubuntu/nifi-1.1.2/./conf/nifi.properties
>>
>> 2017-05-05 07:04:14,602 INFO [main] org.apache.nifi.NiFi Loaded 121
>> properties
>>
>> 2017-05-05 07:04:14,610 INFO [main] org.apache.nifi.BootstrapListener
>> Started Bootstrap Listener, Listening for incoming requests on port 42755
>>
>> 2017-05-05 07:04:14,628 INFO [main] org.apache.nifi.BootstrapListener
>>