DetectDuplicate Question

2021-05-07 Thread Elli Schwarz
Hello,
The DetectDuplicate processor takes a Cache Entry Identifier as the value that 
is cached, and the Age Off Duration specifies a TTL for that cache entry. What 
happens if that same CacheEntryIdentifier comes in a second time after the 
value is already cached - is the TTL now set at the TTL for the time the second 
entry was encountered, or does it remain at the TTL of the original entry?
It seems from my testing that each time that Cache Entry Identifier is 
encountered, that TTL is reset to the latest entry. In this case, if I have the 
Age Off set to one hour, but I encounter this identifier multiple times an 
hour, the entry will never age off, since it is constantly being updated. I 
suppose I can solve this problem by prepending part of a date (like the date 
plus the hour) so that guarantees that at least once an hour I'll have a new 
entry. But this is an imperfect solution. Is there a way to tell 
DetectDuplicate that if encounters a value already cached, that it should leave 
the original TTL in place?
Thank you,Elli

Re: Potential 1.11.X showstopper

2020-02-06 Thread Elli Schwarz
 We ran that command - it appears the site-to-sites that are causing the issue. 
We had a lot of remote process groups that weren't even being used (no data was 
being sent to that part of the dataflow), yet when running the lsof command 
they each had a large number of open files - almost 2k! - showing CLOSE_WAIT. 
Again, there were no flowfiles being sent to them, so can it be some kind of 
bug that keeping a remote process group open is somehow opening files and not 
closing them? (BTW, the reason we had to upgrade from 1.9.2 to 1.11.0 was 
because we had upgraded our Java version and that cause an 
IllegalBlockingModeException - is it possible that whatever fixed that problem 
is now causing an issue with open files?)

We now disabled all of the unused remote process groups. We still have several 
remote process groups that we are using so if this is the issue it might be 
difficult to avoid, but at least we decreased the number of remote process 
groups we have. Another approach we are trying is a merge content before we 
send to the Nifi having the most issues, to have fewer flow files sent at once 
site to site, and then splitting them after they are received.
Thank you!

On Thursday, February 6, 2020, 2:19:48 PM EST, Mike Thomsen 
 wrote:  
 
 Can you share a description of your flows in terms of average flowfile size, 
queue size, data velocity, etc.?
Thanks,
Mike

On Thu, Feb 6, 2020 at 1:59 PM Elli Schwarz  
wrote:

 We seem to be experiencing the same problems. We recently upgraded several of 
our Nifis from 1.9.2 to 1.11.0, and now many of them are failing with "too many 
open files". Nothing else changed other than the upgrade, and our data volume 
is the same as before. The only solution we've been able to come up with is to 
run a script to check for this condition and restart the Nifi. Any other ideas?
Thank you!

    On Sunday, February 2, 2020, 9:11:34 AM EST, Mike Thomsen 
 wrote:  

 Without further details, this is what I did to see if it was something
other than the usual issue of having not enough file handlers available.
Something like a legitimate case of someone forgetting to close file
objects or something in the code itself.

1. Setup a 8core/32GB VM on AWS w/ Amazon AMI.
2. Pushed 1.11.1RC1
3. Pushed the RAM settings to 6/12GB
4. Disabled flowfile archiving because I only allocated 8GB of storage.
5. Setup a flow that used 2 generateflow instances to generate massive
amounts of garbage data using all available cores. (All queues were setup
to hold 250k flow files)
6. Kicked it off and let it run for probably about 20 minutes.

No apparent problem with closing and releasing resources here.

On Sat, Feb 1, 2020 at 8:00 AM Joe Witt  wrote:

> these are usually very easy to find.
>
> run lsof -p pid.  and share results
>
>
> thanks
>
> On Sat, Feb 1, 2020 at 7:56 AM Mike Thomsen 
> wrote:
>
> >
> >
> https://stackoverflow.com/questions/59991035/nifi-1-11-opening-more-than-50k-files/60017064#60017064
> >
> > No idea if this is valid or not. I asked for clarification to see if
> there
> > might be a specific processor or something that is triggering this.
> >
>
  
  

Re: Potential 1.11.X showstopper

2020-02-06 Thread Elli Schwarz
 We seem to be experiencing the same problems. We recently upgraded several of 
our Nifis from 1.9.2 to 1.11.0, and now many of them are failing with "too many 
open files". Nothing else changed other than the upgrade, and our data volume 
is the same as before. The only solution we've been able to come up with is to 
run a script to check for this condition and restart the Nifi. Any other ideas?
Thank you!

On Sunday, February 2, 2020, 9:11:34 AM EST, Mike Thomsen 
 wrote:  
 
 Without further details, this is what I did to see if it was something
other than the usual issue of having not enough file handlers available.
Something like a legitimate case of someone forgetting to close file
objects or something in the code itself.

1. Setup a 8core/32GB VM on AWS w/ Amazon AMI.
2. Pushed 1.11.1RC1
3. Pushed the RAM settings to 6/12GB
4. Disabled flowfile archiving because I only allocated 8GB of storage.
5. Setup a flow that used 2 generateflow instances to generate massive
amounts of garbage data using all available cores. (All queues were setup
to hold 250k flow files)
6. Kicked it off and let it run for probably about 20 minutes.

No apparent problem with closing and releasing resources here.

On Sat, Feb 1, 2020 at 8:00 AM Joe Witt  wrote:

> these are usually very easy to find.
>
> run lsof -p pid.  and share results
>
>
> thanks
>
> On Sat, Feb 1, 2020 at 7:56 AM Mike Thomsen 
> wrote:
>
> >
> >
> https://stackoverflow.com/questions/59991035/nifi-1-11-opening-more-than-50k-files/60017064#60017064
> >
> > No idea if this is valid or not. I asked for clarification to see if
> there
> > might be a specific processor or something that is triggering this.
> >
>
  

Provenance question

2016-01-26 Thread Elli Schwarz
 I've been reading up on how provenance works (thank you, Thad, for pointing me 
to that video - it was very helpful). It looks like it could definitely help me 
instead of my many PutFiles I've been using in the past. (In fact, the PutFiles 
now clutter the lineage diagram so I want to get rid of them!)

I have a couple of questions:1) Is there a way to see the name of the processor 
on the Provenance graph diagram (show lineage)? I know if I click on show 
details I can see the name of the processor, but I think it might be helpful to 
see it in the graph along with the event that occurred. (An "expand all" button 
might also be helpful, though I suppose that can get out of hand if there are a 
lot of processors that processed a flowfile).
2) I use nifi site-to-site to send some processors to another nifi for special 
processing, and then the flowfile (actually a child flowfile) is sent back to 
the first nifi. It looks like the lineage ends at the "send" event, when the 
flowfile is sent from Nifi 1 to the remote Nifi 2. I would like to be able to 
see when that flowfile is received again by Nifi 1 so I that on Nifi 1, I can 
see the entire lineage - flowfile processed by several processors on Nifi 1, 
sent to Nifi 2, then returned back to Nifi 1 and processed by more processors. 
I know I won't see the lineage that occurred on Nifi 2 from Nifi 1, and that's 
OK - I just want to see that the flowfile came back instead of having to search 
for it.
I am quite impressed by what Nifi provenance can do, and wish I'd known more 
about it sooner. Thank you!-Elli

On Wednesday, January 20, 2016 6:16 PM, Thad Guidry  
wrote:
 
 

 Elli,

Joe gave an excellent talk last year at OSCON 2015.

Shows a bit of the provenance features and searchability of NiFi.

https://www.youtube.com/watch?v=sQCgtCoZyFQ

​Full videos and tutorials are here: https://nifi.apache.org/videos.html


Thad
+ThadGuidry 

 
  

Re: Auto-Organize Layout

2016-01-20 Thread Elli Schwarz
Joe, 

Responses to your bullets: 1) I wasn't aware of the details of how provenance 
works. After reading about it, it seems quite powerful and it may help for some 
cases. I especially like the ability to replay files and track timing of 
events. However, there are some problems I think I've experienced in regards to 
provenance: we have thousands of flowfiles going through daily. We like to keep 
the flowfiles from certain processors for several days, but if we keep them 
from all processors we will quickly run out of disk space. When I first 
upgraded to 0.3.0, it had provenance on by default (I think that was a bug, 
since in 0.2.0 it was off by default) and I quickly and unexpectedly ran out of 
disk space on my production server. Furthermore, there are many cases where we 
store a collection of flowfile content in a directory and we grep looking for 
certain strings to debug something - can you do something similar to a grep on 
content through provenance? Can you select which processors you want to store 
flow files for so not to fill up the disk with flowfile content from processors 
we don't care as much about? If the answer to these questions could be yes, 
then I think provenance would work for the kind of tracking I've done with 
PutFile and LogAttributes.
2) For emphasizing the main parts of a flow, I think the ability to align 
selected processors would help. I would select the main processors, line them 
up horizontally, and maybe it would automatically move other processors above 
or below. Also, maybe I didn't fully explain my processor minimization idea - I 
would like an option on a processor to "show icon only" so instead of a full 
processor box with all the stats and details, it would look something like a 
Processor Group port looks - no stats, just text and some icons indicating 
started/stopped/disabled. (It should still look differently than a port does so 
no one gets confused).
Of course, I look forward to hearing what others have to say about these ideas!
-Elli

On Wednesday, January 20, 2016 4:49 PM, Joe Witt  wrote:
 
 

 Elli,

Ok great feedback.  I'll categorize these in the following ways (if
you disagree please share)

1) How best to handle log/debug sort of flows

One thing to consider is that with our provenance and content archival
it eliminated the need for many uses of
LogAttribute and for PutFile for these cases.  Something to consider.

2) How best to de-emphasize 'special case' handling of flows or parts
of a flow which are necessary but not the primary logic thread

I 'see' the idea but not sure what a good user experience for it would
be.  Anyone have visuals/UX concept in mind for this?

3) Modes of edit / Lock-out

Makes sense.  This has been asked for before.  Basically allow the
user to express that they want to go into an edit mode.  How do others
feel?

Thanks
Joe

On Wed, Jan 20, 2016 at 4:00 PM, Elli Schwarz
 wrote:
> Joe,
> What I'm referring to as far as emphasizing "important" processors is that 
> there are many places in my workflow where I do a PutFile for logging 
> purposes, and I have one for success and a separate one for failure. So for 
> many of my main processors, there are two connectors to PutFiles. This makes 
> the workflow look very cluttered, and when doing a demo it is difficult to 
> see the main path that flowfiles are taking without explanation, when I'd 
> like it to be intuitive. In fact, there are some cases that I really want to 
> add PutFiles for logging but since it will make the workflow look so 
> cluttered I don't do it. Maybe this problem can be solved by an option to 
> route all failures or even successes from some or all processors to do a 
> PutFile to a specific folder? Maybe we can minimize some processor such as 
> these PutFiles to a small icon instead of the whole box to save screen space? 
> I often don't care about processor stats for these processors anyway, but 
> maybe they'd be displayed on mouse hover.
>
> Furthermore, in some of my workflows I have my main tasks that deal with the 
> actual processing of the incoming data, and other "side tasks" that I do with 
> the data like collect special metrics,  storing some metadata in a special 
> database, sending status emails, etc. I handle this now with Processor 
> Groups, and that helps, but I find it a bit unwieldy to create many processor 
> groups that only have two or three processors in them (and then the processor 
> group needs the input/output ports, further complicated an otherwise simple 
> workflow). There are also cases where for some failures, I have a ControlRate 
> processor and then retry the flowfile after a certain period of time - this 
> is not a "main" part of the workflow, but it clutters it and it's not 
> intuitive to see what&

Re: Auto-Organize Layout

2016-01-20 Thread Elli Schwarz
Joe,
What I'm referring to as far as emphasizing "important" processors is that 
there are many places in my workflow where I do a PutFile for logging purposes, 
and I have one for success and a separate one for failure. So for many of my 
main processors, there are two connectors to PutFiles. This makes the workflow 
look very cluttered, and when doing a demo it is difficult to see the main path 
that flowfiles are taking without explanation, when I'd like it to be 
intuitive. In fact, there are some cases that I really want to add PutFiles for 
logging but since it will make the workflow look so cluttered I don't do it. 
Maybe this problem can be solved by an option to route all failures or even 
successes from some or all processors to do a PutFile to a specific folder? 
Maybe we can minimize some processor such as these PutFiles to a small icon 
instead of the whole box to save screen space? I often don't care about 
processor stats for these processors anyway, but maybe they'd be displayed on 
mouse hover.

Furthermore, in some of my workflows I have my main tasks that deal with the 
actual processing of the incoming data, and other "side tasks" that I do with 
the data like collect special metrics,  storing some metadata in a special 
database, sending status emails, etc. I handle this now with Processor Groups, 
and that helps, but I find it a bit unwieldy to create many processor groups 
that only have two or three processors in them (and then the processor group 
needs the input/output ports, further complicated an otherwise simple 
workflow). There are also cases where for some failures, I have a ControlRate 
processor and then retry the flowfile after a certain period of time - this is 
not a "main" part of the workflow, but it clutters it and it's not intuitive to 
see what's happening. I think I'd like to solve this problem by just being able 
to align selected processors that I consider more important, and having the 
others off to a side.

As far as accidentally dragging processors is concerned, sometimes I intend to 
the pan screen and end up moving a processor or moving a label that I used to 
highlight a section of my flow. For demos, it would be nice to lock the entire 
page layout. Maybe this can be accomplished with a button on the top right 
corner to enable/disable layout changes or disable adding new processors, and 
only allow viewing processor properties - a sort of "read only" mode.
Thanks for taking my feedback into consideration. I still find Nifi incredibly 
useful for handling my complicated workflows and appreciate your work in 
developing it!
-Elli



 

On Tuesday, January 19, 2016 5:03 PM, Joe Witt  wrote:
 
 

 Elli

"it is sometimes too easy to mis-align processors by dragging them accidentally"
  Great point  I must admit I too do that.  Often in really important
demos.  I've gotten good at making jokes about it.  Probably should
have gotten good at submitting a JIRA :-)

I'd like to understand more about your other idea for emphasizing
processors which are more important.  I can understand the idea I
think but I'm worried about how we could make the user experience
worth the effort for the person signaling the emphasis to be of use
for the people consuming that detail.

Thanks
Joe

On Tue, Jan 19, 2016 at 4:46 PM, Matt Burgess  wrote:
> +1 for "snap to grid" feature
>
> Sent from my iPhone
>
>> On Jan 19, 2016, at 4:20 PM, dan bress  wrote:
>>
>> Maybe not exactly "auto-layout" but I would back a notion of having the
>> components snap to a coarser grain grid than what we currently have.
>> Sometimes I care a lot about having everything line up in the graph
>> horizontally and vertically, and it always takes a long time to achieve
>> this.
>>
>> I could see this being achieved by snapping the component to the same spot
>> horizontally as the component above it when you move it underneath another
>> component.  Or some magical "auto snap" button that does its best to align
>> everything with its nearest neighbors.
>>
>>> On Tue, Jan 19, 2016 at 12:37 PM Ryan H  wrote:
>>>
>>> I like your idea Rob, that would help with lining up relationships too
>>> (straight lines).
>>>
>>> On Matt's note, I don't think there should be a "standard" either, although
>>> best practices are always out there.
>>>
>>> On Matt's note of putting failures up above processes, we do that too.
>>> Totally depends on who made the flow first.  Sometimes, people don't even
>>> follow a convention in the same flow.xml file.
>>>
>>> For these reasons, I'd recommend alternate views to the flow.
>>>
>>> We have a couple projects that just allow you to rearrange a node-based
>>> graph, based on your preference, hierarchy, circular, pyramid, etc.
>>>
>>> Applying this to NiFi, having a couple different default auto-layout
>>> options that you can swap your current view to, but NOT change the original
>>> flow, would be nice.
>>>
>>> It would let you walk into someone else's, potentially large, datafl

Re: Auto-Organize Layout

2016-01-19 Thread Elli Schwarz
I also think that some way to help layout the flow would be very useful. One of 
the hardest parts of creating an easy to read workflow is the designing a good 
layout, and in many cases I simply want to be able to select a few processors 
and line them up (horizontally or vertically). Also, it is sometimes too easy 
to mis-align processors by dragging them accidentally. It would be nice if 
there would be a way to "lock" the layout of a group of selected processors so 
they can't be accidentally moved. Also, the ability to "undo" a layout change 
or save a layout (ie, just save the position of a group of selected processors) 
so that we can revert to it would also be helpful.

Another thing that could be helpful is a way to emphasize the more important 
processors, ie, the ones integral to the flow, as opposed to the many "PutFile" 
processors I have that are simply for logging (although I realize that 
sometimes a PutFile can be an integral part of a flow). I know I can select a 
group of processors and change the color, and also create a label to highlight 
a certain group of processors, but it might be nice to be able to make selected 
processors appear larger on the screen than others, to emphasize their 
importance. It is sometimes hard to follow the trail of a flow simply because 
there are many connections, so a way to highlight the "main" path might be 
useful.

-Elli


On Tuesday, January 19, 2016 4:20 PM, dan bress  wrote:
 
 

 Maybe not exactly "auto-layout" but I would back a notion of having the
components snap to a coarser grain grid than what we currently have.
Sometimes I care a lot about having everything line up in the graph
horizontally and vertically, and it always takes a long time to achieve
this.

I could see this being achieved by snapping the component to the same spot
horizontally as the component above it when you move it underneath another
component.  Or some magical "auto snap" button that does its best to align
everything with its nearest neighbors.

On Tue, Jan 19, 2016 at 12:37 PM Ryan H  wrote:

> I like your idea Rob, that would help with lining up relationships too
> (straight lines).
>
> On Matt's note, I don't think there should be a "standard" either, although
> best practices are always out there.
>
> On Matt's note of putting failures up above processes, we do that too.
> Totally depends on who made the flow first.  Sometimes, people don't even
> follow a convention in the same flow.xml file.
>
> For these reasons, I'd recommend alternate views to the flow.
>
> We have a couple projects that just allow you to rearrange a node-based
> graph, based on your preference, hierarchy, circular, pyramid, etc.
>
> Applying this to NiFi, having a couple different default auto-layout
> options that you can swap your current view to, but NOT change the original
> flow, would be nice.
>
> It would let you walk into someone else's, potentially large, dataflow and
> have a familiar way to view the flow.
>
> Ryan
>
>
> On Tue, Jan 19, 2016 at 2:03 PM, Rob Moran  wrote:
>
> > I agree with Matt's points. I was just replying with something similar
> > basically saying I think trying to set a standard would not be
> > well-received.
> >
> > I believe what could be more useful are layout tools that would help
> users
> > place components to help achieve their preferred layouts. For example,
> the
> > ability to align (left, right, center) components
> > or horizontally/vertically distribute components evenly. Other features
> > such as snap-to and/or smart-guides could make it easier for users to
> > follow their organization's best practices when designing a flow.
> >
> > Rob
> >
> > On Tue, Jan 19, 2016 at 1:49 PM, Matthew Clarke <
> matt.clarke@gmail.com
> > >
> > wrote:
> >
> > > Ryan,
> > >
> > >          Setting a standard is a difficult thing to do.  The
> complexity
> > > that can exist in many flows would make enforcing a standard difficult.
> > The
> > > first example you provide of success to points right while failures
> point
> > > up is not recommended. It would be better to have failures point down
> > since
> > > it is common to put labels over processor(s). Any relationships
> pointing
> > up
> > > would pass through these labels making both the relationship box and
> the
> > > label hard to read.  It is often coomon to see flows designed with a
> > > combination of left to right and top to bottom design.
> > >
> > > Matt
> > >
> > > On Tue, Jan 19, 2016 at 12:07 PM, Ryan H 
> > > wrote:
> > >
> > > > Hi Rob,
> > > >    Yea we did, it was at the end of the meeting.
> > > >
> > > >    I think it would be useful to have a couple default type views
> that
> > > > help standardize flow layout across the community.
> > > >
> > > >    For example, when we organize processors left-to-right, failure
> > > > relationships always point up, and success always point right.
> > > >    Alternatively, when we organize processors up-and-down, failure
> > > > relationships alway

Re: Content Repo Large.. Archive in there?

2015-10-23 Thread Elli Schwarz
We had max storage size of 1GB, but that's for provenance repo and our problem 
was with content_repo. Our disk was 60GB, all on one partition, and 55GB were 
taken up by content_repo. Now, it only contains 233MB.
 


 On Friday, October 23, 2015 2:50 PM, Mark Payne  
wrote:
   
 

 OK, so this is interesting. Do you have your content repository and provenance 
repository
both pointing to the same partition? What do you have the 
"nifi.provenance.repository.max.storage.size"
property set to? How large is the actual disk?

Thanks
-Mark


> On Oct 23, 2015, at 2:45 PM, Ryan H  wrote:
> 
> I've got this one... let me look for that
> 
> 2015-10-23 09:00:33,625 WARN [Provenance Maintenance Thread-1]
> o.a.n.p.PersistentProvenanceRepository
> java.io.IOException: No space left on device
>        at java.io.FileOutputStream.writeBytes(Native Method) ~[na:1.8.0_51]
>        at java.io.FileOutputStream.write(FileOutputStream.java:326)
> ~[na:1.8.0_51]
>        at
> org.apache.lucene.store.FSDirectory$FSIndexOutput$1.write(FSDirectory.java:390)
> ~[lucene-core-4.10.4.jar:4.10.4 1662817 - mike - 2015-02-27 16:38:43]
>        at
> java.util.zip.CheckedOutputStream.write(CheckedOutputStream.java:73)
> ~[na:1.8.0_51]
>        at
> java.io.BufferedOutputStream.write(BufferedOutputStream.java:122)
> ~[na:1.8.0_51]
>        at
> org.apache.lucene.store.OutputStreamIndexOutput.writeBytes(OutputStreamIndexOutput.java:51)
> ~[lucene-core-4.10.4.jar:4.10.4 1662817 - mike - 2015-02-27 16:38:43]
>        at
> org.apache.lucene.store.DataOutput.writeBytes(DataOutput.java:53)
> ~[lucene-core-4.10.4.jar:4.10.4 1662817 - mike - 2015-02-27 16:38:43]
>        at
> org.apache.lucene.codecs.lucene40.BitVector.writeBits(BitVector.java:272)
> ~[lucene-core-4.10.4.jar:4.10.4 1662817 - mike - 2015-02-27 16:38:43]
>        at
> org.apache.lucene.codecs.lucene40.BitVector.write(BitVector.java:227)
> ~[lucene-core-4.10.4.jar:4.10.4 1662817 - mike - 2015-02-27 16:38:43]
>        at
> org.apache.lucene.codecs.lucene40.Lucene40LiveDocsFormat.writeLiveDocs(Lucene40LiveDocsFormat.java:107)
> ~[lucene-core-4.10.4.jar:4.10.4 1662817 - mike - 2015-02-27 16:38:43]
>        at
> org.apache.lucene.index.ReadersAndUpdates.writeLiveDocs(ReadersAndUpdates.java:326)
> ~[lucene-core-4.10.4.jar:4.10.4 1662817 - mike - 2015-02-27 16:38:43]
>        at
> org.apache.lucene.index.IndexWriter$ReaderPool.release(IndexWriter.java:520)
> ~[lucene-core-4.10.4.jar:4.10.4 1662817 - mike - 2015-02-27 16:38:43]
>        at
> org.apache.lucene.index.IndexWriter$ReaderPool.release(IndexWriter.java:505)
> ~[lucene-core-4.10.4.jar:4.10.4 1662817 - mike - 2015-02-27 16:38:43]
>        at
> org.apache.lucene.index.BufferedUpdatesStream.applyDeletesAndUpdates(BufferedUpdatesStream.java:299)
> ~[lucene-core-4.10.4.jar:4.10.4 1662817 - mike - 2015-02-27 16:38:43]
>        at
> org.apache.lucene.index.IndexWriter.applyAllDeletesAndUpdates(IndexWriter.java:3312)
> ~[lucene-core-4.10.4.jar:4.10.4 1662817 - mike - 2015-02-27 16:38:43]
>        at
> org.apache.lucene.index.IndexWriter.maybeApplyDeletes(IndexWriter.java:3303)
> ~[lucene-core-4.10.4.jar:4.10.4 1662817 - mike - 2015-02-27 16:38:43]
>        at
> org.apache.lucene.index.IndexWriter.prepareCommitInternal(IndexWriter.java:2989)
> ~[lucene-core-4.10.4.jar:4.10.4 1662817 - mike - 2015-02-27 16:38:43]
>        at
> org.apache.lucene.index.IndexWriter.commitInternal(IndexWriter.java:3134)
> ~[lucene-core-4.10.4.jar:4.10.4 1662817 - mike - 2015-02-27 16:38:43]
>        at
> org.apache.lucene.index.IndexWriter.commit(IndexWriter.java:3101)
> ~[lucene-core-4.10.4.jar:4.10.4 1662817 - mike - 2015-02-27 16:38:43]
>        at
> org.apache.nifi.provenance.lucene.DeleteIndexAction.execute(DeleteIndexAction.java:66)
> ~[nifi-persistent-provenance-repository-0.3.0.jar:0.3.0]
>        at
> org.apache.nifi.provenance.PersistentProvenanceRepository.purgeOldEvents(PersistentProvenanceRepository.java:906)
> ~[nifi-persistent-provenance-repository-0.3.0.jar:0.3.0]
>        at
> org.apache.nifi.provenance.PersistentProvenanceRepository$2.run(PersistentProvenanceRepository.java:260)
> [nifi-persistent-provenance-repository-0.3.0.jar:0.3.0]
> 
> On Fri, Oct 23, 2015 at 2:44 PM, Mark Payne  wrote:
> 
>> Ryan, Elli,
>> 
>> Do you by chance have any error messages in your logs from the
>> FileSystemRepository?
>> 
>> I.e., if you perform:
>> 
>> grep FileSystemRepository logs/*
>> 
>> Do you get anything interesting in there?
>> 
>> Thanks
>> -Mark
>> 
>> 
>>> On Oct 23, 2015, at 2:38 PM, Elli Schwarz
>>  wrote:
>>> 
>>> I've been working with Ryan. There appear to be a few issues here

Re: Content Repo Large.. Archive in there?

2015-10-23 Thread Elli Schwarz
I've been working with Ryan. There appear to be a few issues here:
   
   - We upgraded from 0.2.0 to 0.3.0 and it appears that content_repository 
archive is now true by default. In 0.2.0 it was false, and the documentation 
still states it is false by default.
   - When we ran out of disk space overnight, the problem was solved by me 
simply restarting nifi, and that cleared out the archive by itself.   

   - In order to clear up the archive, I had to set archive to true, and set 
max usage to 1%, and restart nifi. That cleared it up, and then I set archive 
to false and restarted again so we don't run out of space.
   - Based on the above, it appears that something happened yesterday that 
prevented Nifi from clearing out the archive even though disk usage reached 
100%. However, restarting nifi apparently enabled it to perform the clearing of 
the archive. So apparently the max usage setting doesn't work under some 
conditions, but we don't know what conditions occurred overnight to cause this 
problem.

Thanks!-Elli
 


 On Friday, October 23, 2015 2:29 PM, Ryan H  
wrote:
   
 

 Agree, they concern the archive... although it sounds like there are 2
archives?

Within the content_repository folder, there are subfolders with the name
'archive' and files inside them.

Example:
./nfii/content_repository/837/archive/1445611320767-837

Settings:
nifi.content.repository.archive.max.retention.period=12 hours
nifi.content.repository.archive.max.usage.percentage=50%
nifi.content.repository.archive.enabled=true

Last night, our server ran out of disk space because the content_repository
grew too large.  Nifi didn't crash, but the log file contained errors
saying the disk was full.

We're not sure how, but the content_repository did not respect the above
settings.

We restarted Nifi, and it only then started to remove files, such as:
./nfii/content_repository/837/archive/1445611320767-837

We've turned off archiving for now.

Ryan




On Fri, Oct 23, 2015 at 1:51 PM, Aldrin Piri  wrote:

> Ryan,
>
> Those items only concern the archive.  Did you have data enqueued in
> connections in your flow?  If so, these items are not eligible and could
> explain why your disk was filled.  Otherwise, can you please provide some
> additional information so we can dig into why this may have arisen.
>
> Thanks!
>
> On Fri, Oct 23, 2015 at 10:25 AM, Ryan H 
> wrote:
>
> > I've got the following set:
> >
> > nifi.content.repository.archive.max.retention.period=12 hours
> > nifi.content.repository.archive.max.usage.percentage=50%
> > nifi.content.repository.archive.enabled=true
> >
> > Yet, the content repo filled my disk last night...
> >
> >
> > On Fri, Oct 23, 2015 at 1:16 PM, Aldrin Piri 
> wrote:
> >
> > > Ryan,
> > >
> > > Those archive folders map to the
> nifi.content.repository.archive.enabled
> > > property.
> > >
> > > What this property provides is a retention of files no longer in the
> > system
> > > for historical context of your flow's processing and the ability for
> > > viewing this in conjunction with provenance events as well as allowing
> > > replay.  The amount of the archive when enabled is bounded by the
> > > properties nifi.content.repository.archive.max.retention.period and
> > > nifi.content.repository.archive.max.usage.percentage.
> > >
> > > Additional detail is available in the system properties of our
> > > Administration Guide [1]
> > >
> > > Let us know if you have additional questions.
> > >
> > > --aldrin
> > >
> > > [1]
> > >
> > >
> >
> https://nifi.apache.org/docs/nifi-docs/html/administration-guide.html#system_properties
> > >
> > > On Fri, Oct 23, 2015 at 10:09 AM, Ryan H 
> > > wrote:
> > >
> > > > Interesting.. So what would
> > > >
> > > > ./nfii/content_repository/837/archive/1445611320767-837
> > > >
> > > > typically be?
> > > >
> > > > On Fri, Oct 23, 2015 at 12:56 PM, Andrew Grande <
> > agra...@hortonworks.com
> > > >
> > > > wrote:
> > > >
> > > > > Attachments don't go through, view at imagebin:
> > > > > http://ibin.co/2K3SwR0z8yWX
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > On 10/23/15, 12:52 PM, "Andrew Grande" 
> > > wrote:
> > > > >
> > > > > >Ryan,
> > > > > >
> > > > > >./conf/archive is to create a snapshot of your entire flow, not
> the
> > > > > content repository data. See the attached screenshot (Settings menu
> > on
> > > > the
> > > > > right).
> > > > > >
> > > > > >Andrew
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >On 10/23/15, 12:47 PM, "ryan.andrew.hendrick...@gmail.com on
> behalf
> > > of
> > > > > Ryan H"  > > > > rhendrickson.w...@gmail.com> wrote:
> > > > > >
> > > > > >>Hi,
> > > > > >>  I'm noticing my Content Repo growing large.  There's a number
> of
> > > > > files...
> > > > > >>
> > > > > >>content_repo/837/archive/144...-837
> > > > > >>
> > > > > >>  Is this new in 3.0?  My conf file says any archiving should be
> > > going
> > > > > >>into ./conf/archive, but i don't see anything in there.
> > > > > >>
> > > > > >>Thanks,