Hey Mark - Yeah, agreed. I'm moving some of the 15+ day old files out just because this is kind of an emergency. Yeah, that's not exactly "normal", but I have a new pipeline that batches up errors and the FlowFile is basically 0-bytes with attribute information regarding the error so they can be retried at a later time.
There's definitely some very large files in that content repo, the sizes vary from KB to several GB. $ find . -size +1G | wc -l 163 On Wed, Jun 15, 2016 at 5:13 PM, Mark Payne <marka...@hotmail.com> wrote: > Deleting the old files could certainly cause some problems. > > The weird thing is that it shows that you have 10,000+ FlowFiles, each of > which is 0 bytes. > Is that normal for your flow? > > Could you try running the following against your content repo: > > find . -size +1M > > find . | wc -l > > Curious how many files there are and how many are "large" files. > > > > > On Jun 15, 2016, at 5:02 PM, Ricky Saltzer <ri...@cloudera.com> wrote: > > > > Is it safe to manually remove some of the older files in the repository > to > > avoid our disk from filling up? > > > > On Wed, Jun 15, 2016 at 4:55 PM, Ricky Saltzer <ri...@cloudera.com> > wrote: > > > >> Just a reminder, I just today noticed the "archive.enabled" option was > >> false and changed it to true. > >> > >> $ find . -type f -ls | grep archive | wc -l > >> 0 > >> > >> > >> > >> On Wed, Jun 15, 2016 at 4:53 PM, Mark Payne <marka...@hotmail.com> > wrote: > >> > >>> OK, thanks. It doesn't appear that it believes there is anything to > >>> reclaim. > >>> > >>> Can you try going to your content repository and running: > >>> > >>> find . -type f -ls | grep archive > >>> > >>> Curious as to how much data it has archived. > >>> > >>>> On Jun 15, 2016, at 4:48 PM, Ricky Saltzer <ri...@cloudera.com> > wrote: > >>>> > >>>> Oh sorry! Trying again > >>>> > >>>> [1] > >>>> > >>> > https://gist.githubusercontent.com/rickysaltzer/b00196a3881c052df9b38b418722cd02/raw/279a1bc8c60530426732eb7b653de1f3f74574e2/gistfile1.txt > >>>> > >>>> > >>>> On Wed, Jun 15, 2016 at 4:38 PM, Ricky Saltzer <ri...@cloudera.com> > >>> wrote: > >>>> > >>>>> I should also mention, I just realized that our worker nodes are on > >>> 0.5.1, > >>>>> and for some reason I missed updating the master from 0.4.0. I'm sure > >>> that > >>>>> is not helping. > >>>>> > >>>>> On Wed, Jun 15, 2016 at 4:36 PM, Ricky Saltzer <ri...@cloudera.com> > >>> wrote: > >>>>> > >>>>>> Looks like the threads are parked and waiting [1] > >>>>>> > >>>>>> [1] > >>>>>> > >>> > http://github.mtv.cloudera.com/gist/ricky/7a5d89f2eeba58e2206d/raw/0e2b446ca049a8b5f27298c700ac709772d2847c/gistfile1.txt > >>>>>> > >>>>>> On Wed, Jun 15, 2016 at 4:33 PM, Joe Witt <joe.w...@gmail.com> > wrote: > >>>>>> > >>>>>>> thanks Ricky - then please take a look at mark's note as that is > >>>>>>> probably more relevant to your case. > >>>>>>> > >>>>>>> On Wed, Jun 15, 2016 at 4:32 PM, Ricky Saltzer <ri...@cloudera.com > > > >>>>>>> wrote: > >>>>>>>> Hey Joe - > >>>>>>>> > >>>>>>>> The NiFi web UI currently reads as: > >>>>>>>> > >>>>>>>> Active threads: 3 > >>>>>>>> Queued: 10,173 / 0 bytes > >>>>>>>> Connected nodes: 2 / 2 > >>>>>>>> Stats last refreshed: 13:31:28 PDT > >>>>>>>> > >>>>>>>> > >>>>>>>> On Wed, Jun 15, 2016 at 4:29 PM, Joe Witt <joe.w...@gmail.com> > >>> wrote: > >>>>>>>> > >>>>>>>>> And the data remains? If so that is an interesting data point I > >>>>>>>>> think. So to mark's point how much data do you have queued up > >>>>>>>>> actively in the flow then on that nodes? Number of objects you > >>>>>>>>> mention is 3273 files corresponding to 825GB in the content > >>>>>>>>> repository. Does NiFi see those 825GB worth of data as being in > >>> the > >>>>>>>>> flow/queued up? And then if that is the case are we talking > about > >>> a > >>>>>>>>> roughly 1TB repo and so the reported value seems correct and this > >>> is > >>>>>>>>> simply a case of queueing near to the limit your system can hold? > >>>>>>>>> > >>>>>>>>> On Wed, Jun 15, 2016 at 4:24 PM, Ricky Saltzer < > ri...@cloudera.com > >>>> > >>>>>>> wrote: > >>>>>>>>>> I have two nodes in clustered mode. I have the other node that > >>> isn't > >>>>>>>>>> filling up as my primary. I've actually already restarted nifi > on > >>>>>>> the > >>>>>>>>> node > >>>>>>>>>> which has the large repository a few times. > >>>>>>>>>> > >>>>>>>>>> On Wed, Jun 15, 2016 at 4:22 PM, Joe Witt <joe.w...@gmail.com> > >>>>>>> wrote: > >>>>>>>>>> > >>>>>>>>>>> Ricky, > >>>>>>>>>>> > >>>>>>>>>>> If you restart nifi and then find that it cleans those things > up > >>> I > >>>>>>>>>>> believe then it is related to the defects corrected in the > >>> 0.5/0.6 > >>>>>>>>>>> timeframe. > >>>>>>>>>>> > >>>>>>>>>>> Is restarting an option for you at this time. You agree mark? > >>>>>>>>>>> > >>>>>>>>>>> Thanks > >>>>>>>>>>> Joe > >>>>>>>>>>> > >>>>>>>>>>> On Wed, Jun 15, 2016 at 4:21 PM, Ricky Saltzer < > >>> ri...@cloudera.com > >>>>>>>> > >>>>>>>>> wrote: > >>>>>>>>>>>> Hey Mark - > >>>>>>>>>>>> > >>>>>>>>>>>> Thanks for the quick reply! This is our production system so > >>> it's > >>>>>>>>>>>> unfortunately running 0.4.0. There are currently 3273 files, > >>>>>>> with some > >>>>>>>>>>>> files dating back to May 18th. The content repository itself > is > >>>>>>> 825G. > >>>>>>>>>>>> > >>>>>>>>>>>> Ricky > >>>>>>>>>>>> > >>>>>>>>>>>> On Wed, Jun 15, 2016 at 4:17 PM, Mark Payne < > >>>>>>> marka...@hotmail.com> > >>>>>>>>>>> wrote: > >>>>>>>>>>>> > >>>>>>>>>>>>> Hey Ricky > >>>>>>>>>>>>> > >>>>>>>>>>>>> The reclaim process is pretty much continuous. What version > of > >>>>>>> NiFi > >>>>>>>>> are > >>>>>>>>>>>>> you running? > >>>>>>>>>>>>> I know there was an issue with this a while back that caused > it > >>>>>>> not > >>>>>>>>> to > >>>>>>>>>>>>> cleanup properly. > >>>>>>>>>>>>> > >>>>>>>>>>>>> Also, how much data & how many FlowFiles do you have queued > up > >>>>>>> in > >>>>>>>>> your > >>>>>>>>>>>>> flow? > >>>>>>>>>>>>> Data won't be archived or reclaimed if in the flow. > >>>>>>>>>>>>> > >>>>>>>>>>>>> Thanks > >>>>>>>>>>>>> -Mark > >>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>>>> On Jun 15, 2016, at 4:04 PM, Ricky Saltzer < > >>>>>>> ri...@cloudera.com> > >>>>>>>>>>> wrote: > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> Hey guys - > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> I recently discovered I didn't have my "archive.enabled" > >>>>>>> option > >>>>>>>>> set to > >>>>>>>>>>>>> true > >>>>>>>>>>>>>> after my disk filled up to 95%. I enabled it and then set > the > >>>>>>>>>>> retention > >>>>>>>>>>>>>> period to 12 hours and 50% (default values). However, after > >>>>>>>>> restarting > >>>>>>>>>>>>>> NiFi, I am not seeing any disk space reclaimed. > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> I'm curious, is the reclaiming process periodic or > continuous? > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> --- > >>>>>>>>>>>>>> ricky > >>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> -- > >>>>>>>>>>>> Ricky Saltzer > >>>>>>>>>>>> http://www.cloudera.com > >>>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> -- > >>>>>>>>>> Ricky Saltzer > >>>>>>>>>> http://www.cloudera.com > >>>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> -- > >>>>>>>> Ricky Saltzer > >>>>>>>> http://www.cloudera.com > >>>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>> -- > >>>>>> Ricky Saltzer > >>>>>> http://www.cloudera.com > >>>>>> > >>>>>> > >>>>> > >>>>> > >>>>> -- > >>>>> Ricky Saltzer > >>>>> http://www.cloudera.com > >>>>> > >>>>> > >>>> > >>>> > >>>> -- > >>>> Ricky Saltzer > >>>> http://www.cloudera.com > >>> > >>> > >> > >> > >> -- > >> Ricky Saltzer > >> http://www.cloudera.com > >> > >> > > > > > > -- > > Ricky Saltzer > > http://www.cloudera.com > > -- Ricky Saltzer http://www.cloudera.com