Hey Mark -

Yeah, agreed. I'm moving some of the 15+ day old files out just because
this is kind of an emergency. Yeah, that's not exactly "normal", but I have
a new pipeline that batches up errors and the FlowFile is basically 0-bytes
with attribute information regarding the error so they can be retried at a
later time.

There's definitely some very large files in that content repo, the sizes
vary from KB to several GB.

$ find . -size +1G | wc -l
163

On Wed, Jun 15, 2016 at 5:13 PM, Mark Payne <marka...@hotmail.com> wrote:

> Deleting the old files could certainly cause some problems.
>
> The weird thing is that it shows that you have 10,000+ FlowFiles, each of
> which is 0 bytes.
> Is that normal for your flow?
>
> Could you try running the following against your content repo:
>
> find . -size +1M
>
> find . | wc -l
>
> Curious how many files there are and how many are "large" files.
>
>
>
> > On Jun 15, 2016, at 5:02 PM, Ricky Saltzer <ri...@cloudera.com> wrote:
> >
> > Is it safe to manually remove some of the older files in the repository
> to
> > avoid our disk from filling up?
> >
> > On Wed, Jun 15, 2016 at 4:55 PM, Ricky Saltzer <ri...@cloudera.com>
> wrote:
> >
> >> Just a reminder, I just today noticed the "archive.enabled" option was
> >> false and changed it to true.
> >>
> >> $ find . -type f -ls | grep archive | wc -l
> >> 0
> >>
> >>
> >>
> >> On Wed, Jun 15, 2016 at 4:53 PM, Mark Payne <marka...@hotmail.com>
> wrote:
> >>
> >>> OK, thanks. It doesn't appear that it believes there is anything to
> >>> reclaim.
> >>>
> >>> Can you try going to your content repository and running:
> >>>
> >>> find . -type f -ls | grep archive
> >>>
> >>> Curious as to how much data it has archived.
> >>>
> >>>> On Jun 15, 2016, at 4:48 PM, Ricky Saltzer <ri...@cloudera.com>
> wrote:
> >>>>
> >>>> Oh sorry! Trying again
> >>>>
> >>>> [1]
> >>>>
> >>>
> https://gist.githubusercontent.com/rickysaltzer/b00196a3881c052df9b38b418722cd02/raw/279a1bc8c60530426732eb7b653de1f3f74574e2/gistfile1.txt
> >>>>
> >>>>
> >>>> On Wed, Jun 15, 2016 at 4:38 PM, Ricky Saltzer <ri...@cloudera.com>
> >>> wrote:
> >>>>
> >>>>> I should also mention, I just realized that our worker nodes are on
> >>> 0.5.1,
> >>>>> and for some reason I missed updating the master from 0.4.0. I'm sure
> >>> that
> >>>>> is not helping.
> >>>>>
> >>>>> On Wed, Jun 15, 2016 at 4:36 PM, Ricky Saltzer <ri...@cloudera.com>
> >>> wrote:
> >>>>>
> >>>>>> Looks like the threads are parked and waiting [1]
> >>>>>>
> >>>>>> [1]
> >>>>>>
> >>>
> http://github.mtv.cloudera.com/gist/ricky/7a5d89f2eeba58e2206d/raw/0e2b446ca049a8b5f27298c700ac709772d2847c/gistfile1.txt
> >>>>>>
> >>>>>> On Wed, Jun 15, 2016 at 4:33 PM, Joe Witt <joe.w...@gmail.com>
> wrote:
> >>>>>>
> >>>>>>> thanks Ricky - then please take a look at mark's note as that is
> >>>>>>> probably more relevant to your case.
> >>>>>>>
> >>>>>>> On Wed, Jun 15, 2016 at 4:32 PM, Ricky Saltzer <ri...@cloudera.com
> >
> >>>>>>> wrote:
> >>>>>>>> Hey Joe -
> >>>>>>>>
> >>>>>>>> The NiFi web UI currently reads as:
> >>>>>>>>
> >>>>>>>> Active threads: 3
> >>>>>>>> Queued: 10,173 / 0 bytes
> >>>>>>>> Connected nodes: 2 / 2
> >>>>>>>> Stats last refreshed: 13:31:28 PDT
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> On Wed, Jun 15, 2016 at 4:29 PM, Joe Witt <joe.w...@gmail.com>
> >>> wrote:
> >>>>>>>>
> >>>>>>>>> And the data remains?  If so that is an interesting data point I
> >>>>>>>>> think.  So to mark's point how much data do you have queued up
> >>>>>>>>> actively in the flow then on that nodes?  Number of objects you
> >>>>>>>>> mention is 3273 files corresponding to 825GB in the content
> >>>>>>>>> repository.  Does NiFi see those 825GB worth of data as being in
> >>> the
> >>>>>>>>> flow/queued up?  And then if that is the case are we talking
> about
> >>> a
> >>>>>>>>> roughly 1TB repo and so the reported value seems correct and this
> >>> is
> >>>>>>>>> simply a case of queueing near to the limit your system can hold?
> >>>>>>>>>
> >>>>>>>>> On Wed, Jun 15, 2016 at 4:24 PM, Ricky Saltzer <
> ri...@cloudera.com
> >>>>
> >>>>>>> wrote:
> >>>>>>>>>> I have two nodes in clustered mode. I have the other node that
> >>> isn't
> >>>>>>>>>> filling up as my primary. I've actually already restarted nifi
> on
> >>>>>>> the
> >>>>>>>>> node
> >>>>>>>>>> which has the large repository a few times.
> >>>>>>>>>>
> >>>>>>>>>> On Wed, Jun 15, 2016 at 4:22 PM, Joe Witt <joe.w...@gmail.com>
> >>>>>>> wrote:
> >>>>>>>>>>
> >>>>>>>>>>> Ricky,
> >>>>>>>>>>>
> >>>>>>>>>>> If you restart nifi and then find that it cleans those things
> up
> >>> I
> >>>>>>>>>>> believe then it is related to the defects corrected in the
> >>> 0.5/0.6
> >>>>>>>>>>> timeframe.
> >>>>>>>>>>>
> >>>>>>>>>>> Is restarting an option for you at this time.  You agree mark?
> >>>>>>>>>>>
> >>>>>>>>>>> Thanks
> >>>>>>>>>>> Joe
> >>>>>>>>>>>
> >>>>>>>>>>> On Wed, Jun 15, 2016 at 4:21 PM, Ricky Saltzer <
> >>> ri...@cloudera.com
> >>>>>>>>
> >>>>>>>>> wrote:
> >>>>>>>>>>>> Hey Mark -
> >>>>>>>>>>>>
> >>>>>>>>>>>> Thanks for the quick reply! This is our production system so
> >>> it's
> >>>>>>>>>>>> unfortunately running 0.4.0. There are currently 3273 files,
> >>>>>>> with some
> >>>>>>>>>>>> files dating back to May 18th. The content repository itself
> is
> >>>>>>> 825G.
> >>>>>>>>>>>>
> >>>>>>>>>>>> Ricky
> >>>>>>>>>>>>
> >>>>>>>>>>>> On Wed, Jun 15, 2016 at 4:17 PM, Mark Payne <
> >>>>>>> marka...@hotmail.com>
> >>>>>>>>>>> wrote:
> >>>>>>>>>>>>
> >>>>>>>>>>>>> Hey Ricky
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> The reclaim process is pretty much continuous. What version
> of
> >>>>>>> NiFi
> >>>>>>>>> are
> >>>>>>>>>>>>> you running?
> >>>>>>>>>>>>> I know there was an issue with this a while back that caused
> it
> >>>>>>> not
> >>>>>>>>> to
> >>>>>>>>>>>>> cleanup properly.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Also, how much data & how many FlowFiles do you have queued
> up
> >>>>>>> in
> >>>>>>>>> your
> >>>>>>>>>>>>> flow?
> >>>>>>>>>>>>> Data won't be archived or reclaimed if in the flow.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Thanks
> >>>>>>>>>>>>> -Mark
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>> On Jun 15, 2016, at 4:04 PM, Ricky Saltzer <
> >>>>>>> ri...@cloudera.com>
> >>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Hey guys -
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> I recently discovered I didn't have my "archive.enabled"
> >>>>>>> option
> >>>>>>>>> set to
> >>>>>>>>>>>>> true
> >>>>>>>>>>>>>> after my disk filled up to 95%. I enabled it and then set
> the
> >>>>>>>>>>> retention
> >>>>>>>>>>>>>> period to 12 hours and 50% (default values). However, after
> >>>>>>>>> restarting
> >>>>>>>>>>>>>> NiFi, I am not seeing any disk space reclaimed.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> I'm curious, is the reclaiming process periodic or
> continuous?
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> ---
> >>>>>>>>>>>>>> ricky
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>> --
> >>>>>>>>>>>> Ricky Saltzer
> >>>>>>>>>>>> http://www.cloudera.com
> >>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> --
> >>>>>>>>>> Ricky Saltzer
> >>>>>>>>>> http://www.cloudera.com
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> --
> >>>>>>>> Ricky Saltzer
> >>>>>>>> http://www.cloudera.com
> >>>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> --
> >>>>>> Ricky Saltzer
> >>>>>> http://www.cloudera.com
> >>>>>>
> >>>>>>
> >>>>>
> >>>>>
> >>>>> --
> >>>>> Ricky Saltzer
> >>>>> http://www.cloudera.com
> >>>>>
> >>>>>
> >>>>
> >>>>
> >>>> --
> >>>> Ricky Saltzer
> >>>> http://www.cloudera.com
> >>>
> >>>
> >>
> >>
> >> --
> >> Ricky Saltzer
> >> http://www.cloudera.com
> >>
> >>
> >
> >
> > --
> > Ricky Saltzer
> > http://www.cloudera.com
>
>


-- 
Ricky Saltzer
http://www.cloudera.com

Reply via email to