Re: Content Repository Cleanup Schedule

2016-06-15 Thread Ricky Saltzer
Hey Mark -

Yeah, agreed. I'm moving some of the 15+ day old files out just because
this is kind of an emergency. Yeah, that's not exactly "normal", but I have
a new pipeline that batches up errors and the FlowFile is basically 0-bytes
with attribute information regarding the error so they can be retried at a
later time.

There's definitely some very large files in that content repo, the sizes
vary from KB to several GB.

$ find . -size +1G | wc -l
163

On Wed, Jun 15, 2016 at 5:13 PM, Mark Payne  wrote:

> Deleting the old files could certainly cause some problems.
>
> The weird thing is that it shows that you have 10,000+ FlowFiles, each of
> which is 0 bytes.
> Is that normal for your flow?
>
> Could you try running the following against your content repo:
>
> find . -size +1M
>
> find . | wc -l
>
> Curious how many files there are and how many are "large" files.
>
>
>
> > On Jun 15, 2016, at 5:02 PM, Ricky Saltzer  wrote:
> >
> > Is it safe to manually remove some of the older files in the repository
> to
> > avoid our disk from filling up?
> >
> > On Wed, Jun 15, 2016 at 4:55 PM, Ricky Saltzer 
> wrote:
> >
> >> Just a reminder, I just today noticed the "archive.enabled" option was
> >> false and changed it to true.
> >>
> >> $ find . -type f -ls | grep archive | wc -l
> >> 0
> >>
> >>
> >>
> >> On Wed, Jun 15, 2016 at 4:53 PM, Mark Payne 
> wrote:
> >>
> >>> OK, thanks. It doesn't appear that it believes there is anything to
> >>> reclaim.
> >>>
> >>> Can you try going to your content repository and running:
> >>>
> >>> find . -type f -ls | grep archive
> >>>
> >>> Curious as to how much data it has archived.
> >>>
>  On Jun 15, 2016, at 4:48 PM, Ricky Saltzer 
> wrote:
> 
>  Oh sorry! Trying again
> 
>  [1]
> 
> >>>
> https://gist.githubusercontent.com/rickysaltzer/b00196a3881c052df9b38b418722cd02/raw/279a1bc8c60530426732eb7b653de1f3f74574e2/gistfile1.txt
> 
> 
>  On Wed, Jun 15, 2016 at 4:38 PM, Ricky Saltzer 
> >>> wrote:
> 
> > I should also mention, I just realized that our worker nodes are on
> >>> 0.5.1,
> > and for some reason I missed updating the master from 0.4.0. I'm sure
> >>> that
> > is not helping.
> >
> > On Wed, Jun 15, 2016 at 4:36 PM, Ricky Saltzer 
> >>> wrote:
> >
> >> Looks like the threads are parked and waiting [1]
> >>
> >> [1]
> >>
> >>>
> http://github.mtv.cloudera.com/gist/ricky/7a5d89f2eeba58e2206d/raw/0e2b446ca049a8b5f27298c700ac709772d2847c/gistfile1.txt
> >>
> >> On Wed, Jun 15, 2016 at 4:33 PM, Joe Witt 
> wrote:
> >>
> >>> thanks Ricky - then please take a look at mark's note as that is
> >>> probably more relevant to your case.
> >>>
> >>> On Wed, Jun 15, 2016 at 4:32 PM, Ricky Saltzer  >
> >>> wrote:
>  Hey Joe -
> 
>  The NiFi web UI currently reads as:
> 
>  Active threads: 3
>  Queued: 10,173 / 0 bytes
>  Connected nodes: 2 / 2
>  Stats last refreshed: 13:31:28 PDT
> 
> 
>  On Wed, Jun 15, 2016 at 4:29 PM, Joe Witt 
> >>> wrote:
> 
> > And the data remains?  If so that is an interesting data point I
> > think.  So to mark's point how much data do you have queued up
> > actively in the flow then on that nodes?  Number of objects you
> > mention is 3273 files corresponding to 825GB in the content
> > repository.  Does NiFi see those 825GB worth of data as being in
> >>> the
> > flow/queued up?  And then if that is the case are we talking
> about
> >>> a
> > roughly 1TB repo and so the reported value seems correct and this
> >>> is
> > simply a case of queueing near to the limit your system can hold?
> >
> > On Wed, Jun 15, 2016 at 4:24 PM, Ricky Saltzer <
> ri...@cloudera.com
> 
> >>> wrote:
> >> I have two nodes in clustered mode. I have the other node that
> >>> isn't
> >> filling up as my primary. I've actually already restarted nifi
> on
> >>> the
> > node
> >> which has the large repository a few times.
> >>
> >> On Wed, Jun 15, 2016 at 4:22 PM, Joe Witt 
> >>> wrote:
> >>
> >>> Ricky,
> >>>
> >>> If you restart nifi and then find that it cleans those things
> up
> >>> I
> >>> believe then it is related to the defects corrected in the
> >>> 0.5/0.6
> >>> timeframe.
> >>>
> >>> Is restarting an option for you at this time.  You agree mark?
> >>>
> >>> Thanks
> >>> Joe
> >>>
> >>> On Wed, Jun 15, 2016 at 4:21 PM, Ricky Saltzer <
> >>> ri...@cloudera.com
> 
> > wrote:
>  Hey Mark -
> 
>  Thanks for the quick reply! This is our production system so
> >>> it's
>  unfortunately running 0.4.0. There are currently 3273 files,
> >>> with some
> 

Re: Content Repository Cleanup Schedule

2016-06-15 Thread Mark Payne
Deleting the old files could certainly cause some problems.

The weird thing is that it shows that you have 10,000+ FlowFiles, each of which 
is 0 bytes.
Is that normal for your flow?

Could you try running the following against your content repo:

find . -size +1M

find . | wc -l

Curious how many files there are and how many are "large" files.



> On Jun 15, 2016, at 5:02 PM, Ricky Saltzer  wrote:
> 
> Is it safe to manually remove some of the older files in the repository to
> avoid our disk from filling up?
> 
> On Wed, Jun 15, 2016 at 4:55 PM, Ricky Saltzer  wrote:
> 
>> Just a reminder, I just today noticed the "archive.enabled" option was
>> false and changed it to true.
>> 
>> $ find . -type f -ls | grep archive | wc -l
>> 0
>> 
>> 
>> 
>> On Wed, Jun 15, 2016 at 4:53 PM, Mark Payne  wrote:
>> 
>>> OK, thanks. It doesn't appear that it believes there is anything to
>>> reclaim.
>>> 
>>> Can you try going to your content repository and running:
>>> 
>>> find . -type f -ls | grep archive
>>> 
>>> Curious as to how much data it has archived.
>>> 
 On Jun 15, 2016, at 4:48 PM, Ricky Saltzer  wrote:
 
 Oh sorry! Trying again
 
 [1]
 
>>> https://gist.githubusercontent.com/rickysaltzer/b00196a3881c052df9b38b418722cd02/raw/279a1bc8c60530426732eb7b653de1f3f74574e2/gistfile1.txt
 
 
 On Wed, Jun 15, 2016 at 4:38 PM, Ricky Saltzer 
>>> wrote:
 
> I should also mention, I just realized that our worker nodes are on
>>> 0.5.1,
> and for some reason I missed updating the master from 0.4.0. I'm sure
>>> that
> is not helping.
> 
> On Wed, Jun 15, 2016 at 4:36 PM, Ricky Saltzer 
>>> wrote:
> 
>> Looks like the threads are parked and waiting [1]
>> 
>> [1]
>> 
>>> http://github.mtv.cloudera.com/gist/ricky/7a5d89f2eeba58e2206d/raw/0e2b446ca049a8b5f27298c700ac709772d2847c/gistfile1.txt
>> 
>> On Wed, Jun 15, 2016 at 4:33 PM, Joe Witt  wrote:
>> 
>>> thanks Ricky - then please take a look at mark's note as that is
>>> probably more relevant to your case.
>>> 
>>> On Wed, Jun 15, 2016 at 4:32 PM, Ricky Saltzer 
>>> wrote:
 Hey Joe -
 
 The NiFi web UI currently reads as:
 
 Active threads: 3
 Queued: 10,173 / 0 bytes
 Connected nodes: 2 / 2
 Stats last refreshed: 13:31:28 PDT
 
 
 On Wed, Jun 15, 2016 at 4:29 PM, Joe Witt 
>>> wrote:
 
> And the data remains?  If so that is an interesting data point I
> think.  So to mark's point how much data do you have queued up
> actively in the flow then on that nodes?  Number of objects you
> mention is 3273 files corresponding to 825GB in the content
> repository.  Does NiFi see those 825GB worth of data as being in
>>> the
> flow/queued up?  And then if that is the case are we talking about
>>> a
> roughly 1TB repo and so the reported value seems correct and this
>>> is
> simply a case of queueing near to the limit your system can hold?
> 
> On Wed, Jun 15, 2016 at 4:24 PM, Ricky Saltzer >>> 
>>> wrote:
>> I have two nodes in clustered mode. I have the other node that
>>> isn't
>> filling up as my primary. I've actually already restarted nifi on
>>> the
> node
>> which has the large repository a few times.
>> 
>> On Wed, Jun 15, 2016 at 4:22 PM, Joe Witt 
>>> wrote:
>> 
>>> Ricky,
>>> 
>>> If you restart nifi and then find that it cleans those things up
>>> I
>>> believe then it is related to the defects corrected in the
>>> 0.5/0.6
>>> timeframe.
>>> 
>>> Is restarting an option for you at this time.  You agree mark?
>>> 
>>> Thanks
>>> Joe
>>> 
>>> On Wed, Jun 15, 2016 at 4:21 PM, Ricky Saltzer <
>>> ri...@cloudera.com
 
> wrote:
 Hey Mark -
 
 Thanks for the quick reply! This is our production system so
>>> it's
 unfortunately running 0.4.0. There are currently 3273 files,
>>> with some
 files dating back to May 18th. The content repository itself is
>>> 825G.
 
 Ricky
 
 On Wed, Jun 15, 2016 at 4:17 PM, Mark Payne <
>>> marka...@hotmail.com>
>>> wrote:
 
> Hey Ricky
> 
> The reclaim process is pretty much continuous. What version of
>>> NiFi
> are
> you running?
> I know there was an issue with this a while back that caused it
>>> not
> to
> cleanup properly.
> 
> Also, how much data & how many FlowFiles do you have queued up
>>> in
> your
> flow?
> Data won't be archived or reclaimed if in the flow.
> 

Re: Content Repository Cleanup Schedule

2016-06-15 Thread Ricky Saltzer
Is it safe to manually remove some of the older files in the repository to
avoid our disk from filling up?

On Wed, Jun 15, 2016 at 4:55 PM, Ricky Saltzer  wrote:

> Just a reminder, I just today noticed the "archive.enabled" option was
> false and changed it to true.
>
> $ find . -type f -ls | grep archive | wc -l
> 0
>
>
>
> On Wed, Jun 15, 2016 at 4:53 PM, Mark Payne  wrote:
>
>> OK, thanks. It doesn't appear that it believes there is anything to
>> reclaim.
>>
>> Can you try going to your content repository and running:
>>
>> find . -type f -ls | grep archive
>>
>> Curious as to how much data it has archived.
>>
>> > On Jun 15, 2016, at 4:48 PM, Ricky Saltzer  wrote:
>> >
>> > Oh sorry! Trying again
>> >
>> > [1]
>> >
>> https://gist.githubusercontent.com/rickysaltzer/b00196a3881c052df9b38b418722cd02/raw/279a1bc8c60530426732eb7b653de1f3f74574e2/gistfile1.txt
>> >
>> >
>> > On Wed, Jun 15, 2016 at 4:38 PM, Ricky Saltzer 
>> wrote:
>> >
>> >> I should also mention, I just realized that our worker nodes are on
>> 0.5.1,
>> >> and for some reason I missed updating the master from 0.4.0. I'm sure
>> that
>> >> is not helping.
>> >>
>> >> On Wed, Jun 15, 2016 at 4:36 PM, Ricky Saltzer 
>> wrote:
>> >>
>> >>> Looks like the threads are parked and waiting [1]
>> >>>
>> >>> [1]
>> >>>
>> http://github.mtv.cloudera.com/gist/ricky/7a5d89f2eeba58e2206d/raw/0e2b446ca049a8b5f27298c700ac709772d2847c/gistfile1.txt
>> >>>
>> >>> On Wed, Jun 15, 2016 at 4:33 PM, Joe Witt  wrote:
>> >>>
>>  thanks Ricky - then please take a look at mark's note as that is
>>  probably more relevant to your case.
>> 
>>  On Wed, Jun 15, 2016 at 4:32 PM, Ricky Saltzer 
>>  wrote:
>> > Hey Joe -
>> >
>> > The NiFi web UI currently reads as:
>> >
>> > Active threads: 3
>> > Queued: 10,173 / 0 bytes
>> > Connected nodes: 2 / 2
>> > Stats last refreshed: 13:31:28 PDT
>> >
>> >
>> > On Wed, Jun 15, 2016 at 4:29 PM, Joe Witt 
>> wrote:
>> >
>> >> And the data remains?  If so that is an interesting data point I
>> >> think.  So to mark's point how much data do you have queued up
>> >> actively in the flow then on that nodes?  Number of objects you
>> >> mention is 3273 files corresponding to 825GB in the content
>> >> repository.  Does NiFi see those 825GB worth of data as being in
>> the
>> >> flow/queued up?  And then if that is the case are we talking about
>> a
>> >> roughly 1TB repo and so the reported value seems correct and this
>> is
>> >> simply a case of queueing near to the limit your system can hold?
>> >>
>> >> On Wed, Jun 15, 2016 at 4:24 PM, Ricky Saltzer > >
>>  wrote:
>> >>> I have two nodes in clustered mode. I have the other node that
>> isn't
>> >>> filling up as my primary. I've actually already restarted nifi on
>>  the
>> >> node
>> >>> which has the large repository a few times.
>> >>>
>> >>> On Wed, Jun 15, 2016 at 4:22 PM, Joe Witt 
>>  wrote:
>> >>>
>>  Ricky,
>> 
>>  If you restart nifi and then find that it cleans those things up
>> I
>>  believe then it is related to the defects corrected in the
>> 0.5/0.6
>>  timeframe.
>> 
>>  Is restarting an option for you at this time.  You agree mark?
>> 
>>  Thanks
>>  Joe
>> 
>>  On Wed, Jun 15, 2016 at 4:21 PM, Ricky Saltzer <
>> ri...@cloudera.com
>> >
>> >> wrote:
>> > Hey Mark -
>> >
>> > Thanks for the quick reply! This is our production system so
>> it's
>> > unfortunately running 0.4.0. There are currently 3273 files,
>>  with some
>> > files dating back to May 18th. The content repository itself is
>>  825G.
>> >
>> > Ricky
>> >
>> > On Wed, Jun 15, 2016 at 4:17 PM, Mark Payne <
>>  marka...@hotmail.com>
>>  wrote:
>> >
>> >> Hey Ricky
>> >>
>> >> The reclaim process is pretty much continuous. What version of
>>  NiFi
>> >> are
>> >> you running?
>> >> I know there was an issue with this a while back that caused it
>>  not
>> >> to
>> >> cleanup properly.
>> >>
>> >> Also, how much data & how many FlowFiles do you have queued up
>>  in
>> >> your
>> >> flow?
>> >> Data won't be archived or reclaimed if in the flow.
>> >>
>> >> Thanks
>> >> -Mark
>> >>
>> >>
>> >>
>> >>> On Jun 15, 2016, at 4:04 PM, Ricky Saltzer <
>>  ri...@cloudera.com>
>>  wrote:
>> >>>
>> >>> Hey guys -
>> >>>
>> >>> I recently discovered I didn't have my "archive.enabled"
>>  option
>> >> set to
>> >> true
>> >>> after my disk filled up to 95%. I enabled it and then set the
>>  retention
>> >>> period to 12 hours 

Re: Content Repository Cleanup Schedule

2016-06-15 Thread Ricky Saltzer
Just a reminder, I just today noticed the "archive.enabled" option was
false and changed it to true.

$ find . -type f -ls | grep archive | wc -l
0



On Wed, Jun 15, 2016 at 4:53 PM, Mark Payne  wrote:

> OK, thanks. It doesn't appear that it believes there is anything to
> reclaim.
>
> Can you try going to your content repository and running:
>
> find . -type f -ls | grep archive
>
> Curious as to how much data it has archived.
>
> > On Jun 15, 2016, at 4:48 PM, Ricky Saltzer  wrote:
> >
> > Oh sorry! Trying again
> >
> > [1]
> >
> https://gist.githubusercontent.com/rickysaltzer/b00196a3881c052df9b38b418722cd02/raw/279a1bc8c60530426732eb7b653de1f3f74574e2/gistfile1.txt
> >
> >
> > On Wed, Jun 15, 2016 at 4:38 PM, Ricky Saltzer 
> wrote:
> >
> >> I should also mention, I just realized that our worker nodes are on
> 0.5.1,
> >> and for some reason I missed updating the master from 0.4.0. I'm sure
> that
> >> is not helping.
> >>
> >> On Wed, Jun 15, 2016 at 4:36 PM, Ricky Saltzer 
> wrote:
> >>
> >>> Looks like the threads are parked and waiting [1]
> >>>
> >>> [1]
> >>>
> http://github.mtv.cloudera.com/gist/ricky/7a5d89f2eeba58e2206d/raw/0e2b446ca049a8b5f27298c700ac709772d2847c/gistfile1.txt
> >>>
> >>> On Wed, Jun 15, 2016 at 4:33 PM, Joe Witt  wrote:
> >>>
>  thanks Ricky - then please take a look at mark's note as that is
>  probably more relevant to your case.
> 
>  On Wed, Jun 15, 2016 at 4:32 PM, Ricky Saltzer 
>  wrote:
> > Hey Joe -
> >
> > The NiFi web UI currently reads as:
> >
> > Active threads: 3
> > Queued: 10,173 / 0 bytes
> > Connected nodes: 2 / 2
> > Stats last refreshed: 13:31:28 PDT
> >
> >
> > On Wed, Jun 15, 2016 at 4:29 PM, Joe Witt 
> wrote:
> >
> >> And the data remains?  If so that is an interesting data point I
> >> think.  So to mark's point how much data do you have queued up
> >> actively in the flow then on that nodes?  Number of objects you
> >> mention is 3273 files corresponding to 825GB in the content
> >> repository.  Does NiFi see those 825GB worth of data as being in the
> >> flow/queued up?  And then if that is the case are we talking about a
> >> roughly 1TB repo and so the reported value seems correct and this is
> >> simply a case of queueing near to the limit your system can hold?
> >>
> >> On Wed, Jun 15, 2016 at 4:24 PM, Ricky Saltzer 
>  wrote:
> >>> I have two nodes in clustered mode. I have the other node that
> isn't
> >>> filling up as my primary. I've actually already restarted nifi on
>  the
> >> node
> >>> which has the large repository a few times.
> >>>
> >>> On Wed, Jun 15, 2016 at 4:22 PM, Joe Witt 
>  wrote:
> >>>
>  Ricky,
> 
>  If you restart nifi and then find that it cleans those things up I
>  believe then it is related to the defects corrected in the 0.5/0.6
>  timeframe.
> 
>  Is restarting an option for you at this time.  You agree mark?
> 
>  Thanks
>  Joe
> 
>  On Wed, Jun 15, 2016 at 4:21 PM, Ricky Saltzer <
> ri...@cloudera.com
> >
> >> wrote:
> > Hey Mark -
> >
> > Thanks for the quick reply! This is our production system so it's
> > unfortunately running 0.4.0. There are currently 3273 files,
>  with some
> > files dating back to May 18th. The content repository itself is
>  825G.
> >
> > Ricky
> >
> > On Wed, Jun 15, 2016 at 4:17 PM, Mark Payne <
>  marka...@hotmail.com>
>  wrote:
> >
> >> Hey Ricky
> >>
> >> The reclaim process is pretty much continuous. What version of
>  NiFi
> >> are
> >> you running?
> >> I know there was an issue with this a while back that caused it
>  not
> >> to
> >> cleanup properly.
> >>
> >> Also, how much data & how many FlowFiles do you have queued up
>  in
> >> your
> >> flow?
> >> Data won't be archived or reclaimed if in the flow.
> >>
> >> Thanks
> >> -Mark
> >>
> >>
> >>
> >>> On Jun 15, 2016, at 4:04 PM, Ricky Saltzer <
>  ri...@cloudera.com>
>  wrote:
> >>>
> >>> Hey guys -
> >>>
> >>> I recently discovered I didn't have my "archive.enabled"
>  option
> >> set to
> >> true
> >>> after my disk filled up to 95%. I enabled it and then set the
>  retention
> >>> period to 12 hours and 50% (default values). However, after
> >> restarting
> >>> NiFi, I am not seeing any disk space reclaimed.
> >>>
> >>> I'm curious, is the reclaiming process periodic or continuous?
> >>>
> >>> ---
> >>> ricky
> >>
> >>
> >
> >
> > --
> 

Re: Content Repository Cleanup Schedule

2016-06-15 Thread Mark Payne
OK, thanks. It doesn't appear that it believes there is anything to reclaim.

Can you try going to your content repository and running:

find . -type f -ls | grep archive

Curious as to how much data it has archived.

> On Jun 15, 2016, at 4:48 PM, Ricky Saltzer  wrote:
> 
> Oh sorry! Trying again
> 
> [1]
> https://gist.githubusercontent.com/rickysaltzer/b00196a3881c052df9b38b418722cd02/raw/279a1bc8c60530426732eb7b653de1f3f74574e2/gistfile1.txt
> 
> 
> On Wed, Jun 15, 2016 at 4:38 PM, Ricky Saltzer  wrote:
> 
>> I should also mention, I just realized that our worker nodes are on 0.5.1,
>> and for some reason I missed updating the master from 0.4.0. I'm sure that
>> is not helping.
>> 
>> On Wed, Jun 15, 2016 at 4:36 PM, Ricky Saltzer  wrote:
>> 
>>> Looks like the threads are parked and waiting [1]
>>> 
>>> [1]
>>> http://github.mtv.cloudera.com/gist/ricky/7a5d89f2eeba58e2206d/raw/0e2b446ca049a8b5f27298c700ac709772d2847c/gistfile1.txt
>>> 
>>> On Wed, Jun 15, 2016 at 4:33 PM, Joe Witt  wrote:
>>> 
 thanks Ricky - then please take a look at mark's note as that is
 probably more relevant to your case.
 
 On Wed, Jun 15, 2016 at 4:32 PM, Ricky Saltzer 
 wrote:
> Hey Joe -
> 
> The NiFi web UI currently reads as:
> 
> Active threads: 3
> Queued: 10,173 / 0 bytes
> Connected nodes: 2 / 2
> Stats last refreshed: 13:31:28 PDT
> 
> 
> On Wed, Jun 15, 2016 at 4:29 PM, Joe Witt  wrote:
> 
>> And the data remains?  If so that is an interesting data point I
>> think.  So to mark's point how much data do you have queued up
>> actively in the flow then on that nodes?  Number of objects you
>> mention is 3273 files corresponding to 825GB in the content
>> repository.  Does NiFi see those 825GB worth of data as being in the
>> flow/queued up?  And then if that is the case are we talking about a
>> roughly 1TB repo and so the reported value seems correct and this is
>> simply a case of queueing near to the limit your system can hold?
>> 
>> On Wed, Jun 15, 2016 at 4:24 PM, Ricky Saltzer 
 wrote:
>>> I have two nodes in clustered mode. I have the other node that isn't
>>> filling up as my primary. I've actually already restarted nifi on
 the
>> node
>>> which has the large repository a few times.
>>> 
>>> On Wed, Jun 15, 2016 at 4:22 PM, Joe Witt 
 wrote:
>>> 
 Ricky,
 
 If you restart nifi and then find that it cleans those things up I
 believe then it is related to the defects corrected in the 0.5/0.6
 timeframe.
 
 Is restarting an option for you at this time.  You agree mark?
 
 Thanks
 Joe
 
 On Wed, Jun 15, 2016 at 4:21 PM, Ricky Saltzer  
>> wrote:
> Hey Mark -
> 
> Thanks for the quick reply! This is our production system so it's
> unfortunately running 0.4.0. There are currently 3273 files,
 with some
> files dating back to May 18th. The content repository itself is
 825G.
> 
> Ricky
> 
> On Wed, Jun 15, 2016 at 4:17 PM, Mark Payne <
 marka...@hotmail.com>
 wrote:
> 
>> Hey Ricky
>> 
>> The reclaim process is pretty much continuous. What version of
 NiFi
>> are
>> you running?
>> I know there was an issue with this a while back that caused it
 not
>> to
>> cleanup properly.
>> 
>> Also, how much data & how many FlowFiles do you have queued up
 in
>> your
>> flow?
>> Data won't be archived or reclaimed if in the flow.
>> 
>> Thanks
>> -Mark
>> 
>> 
>> 
>>> On Jun 15, 2016, at 4:04 PM, Ricky Saltzer <
 ri...@cloudera.com>
 wrote:
>>> 
>>> Hey guys -
>>> 
>>> I recently discovered I didn't have my "archive.enabled"
 option
>> set to
>> true
>>> after my disk filled up to 95%. I enabled it and then set the
 retention
>>> period to 12 hours and 50% (default values). However, after
>> restarting
>>> NiFi, I am not seeing any disk space reclaimed.
>>> 
>>> I'm curious, is the reclaiming process periodic or continuous?
>>> 
>>> ---
>>> ricky
>> 
>> 
> 
> 
> --
> Ricky Saltzer
> http://www.cloudera.com
 
>>> 
>>> 
>>> 
>>> --
>>> Ricky Saltzer
>>> http://www.cloudera.com
>> 
> 
> 
> 
> --
> Ricky Saltzer
> http://www.cloudera.com
 
>>> 
>>> 
>>> 
>>> --
>>> Ricky Saltzer
>>> http://www.cloudera.com
>>> 
>>> 
>> 
>> 
>> --
>> Ricky Saltzer
>> http://www.cloudera.com
>> 
>> 
> 
> 
> -- 
> Ricky Saltzer
> http://www.cloudera.com



Re: Content Repository Cleanup Schedule

2016-06-15 Thread Ricky Saltzer
Oh sorry! Trying again

[1]
https://gist.githubusercontent.com/rickysaltzer/b00196a3881c052df9b38b418722cd02/raw/279a1bc8c60530426732eb7b653de1f3f74574e2/gistfile1.txt


On Wed, Jun 15, 2016 at 4:38 PM, Ricky Saltzer  wrote:

> I should also mention, I just realized that our worker nodes are on 0.5.1,
> and for some reason I missed updating the master from 0.4.0. I'm sure that
> is not helping.
>
> On Wed, Jun 15, 2016 at 4:36 PM, Ricky Saltzer  wrote:
>
>> Looks like the threads are parked and waiting [1]
>>
>> [1]
>> http://github.mtv.cloudera.com/gist/ricky/7a5d89f2eeba58e2206d/raw/0e2b446ca049a8b5f27298c700ac709772d2847c/gistfile1.txt
>>
>> On Wed, Jun 15, 2016 at 4:33 PM, Joe Witt  wrote:
>>
>>> thanks Ricky - then please take a look at mark's note as that is
>>> probably more relevant to your case.
>>>
>>> On Wed, Jun 15, 2016 at 4:32 PM, Ricky Saltzer 
>>> wrote:
>>> > Hey Joe -
>>> >
>>> > The NiFi web UI currently reads as:
>>> >
>>> > Active threads: 3
>>> > Queued: 10,173 / 0 bytes
>>> > Connected nodes: 2 / 2
>>> > Stats last refreshed: 13:31:28 PDT
>>> >
>>> >
>>> > On Wed, Jun 15, 2016 at 4:29 PM, Joe Witt  wrote:
>>> >
>>> >> And the data remains?  If so that is an interesting data point I
>>> >> think.  So to mark's point how much data do you have queued up
>>> >> actively in the flow then on that nodes?  Number of objects you
>>> >> mention is 3273 files corresponding to 825GB in the content
>>> >> repository.  Does NiFi see those 825GB worth of data as being in the
>>> >> flow/queued up?  And then if that is the case are we talking about a
>>> >> roughly 1TB repo and so the reported value seems correct and this is
>>> >> simply a case of queueing near to the limit your system can hold?
>>> >>
>>> >> On Wed, Jun 15, 2016 at 4:24 PM, Ricky Saltzer 
>>> wrote:
>>> >> > I have two nodes in clustered mode. I have the other node that isn't
>>> >> > filling up as my primary. I've actually already restarted nifi on
>>> the
>>> >> node
>>> >> > which has the large repository a few times.
>>> >> >
>>> >> > On Wed, Jun 15, 2016 at 4:22 PM, Joe Witt 
>>> wrote:
>>> >> >
>>> >> >> Ricky,
>>> >> >>
>>> >> >> If you restart nifi and then find that it cleans those things up I
>>> >> >> believe then it is related to the defects corrected in the 0.5/0.6
>>> >> >> timeframe.
>>> >> >>
>>> >> >> Is restarting an option for you at this time.  You agree mark?
>>> >> >>
>>> >> >> Thanks
>>> >> >> Joe
>>> >> >>
>>> >> >> On Wed, Jun 15, 2016 at 4:21 PM, Ricky Saltzer >> >
>>> >> wrote:
>>> >> >> > Hey Mark -
>>> >> >> >
>>> >> >> > Thanks for the quick reply! This is our production system so it's
>>> >> >> > unfortunately running 0.4.0. There are currently 3273 files,
>>> with some
>>> >> >> > files dating back to May 18th. The content repository itself is
>>> 825G.
>>> >> >> >
>>> >> >> > Ricky
>>> >> >> >
>>> >> >> > On Wed, Jun 15, 2016 at 4:17 PM, Mark Payne <
>>> marka...@hotmail.com>
>>> >> >> wrote:
>>> >> >> >
>>> >> >> >> Hey Ricky
>>> >> >> >>
>>> >> >> >> The reclaim process is pretty much continuous. What version of
>>> NiFi
>>> >> are
>>> >> >> >> you running?
>>> >> >> >> I know there was an issue with this a while back that caused it
>>> not
>>> >> to
>>> >> >> >> cleanup properly.
>>> >> >> >>
>>> >> >> >> Also, how much data & how many FlowFiles do you have queued up
>>> in
>>> >> your
>>> >> >> >> flow?
>>> >> >> >> Data won't be archived or reclaimed if in the flow.
>>> >> >> >>
>>> >> >> >> Thanks
>>> >> >> >> -Mark
>>> >> >> >>
>>> >> >> >>
>>> >> >> >>
>>> >> >> >> > On Jun 15, 2016, at 4:04 PM, Ricky Saltzer <
>>> ri...@cloudera.com>
>>> >> >> wrote:
>>> >> >> >> >
>>> >> >> >> > Hey guys -
>>> >> >> >> >
>>> >> >> >> > I recently discovered I didn't have my "archive.enabled"
>>> option
>>> >> set to
>>> >> >> >> true
>>> >> >> >> > after my disk filled up to 95%. I enabled it and then set the
>>> >> >> retention
>>> >> >> >> > period to 12 hours and 50% (default values). However, after
>>> >> restarting
>>> >> >> >> > NiFi, I am not seeing any disk space reclaimed.
>>> >> >> >> >
>>> >> >> >> > I'm curious, is the reclaiming process periodic or continuous?
>>> >> >> >> >
>>> >> >> >> > ---
>>> >> >> >> > ricky
>>> >> >> >>
>>> >> >> >>
>>> >> >> >
>>> >> >> >
>>> >> >> > --
>>> >> >> > Ricky Saltzer
>>> >> >> > http://www.cloudera.com
>>> >> >>
>>> >> >
>>> >> >
>>> >> >
>>> >> > --
>>> >> > Ricky Saltzer
>>> >> > http://www.cloudera.com
>>> >>
>>> >
>>> >
>>> >
>>> > --
>>> > Ricky Saltzer
>>> > http://www.cloudera.com
>>>
>>
>>
>>
>> --
>> Ricky Saltzer
>> http://www.cloudera.com
>>
>>
>
>
> --
> Ricky Saltzer
> http://www.cloudera.com
>
>


-- 
Ricky Saltzer
http://www.cloudera.com


Re: Content Repository Cleanup Schedule

2016-06-15 Thread Mark Payne
It is definitely best to try to keep those in sync, but that won't affect this, 
as the NCM isn't involved
in the nodes' internal maintenance, etc.

> On Jun 15, 2016, at 4:38 PM, Ricky Saltzer  wrote:
> 
> I should also mention, I just realized that our worker nodes are on 0.5.1,
> and for some reason I missed updating the master from 0.4.0. I'm sure that
> is not helping.
> 
> On Wed, Jun 15, 2016 at 4:36 PM, Ricky Saltzer  wrote:
> 
>> Looks like the threads are parked and waiting [1]
>> 
>> [1]
>> http://github.mtv.cloudera.com/gist/ricky/7a5d89f2eeba58e2206d/raw/0e2b446ca049a8b5f27298c700ac709772d2847c/gistfile1.txt
>> 
>> On Wed, Jun 15, 2016 at 4:33 PM, Joe Witt  wrote:
>> 
>>> thanks Ricky - then please take a look at mark's note as that is
>>> probably more relevant to your case.
>>> 
>>> On Wed, Jun 15, 2016 at 4:32 PM, Ricky Saltzer 
>>> wrote:
 Hey Joe -
 
 The NiFi web UI currently reads as:
 
 Active threads: 3
 Queued: 10,173 / 0 bytes
 Connected nodes: 2 / 2
 Stats last refreshed: 13:31:28 PDT
 
 
 On Wed, Jun 15, 2016 at 4:29 PM, Joe Witt  wrote:
 
> And the data remains?  If so that is an interesting data point I
> think.  So to mark's point how much data do you have queued up
> actively in the flow then on that nodes?  Number of objects you
> mention is 3273 files corresponding to 825GB in the content
> repository.  Does NiFi see those 825GB worth of data as being in the
> flow/queued up?  And then if that is the case are we talking about a
> roughly 1TB repo and so the reported value seems correct and this is
> simply a case of queueing near to the limit your system can hold?
> 
> On Wed, Jun 15, 2016 at 4:24 PM, Ricky Saltzer 
>>> wrote:
>> I have two nodes in clustered mode. I have the other node that isn't
>> filling up as my primary. I've actually already restarted nifi on the
> node
>> which has the large repository a few times.
>> 
>> On Wed, Jun 15, 2016 at 4:22 PM, Joe Witt 
>>> wrote:
>> 
>>> Ricky,
>>> 
>>> If you restart nifi and then find that it cleans those things up I
>>> believe then it is related to the defects corrected in the 0.5/0.6
>>> timeframe.
>>> 
>>> Is restarting an option for you at this time.  You agree mark?
>>> 
>>> Thanks
>>> Joe
>>> 
>>> On Wed, Jun 15, 2016 at 4:21 PM, Ricky Saltzer 
> wrote:
 Hey Mark -
 
 Thanks for the quick reply! This is our production system so it's
 unfortunately running 0.4.0. There are currently 3273 files, with
>>> some
 files dating back to May 18th. The content repository itself is
>>> 825G.
 
 Ricky
 
 On Wed, Jun 15, 2016 at 4:17 PM, Mark Payne >>> 
>>> wrote:
 
> Hey Ricky
> 
> The reclaim process is pretty much continuous. What version of
>>> NiFi
> are
> you running?
> I know there was an issue with this a while back that caused it
>>> not
> to
> cleanup properly.
> 
> Also, how much data & how many FlowFiles do you have queued up in
> your
> flow?
> Data won't be archived or reclaimed if in the flow.
> 
> Thanks
> -Mark
> 
> 
> 
>> On Jun 15, 2016, at 4:04 PM, Ricky Saltzer >>> 
>>> wrote:
>> 
>> Hey guys -
>> 
>> I recently discovered I didn't have my "archive.enabled" option
> set to
> true
>> after my disk filled up to 95%. I enabled it and then set the
>>> retention
>> period to 12 hours and 50% (default values). However, after
> restarting
>> NiFi, I am not seeing any disk space reclaimed.
>> 
>> I'm curious, is the reclaiming process periodic or continuous?
>> 
>> ---
>> ricky
> 
> 
 
 
 --
 Ricky Saltzer
 http://www.cloudera.com
>>> 
>> 
>> 
>> 
>> --
>> Ricky Saltzer
>> http://www.cloudera.com
> 
 
 
 
 --
 Ricky Saltzer
 http://www.cloudera.com
>>> 
>> 
>> 
>> 
>> --
>> Ricky Saltzer
>> http://www.cloudera.com
>> 
>> 
> 
> 
> -- 
> Ricky Saltzer
> http://www.cloudera.com



Re: Content Repository Cleanup Schedule

2016-06-15 Thread Mark Payne
Ricky - can't get to that URL, unfortunately. Tells me "This site can't be 
reached".

May be easier to just copy & paste those particular threads here.

Thanks
-Mark


> On Jun 15, 2016, at 4:36 PM, Ricky Saltzer  wrote:
> 
> Looks like the threads are parked and waiting [1]
> 
> [1]
> http://github.mtv.cloudera.com/gist/ricky/7a5d89f2eeba58e2206d/raw/0e2b446ca049a8b5f27298c700ac709772d2847c/gistfile1.txt
> 
> On Wed, Jun 15, 2016 at 4:33 PM, Joe Witt  wrote:
> 
>> thanks Ricky - then please take a look at mark's note as that is
>> probably more relevant to your case.
>> 
>> On Wed, Jun 15, 2016 at 4:32 PM, Ricky Saltzer  wrote:
>>> Hey Joe -
>>> 
>>> The NiFi web UI currently reads as:
>>> 
>>> Active threads: 3
>>> Queued: 10,173 / 0 bytes
>>> Connected nodes: 2 / 2
>>> Stats last refreshed: 13:31:28 PDT
>>> 
>>> 
>>> On Wed, Jun 15, 2016 at 4:29 PM, Joe Witt  wrote:
>>> 
 And the data remains?  If so that is an interesting data point I
 think.  So to mark's point how much data do you have queued up
 actively in the flow then on that nodes?  Number of objects you
 mention is 3273 files corresponding to 825GB in the content
 repository.  Does NiFi see those 825GB worth of data as being in the
 flow/queued up?  And then if that is the case are we talking about a
 roughly 1TB repo and so the reported value seems correct and this is
 simply a case of queueing near to the limit your system can hold?
 
 On Wed, Jun 15, 2016 at 4:24 PM, Ricky Saltzer 
>> wrote:
> I have two nodes in clustered mode. I have the other node that isn't
> filling up as my primary. I've actually already restarted nifi on the
 node
> which has the large repository a few times.
> 
> On Wed, Jun 15, 2016 at 4:22 PM, Joe Witt  wrote:
> 
>> Ricky,
>> 
>> If you restart nifi and then find that it cleans those things up I
>> believe then it is related to the defects corrected in the 0.5/0.6
>> timeframe.
>> 
>> Is restarting an option for you at this time.  You agree mark?
>> 
>> Thanks
>> Joe
>> 
>> On Wed, Jun 15, 2016 at 4:21 PM, Ricky Saltzer 
 wrote:
>>> Hey Mark -
>>> 
>>> Thanks for the quick reply! This is our production system so it's
>>> unfortunately running 0.4.0. There are currently 3273 files, with
>> some
>>> files dating back to May 18th. The content repository itself is
>> 825G.
>>> 
>>> Ricky
>>> 
>>> On Wed, Jun 15, 2016 at 4:17 PM, Mark Payne 
>> wrote:
>>> 
 Hey Ricky
 
 The reclaim process is pretty much continuous. What version of
>> NiFi
 are
 you running?
 I know there was an issue with this a while back that caused it
>> not
 to
 cleanup properly.
 
 Also, how much data & how many FlowFiles do you have queued up in
 your
 flow?
 Data won't be archived or reclaimed if in the flow.
 
 Thanks
 -Mark
 
 
 
> On Jun 15, 2016, at 4:04 PM, Ricky Saltzer 
>> wrote:
> 
> Hey guys -
> 
> I recently discovered I didn't have my "archive.enabled" option
 set to
 true
> after my disk filled up to 95%. I enabled it and then set the
>> retention
> period to 12 hours and 50% (default values). However, after
 restarting
> NiFi, I am not seeing any disk space reclaimed.
> 
> I'm curious, is the reclaiming process periodic or continuous?
> 
> ---
> ricky
 
 
>>> 
>>> 
>>> --
>>> Ricky Saltzer
>>> http://www.cloudera.com
>> 
> 
> 
> 
> --
> Ricky Saltzer
> http://www.cloudera.com
 
>>> 
>>> 
>>> 
>>> --
>>> Ricky Saltzer
>>> http://www.cloudera.com
>> 
> 
> 
> 
> -- 
> Ricky Saltzer
> http://www.cloudera.com



Re: Content Repository Cleanup Schedule

2016-06-15 Thread Ricky Saltzer
I should also mention, I just realized that our worker nodes are on 0.5.1,
and for some reason I missed updating the master from 0.4.0. I'm sure that
is not helping.

On Wed, Jun 15, 2016 at 4:36 PM, Ricky Saltzer  wrote:

> Looks like the threads are parked and waiting [1]
>
> [1]
> http://github.mtv.cloudera.com/gist/ricky/7a5d89f2eeba58e2206d/raw/0e2b446ca049a8b5f27298c700ac709772d2847c/gistfile1.txt
>
> On Wed, Jun 15, 2016 at 4:33 PM, Joe Witt  wrote:
>
>> thanks Ricky - then please take a look at mark's note as that is
>> probably more relevant to your case.
>>
>> On Wed, Jun 15, 2016 at 4:32 PM, Ricky Saltzer 
>> wrote:
>> > Hey Joe -
>> >
>> > The NiFi web UI currently reads as:
>> >
>> > Active threads: 3
>> > Queued: 10,173 / 0 bytes
>> > Connected nodes: 2 / 2
>> > Stats last refreshed: 13:31:28 PDT
>> >
>> >
>> > On Wed, Jun 15, 2016 at 4:29 PM, Joe Witt  wrote:
>> >
>> >> And the data remains?  If so that is an interesting data point I
>> >> think.  So to mark's point how much data do you have queued up
>> >> actively in the flow then on that nodes?  Number of objects you
>> >> mention is 3273 files corresponding to 825GB in the content
>> >> repository.  Does NiFi see those 825GB worth of data as being in the
>> >> flow/queued up?  And then if that is the case are we talking about a
>> >> roughly 1TB repo and so the reported value seems correct and this is
>> >> simply a case of queueing near to the limit your system can hold?
>> >>
>> >> On Wed, Jun 15, 2016 at 4:24 PM, Ricky Saltzer 
>> wrote:
>> >> > I have two nodes in clustered mode. I have the other node that isn't
>> >> > filling up as my primary. I've actually already restarted nifi on the
>> >> node
>> >> > which has the large repository a few times.
>> >> >
>> >> > On Wed, Jun 15, 2016 at 4:22 PM, Joe Witt 
>> wrote:
>> >> >
>> >> >> Ricky,
>> >> >>
>> >> >> If you restart nifi and then find that it cleans those things up I
>> >> >> believe then it is related to the defects corrected in the 0.5/0.6
>> >> >> timeframe.
>> >> >>
>> >> >> Is restarting an option for you at this time.  You agree mark?
>> >> >>
>> >> >> Thanks
>> >> >> Joe
>> >> >>
>> >> >> On Wed, Jun 15, 2016 at 4:21 PM, Ricky Saltzer 
>> >> wrote:
>> >> >> > Hey Mark -
>> >> >> >
>> >> >> > Thanks for the quick reply! This is our production system so it's
>> >> >> > unfortunately running 0.4.0. There are currently 3273 files, with
>> some
>> >> >> > files dating back to May 18th. The content repository itself is
>> 825G.
>> >> >> >
>> >> >> > Ricky
>> >> >> >
>> >> >> > On Wed, Jun 15, 2016 at 4:17 PM, Mark Payne > >
>> >> >> wrote:
>> >> >> >
>> >> >> >> Hey Ricky
>> >> >> >>
>> >> >> >> The reclaim process is pretty much continuous. What version of
>> NiFi
>> >> are
>> >> >> >> you running?
>> >> >> >> I know there was an issue with this a while back that caused it
>> not
>> >> to
>> >> >> >> cleanup properly.
>> >> >> >>
>> >> >> >> Also, how much data & how many FlowFiles do you have queued up in
>> >> your
>> >> >> >> flow?
>> >> >> >> Data won't be archived or reclaimed if in the flow.
>> >> >> >>
>> >> >> >> Thanks
>> >> >> >> -Mark
>> >> >> >>
>> >> >> >>
>> >> >> >>
>> >> >> >> > On Jun 15, 2016, at 4:04 PM, Ricky Saltzer > >
>> >> >> wrote:
>> >> >> >> >
>> >> >> >> > Hey guys -
>> >> >> >> >
>> >> >> >> > I recently discovered I didn't have my "archive.enabled" option
>> >> set to
>> >> >> >> true
>> >> >> >> > after my disk filled up to 95%. I enabled it and then set the
>> >> >> retention
>> >> >> >> > period to 12 hours and 50% (default values). However, after
>> >> restarting
>> >> >> >> > NiFi, I am not seeing any disk space reclaimed.
>> >> >> >> >
>> >> >> >> > I'm curious, is the reclaiming process periodic or continuous?
>> >> >> >> >
>> >> >> >> > ---
>> >> >> >> > ricky
>> >> >> >>
>> >> >> >>
>> >> >> >
>> >> >> >
>> >> >> > --
>> >> >> > Ricky Saltzer
>> >> >> > http://www.cloudera.com
>> >> >>
>> >> >
>> >> >
>> >> >
>> >> > --
>> >> > Ricky Saltzer
>> >> > http://www.cloudera.com
>> >>
>> >
>> >
>> >
>> > --
>> > Ricky Saltzer
>> > http://www.cloudera.com
>>
>
>
>
> --
> Ricky Saltzer
> http://www.cloudera.com
>
>


-- 
Ricky Saltzer
http://www.cloudera.com


Re: Content Repository Cleanup Schedule

2016-06-15 Thread Ricky Saltzer
Looks like the threads are parked and waiting [1]

[1]
http://github.mtv.cloudera.com/gist/ricky/7a5d89f2eeba58e2206d/raw/0e2b446ca049a8b5f27298c700ac709772d2847c/gistfile1.txt

On Wed, Jun 15, 2016 at 4:33 PM, Joe Witt  wrote:

> thanks Ricky - then please take a look at mark's note as that is
> probably more relevant to your case.
>
> On Wed, Jun 15, 2016 at 4:32 PM, Ricky Saltzer  wrote:
> > Hey Joe -
> >
> > The NiFi web UI currently reads as:
> >
> > Active threads: 3
> > Queued: 10,173 / 0 bytes
> > Connected nodes: 2 / 2
> > Stats last refreshed: 13:31:28 PDT
> >
> >
> > On Wed, Jun 15, 2016 at 4:29 PM, Joe Witt  wrote:
> >
> >> And the data remains?  If so that is an interesting data point I
> >> think.  So to mark's point how much data do you have queued up
> >> actively in the flow then on that nodes?  Number of objects you
> >> mention is 3273 files corresponding to 825GB in the content
> >> repository.  Does NiFi see those 825GB worth of data as being in the
> >> flow/queued up?  And then if that is the case are we talking about a
> >> roughly 1TB repo and so the reported value seems correct and this is
> >> simply a case of queueing near to the limit your system can hold?
> >>
> >> On Wed, Jun 15, 2016 at 4:24 PM, Ricky Saltzer 
> wrote:
> >> > I have two nodes in clustered mode. I have the other node that isn't
> >> > filling up as my primary. I've actually already restarted nifi on the
> >> node
> >> > which has the large repository a few times.
> >> >
> >> > On Wed, Jun 15, 2016 at 4:22 PM, Joe Witt  wrote:
> >> >
> >> >> Ricky,
> >> >>
> >> >> If you restart nifi and then find that it cleans those things up I
> >> >> believe then it is related to the defects corrected in the 0.5/0.6
> >> >> timeframe.
> >> >>
> >> >> Is restarting an option for you at this time.  You agree mark?
> >> >>
> >> >> Thanks
> >> >> Joe
> >> >>
> >> >> On Wed, Jun 15, 2016 at 4:21 PM, Ricky Saltzer 
> >> wrote:
> >> >> > Hey Mark -
> >> >> >
> >> >> > Thanks for the quick reply! This is our production system so it's
> >> >> > unfortunately running 0.4.0. There are currently 3273 files, with
> some
> >> >> > files dating back to May 18th. The content repository itself is
> 825G.
> >> >> >
> >> >> > Ricky
> >> >> >
> >> >> > On Wed, Jun 15, 2016 at 4:17 PM, Mark Payne 
> >> >> wrote:
> >> >> >
> >> >> >> Hey Ricky
> >> >> >>
> >> >> >> The reclaim process is pretty much continuous. What version of
> NiFi
> >> are
> >> >> >> you running?
> >> >> >> I know there was an issue with this a while back that caused it
> not
> >> to
> >> >> >> cleanup properly.
> >> >> >>
> >> >> >> Also, how much data & how many FlowFiles do you have queued up in
> >> your
> >> >> >> flow?
> >> >> >> Data won't be archived or reclaimed if in the flow.
> >> >> >>
> >> >> >> Thanks
> >> >> >> -Mark
> >> >> >>
> >> >> >>
> >> >> >>
> >> >> >> > On Jun 15, 2016, at 4:04 PM, Ricky Saltzer 
> >> >> wrote:
> >> >> >> >
> >> >> >> > Hey guys -
> >> >> >> >
> >> >> >> > I recently discovered I didn't have my "archive.enabled" option
> >> set to
> >> >> >> true
> >> >> >> > after my disk filled up to 95%. I enabled it and then set the
> >> >> retention
> >> >> >> > period to 12 hours and 50% (default values). However, after
> >> restarting
> >> >> >> > NiFi, I am not seeing any disk space reclaimed.
> >> >> >> >
> >> >> >> > I'm curious, is the reclaiming process periodic or continuous?
> >> >> >> >
> >> >> >> > ---
> >> >> >> > ricky
> >> >> >>
> >> >> >>
> >> >> >
> >> >> >
> >> >> > --
> >> >> > Ricky Saltzer
> >> >> > http://www.cloudera.com
> >> >>
> >> >
> >> >
> >> >
> >> > --
> >> > Ricky Saltzer
> >> > http://www.cloudera.com
> >>
> >
> >
> >
> > --
> > Ricky Saltzer
> > http://www.cloudera.com
>



-- 
Ricky Saltzer
http://www.cloudera.com


Re: Content Repository Cleanup Schedule

2016-06-15 Thread Joe Witt
thanks Ricky - then please take a look at mark's note as that is
probably more relevant to your case.

On Wed, Jun 15, 2016 at 4:32 PM, Ricky Saltzer  wrote:
> Hey Joe -
>
> The NiFi web UI currently reads as:
>
> Active threads: 3
> Queued: 10,173 / 0 bytes
> Connected nodes: 2 / 2
> Stats last refreshed: 13:31:28 PDT
>
>
> On Wed, Jun 15, 2016 at 4:29 PM, Joe Witt  wrote:
>
>> And the data remains?  If so that is an interesting data point I
>> think.  So to mark's point how much data do you have queued up
>> actively in the flow then on that nodes?  Number of objects you
>> mention is 3273 files corresponding to 825GB in the content
>> repository.  Does NiFi see those 825GB worth of data as being in the
>> flow/queued up?  And then if that is the case are we talking about a
>> roughly 1TB repo and so the reported value seems correct and this is
>> simply a case of queueing near to the limit your system can hold?
>>
>> On Wed, Jun 15, 2016 at 4:24 PM, Ricky Saltzer  wrote:
>> > I have two nodes in clustered mode. I have the other node that isn't
>> > filling up as my primary. I've actually already restarted nifi on the
>> node
>> > which has the large repository a few times.
>> >
>> > On Wed, Jun 15, 2016 at 4:22 PM, Joe Witt  wrote:
>> >
>> >> Ricky,
>> >>
>> >> If you restart nifi and then find that it cleans those things up I
>> >> believe then it is related to the defects corrected in the 0.5/0.6
>> >> timeframe.
>> >>
>> >> Is restarting an option for you at this time.  You agree mark?
>> >>
>> >> Thanks
>> >> Joe
>> >>
>> >> On Wed, Jun 15, 2016 at 4:21 PM, Ricky Saltzer 
>> wrote:
>> >> > Hey Mark -
>> >> >
>> >> > Thanks for the quick reply! This is our production system so it's
>> >> > unfortunately running 0.4.0. There are currently 3273 files, with some
>> >> > files dating back to May 18th. The content repository itself is 825G.
>> >> >
>> >> > Ricky
>> >> >
>> >> > On Wed, Jun 15, 2016 at 4:17 PM, Mark Payne 
>> >> wrote:
>> >> >
>> >> >> Hey Ricky
>> >> >>
>> >> >> The reclaim process is pretty much continuous. What version of NiFi
>> are
>> >> >> you running?
>> >> >> I know there was an issue with this a while back that caused it not
>> to
>> >> >> cleanup properly.
>> >> >>
>> >> >> Also, how much data & how many FlowFiles do you have queued up in
>> your
>> >> >> flow?
>> >> >> Data won't be archived or reclaimed if in the flow.
>> >> >>
>> >> >> Thanks
>> >> >> -Mark
>> >> >>
>> >> >>
>> >> >>
>> >> >> > On Jun 15, 2016, at 4:04 PM, Ricky Saltzer 
>> >> wrote:
>> >> >> >
>> >> >> > Hey guys -
>> >> >> >
>> >> >> > I recently discovered I didn't have my "archive.enabled" option
>> set to
>> >> >> true
>> >> >> > after my disk filled up to 95%. I enabled it and then set the
>> >> retention
>> >> >> > period to 12 hours and 50% (default values). However, after
>> restarting
>> >> >> > NiFi, I am not seeing any disk space reclaimed.
>> >> >> >
>> >> >> > I'm curious, is the reclaiming process periodic or continuous?
>> >> >> >
>> >> >> > ---
>> >> >> > ricky
>> >> >>
>> >> >>
>> >> >
>> >> >
>> >> > --
>> >> > Ricky Saltzer
>> >> > http://www.cloudera.com
>> >>
>> >
>> >
>> >
>> > --
>> > Ricky Saltzer
>> > http://www.cloudera.com
>>
>
>
>
> --
> Ricky Saltzer
> http://www.cloudera.com


Re: Content Repository Cleanup Schedule

2016-06-15 Thread Ricky Saltzer
Hey Joe -

The NiFi web UI currently reads as:

Active threads: 3
Queued: 10,173 / 0 bytes
Connected nodes: 2 / 2
Stats last refreshed: 13:31:28 PDT


On Wed, Jun 15, 2016 at 4:29 PM, Joe Witt  wrote:

> And the data remains?  If so that is an interesting data point I
> think.  So to mark's point how much data do you have queued up
> actively in the flow then on that nodes?  Number of objects you
> mention is 3273 files corresponding to 825GB in the content
> repository.  Does NiFi see those 825GB worth of data as being in the
> flow/queued up?  And then if that is the case are we talking about a
> roughly 1TB repo and so the reported value seems correct and this is
> simply a case of queueing near to the limit your system can hold?
>
> On Wed, Jun 15, 2016 at 4:24 PM, Ricky Saltzer  wrote:
> > I have two nodes in clustered mode. I have the other node that isn't
> > filling up as my primary. I've actually already restarted nifi on the
> node
> > which has the large repository a few times.
> >
> > On Wed, Jun 15, 2016 at 4:22 PM, Joe Witt  wrote:
> >
> >> Ricky,
> >>
> >> If you restart nifi and then find that it cleans those things up I
> >> believe then it is related to the defects corrected in the 0.5/0.6
> >> timeframe.
> >>
> >> Is restarting an option for you at this time.  You agree mark?
> >>
> >> Thanks
> >> Joe
> >>
> >> On Wed, Jun 15, 2016 at 4:21 PM, Ricky Saltzer 
> wrote:
> >> > Hey Mark -
> >> >
> >> > Thanks for the quick reply! This is our production system so it's
> >> > unfortunately running 0.4.0. There are currently 3273 files, with some
> >> > files dating back to May 18th. The content repository itself is 825G.
> >> >
> >> > Ricky
> >> >
> >> > On Wed, Jun 15, 2016 at 4:17 PM, Mark Payne 
> >> wrote:
> >> >
> >> >> Hey Ricky
> >> >>
> >> >> The reclaim process is pretty much continuous. What version of NiFi
> are
> >> >> you running?
> >> >> I know there was an issue with this a while back that caused it not
> to
> >> >> cleanup properly.
> >> >>
> >> >> Also, how much data & how many FlowFiles do you have queued up in
> your
> >> >> flow?
> >> >> Data won't be archived or reclaimed if in the flow.
> >> >>
> >> >> Thanks
> >> >> -Mark
> >> >>
> >> >>
> >> >>
> >> >> > On Jun 15, 2016, at 4:04 PM, Ricky Saltzer 
> >> wrote:
> >> >> >
> >> >> > Hey guys -
> >> >> >
> >> >> > I recently discovered I didn't have my "archive.enabled" option
> set to
> >> >> true
> >> >> > after my disk filled up to 95%. I enabled it and then set the
> >> retention
> >> >> > period to 12 hours and 50% (default values). However, after
> restarting
> >> >> > NiFi, I am not seeing any disk space reclaimed.
> >> >> >
> >> >> > I'm curious, is the reclaiming process periodic or continuous?
> >> >> >
> >> >> > ---
> >> >> > ricky
> >> >>
> >> >>
> >> >
> >> >
> >> > --
> >> > Ricky Saltzer
> >> > http://www.cloudera.com
> >>
> >
> >
> >
> > --
> > Ricky Saltzer
> > http://www.cloudera.com
>



-- 
Ricky Saltzer
http://www.cloudera.com


Re: Content Repository Cleanup Schedule

2016-06-15 Thread Mark Payne
I do agree. Unfortunately, I was a bit off, apparently, when I said "an issue a 
while back."
It turns out that the ticket was 1726 [1], which was fixed in 0.6.1.

To determine if this is what is biting you, could you do a thread-dump 
(bin/nifi.sh dump thread-dump.txt)
and then look in that file for "FileSystemRepository" and see what it is doing?

Thanks
-Mark


[1] https://issues.apache.org/jira/browse/NIFI-1726 




> On Jun 15, 2016, at 4:24 PM, Ricky Saltzer  wrote:
> 
> I have two nodes in clustered mode. I have the other node that isn't
> filling up as my primary. I've actually already restarted nifi on the node
> which has the large repository a few times.
> 
> On Wed, Jun 15, 2016 at 4:22 PM, Joe Witt  wrote:
> 
>> Ricky,
>> 
>> If you restart nifi and then find that it cleans those things up I
>> believe then it is related to the defects corrected in the 0.5/0.6
>> timeframe.
>> 
>> Is restarting an option for you at this time.  You agree mark?
>> 
>> Thanks
>> Joe
>> 
>> On Wed, Jun 15, 2016 at 4:21 PM, Ricky Saltzer  wrote:
>>> Hey Mark -
>>> 
>>> Thanks for the quick reply! This is our production system so it's
>>> unfortunately running 0.4.0. There are currently 3273 files, with some
>>> files dating back to May 18th. The content repository itself is 825G.
>>> 
>>> Ricky
>>> 
>>> On Wed, Jun 15, 2016 at 4:17 PM, Mark Payne 
>> wrote:
>>> 
 Hey Ricky
 
 The reclaim process is pretty much continuous. What version of NiFi are
 you running?
 I know there was an issue with this a while back that caused it not to
 cleanup properly.
 
 Also, how much data & how many FlowFiles do you have queued up in your
 flow?
 Data won't be archived or reclaimed if in the flow.
 
 Thanks
 -Mark
 
 
 
> On Jun 15, 2016, at 4:04 PM, Ricky Saltzer 
>> wrote:
> 
> Hey guys -
> 
> I recently discovered I didn't have my "archive.enabled" option set to
 true
> after my disk filled up to 95%. I enabled it and then set the
>> retention
> period to 12 hours and 50% (default values). However, after restarting
> NiFi, I am not seeing any disk space reclaimed.
> 
> I'm curious, is the reclaiming process periodic or continuous?
> 
> ---
> ricky
 
 
>>> 
>>> 
>>> --
>>> Ricky Saltzer
>>> http://www.cloudera.com
>> 
> 
> 
> 
> -- 
> Ricky Saltzer
> http://www.cloudera.com



Re: Content Repository Cleanup Schedule

2016-06-15 Thread Joe Witt
And the data remains?  If so that is an interesting data point I
think.  So to mark's point how much data do you have queued up
actively in the flow then on that nodes?  Number of objects you
mention is 3273 files corresponding to 825GB in the content
repository.  Does NiFi see those 825GB worth of data as being in the
flow/queued up?  And then if that is the case are we talking about a
roughly 1TB repo and so the reported value seems correct and this is
simply a case of queueing near to the limit your system can hold?

On Wed, Jun 15, 2016 at 4:24 PM, Ricky Saltzer  wrote:
> I have two nodes in clustered mode. I have the other node that isn't
> filling up as my primary. I've actually already restarted nifi on the node
> which has the large repository a few times.
>
> On Wed, Jun 15, 2016 at 4:22 PM, Joe Witt  wrote:
>
>> Ricky,
>>
>> If you restart nifi and then find that it cleans those things up I
>> believe then it is related to the defects corrected in the 0.5/0.6
>> timeframe.
>>
>> Is restarting an option for you at this time.  You agree mark?
>>
>> Thanks
>> Joe
>>
>> On Wed, Jun 15, 2016 at 4:21 PM, Ricky Saltzer  wrote:
>> > Hey Mark -
>> >
>> > Thanks for the quick reply! This is our production system so it's
>> > unfortunately running 0.4.0. There are currently 3273 files, with some
>> > files dating back to May 18th. The content repository itself is 825G.
>> >
>> > Ricky
>> >
>> > On Wed, Jun 15, 2016 at 4:17 PM, Mark Payne 
>> wrote:
>> >
>> >> Hey Ricky
>> >>
>> >> The reclaim process is pretty much continuous. What version of NiFi are
>> >> you running?
>> >> I know there was an issue with this a while back that caused it not to
>> >> cleanup properly.
>> >>
>> >> Also, how much data & how many FlowFiles do you have queued up in your
>> >> flow?
>> >> Data won't be archived or reclaimed if in the flow.
>> >>
>> >> Thanks
>> >> -Mark
>> >>
>> >>
>> >>
>> >> > On Jun 15, 2016, at 4:04 PM, Ricky Saltzer 
>> wrote:
>> >> >
>> >> > Hey guys -
>> >> >
>> >> > I recently discovered I didn't have my "archive.enabled" option set to
>> >> true
>> >> > after my disk filled up to 95%. I enabled it and then set the
>> retention
>> >> > period to 12 hours and 50% (default values). However, after restarting
>> >> > NiFi, I am not seeing any disk space reclaimed.
>> >> >
>> >> > I'm curious, is the reclaiming process periodic or continuous?
>> >> >
>> >> > ---
>> >> > ricky
>> >>
>> >>
>> >
>> >
>> > --
>> > Ricky Saltzer
>> > http://www.cloudera.com
>>
>
>
>
> --
> Ricky Saltzer
> http://www.cloudera.com


Re: Content Repository Cleanup Schedule

2016-06-15 Thread Ricky Saltzer
I have two nodes in clustered mode. I have the other node that isn't
filling up as my primary. I've actually already restarted nifi on the node
which has the large repository a few times.

On Wed, Jun 15, 2016 at 4:22 PM, Joe Witt  wrote:

> Ricky,
>
> If you restart nifi and then find that it cleans those things up I
> believe then it is related to the defects corrected in the 0.5/0.6
> timeframe.
>
> Is restarting an option for you at this time.  You agree mark?
>
> Thanks
> Joe
>
> On Wed, Jun 15, 2016 at 4:21 PM, Ricky Saltzer  wrote:
> > Hey Mark -
> >
> > Thanks for the quick reply! This is our production system so it's
> > unfortunately running 0.4.0. There are currently 3273 files, with some
> > files dating back to May 18th. The content repository itself is 825G.
> >
> > Ricky
> >
> > On Wed, Jun 15, 2016 at 4:17 PM, Mark Payne 
> wrote:
> >
> >> Hey Ricky
> >>
> >> The reclaim process is pretty much continuous. What version of NiFi are
> >> you running?
> >> I know there was an issue with this a while back that caused it not to
> >> cleanup properly.
> >>
> >> Also, how much data & how many FlowFiles do you have queued up in your
> >> flow?
> >> Data won't be archived or reclaimed if in the flow.
> >>
> >> Thanks
> >> -Mark
> >>
> >>
> >>
> >> > On Jun 15, 2016, at 4:04 PM, Ricky Saltzer 
> wrote:
> >> >
> >> > Hey guys -
> >> >
> >> > I recently discovered I didn't have my "archive.enabled" option set to
> >> true
> >> > after my disk filled up to 95%. I enabled it and then set the
> retention
> >> > period to 12 hours and 50% (default values). However, after restarting
> >> > NiFi, I am not seeing any disk space reclaimed.
> >> >
> >> > I'm curious, is the reclaiming process periodic or continuous?
> >> >
> >> > ---
> >> > ricky
> >>
> >>
> >
> >
> > --
> > Ricky Saltzer
> > http://www.cloudera.com
>



-- 
Ricky Saltzer
http://www.cloudera.com


Re: Content Repository Cleanup Schedule

2016-06-15 Thread Joe Witt
Ricky,

If you restart nifi and then find that it cleans those things up I
believe then it is related to the defects corrected in the 0.5/0.6
timeframe.

Is restarting an option for you at this time.  You agree mark?

Thanks
Joe

On Wed, Jun 15, 2016 at 4:21 PM, Ricky Saltzer  wrote:
> Hey Mark -
>
> Thanks for the quick reply! This is our production system so it's
> unfortunately running 0.4.0. There are currently 3273 files, with some
> files dating back to May 18th. The content repository itself is 825G.
>
> Ricky
>
> On Wed, Jun 15, 2016 at 4:17 PM, Mark Payne  wrote:
>
>> Hey Ricky
>>
>> The reclaim process is pretty much continuous. What version of NiFi are
>> you running?
>> I know there was an issue with this a while back that caused it not to
>> cleanup properly.
>>
>> Also, how much data & how many FlowFiles do you have queued up in your
>> flow?
>> Data won't be archived or reclaimed if in the flow.
>>
>> Thanks
>> -Mark
>>
>>
>>
>> > On Jun 15, 2016, at 4:04 PM, Ricky Saltzer  wrote:
>> >
>> > Hey guys -
>> >
>> > I recently discovered I didn't have my "archive.enabled" option set to
>> true
>> > after my disk filled up to 95%. I enabled it and then set the retention
>> > period to 12 hours and 50% (default values). However, after restarting
>> > NiFi, I am not seeing any disk space reclaimed.
>> >
>> > I'm curious, is the reclaiming process periodic or continuous?
>> >
>> > ---
>> > ricky
>>
>>
>
>
> --
> Ricky Saltzer
> http://www.cloudera.com


Re: Content Repository Cleanup Schedule

2016-06-15 Thread Ricky Saltzer
Hey Mark -

Thanks for the quick reply! This is our production system so it's
unfortunately running 0.4.0. There are currently 3273 files, with some
files dating back to May 18th. The content repository itself is 825G.

Ricky

On Wed, Jun 15, 2016 at 4:17 PM, Mark Payne  wrote:

> Hey Ricky
>
> The reclaim process is pretty much continuous. What version of NiFi are
> you running?
> I know there was an issue with this a while back that caused it not to
> cleanup properly.
>
> Also, how much data & how many FlowFiles do you have queued up in your
> flow?
> Data won't be archived or reclaimed if in the flow.
>
> Thanks
> -Mark
>
>
>
> > On Jun 15, 2016, at 4:04 PM, Ricky Saltzer  wrote:
> >
> > Hey guys -
> >
> > I recently discovered I didn't have my "archive.enabled" option set to
> true
> > after my disk filled up to 95%. I enabled it and then set the retention
> > period to 12 hours and 50% (default values). However, after restarting
> > NiFi, I am not seeing any disk space reclaimed.
> >
> > I'm curious, is the reclaiming process periodic or continuous?
> >
> > ---
> > ricky
>
>


-- 
Ricky Saltzer
http://www.cloudera.com


Re: Content Repository Cleanup Schedule

2016-06-15 Thread Mark Payne
Hey Ricky

The reclaim process is pretty much continuous. What version of NiFi are you 
running?
I know there was an issue with this a while back that caused it not to cleanup 
properly.

Also, how much data & how many FlowFiles do you have queued up in your flow?
Data won't be archived or reclaimed if in the flow.

Thanks
-Mark



> On Jun 15, 2016, at 4:04 PM, Ricky Saltzer  wrote:
> 
> Hey guys -
> 
> I recently discovered I didn't have my "archive.enabled" option set to true
> after my disk filled up to 95%. I enabled it and then set the retention
> period to 12 hours and 50% (default values). However, after restarting
> NiFi, I am not seeing any disk space reclaimed.
> 
> I'm curious, is the reclaiming process periodic or continuous?
> 
> ---
> ricky