Joe,

That definitely sounds like a bug causing the eviction to not happen. Can you 
grep your logs for the phrase
"checkpointed with"? You should have a line that tells you how many records 
were written to the Snapshot.
You will certainly see a few of these types of messages, though, because you 
have 1 for the FlowFile Repository,
one for Local State Management, and another one for the 
DistributedMapCacheServer. I am curious to see if
you see the log message indicating 3 million+ records also.

Thanks
-Mark


> On Mar 8, 2017, at 7:13 PM, Joe Gresock <[email protected]> wrote:
> 
> Looking through the PersistenceMapCache and SimpleMapCache, it seems like
> lots of these records should have been evicted by now.  We're up to 3.1
> million records on disk in the snapshot file.  My understanding is that
> when wali.checkpoint() is called, it collapses all the DELETE records in
> the journaled log and removes them before writing the snapshot file.  Is
> that accurate?
> 
> I feel like something is not going quite right with the eviction process.
> I am using 1.1.1, though, and I have noticed that the PersistentMapCache
> has changed in [1], so I might apply that patch and try some more
> experiments.
> 
> Would anyone be willing to try to replicate this behavior in NiFi 1.1.1?
> You should be able to do it as follows:
> Services:
> DistributedMapCacheServer, maximum cache entries = 100,000, FIFO eviction,
> persistence directory specified
> DistributedMapCacheClientService, point to the same host and port
> 
> Flow:
> GenerateFlowFile (randomize 1K binary files in batches of 10, schedule 10
> threads) ->HashContent (md5) into hash.value -> DetectDuplicate with
> identifier = ${hash.value}, description = ., no age off, select your cache
> client, cache identifier = true
> 
> This should cause the snapshot file to exceed 100,000 keys pretty quickly,
> and as far as I can tell, it never goes back down.  This in itself is not a
> problem, but when the cache gets really big, it tends to crash our cluster
> when NiFi reloads it into memory.
> 
> [1] https://issues.apache.org/jira/browse/NIFI-3214
> 
> 
> On Wed, Mar 8, 2017 at 11:06 AM, Joe Gresock <[email protected]> wrote:
> 
>> Thanks Bryan, I'll start looking through the PersistenceMapCache.  This
>> morning I checked back and the snapshot file now has 2.9 million keys in it.
>> 
>> On Tue, Mar 7, 2017 at 4:39 PM, Bryan Bende <[email protected]> wrote:
>> 
>>> Joe,
>>> 
>>> I'm not that familiar with the persistence part of the DMCS, although
>>> I do know that it uses the write-ahead-log that is also used by the
>>> flow file repo.
>>> 
>>> The code for PersistenceMapCache is here:
>>> https://github.com/apache/nifi/blob/master/nifi-nar-bundles/
>>> nifi-standard-services/nifi-distributed-cache-services-
>>> bundle/nifi-distributed-cache-server/src/main/java/org/
>>> apache/nifi/distributed/cache/server/map/PersistentMapCache.java
>>> 
>>> It looks like the WAL is check-pointed during puts here:
>>> 
>>> final long modCount = modifications.getAndIncrement();
>>> if ( modCount > 0 && modCount % 100000 == 0 ) {
>>>    wali.checkpoint();
>>> }
>>> 
>>> And during deletes here:
>>> 
>>> final long modCount = modifications.getAndIncrement();
>>> if (modCount > 0 && modCount % 1000 == 0) {
>>>    wali.checkpoint();
>>> }
>>> 
>>> Not sure if that was intentional that put operations check point every
>>> 100k and and deletes check point every 1k.
>>> 
>>> Maybe Mark or others could shed some light on why the snapshot is
>>> reaching 3GB in size.
>>> 
>>> -Bryan
>>> 
>>> 
>>> On Tue, Mar 7, 2017 at 7:07 AM, Joe Gresock <[email protected]> wrote:
>>>> Hi folks,
>>>> 
>>>> Is there a technical description of how the DistributedMapCacheServer
>>>> (DMCS) persistence works?  I've noticed the following on our cluster:
>>>> 
>>>> - I have the DMCS configured on port 4557 as FIFO with max 100,000
>>> entries,
>>>> and have specified a persistence directory
>>>> - I am using DetectDuplicate with the DMCS, and the individual key
>>> length
>>>> is 80 bytes, with a Description length of 1 byte.  By my count, this
>>> should
>>>> result in a pure data size of 7.7MB.
>>>> - I notice that the snapshot file in the persistence directory appears
>>> to
>>>> continue growing past the 100,000 limit, though this may be expected
>>>> depending on the implementation.  Since I know that the key will contain
>>>> "json" in it, I can run the following command to count the number of
>>>> possible keys in the snapshot file (though I'm not sure if this is a
>>> good
>>>> way of measuring how many keys are actually cached): grep -oa json
>>> snapshot
>>>> | wc -l
>>>> - When the snapshot file reaches around 3GB, the DMCS has a hard time
>>>> staying up, and frequently becomes unreachable (netstat -tulpn | grep
>>> 4557
>>>> shows nothing).  At this point, in order to restore functionality I
>>> delete
>>>> the persistence directory and let it start over.
>>>> 
>>>> So my main questions are:
>>>> - How are the snapshot and partition files structured, and how can I
>>>> estimate how many keys are actually cached at a given time?
>>>> - Is the described behavior indicative of the cache exceeding the
>>>> configured max number of keys?
>>>> 
>>>> Thanks,
>>>> Joe
>>>> 
>>>> --
>>>> I know what it is to be in need, and I know what it is to have plenty.
>>> I
>>>> have learned the secret of being content in any and every situation,
>>>> whether well fed or hungry, whether living in plenty or in want.  I can
>>> do
>>>> all this through him who gives me strength.    *-Philippians 4:12-13*
>>> 
>> 
>> 
>> 
>> --
>> I know what it is to be in need, and I know what it is to have plenty.  I
>> have learned the secret of being content in any and every situation,
>> whether well fed or hungry, whether living in plenty or in want.  I can
>> do all this through him who gives me strength.    *-Philippians 4:12-13*
>> 
> 
> 
> 
> -- 
> I know what it is to be in need, and I know what it is to have plenty.  I
> have learned the secret of being content in any and every situation,
> whether well fed or hungry, whether living in plenty or in want.  I can do
> all this through him who gives me strength.    *-Philippians 4:12-13*

Reply via email to