Looking through the PersistenceMapCache and SimpleMapCache, it seems like
lots of these records should have been evicted by now.  We're up to 3.1
million records on disk in the snapshot file.  My understanding is that
when wali.checkpoint() is called, it collapses all the DELETE records in
the journaled log and removes them before writing the snapshot file.  Is
that accurate?

I feel like something is not going quite right with the eviction process.
I am using 1.1.1, though, and I have noticed that the PersistentMapCache
has changed in [1], so I might apply that patch and try some more
experiments.

Would anyone be willing to try to replicate this behavior in NiFi 1.1.1?
You should be able to do it as follows:
Services:
DistributedMapCacheServer, maximum cache entries = 100,000, FIFO eviction,
persistence directory specified
DistributedMapCacheClientService, point to the same host and port

Flow:
GenerateFlowFile (randomize 1K binary files in batches of 10, schedule 10
threads) ->HashContent (md5) into hash.value -> DetectDuplicate with
identifier = ${hash.value}, description = ., no age off, select your cache
client, cache identifier = true

This should cause the snapshot file to exceed 100,000 keys pretty quickly,
and as far as I can tell, it never goes back down.  This in itself is not a
problem, but when the cache gets really big, it tends to crash our cluster
when NiFi reloads it into memory.

[1] https://issues.apache.org/jira/browse/NIFI-3214


On Wed, Mar 8, 2017 at 11:06 AM, Joe Gresock <[email protected]> wrote:

> Thanks Bryan, I'll start looking through the PersistenceMapCache.  This
> morning I checked back and the snapshot file now has 2.9 million keys in it.
>
> On Tue, Mar 7, 2017 at 4:39 PM, Bryan Bende <[email protected]> wrote:
>
>> Joe,
>>
>> I'm not that familiar with the persistence part of the DMCS, although
>> I do know that it uses the write-ahead-log that is also used by the
>> flow file repo.
>>
>> The code for PersistenceMapCache is here:
>> https://github.com/apache/nifi/blob/master/nifi-nar-bundles/
>> nifi-standard-services/nifi-distributed-cache-services-
>> bundle/nifi-distributed-cache-server/src/main/java/org/
>> apache/nifi/distributed/cache/server/map/PersistentMapCache.java
>>
>> It looks like the WAL is check-pointed during puts here:
>>
>> final long modCount = modifications.getAndIncrement();
>> if ( modCount > 0 && modCount % 100000 == 0 ) {
>>     wali.checkpoint();
>> }
>>
>> And during deletes here:
>>
>> final long modCount = modifications.getAndIncrement();
>> if (modCount > 0 && modCount % 1000 == 0) {
>>     wali.checkpoint();
>> }
>>
>> Not sure if that was intentional that put operations check point every
>> 100k and and deletes check point every 1k.
>>
>> Maybe Mark or others could shed some light on why the snapshot is
>> reaching 3GB in size.
>>
>> -Bryan
>>
>>
>> On Tue, Mar 7, 2017 at 7:07 AM, Joe Gresock <[email protected]> wrote:
>> > Hi folks,
>> >
>> > Is there a technical description of how the DistributedMapCacheServer
>> > (DMCS) persistence works?  I've noticed the following on our cluster:
>> >
>> > - I have the DMCS configured on port 4557 as FIFO with max 100,000
>> entries,
>> > and have specified a persistence directory
>> > - I am using DetectDuplicate with the DMCS, and the individual key
>> length
>> > is 80 bytes, with a Description length of 1 byte.  By my count, this
>> should
>> > result in a pure data size of 7.7MB.
>> > - I notice that the snapshot file in the persistence directory appears
>> to
>> > continue growing past the 100,000 limit, though this may be expected
>> > depending on the implementation.  Since I know that the key will contain
>> > "json" in it, I can run the following command to count the number of
>> > possible keys in the snapshot file (though I'm not sure if this is a
>> good
>> > way of measuring how many keys are actually cached): grep -oa json
>> snapshot
>> > | wc -l
>> > - When the snapshot file reaches around 3GB, the DMCS has a hard time
>> > staying up, and frequently becomes unreachable (netstat -tulpn | grep
>> 4557
>> > shows nothing).  At this point, in order to restore functionality I
>> delete
>> > the persistence directory and let it start over.
>> >
>> > So my main questions are:
>> > - How are the snapshot and partition files structured, and how can I
>> > estimate how many keys are actually cached at a given time?
>> > - Is the described behavior indicative of the cache exceeding the
>> > configured max number of keys?
>> >
>> > Thanks,
>> > Joe
>> >
>> > --
>> > I know what it is to be in need, and I know what it is to have plenty.
>> I
>> > have learned the secret of being content in any and every situation,
>> > whether well fed or hungry, whether living in plenty or in want.  I can
>> do
>> > all this through him who gives me strength.    *-Philippians 4:12-13*
>>
>
>
>
> --
> I know what it is to be in need, and I know what it is to have plenty.  I
> have learned the secret of being content in any and every situation,
> whether well fed or hungry, whether living in plenty or in want.  I can
> do all this through him who gives me strength.    *-Philippians 4:12-13*
>



-- 
I know what it is to be in need, and I know what it is to have plenty.  I
have learned the secret of being content in any and every situation,
whether well fed or hungry, whether living in plenty or in want.  I can do
all this through him who gives me strength.    *-Philippians 4:12-13*

Reply via email to