Thanks Bryan, I'll start looking through the PersistenceMapCache. This morning I checked back and the snapshot file now has 2.9 million keys in it.
On Tue, Mar 7, 2017 at 4:39 PM, Bryan Bende <[email protected]> wrote: > Joe, > > I'm not that familiar with the persistence part of the DMCS, although > I do know that it uses the write-ahead-log that is also used by the > flow file repo. > > The code for PersistenceMapCache is here: > https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-standard- > services/nifi-distributed-cache-services-bundle/nifi- > distributed-cache-server/src/main/java/org/apache/nifi/ > distributed/cache/server/map/PersistentMapCache.java > > It looks like the WAL is check-pointed during puts here: > > final long modCount = modifications.getAndIncrement(); > if ( modCount > 0 && modCount % 100000 == 0 ) { > wali.checkpoint(); > } > > And during deletes here: > > final long modCount = modifications.getAndIncrement(); > if (modCount > 0 && modCount % 1000 == 0) { > wali.checkpoint(); > } > > Not sure if that was intentional that put operations check point every > 100k and and deletes check point every 1k. > > Maybe Mark or others could shed some light on why the snapshot is > reaching 3GB in size. > > -Bryan > > > On Tue, Mar 7, 2017 at 7:07 AM, Joe Gresock <[email protected]> wrote: > > Hi folks, > > > > Is there a technical description of how the DistributedMapCacheServer > > (DMCS) persistence works? I've noticed the following on our cluster: > > > > - I have the DMCS configured on port 4557 as FIFO with max 100,000 > entries, > > and have specified a persistence directory > > - I am using DetectDuplicate with the DMCS, and the individual key length > > is 80 bytes, with a Description length of 1 byte. By my count, this > should > > result in a pure data size of 7.7MB. > > - I notice that the snapshot file in the persistence directory appears to > > continue growing past the 100,000 limit, though this may be expected > > depending on the implementation. Since I know that the key will contain > > "json" in it, I can run the following command to count the number of > > possible keys in the snapshot file (though I'm not sure if this is a good > > way of measuring how many keys are actually cached): grep -oa json > snapshot > > | wc -l > > - When the snapshot file reaches around 3GB, the DMCS has a hard time > > staying up, and frequently becomes unreachable (netstat -tulpn | grep > 4557 > > shows nothing). At this point, in order to restore functionality I > delete > > the persistence directory and let it start over. > > > > So my main questions are: > > - How are the snapshot and partition files structured, and how can I > > estimate how many keys are actually cached at a given time? > > - Is the described behavior indicative of the cache exceeding the > > configured max number of keys? > > > > Thanks, > > Joe > > > > -- > > I know what it is to be in need, and I know what it is to have plenty. I > > have learned the secret of being content in any and every situation, > > whether well fed or hungry, whether living in plenty or in want. I can > do > > all this through him who gives me strength. *-Philippians 4:12-13* > -- I know what it is to be in need, and I know what it is to have plenty. I have learned the secret of being content in any and every situation, whether well fed or hungry, whether living in plenty or in want. I can do all this through him who gives me strength. *-Philippians 4:12-13*
