Correct -- I'll create it now and add you as a watcher. On Thu, Mar 9, 2017 at 3:31 PM, Mark Payne <[email protected]> wrote:
> Excellent, thanks! Definitely looks like old records are not getting > evicted. You have not yet created a JIRA for > this, correct? > > Thanks > -Mark > > > > On Mar 9, 2017, at 10:24 AM, Joe Gresock <[email protected]> wrote: > > > > Good instinct -- here's what I get: > > > > nifi-app.log:2017-03-09 15:03:00,670 INFO [Distributed Cache Server > > Communications Thread: ac907dec-49a4-439e-99f5-1558f2358d87] > > org.wali.MinimalLockingWriteAheadLog > > org.wali.MinimalLockingWriteAheadLog@40569408 checkpointed with > *4262902* > > Records and 0 Swap Files in 256302 milliseconds (Stop-the-world time = > 1378 > > milliseconds, Clear Edit Logs time = 19 millis), max Transaction ID > 4263237 > > > > Looks like it's over 4.2 million records now. > > > > On Thu, Mar 9, 2017 at 3:13 PM, Mark Payne <[email protected]> wrote: > > > >> Joe, > >> > >> That definitely sounds like a bug causing the eviction to not happen. > Can > >> you grep your logs for the phrase > >> "checkpointed with"? You should have a line that tells you how many > >> records were written to the Snapshot. > >> You will certainly see a few of these types of messages, though, because > >> you have 1 for the FlowFile Repository, > >> one for Local State Management, and another one for the > >> DistributedMapCacheServer. I am curious to see if > >> you see the log message indicating 3 million+ records also. > >> > >> Thanks > >> -Mark > >> > >> > >>> On Mar 8, 2017, at 7:13 PM, Joe Gresock <[email protected]> wrote: > >>> > >>> Looking through the PersistenceMapCache and SimpleMapCache, it seems > like > >>> lots of these records should have been evicted by now. We're up to 3.1 > >>> million records on disk in the snapshot file. My understanding is that > >>> when wali.checkpoint() is called, it collapses all the DELETE records > in > >>> the journaled log and removes them before writing the snapshot file. > Is > >>> that accurate? > >>> > >>> I feel like something is not going quite right with the eviction > process. > >>> I am using 1.1.1, though, and I have noticed that the > PersistentMapCache > >>> has changed in [1], so I might apply that patch and try some more > >>> experiments. > >>> > >>> Would anyone be willing to try to replicate this behavior in NiFi > 1.1.1? > >>> You should be able to do it as follows: > >>> Services: > >>> DistributedMapCacheServer, maximum cache entries = 100,000, FIFO > >> eviction, > >>> persistence directory specified > >>> DistributedMapCacheClientService, point to the same host and port > >>> > >>> Flow: > >>> GenerateFlowFile (randomize 1K binary files in batches of 10, schedule > 10 > >>> threads) ->HashContent (md5) into hash.value -> DetectDuplicate with > >>> identifier = ${hash.value}, description = ., no age off, select your > >> cache > >>> client, cache identifier = true > >>> > >>> This should cause the snapshot file to exceed 100,000 keys pretty > >> quickly, > >>> and as far as I can tell, it never goes back down. This in itself is > >> not a > >>> problem, but when the cache gets really big, it tends to crash our > >> cluster > >>> when NiFi reloads it into memory. > >>> > >>> [1] https://issues.apache.org/jira/browse/NIFI-3214 > >>> > >>> > >>> On Wed, Mar 8, 2017 at 11:06 AM, Joe Gresock <[email protected]> > wrote: > >>> > >>>> Thanks Bryan, I'll start looking through the PersistenceMapCache. > This > >>>> morning I checked back and the snapshot file now has 2.9 million keys > >> in it. > >>>> > >>>> On Tue, Mar 7, 2017 at 4:39 PM, Bryan Bende <[email protected]> wrote: > >>>> > >>>>> Joe, > >>>>> > >>>>> I'm not that familiar with the persistence part of the DMCS, although > >>>>> I do know that it uses the write-ahead-log that is also used by the > >>>>> flow file repo. > >>>>> > >>>>> The code for PersistenceMapCache is here: > >>>>> https://github.com/apache/nifi/blob/master/nifi-nar-bundles/ > >>>>> nifi-standard-services/nifi-distributed-cache-services- > >>>>> bundle/nifi-distributed-cache-server/src/main/java/org/ > >>>>> apache/nifi/distributed/cache/server/map/PersistentMapCache.java > >>>>> > >>>>> It looks like the WAL is check-pointed during puts here: > >>>>> > >>>>> final long modCount = modifications.getAndIncrement(); > >>>>> if ( modCount > 0 && modCount % 100000 == 0 ) { > >>>>> wali.checkpoint(); > >>>>> } > >>>>> > >>>>> And during deletes here: > >>>>> > >>>>> final long modCount = modifications.getAndIncrement(); > >>>>> if (modCount > 0 && modCount % 1000 == 0) { > >>>>> wali.checkpoint(); > >>>>> } > >>>>> > >>>>> Not sure if that was intentional that put operations check point > every > >>>>> 100k and and deletes check point every 1k. > >>>>> > >>>>> Maybe Mark or others could shed some light on why the snapshot is > >>>>> reaching 3GB in size. > >>>>> > >>>>> -Bryan > >>>>> > >>>>> > >>>>> On Tue, Mar 7, 2017 at 7:07 AM, Joe Gresock <[email protected]> > >> wrote: > >>>>>> Hi folks, > >>>>>> > >>>>>> Is there a technical description of how the > DistributedMapCacheServer > >>>>>> (DMCS) persistence works? I've noticed the following on our > cluster: > >>>>>> > >>>>>> - I have the DMCS configured on port 4557 as FIFO with max 100,000 > >>>>> entries, > >>>>>> and have specified a persistence directory > >>>>>> - I am using DetectDuplicate with the DMCS, and the individual key > >>>>> length > >>>>>> is 80 bytes, with a Description length of 1 byte. By my count, this > >>>>> should > >>>>>> result in a pure data size of 7.7MB. > >>>>>> - I notice that the snapshot file in the persistence directory > appears > >>>>> to > >>>>>> continue growing past the 100,000 limit, though this may be expected > >>>>>> depending on the implementation. Since I know that the key will > >> contain > >>>>>> "json" in it, I can run the following command to count the number of > >>>>>> possible keys in the snapshot file (though I'm not sure if this is a > >>>>> good > >>>>>> way of measuring how many keys are actually cached): grep -oa json > >>>>> snapshot > >>>>>> | wc -l > >>>>>> - When the snapshot file reaches around 3GB, the DMCS has a hard > time > >>>>>> staying up, and frequently becomes unreachable (netstat -tulpn | > grep > >>>>> 4557 > >>>>>> shows nothing). At this point, in order to restore functionality I > >>>>> delete > >>>>>> the persistence directory and let it start over. > >>>>>> > >>>>>> So my main questions are: > >>>>>> - How are the snapshot and partition files structured, and how can I > >>>>>> estimate how many keys are actually cached at a given time? > >>>>>> - Is the described behavior indicative of the cache exceeding the > >>>>>> configured max number of keys? > >>>>>> > >>>>>> Thanks, > >>>>>> Joe > >>>>>> > >>>>>> -- > >>>>>> I know what it is to be in need, and I know what it is to have > plenty. > >>>>> I > >>>>>> have learned the secret of being content in any and every situation, > >>>>>> whether well fed or hungry, whether living in plenty or in want. I > >> can > >>>>> do > >>>>>> all this through him who gives me strength. *-Philippians > 4:12-13* > >>>>> > >>>> > >>>> > >>>> > >>>> -- > >>>> I know what it is to be in need, and I know what it is to have plenty. > >> I > >>>> have learned the secret of being content in any and every situation, > >>>> whether well fed or hungry, whether living in plenty or in want. I > can > >>>> do all this through him who gives me strength. *-Philippians > 4:12-13* > >>>> > >>> > >>> > >>> > >>> -- > >>> I know what it is to be in need, and I know what it is to have > plenty. I > >>> have learned the secret of being content in any and every situation, > >>> whether well fed or hungry, whether living in plenty or in want. I can > >> do > >>> all this through him who gives me strength. *-Philippians 4:12-13* > >> > >> > > > > > > -- > > I know what it is to be in need, and I know what it is to have plenty. I > > have learned the secret of being content in any and every situation, > > whether well fed or hungry, whether living in plenty or in want. I can > do > > all this through him who gives me strength. *-Philippians 4:12-13* > > -- I know what it is to be in need, and I know what it is to have plenty. I have learned the secret of being content in any and every situation, whether well fed or hungry, whether living in plenty or in want. I can do all this through him who gives me strength. *-Philippians 4:12-13*
