Re: StoreJanitor

Reinhard Poetz Tue, 03 Apr 2007 13:35:31 -0700

Ard Schrijvers wrote:

> Before I say things that are wrong, please consider that the StoreJanitor wasinvented long before I looked into the cocoon code, so probably a lot ofdiscussion and good ideas has been around which I am not aware of. But still, myideas about the StoreJanitor (and sorry for the long mail, but perhaps it mightcontain something useful):

> 1) How it works and its intention (I think :-) ): The StoreJanitor isoriginally invented to monitor cocoon's memory useage and does this by checkingsome memory values every X (default 10) seconds. Beside the fact that I doubtusers know that it is quite important to configure the store janitor correctly,I stick to the defaults and use a heapsize of just a little lower then JVMmaxmemory.

> Now, every 10 seconds, the StoreJanitor does a check wether(getJVM().totalMemory() >= getMaxHeapSize() && (getJVM().freeMemory() <getMinFreeMemory()) is true, and if so, the next store is choosen (compared toprevioud one) and entries are removed from this store (I saw a post that intrunk not one single store is chosen anymore, but an equal part of all of themis being removed, right?...probably you can configure which stores to use, idon't know)


AFAICS there are two freeing algorithms in trunk: round-robin and all-stores.

> 2) My Observations: When running high traffic sites and render them live(only mod_cache in between which holds pages for 5 to 10 min) like [1] or [2],then checking every X sec for a JVM to be low on memory doesn't make sense tome. At the moment of checking, the JVM might be perfectly sound but just neededsome extra memory for a moment, in that case, the Store Janitor is removingitems from cache while not needed. Also, when the JVM is really in trouble, butthe Store Janitor is not checking for 5 more sec....this might be too long for aJVM in a high traffic site when it is low on memory. Problems that result fromit are:

> - Since there is no way to remove cache entries from the used cache impl bythe cache's eviction policy, the cache entries from memory are removed bystarting from entry 0, whatever this might be in the cache. There is a verylikely situation, that at the very next request, the same cache entries areadded again.

> - Ones the JVM gets low on memory, and the StoreJanitor is needed, it isquite likely that from that moment on, the StoreJanitor runs *every* 10 seconds,and keeps removing cache entries which you perhaps don't want to be removed,like compiled stylesheets.


yep, that's a problem

> 1) suppose, from one store (or since trunk from multiple stores) 10%(default) is removed. This 10% is from the number of memory cache entries. Iquite frequently happen to have only 200 entries in memory for each store ( Ihave added *many* different stores to enable all we wanted in a high trafficenvironment) and the rest is disk store. Now, suppose, the JVM which has 512 Mbof memory, is low on memory, and removes 10% of 200 entries = 20 entries,helping me zero!


agreed

> These memory entries are my most important ones, so, on the next request,they are either added again, or, from diskcache I have a hit, implying that thecache will put this cache entry in memory again. If I would use 2000 memoryitems, I am very sure, the 200 items which are cleaned are put back in memorybefore the next StoreJanitor runs.> 2) I am not sure if in trunk you can configure wether the StoreJanitorshould leave one store alone, like the DefaultTransientStore. In this store,typically, compiled stylesheets end up, and i18n resource bundles. Since thesefiles are needed virtually on every request, I had rather not that theStoreJanitor removes from this store. I think, the StoreJanitor does so, leavingmy "critical app" in an even worse state, and on the next request, the hardlyimproved JVM needs to recompile stylesheets and i18n resource bundles.


agreed

> 3) What if the JVM being low is not because of the stores....For example,you have added some component which has some problems you did not know, and,that component is the real reason for you OOM. The StoreJanitor, sees your lowmemory, and starts removing entries from your perfectly sound cache, leaving youapp in a much worse situation then it already was. Your component with memoryleak has some more memory it now can fill, and hapily does this, making theStoreJanitor remove more and more entries from cache, untill it ends up with anempty cache. You could blame the wrong component for this behavior. One of thesewrong components in use is the event registry for event caching, which made ourhigh traffic sites with 512 Mb crash every two days. Better that I write inanother mail what I did to the event cache registry, why I did not yet postabout it, and if others are interested and how to include it in the trunk.Bottom line is that there was a major OOM problem if the registry grows,resulting in a StoreJanitor removing cache entries while this was actuallyincreasing the problem.> 4) By default, probably most people are using ehcache. Naturally,overflow-to-disk is true. In a high traffic site, the number of cache keys cangrow enormously (I have seen mails around people complaing about disk cachedgrowing to multiple Gbytes). Certainly, when the not very experienced user usessomething like a session attr (or timestamp and many more possibilities) in astylesheet parameter which ends up in the cache key (but perhaps, should cocoonbe the target for high traffic sites for the average user, I don't know). Now,and this is IMO one of the major weakenesses of ehcache (or I missed itcompletely), I did not find any way to limit the number of disk store entries.

Actually we don't configure this value. According tohttp://ehcache.sourceforge.net/documentation/configuration.html the defaultvalue is 0 meaning unlimited. We should use the 1.2.4 constructor that allows toset a maxElementsOnDisk parameter.

> This implies, that the disk store can grow indefinitely. For the ones everlooking at the status page, cache keys in memory of about 2 kb are quite commonin cocoon (actually, the dept of the folder structure of your app is ofinfluence). The disk store cache keys are kept in *memory*. So, suppose, you runyour app with 128 Mb, and you have overflow-to-disk=true, your app runs intoproblem when there are about 50.000 keys in cache. Then your StoreJanitor keepremoving entries from your memory cache, which are refilled with disk storeentries just a few moments later. Now, if you really know how to configure yourstores, you use a time2liveSeconds and time2IdleSeconds to let your store clearunused cache entries. This is good to do, unless, you depend on something likean event registry which is currently in cocoon trunk. The problem is, that theStoreJanitor removes cache entries by calling the free from the correct store,which, might for example be the eventaware store. This event aware store,updates (cleans) its registry before removing the cache entry from its delegate.Now, when you use the internal cleaning of caches by a time2liveSeconds ortime2IdleSeconds, the event registry is not cleaned and will lead to OOM in thelong run.

> I have more things about it, but probably nobody will read it anymore, but inshort, my conclusion is that the StoreJanitor never helped me out, but merelyimpoverished my app when it ran

I wonder what StoreJanitor is good for at all. EHCache takes care that thenumber of items in the memory cache doesn't grow indefinitly and starts its owncleanup threads for the disc store(http://ehcache.sourceforge.net/documentation/storage_options.html#DiskStore).JCS will probably do the same. I guess that original purpose of StoreJanitor waswhen Cocoon had its own store implementations (transient, persistent) and we hadto take care of cleaning them up in our code.Only the persistent store can grow unlimited but since it should only be usedfor special usecases, it shouldn't be a real problem.


>
>                                            --------o0o--------
>
> The rules I try to follow to avoid the Store Janitor to run
>

> 1) use readers in noncaching pipelines and use expires on them to avoidcache/memory polution> 2) use a different store for repository binary sources which has only a diskstore part and no memory part (cached-binary: protocol added)

> 3) use a different store for repository sources then for pipeline cache

> 4) replaced the abstract double mapping event registry to use weakreferencesand let the JVM clean up my event registry> 5) (4) gave me undesired behavior by removing weakrefs in combination withehcache when overflowing items to disk (i could not reproduce this, but seemsthat my references to cachekeys got lost). Testing with JCSCache solved thisproblem, gave me faster response times and gave me for free to limit the numberof disk cache entries. Disadvantage of the weakreferences, is that I disabledpersitstent caches for jvm restarts, but, I never wanted this anyway (but thismight be implemented quite easily, but might take long start up times)> 6) JCSCache has a complex configuration IMO. Therefor, I added defaultconfigurations to choose from, for example:

> <store logger="core.store">
>    <parameter name="region-name" value="store"/>
>    <parameter name="size" value="small"/>
>
> where size might be small, medium, large or huge.
>

> I think we have created with in this way a setup for cocoon, where it isharder for unexperienced users to have memory > problems when trying toimplement larger sites.

> Hopefully somebody read my mail until here :-) I am curious about whatothers think,

> [1] http://www.minfin.nl
> [2] http://www.minbuza.nl

What do we want to do in order to improve the situation? After reading your mailand from my own experience I'd say


 - introduce a maxPersistentObjects parameter and use it in EHDefaultCache to
   set maxElementsOnDisk
 - make the registration of stores at StoreJanitor configureable
   (Though I wonder what the default value should be, true or false?)
 - fix EventRegistry

Any further ideas?

P.S. Ard, answering to your mails is very difficult because there are no linebreaks. Is anybody else experiencing the same problem or is it only me?

--

Reinhard Pötz Independent Consultant, Trainer & (IT)-Coach

{Software Engineering, Open Source, Web Applications, Apache Cocoon}

                                       web(log): http://www.poetz.cc
--------------------------------------------------------------------

Re: StoreJanitor

Reply via email to