Re: [HippoCMS-dev] Cocoon caching problem

Ard Schrijvers Fri, 15 Jan 2010 04:05:56 -0800

Hello Reinier,

thanks for your feedback and writing down all your findings/analyses.
Great that it applies so well for your usecase! Note that this feature
is also really valuable for external sources, like an external get to
a rss feed.


Regards Ard

On Fri, Jan 15, 2010 at 12:21 PM, Reinier van den Born
<[email protected]> wrote:
> Hi Ard,
>
> Wanted to see what is really happening so had to fight with the logging
> system. Classloading problems.
> But I am happy to confirm: yes, it is working and doing pretty much what I
> need.
>
> For others to decide whether async: can be useful for them:
> - See manual page:
> http://wiki.onehippo.com/display/CMS/Using+asynchronic+get+for+cached+content
> - A cocoon: source is internally converted to a http request, so cannot be
> referring to a match in an internal-only="true" pipeline.
>
> Properties:
> - It will immediately return the page from the cache (even if the page
> expired long ago and cache is complety outdated).
>  If this is a problem one can periodically generate requests for the page
> to refresh the cache.
>
> - Only if the cached page has expired it will request a background refresh
> of the cache.
>  Refresh requests for distinct URLs are first collected in a queue.
>
> - Once per second it will try to refresh as many URLs from the queue as
> possible (limited by number of threads the refresher is allowed to use,
> threads running, etc).
>  Note that this cycle creates a delay for having the refreshed page
> available of 1/2 second on average.
>
> - If a page is being refreshed, another refresh request for the same page
> may be queued again.
>  So multiple refresh attempt for the same page may end up running in
> parallel (started 1 second apart), but this will be limited by:
>  1. The number of threads the refresher may use.
>  2. The time it takes to refresh the page (as soon as the first refresh is
> finished, the page is no longer expired so no more requests will be queued).
>
> One minor thought: the 1 second refresh-period (and the delay it causes)
> seems not to be a problem, but I (or my users) may change my mind about
> that.
> It might be useful to make this configurable (as max-threads is).
>
> Thanks for the excellent support!
>
> regards,
>
> Reinier
>
>
>
> On Tue, Jan 12, 2010 at 10:33 PM, Ard Schrijvers
> <[email protected]>wrote:
>
>> On Tue, Jan 12, 2010 at 5:59 PM, Reinier van den Born
>> <[email protected]> wrote:
>> > Hi Ard,
>> >
>> > Shoot, I am blind. The refresher is already dropping requests for keys
>> that
>> > are already in the queue.
>> > So then I don't think there is a serious problem left :-).
>>
>> Great! It just works then? I like it when people actually use the
>> async stuff I wrote back then, it wasn't easy :-))
>>
>> Thanks for letting me know it works,
>>
>> Cheers
>>
>> >
>> >
>> > Reinier
>> >
>> >
>> > On Tue, Jan 12, 2010 at 5:50 PM, Reinier van den Born <
>> > [email protected]> wrote:
>> >
>> >> Hi Ard,
>> >>
>> >> This is becoming interesting :-).
>> >>
>> >> On Tue, Jan 12, 2010 at 1:16 PM, Ard Schrijvers <
>> [email protected]
>> >> > wrote:
>> >>
>> >>> On Tue, Jan 12, 2010 at 11:40 AM, Reinier van den Born
>> >>> <[email protected]> wrote:
>> >>> > Hello Ard,
>> >>> >
>> >>> > I followed your proposal to take the outside (http:) route.
>> >>> > So I have two pipeline matchers now: "lazysource" and
>> >>> "lazysource_direct",
>> >>> > where "lazysource" generates "async:http://host/lazysource_direct";
>> and
>> >>> the
>> >>> > latter does the original work.
>> >>> >
>> >>> > This seems to work, ie. I get cached results in return until it
>> expires,
>> >>> > while the _direct returns an updated result instantly.
>> >>> > The only odd thing is that when I log the matchers,
>> "lazysource_direct"
>> >>> > appears to be invoked every time lazysource is.
>> >>>
>> >>> this is to compute the cachekey. The cached result should be returned
>> >>> still. So, that the call is done is correct
>> >>>
>> >>
>> >> Still a bit confusing (to me) that  the logging action is also called
>> when
>> >> only the key is generated.
>> >> I would expect it to be only called when entering the pipeline to
>> >> generate(),
>> >> Maybe I should configure the actions differently (I am not very
>> experienced
>> >> with Cocoon).
>> >>
>> >>
>> >> > So it seems the cached result is used, but still gets the result to be
>> >> > generated. Sounds odd, because that would defy the purpose of the
>> whole
>> >>
>> >>  No, the cachekey is generated, see
>> >>> http://cocoon.apache.org/2.2/core-modules/core/2.2/690_1_1.html
>> >>
>> >>
>> >> Interesting, but goes a bit too deep to fully understand with my limited
>> >> knowledge :-)
>> >> The page they claim to make more intelligible is actually easier to
>> >> follow...but is probably outdated.
>> >> But does this one describe really what is happening for an async-ed
>> Source?
>> >>
>> >> Still not really sure why cocoon would need to generate a cachekey for
>> >> "lazysource_direct" as long as "lazysource" is in cache and valid.
>> >> I can see it being necessary for normal caching go down a full cache
>> tree.
>> >> But for an expiring, time-triggered cache that doesn't seem necessary,
>> or?
>> >> Unless it is checking for existence. According to the page you refer to
>> >> that might be it...
>> >>
>> >> > exercise.
>> >>> >
>> >>> > I tried to put lazysource in a "caching", as opposed to "ecaching",
>> >>> pipeline
>> >>> > but that doesn't seem to make a difference.
>> >>> >
>> >>> > Made me wonder further, whether you have foreseen a mechanism that
>> keeps
>> >>> > simultaneous requests from being able to kick off parallel requests
>> for
>> >>> > "_direct" once it is outdated.
>> >>>
>> >>> you can try to create your own generator (configured in the
>> >>> cocoon.xconf to have a pool limit size of 1) and have in here a
>> >>> synchronized method, which blocks other requests
>> >>>
>> >>
>> >> Isn't having pool-size=1 and using synchronized, somehow doing things
>> >> double?
>> >>
>> >> Also it seems to me this would serialize all generate/refresh requests
>> >> (yes, this would prevent us from running out of memory)
>> >> but not suppress them. See below.
>> >>
>> >>
>> >> > Because that is what my original problem was.
>> >> >
>> >> > I made an attempt to locate the source code to take a peek myself.
>> Only
>> >> > found something doing with async in the repository block of cocoon
>> itself
>> >> > (CachingSource.java).
>> >> > Is that where I should be looking?
>> >>
>> >>  you should take a look at the hippo cachingsource block, but be
>> >>> warned, Cocoon's caching is a very complex thing.
>> >>
>> >>
>> >> Just looking around the code I ran into refresher, the one that controls
>> >> the actual generating work on asynced stuff.
>> >> What if it were to be modified to drop refresh() requests that are
>> already
>> >> in the queue (match on cacheKey)?
>> >> Since the refresher starts processing no more than one request per
>> second
>> >> (which in cases may be limiting, might want to make a parameter of
>> that?)
>> >> and is threadCount (which is a parameter) limited, unnecessary
>> generation
>> >> will not be completely avoided, but they will be kept under tight
>> control.
>> >> Quick upper limit estimate: number of threads+1 or so??
>> >>
>> >> Doesn't seem like a risky or complicated modification, but my view of
>> the
>> >> world may be to simplified :-)
>> >> What do you think?
>> >>
>> >> Groeten,
>> >>
>> >> Reinier
>> >>
>> >>
>> >>
>> >>>
>> >>> Regards Ard
>> >>>
>> >>> >
>> >>> > Thanks,
>> >>> >
>> >>> > Reinier
>> >>> >
>> >>> > Reinier van den Born
>> >>> > HintTech B.V.
>> >>> > The Netherlands
>> >>> >
>> >>> > T: +31(0)88 268 25 00
>> >>> > F: +31(0)88 268 25 01
>> >>> > M: +31(0)6 494 171 36
>> >>> > HintTech is a specialist in eBusiness Technology ( .Net, Java
>> platform,
>> >>> > Tridion ) and IT-Projects.
>> >>> > Chamber of Commerce The Hague nr. 27242282 | Sales Tax nr.
>> >>> NL8062.16.396.B01
>> >>> >
>> >>> >
>> >>> >
>> >>> >
>> >>> >
>> >>> > On Fri, Jan 8, 2010 at 2:45 PM, Ard Schrijvers <
>> >>> [email protected]>wrote:
>> >>> >
>> >>> >> Hello Reinier,
>> >>> >>
>> >>> >> On Fri, Jan 8, 2010 at 2:37 PM, Reinier van den Born
>> >>> >> <[email protected]> wrote:
>> >>> >> > Hi Ard,
>> >>> >> >
>> >>> >> > I have followed the instructions, but it doesn't seem to work. At
>> >>> least
>> >>> >> not
>> >>> >> > for a cocoon: resource.
>> >>> >> >
>> >>> >> > I created a separate pipeline (original was calling a resource, so
>> >>> that
>> >>> >> was
>> >>> >> > easier).
>> >>> >> >
>> >>> >> > Without async: it works like before, but with async the cache
>> seems
>> >>> not
>> >>> >> to
>> >>> >> > expire at all.
>> >>> >> > I tried both with and without a cache-expires argument.
>> >>> >> > This is what I am using:
>> >>> >> > <map:generate src="async:cocoon://lazysource"/>
>> >>> >> >
>> >>> >> > I also tried
>> >>> >> > <map:generate
>> >>> src="async:cocoon://lazysource?cocoon:cache-expires=10"/>
>> >>> >> >
>> >>> >> > When I switch back and forth between with async: and without
>> async: I
>> >>> get
>> >>> >> > the first cached version and the latest version respectively.
>> >>> >> >
>> >>> >> >>> just found out this is not entirely correct: after a real long
>> >>> while -
>> >>> >> at
>> >>> >> > least 20 minutes but more like an hour - the cache is being
>> updated
>> >>> <<
>> >>> >> >
>> >>> >> > However when I replace the cocoon: url with a http: one things
>> start
>> >>> to
>> >>> >> work
>> >>> >> > like you described.
>> >>> >> > So for instance
>> >>> >> > <map:generate src="async:
>> >>> >> > http://www.anwb.nl/verkeer/verkeersinformatie_files_nl"/>
>> >>> >> > is updated properly (with the delay).
>> >>> >> > (except that the first request will show the old cached version,
>> like
>> >>> we
>> >>> >> > discussed below, showing it is actually active :-)
>> >>> >> >
>> >>> >> > I hope this is just me overlooking something, because it looks
>> very
>> >>> >> > promosing.
>> >>> >> > Any ideas?
>> >>> >>
>> >>> >> It might have been broken for the cocoon:// protocol. I vaguely
>> >>> >> remember that it was extremely hard to accomplish for the cocoon://
>> >>> >> protocol.
>> >>> >>
>> >>> >> But can't you just do an http call to the cocoon instance? Thus,
>> >>> instead
>> >>> >> of:
>> >>> >>
>> >>> >> <map:generate
>> src="async:cocoon://lazysource?cocoon:cache-expires=10"/>
>> >>> >>
>> >>> >> use
>> >>> >>
>> >>> >> <map:generate src="async:
>> >>> >> http://www.mydomain.com/lazysource?cocoon:cache-expires=10"/>
>> >>> >>
>> >>> >> Regards Ard
>> >>> >>
>> >>> >> >
>> >>> >> > Reinier
>> >>> >> >
>> >>> >> >
>> >>> >> >
>> >>> >> > On Thu, Jan 7, 2010 at 5:36 PM, Ard Schrijvers <
>> >>> >> [email protected]>wrote:
>> >>> >> >
>> >>> >> >> hello,
>> >>> >> >>
>> >>> >> >> On Thu, Jan 7, 2010 at 5:17 PM, Reinier van den Born
>> >>> >> >> <[email protected]> wrote:
>> >>> >> >> > Hi Ard,
>> >>> >> >> >
>> >>> >> >> > Thanks for your reply.
>> >>> >> >> >
>> >>> >> >> > Your suggestion looks quite like what we need, but I am not
>> sure
>> >>> >> whether
>> >>> >> >> it
>> >>> >> >> > will really work for us.
>> >>> >> >> >
>> >>> >> >> > You are talking about an external resource where the (async)
>> >>> >> regeneration
>> >>> >> >> is
>> >>> >> >> > simply triggered when the cache has expired.
>> >>> >> >>
>> >>> >> >> it works also for internal
>> >>> >> >>
>> >>> >> >> > In our case we are dealing with a repository resource, where
>> cache
>> >>> >> >> > invalidations are triggered by resource changes.
>> >>> >> >> > So normally update events are invalidating all caches upto our
>> >>> final
>> >>> >> >> page.
>> >>> >> >> > If this continues to happen the cache
>> >>> >> >> > will be invalidated in between expires, so the problem would
>> >>> remain.
>> >>> >> >>
>> >>> >> >> I think I addressed this, where the old response is still being
>> >>> served.
>> >>> >> >>
>> >>> >> >> > Or will the cache-expires option protect the "slowpart" cache
>> from
>> >>> >> those
>> >>> >> >> > cache invalidations?
>> >>> >> >>
>> >>> >> >> think so, but it was long time ago i built it
>> >>> >> >>
>> >>> >> >> >
>> >>> >> >> > If not, maybe it is possible to set up an intermediate pipeline
>> to
>> >>> >> >> provide
>> >>> >> >> > the necessary isolation?
>> >>> >> >>
>> >>> >> >> I would just test whether it does what you expect from it...I
>> know
>> >>> the
>> >>> >> >> Cocoon caching is quite complex, and this part was much complexer
>> >>> then
>> >>> >> >> standard Cocoon caching
>> >>> >> >>
>> >>> >> >> >
>> >>> >> >> > Some more general questions:
>> >>> >> >> >
>> >>> >> >> > I've been looking around a bit further and ran into the
>> following
>> >>> >> page:
>> >>> >> >> >
>> >>> >> >>
>> >>> >>
>> >>>
>> http://wiki.onehippo.com/display/CMS/Using+asynchronic+get+for+cached+content
>> >>> >> >> > If I am not mistaken this is using the async option on Cocoon's
>> >>> >> >> > CachingSourceFactory.
>> >>> >> >> > Is this related to what you mention or are these entirely
>> >>> different
>> >>> >> >> things?
>> >>> >> >>
>> >>> >> >> Heey, I wrote that wiki page, great! This is exactly what I
>> meant. I
>> >>> >> >> only forgot I added a seperate source-factory async for
>> it...whow, I
>> >>> >> >> forgot I did all that back then :-))
>> >>> >> >>
>> >>> >> >> You should follow that page!
>> >>> >> >>
>> >>> >> >> >>
>> >>> >> >> > Btw, if I understand the async option correctly, the
>> regeneration
>> >>> is
>> >>> >> >> still
>> >>> >> >> > initiated by an incoming request.
>> >>> >> >>
>> >>> >> >> yes... (you can have some cron script calling it if you want?)
>> >>> >> >>
>> >>> >> >> > Does that mean that if a request arrives first after an expire
>> it
>> >>> >> still
>> >>> >> >> gets
>> >>> >> >> > the old content served?
>> >>> >> >>
>> >>> >> >> Exactly....
>> >>> >> >>
>> >>> >> >> > In other words, when page traffic is sufficiently low, the site
>> >>> will
>> >>> >> >> always
>> >>> >> >> > produce expired content?
>> >>> >> >>
>> >>> >> >> Tja, that's what you get with a async fetch...iirc, I did not add
>> a
>> >>> >> >> cron job checking the cache for async expired entries and refetch
>> >>> >> >> them...
>> >>> >> >>
>> >>> >> >> >
>> >>> >> >> > This shouldn't be a problem for the case we're experiencing our
>> >>> >> problems,
>> >>> >> >> > but it would be something to take into consideration for
>> different
>> >>> >> cases.
>> >>> >> >>
>> >>> >> >> Well....yes, I see your point, but it was quite hard already :-))
>> >>> >> >>
>> >>> >> >> Just try it I think is the best!
>> >>> >> >>
>> >>> >> >> Regards Ard
>> >>> >> >>
>> >>> >> >> >
>> >>> >> >> > Regards,
>> >>> >> >> >
>> >>> >> >> > Reinier
>> >>> >> >> >
>> >>> >> >> > Reinier van den Born
>> >>> >> >> > HintTech B.V.
>> >>> >> >> >
>> >>> >> >> > T: +31(0)88 268 25 00
>> >>> >> >> > F: +31(0)88 268 25 01
>> >>> >> >> > M: +31(0)6 494 171 36
>> >>> >> >> >
>> >>> >> >> > Delftechpark 37i | 2628 XJ Delft | The Netherlands
>> >>> >> >> > www.hinttech.com<javascript:void('http://www.hinttech.com');>
>> >>> >> >> >
>> >>> >> >> > HintTech is a specialist in eBusiness Technology ( .Net, Java
>> >>> >> platform,
>> >>> >> >> > Tridion ) and IT-Projects.
>> >>> >> >> > Chamber of Commerce The Hague nr. 27242282 | Sales Tax nr.
>> >>> >> >> NL8062.16.396.B01
>> >>> >> >> >
>> >>> >> >> >
>> >>> >>
>> >>> >> Deleted tail of the conversation here.
>> >>> > ********************************************
>> >>> > Hippocms-dev: Hippo CMS development public mailinglist
>> >>> >
>> >>> > Searchable archives can be found at:
>> >>> > MarkMail: http://hippocms-dev.markmail.org
>> >>> > Nabble: http://www.nabble.com/Hippo-CMS-f26633.html
>> >>> >
>> >>> >
>> >>> ********************************************
>> >>> Hippocms-dev: Hippo CMS development public mailinglist
>> >>>
>> >>> Searchable archives can be found at:
>> >>> MarkMail: http://hippocms-dev.markmail.org
>> >>> Nabble: http://www.nabble.com/Hippo-CMS-f26633.html
>> >>>
>> >>>
>> >>
>> > ********************************************
>> > Hippocms-dev: Hippo CMS development public mailinglist
>> >
>> > Searchable archives can be found at:
>> > MarkMail: http://hippocms-dev.markmail.org
>> > Nabble: http://www.nabble.com/Hippo-CMS-f26633.html
>> >
>> >
>> ********************************************
>> Hippocms-dev: Hippo CMS development public mailinglist
>>
>> Searchable archives can be found at:
>> MarkMail: http://hippocms-dev.markmail.org
>> Nabble: http://www.nabble.com/Hippo-CMS-f26633.html
>>
>>
> ********************************************
> Hippocms-dev: Hippo CMS development public mailinglist
>
> Searchable archives can be found at:
> MarkMail: http://hippocms-dev.markmail.org
> Nabble: http://www.nabble.com/Hippo-CMS-f26633.html
>
>
********************************************
Hippocms-dev: Hippo CMS development public mailinglist

Searchable archives can be found at:
MarkMail: http://hippocms-dev.markmail.org
Nabble: http://www.nabble.com/Hippo-CMS-f26633.html

Re: [HippoCMS-dev] Cocoon caching problem

Reply via email to