Re: [HippoCMS-dev] Cocoon caching problem

Reinier van den Born Fri, 15 Jan 2010 03:22:01 -0800

Hi Ard,

Wanted to see what is really happening so had to fight with the logging
system. Classloading problems.
But I am happy to confirm: yes, it is working and doing pretty much what I
need.


For others to decide whether async: can be useful for them:
- See manual page:
http://wiki.onehippo.com/display/CMS/Using+asynchronic+get+for+cached+content
- A cocoon: source is internally converted to a http request, so cannot be
referring to a match in an internal-only="true" pipeline.

Properties:
- It will immediately return the page from the cache (even if the page
expired long ago and cache is complety outdated).
  If this is a problem one can periodically generate requests for the page
to refresh the cache.

- Only if the cached page has expired it will request a background refresh
of the cache.
  Refresh requests for distinct URLs are first collected in a queue.

- Once per second it will try to refresh as many URLs from the queue as
possible (limited by number of threads the refresher is allowed to use,
threads running, etc).
  Note that this cycle creates a delay for having the refreshed page
available of 1/2 second on average.

- If a page is being refreshed, another refresh request for the same page
may be queued again.
  So multiple refresh attempt for the same page may end up running in
parallel (started 1 second apart), but this will be limited by:
  1. The number of threads the refresher may use.
  2. The time it takes to refresh the page (as soon as the first refresh is
finished, the page is no longer expired so no more requests will be queued).

One minor thought: the 1 second refresh-period (and the delay it causes)
seems not to be a problem, but I (or my users) may change my mind about
that.
It might be useful to make this configurable (as max-threads is).

Thanks for the excellent support!

regards,

Reinier



On Tue, Jan 12, 2010 at 10:33 PM, Ard Schrijvers
<[email protected]>wrote:

> On Tue, Jan 12, 2010 at 5:59 PM, Reinier van den Born
> <[email protected]> wrote:
> > Hi Ard,
> >
> > Shoot, I am blind. The refresher is already dropping requests for keys
> that
> > are already in the queue.
> > So then I don't think there is a serious problem left :-).
>
> Great! It just works then? I like it when people actually use the
> async stuff I wrote back then, it wasn't easy :-))
>
> Thanks for letting me know it works,
>
> Cheers
>
> >
> >
> > Reinier
> >
> >
> > On Tue, Jan 12, 2010 at 5:50 PM, Reinier van den Born <
> > [email protected]> wrote:
> >
> >> Hi Ard,
> >>
> >> This is becoming interesting :-).
> >>
> >> On Tue, Jan 12, 2010 at 1:16 PM, Ard Schrijvers <
> [email protected]
> >> > wrote:
> >>
> >>> On Tue, Jan 12, 2010 at 11:40 AM, Reinier van den Born
> >>> <[email protected]> wrote:
> >>> > Hello Ard,
> >>> >
> >>> > I followed your proposal to take the outside (http:) route.
> >>> > So I have two pipeline matchers now: "lazysource" and
> >>> "lazysource_direct",
> >>> > where "lazysource" generates "async:http://host/lazysource_direct";
> and
> >>> the
> >>> > latter does the original work.
> >>> >
> >>> > This seems to work, ie. I get cached results in return until it
> expires,
> >>> > while the _direct returns an updated result instantly.
> >>> > The only odd thing is that when I log the matchers,
> "lazysource_direct"
> >>> > appears to be invoked every time lazysource is.
> >>>
> >>> this is to compute the cachekey. The cached result should be returned
> >>> still. So, that the call is done is correct
> >>>
> >>
> >> Still a bit confusing (to me) that  the logging action is also called
> when
> >> only the key is generated.
> >> I would expect it to be only called when entering the pipeline to
> >> generate(),
> >> Maybe I should configure the actions differently (I am not very
> experienced
> >> with Cocoon).
> >>
> >>
> >> > So it seems the cached result is used, but still gets the result to be
> >> > generated. Sounds odd, because that would defy the purpose of the
> whole
> >>
> >>  No, the cachekey is generated, see
> >>> http://cocoon.apache.org/2.2/core-modules/core/2.2/690_1_1.html
> >>
> >>
> >> Interesting, but goes a bit too deep to fully understand with my limited
> >> knowledge :-)
> >> The page they claim to make more intelligible is actually easier to
> >> follow...but is probably outdated.
> >> But does this one describe really what is happening for an async-ed
> Source?
> >>
> >> Still not really sure why cocoon would need to generate a cachekey for
> >> "lazysource_direct" as long as "lazysource" is in cache and valid.
> >> I can see it being necessary for normal caching go down a full cache
> tree.
> >> But for an expiring, time-triggered cache that doesn't seem necessary,
> or?
> >> Unless it is checking for existence. According to the page you refer to
> >> that might be it...
> >>
> >> > exercise.
> >>> >
> >>> > I tried to put lazysource in a "caching", as opposed to "ecaching",
> >>> pipeline
> >>> > but that doesn't seem to make a difference.
> >>> >
> >>> > Made me wonder further, whether you have foreseen a mechanism that
> keeps
> >>> > simultaneous requests from being able to kick off parallel requests
> for
> >>> > "_direct" once it is outdated.
> >>>
> >>> you can try to create your own generator (configured in the
> >>> cocoon.xconf to have a pool limit size of 1) and have in here a
> >>> synchronized method, which blocks other requests
> >>>
> >>
> >> Isn't having pool-size=1 and using synchronized, somehow doing things
> >> double?
> >>
> >> Also it seems to me this would serialize all generate/refresh requests
> >> (yes, this would prevent us from running out of memory)
> >> but not suppress them. See below.
> >>
> >>
> >> > Because that is what my original problem was.
> >> >
> >> > I made an attempt to locate the source code to take a peek myself.
> Only
> >> > found something doing with async in the repository block of cocoon
> itself
> >> > (CachingSource.java).
> >> > Is that where I should be looking?
> >>
> >>  you should take a look at the hippo cachingsource block, but be
> >>> warned, Cocoon's caching is a very complex thing.
> >>
> >>
> >> Just looking around the code I ran into refresher, the one that controls
> >> the actual generating work on asynced stuff.
> >> What if it were to be modified to drop refresh() requests that are
> already
> >> in the queue (match on cacheKey)?
> >> Since the refresher starts processing no more than one request per
> second
> >> (which in cases may be limiting, might want to make a parameter of
> that?)
> >> and is threadCount (which is a parameter) limited, unnecessary
> generation
> >> will not be completely avoided, but they will be kept under tight
> control.
> >> Quick upper limit estimate: number of threads+1 or so??
> >>
> >> Doesn't seem like a risky or complicated modification, but my view of
> the
> >> world may be to simplified :-)
> >> What do you think?
> >>
> >> Groeten,
> >>
> >> Reinier
> >>
> >>
> >>
> >>>
> >>> Regards Ard
> >>>
> >>> >
> >>> > Thanks,
> >>> >
> >>> > Reinier
> >>> >
> >>> > Reinier van den Born
> >>> > HintTech B.V.
> >>> > The Netherlands
> >>> >
> >>> > T: +31(0)88 268 25 00
> >>> > F: +31(0)88 268 25 01
> >>> > M: +31(0)6 494 171 36
> >>> > HintTech is a specialist in eBusiness Technology ( .Net, Java
> platform,
> >>> > Tridion ) and IT-Projects.
> >>> > Chamber of Commerce The Hague nr. 27242282 | Sales Tax nr.
> >>> NL8062.16.396.B01
> >>> >
> >>> >
> >>> >
> >>> >
> >>> >
> >>> > On Fri, Jan 8, 2010 at 2:45 PM, Ard Schrijvers <
> >>> [email protected]>wrote:
> >>> >
> >>> >> Hello Reinier,
> >>> >>
> >>> >> On Fri, Jan 8, 2010 at 2:37 PM, Reinier van den Born
> >>> >> <[email protected]> wrote:
> >>> >> > Hi Ard,
> >>> >> >
> >>> >> > I have followed the instructions, but it doesn't seem to work. At
> >>> least
> >>> >> not
> >>> >> > for a cocoon: resource.
> >>> >> >
> >>> >> > I created a separate pipeline (original was calling a resource, so
> >>> that
> >>> >> was
> >>> >> > easier).
> >>> >> >
> >>> >> > Without async: it works like before, but with async the cache
> seems
> >>> not
> >>> >> to
> >>> >> > expire at all.
> >>> >> > I tried both with and without a cache-expires argument.
> >>> >> > This is what I am using:
> >>> >> > <map:generate src="async:cocoon://lazysource"/>
> >>> >> >
> >>> >> > I also tried
> >>> >> > <map:generate
> >>> src="async:cocoon://lazysource?cocoon:cache-expires=10"/>
> >>> >> >
> >>> >> > When I switch back and forth between with async: and without
> async: I
> >>> get
> >>> >> > the first cached version and the latest version respectively.
> >>> >> >
> >>> >> >>> just found out this is not entirely correct: after a real long
> >>> while -
> >>> >> at
> >>> >> > least 20 minutes but more like an hour - the cache is being
> updated
> >>> <<
> >>> >> >
> >>> >> > However when I replace the cocoon: url with a http: one things
> start
> >>> to
> >>> >> work
> >>> >> > like you described.
> >>> >> > So for instance
> >>> >> > <map:generate src="async:
> >>> >> > http://www.anwb.nl/verkeer/verkeersinformatie_files_nl"/>
> >>> >> > is updated properly (with the delay).
> >>> >> > (except that the first request will show the old cached version,
> like
> >>> we
> >>> >> > discussed below, showing it is actually active :-)
> >>> >> >
> >>> >> > I hope this is just me overlooking something, because it looks
> very
> >>> >> > promosing.
> >>> >> > Any ideas?
> >>> >>
> >>> >> It might have been broken for the cocoon:// protocol. I vaguely
> >>> >> remember that it was extremely hard to accomplish for the cocoon://
> >>> >> protocol.
> >>> >>
> >>> >> But can't you just do an http call to the cocoon instance? Thus,
> >>> instead
> >>> >> of:
> >>> >>
> >>> >> <map:generate
> src="async:cocoon://lazysource?cocoon:cache-expires=10"/>
> >>> >>
> >>> >> use
> >>> >>
> >>> >> <map:generate src="async:
> >>> >> http://www.mydomain.com/lazysource?cocoon:cache-expires=10"/>
> >>> >>
> >>> >> Regards Ard
> >>> >>
> >>> >> >
> >>> >> > Reinier
> >>> >> >
> >>> >> >
> >>> >> >
> >>> >> > On Thu, Jan 7, 2010 at 5:36 PM, Ard Schrijvers <
> >>> >> [email protected]>wrote:
> >>> >> >
> >>> >> >> hello,
> >>> >> >>
> >>> >> >> On Thu, Jan 7, 2010 at 5:17 PM, Reinier van den Born
> >>> >> >> <[email protected]> wrote:
> >>> >> >> > Hi Ard,
> >>> >> >> >
> >>> >> >> > Thanks for your reply.
> >>> >> >> >
> >>> >> >> > Your suggestion looks quite like what we need, but I am not
> sure
> >>> >> whether
> >>> >> >> it
> >>> >> >> > will really work for us.
> >>> >> >> >
> >>> >> >> > You are talking about an external resource where the (async)
> >>> >> regeneration
> >>> >> >> is
> >>> >> >> > simply triggered when the cache has expired.
> >>> >> >>
> >>> >> >> it works also for internal
> >>> >> >>
> >>> >> >> > In our case we are dealing with a repository resource, where
> cache
> >>> >> >> > invalidations are triggered by resource changes.
> >>> >> >> > So normally update events are invalidating all caches upto our
> >>> final
> >>> >> >> page.
> >>> >> >> > If this continues to happen the cache
> >>> >> >> > will be invalidated in between expires, so the problem would
> >>> remain.
> >>> >> >>
> >>> >> >> I think I addressed this, where the old response is still being
> >>> served.
> >>> >> >>
> >>> >> >> > Or will the cache-expires option protect the "slowpart" cache
> from
> >>> >> those
> >>> >> >> > cache invalidations?
> >>> >> >>
> >>> >> >> think so, but it was long time ago i built it
> >>> >> >>
> >>> >> >> >
> >>> >> >> > If not, maybe it is possible to set up an intermediate pipeline
> to
> >>> >> >> provide
> >>> >> >> > the necessary isolation?
> >>> >> >>
> >>> >> >> I would just test whether it does what you expect from it...I
> know
> >>> the
> >>> >> >> Cocoon caching is quite complex, and this part was much complexer
> >>> then
> >>> >> >> standard Cocoon caching
> >>> >> >>
> >>> >> >> >
> >>> >> >> > Some more general questions:
> >>> >> >> >
> >>> >> >> > I've been looking around a bit further and ran into the
> following
> >>> >> page:
> >>> >> >> >
> >>> >> >>
> >>> >>
> >>>
> http://wiki.onehippo.com/display/CMS/Using+asynchronic+get+for+cached+content
> >>> >> >> > If I am not mistaken this is using the async option on Cocoon's
> >>> >> >> > CachingSourceFactory.
> >>> >> >> > Is this related to what you mention or are these entirely
> >>> different
> >>> >> >> things?
> >>> >> >>
> >>> >> >> Heey, I wrote that wiki page, great! This is exactly what I
> meant. I
> >>> >> >> only forgot I added a seperate source-factory async for
> it...whow, I
> >>> >> >> forgot I did all that back then :-))
> >>> >> >>
> >>> >> >> You should follow that page!
> >>> >> >>
> >>> >> >> >>
> >>> >> >> > Btw, if I understand the async option correctly, the
> regeneration
> >>> is
> >>> >> >> still
> >>> >> >> > initiated by an incoming request.
> >>> >> >>
> >>> >> >> yes... (you can have some cron script calling it if you want?)
> >>> >> >>
> >>> >> >> > Does that mean that if a request arrives first after an expire
> it
> >>> >> still
> >>> >> >> gets
> >>> >> >> > the old content served?
> >>> >> >>
> >>> >> >> Exactly....
> >>> >> >>
> >>> >> >> > In other words, when page traffic is sufficiently low, the site
> >>> will
> >>> >> >> always
> >>> >> >> > produce expired content?
> >>> >> >>
> >>> >> >> Tja, that's what you get with a async fetch...iirc, I did not add
> a
> >>> >> >> cron job checking the cache for async expired entries and refetch
> >>> >> >> them...
> >>> >> >>
> >>> >> >> >
> >>> >> >> > This shouldn't be a problem for the case we're experiencing our
> >>> >> problems,
> >>> >> >> > but it would be something to take into consideration for
> different
> >>> >> cases.
> >>> >> >>
> >>> >> >> Well....yes, I see your point, but it was quite hard already :-))
> >>> >> >>
> >>> >> >> Just try it I think is the best!
> >>> >> >>
> >>> >> >> Regards Ard
> >>> >> >>
> >>> >> >> >
> >>> >> >> > Regards,
> >>> >> >> >
> >>> >> >> > Reinier
> >>> >> >> >
> >>> >> >> > Reinier van den Born
> >>> >> >> > HintTech B.V.
> >>> >> >> >
> >>> >> >> > T: +31(0)88 268 25 00
> >>> >> >> > F: +31(0)88 268 25 01
> >>> >> >> > M: +31(0)6 494 171 36
> >>> >> >> >
> >>> >> >> > Delftechpark 37i | 2628 XJ Delft | The Netherlands
> >>> >> >> > www.hinttech.com<javascript:void('http://www.hinttech.com');>
> >>> >> >> >
> >>> >> >> > HintTech is a specialist in eBusiness Technology ( .Net, Java
> >>> >> platform,
> >>> >> >> > Tridion ) and IT-Projects.
> >>> >> >> > Chamber of Commerce The Hague nr. 27242282 | Sales Tax nr.
> >>> >> >> NL8062.16.396.B01
> >>> >> >> >
> >>> >> >> >
> >>> >>
> >>> >> Deleted tail of the conversation here.
> >>> > ********************************************
> >>> > Hippocms-dev: Hippo CMS development public mailinglist
> >>> >
> >>> > Searchable archives can be found at:
> >>> > MarkMail: http://hippocms-dev.markmail.org
> >>> > Nabble: http://www.nabble.com/Hippo-CMS-f26633.html
> >>> >
> >>> >
> >>> ********************************************
> >>> Hippocms-dev: Hippo CMS development public mailinglist
> >>>
> >>> Searchable archives can be found at:
> >>> MarkMail: http://hippocms-dev.markmail.org
> >>> Nabble: http://www.nabble.com/Hippo-CMS-f26633.html
> >>>
> >>>
> >>
> > ********************************************
> > Hippocms-dev: Hippo CMS development public mailinglist
> >
> > Searchable archives can be found at:
> > MarkMail: http://hippocms-dev.markmail.org
> > Nabble: http://www.nabble.com/Hippo-CMS-f26633.html
> >
> >
> ********************************************
> Hippocms-dev: Hippo CMS development public mailinglist
>
> Searchable archives can be found at:
> MarkMail: http://hippocms-dev.markmail.org
> Nabble: http://www.nabble.com/Hippo-CMS-f26633.html
>
>
********************************************
Hippocms-dev: Hippo CMS development public mailinglist

Searchable archives can be found at:
MarkMail: http://hippocms-dev.markmail.org
Nabble: http://www.nabble.com/Hippo-CMS-f26633.html

Re: [HippoCMS-dev] Cocoon caching problem

Reply via email to