Re: Event caching and CachedSource

2004-03-03 Thread Unico Hommes
Vadim Gritsenko wrote:

Unico Hommes wrote:

Carsten Ziegeler wrote:

Unico Hommes wrote:

I'd also like to change the protocol URL a little bit. Since the 
timeout parameter will only be applicable to the delay refresher 
implementation and not to the event aware one I think it would be 
better to specify it with a query parameter instead.

Current syntax: cache://[EMAIL PROTECTED]@http://www.apache.org/
Proposed syntax: 
cache:http://www.apache.org/?cache-expires=60&cache-name=main

The protocol:subprotocol syntax is also more in line with well 
established conventions such as in jdbc for instance.

Let me know if you have any objections or comments.
  


No objections from me, but the parameters must have clear names, 
which means there shouldn't be a conflict. Imagine:

cache:http://www.apache.org/?cache-expires=60&cache-name=main&expires=500 

(Dumb example, I know) But what I mean is that the real url/source
could also have parameters and it must be clear which ones are
for the cache source and which ones are for the real source,
so perhaps something like "cocoon-cache..." or perhaps better
using invalid names like "cocoon:cache=60"?
Yeah I had been thinkin along the same lines. I like the colon 
notation because it resembles familiar namespace notation. So I'll go 
with your latter suggestion.


Does it make sense to have it both ways? So, say, you can use either:
   cache:main:[EMAIL PROTECTED]://www.apache.org/
or:
   cache:@http://www.apache.org/?cache:name=main&cache:expires=60
?

Hmm, I would prefer to settle on just one syntax. Prevents confusion and 
minimizes amount of code to maintain. Also what to do when expiration 
value is not applicable? Ignore it or throw an exception. I think we 
should keep it as simple as possible.

Unico


Re: Event caching and CachedSource

2004-03-03 Thread Vadim Gritsenko
Unico Hommes wrote:

Carsten Ziegeler wrote:

Unico Hommes wrote:

I'd also like to change the protocol URL a little bit. Since the 
timeout parameter will only be applicable to the delay refresher 
implementation and not to the event aware one I think it would be 
better to specify it with a query parameter instead.

Current syntax: cache://[EMAIL PROTECTED]@http://www.apache.org/
Proposed syntax: 
cache:http://www.apache.org/?cache-expires=60&cache-name=main

The protocol:subprotocol syntax is also more in line with well 
established conventions such as in jdbc for instance.

Let me know if you have any objections or comments.
  
No objections from me, but the parameters must have clear names, 
which means there shouldn't be a conflict. Imagine:

cache:http://www.apache.org/?cache-expires=60&cache-name=main&expires=500 

(Dumb example, I know) But what I mean is that the real url/source
could also have parameters and it must be clear which ones are
for the cache source and which ones are for the real source,
so perhaps something like "cocoon-cache..." or perhaps better
using invalid names like "cocoon:cache=60"?
Yeah I had been thinkin along the same lines. I like the colon 
notation because it resembles familiar namespace notation. So I'll go 
with your latter suggestion.


Does it make sense to have it both ways? So, say, you can use either:
   cache:main:[EMAIL PROTECTED]://www.apache.org/
or:
   cache:@http://www.apache.org/?cache:name=main&cache:expires=60
?
Vadim




Re: Event caching and CachedSource

2004-03-03 Thread Unico Hommes
Carsten Ziegeler wrote:

Unico Hommes wrote:

 

I'd also like to change the protocol URL a little bit. Since 
the timeout parameter will only be applicable to the delay 
refresher implementation and not to the event aware one I 
think it would be better to specify it with a query parameter instead.

Current syntax: cache://[EMAIL PROTECTED]@http://www.apache.org/
Proposed syntax: 
cache:http://www.apache.org/?cache-expires=60&cache-name=main

The protocol:subprotocol syntax is also more in line with 
well established conventions such as in jdbc for instance.

Let me know if you have any objections or comments.

   

No objections from me, but the parameters must have clear names, 
which means there shouldn't be a conflict. Imagine:

cache:http://www.apache.org/?cache-expires=60&cache-name=main&expires=500

(Dumb example, I know) But what I mean is that the real url/source
could also have parameters and it must be clear which ones are
for the cache source and which ones are for the real source,
so perhaps something like "cocoon-cache..." or perhaps better
using invalid names like "cocoon:cache=60"?
 

Yeah I had been thinkin along the same lines. I like the colon notation 
because it resembles familiar namespace notation. So I'll go with your 
latter suggestion.

Unico


RE: Event caching and CachedSource

2004-03-03 Thread Carsten Ziegeler
Unico Hommes wrote:

> 
> I'd also like to change the protocol URL a little bit. Since 
> the timeout parameter will only be applicable to the delay 
> refresher implementation and not to the event aware one I 
> think it would be better to specify it with a query parameter instead.
> 
> Current syntax: cache://[EMAIL PROTECTED]@http://www.apache.org/
> Proposed syntax: 
> cache:http://www.apache.org/?cache-expires=60&cache-name=main
> 
> The protocol:subprotocol syntax is also more in line with 
> well established conventions such as in jdbc for instance.
> 
> Let me know if you have any objections or comments.
> 
No objections from me, but the parameters must have clear names, 
which means there shouldn't be a conflict. Imagine:

cache:http://www.apache.org/?cache-expires=60&cache-name=main&expires=500

(Dumb example, I know) But what I mean is that the real url/source
could also have parameters and it must be clear which ones are
for the cache source and which ones are for the real source,
so perhaps something like "cocoon-cache..." or perhaps better
using invalid names like "cocoon:cache=60"?

Carsten



Re: Event caching and CachedSource

2004-03-02 Thread Unico Hommes
Carsten Ziegeler wrote:

Unico Hommes wrote:
 

Hi gang :-)

A drawback I have been running into lately with eventcache 
mechanism is that it lacks the ability to remove heavy 
processing from the critical path. An event will simply 
remove a set of cached pipelines from the cache completely. 
Making the subsequent request for such a pipeline potentialy 
very slow. In applications where isolation is not a 
requirement this is an unnecessary drawback.

I am looking at the excellent CachedSource stuff that is in 
the scratchpad area ATM and am wondering how it fits together 
with the eventcache stuff. One thing I am looking into right 
now is to write an EventAware Refresher implementation.

For those unfamiliar with CachedSource, it is a Source 
wrapper that can cache a its delegate. Refreshing can be done 
either synchronously or asynchronously but currently only 
based upon a specified time-out. What I'd like to do is 
generalize this a bit in order to add the ability to 
externally trigger invalidation.

For this however I think a modification to the Refresher 
interface is needed.

Instead of:

Refresher {
 refresh(key,uri,timeout);
 periodicallyRefresh(key,uri,timeout);
}
I'd like to remove timeout semantics from the interface:

Refresher {
 refresh(key,uri,params);
}
I don't think there is currently a reason for there being two 
the separate methods. So I think we can safely combine them 
into one. But I guess I am looking at Carsten for confirmation... :-)

   

Although you actually don't need my confirmation as it's not my
but *our* source, here it is :)
I think this makes sense and I think we should also move this
out of the scratchpad afterwards as well.
 

I'd also like to change the protocol URL a little bit. Since the timeout 
parameter will only be applicable to the delay refresher implementation 
and not to the event aware one I think it would be better to specify it 
with a query parameter instead.

Current syntax: cache://[EMAIL PROTECTED]@http://www.apache.org/
Proposed syntax: 
cache:http://www.apache.org/?cache-expires=60&cache-name=main

The protocol:subprotocol syntax is also more in line with well 
established conventions such as in jdbc for instance.

Let me know if you have any objections or comments.

Unico



RE: Event caching and CachedSource

2004-03-02 Thread Hunsberger, Peter
Unico Hommes <[EMAIL PROTECTED]> writes:

> 
> Hi gang :-)
> 
> A drawback I have been running into lately with eventcache 
> mechanism is 
> that it lacks the ability to remove heavy processing from the 
> critical 
> path. An event will simply remove a set of cached pipelines from the 
> cache completely. Making the subsequent request for such a pipeline 
> potentialy very slow. In applications where isolation is not a 
> requirement this is an unnecessary drawback.
> 
> I am looking at the excellent CachedSource stuff that is in the 
> scratchpad area ATM and am wondering how it fits together with the 
> eventcache stuff. One thing I am looking into right now is to 
> write an 
> EventAware Refresher implementation.

Cool, in our case, for much of our data, we know we can, in theory,
repopulate the cache the moment after the data is invalidated (which is
the moment before the new version is committed).  However, we need to do
this asynchronously if possible.  We haven't started to look at this
issue, but it sounds like this might be the way to go?

> 
> For those unfamiliar with CachedSource, it is a Source 
> wrapper that can 
> cache a its delegate. Refreshing can be done either synchronously or 
> asynchronously but currently only based upon a specified 
> time-out. What 
> I'd like to do is generalize this a bit in order to add the 
> ability to  
> externally trigger invalidation.
>





Re: Event caching and CachedSource

2004-03-02 Thread Geoff Howard
Unico Hommes wrote:

Geoff Howard wrote:

Unico Hommes wrote:

Corin Moss wrote:

Hiya,

I'm probably wrong here, but my understanding of the RefresherImpl is
that the "timeout" is used to cache the page on a timed basis a la 
cron
(although that could be what you mean).

I'm not entirely sure how this helps with external validation directly
:)
What I've been playing around with in this class is a 
"refreshInFuture"
method which does a one-time-only refresh in x seconds (probably 1
minimum to be safe from expiry problems :)

Does that help you at all? I'm happy to contribute it if it would.

 

Hmm, I guess my explanation was a little bit dense. Let me put this 
into context of the things I am trying to solve.

We recently deployed a website that is backed by a webdav 
repository. Obviously this introduces some network overhead and 
particular parts of the site can get slow to generate. Especially 
those generated using TraversableGenerators.

Traditional caching requires the objects on behalf of which it 
caches to provide a so called Validity object in order to determine 
whether its cached objects are still valid upon subsequent request. 
Most sources will try to determine this by providing a last modifed 
timestamp the cache can compare. Since retrieving the last 
modification time is an expensive operation in the case of a webdav 
source determining the validity of a cached response would be almost 
as expensive as generating a new response, requiring a webdav 
propfind for each source that is a member of the generated pipeline.

Instead we employ a different strategy altoghether. We just tell the 
Cache that the source is always valid using a special Validity 
object. Cache invalidation will be accomplished by an external event 
triggering the removal of all pipelines the Source is associated 
with. This means though that a subsequent request will be slow since 
nothing is cached anymore.

Perhaps even more importantly, since the pipelines can be huge 
objects the generation of which potentially requires many network 
calls it is much better to cache objects at the most atomic level: 
the source. Hence my interest in CachedSource.

What I am proposing is to extend the capability of CachedSource so 
that an external event (say, someone saving a document in the webdav 
repository) will trigger the retrieval of a fresh one. But in the 
background, away from the critical path (asynchronous).

Hope that explains it better.


Ok, I'm done being dense - I get this now. 


Sorry it wasn't meant to suggest you or anybody was being dense. Just 
that my explanation was overly concise.;-)


No, I saw where you placed blame (on your explanation) -- I just 
disagreed with your diagnosis.  :)  In my case, it was the "explainee" 
not the explanation that was the problem!

My last response about pluggable cache strategies at the pipeline 
level is totally mismatched.  At the Source level you don't have to 
worry about re-assembling the pipeline, so all you would have to do 
is re-cache the source when it is invalidated externally.


Exactly.

Can you just handle this when the JMS/other event comes in?  
Currently it's just translated to an Event and sent to the Cache, but 
you could also contact the Source as well at that point?  Where does 
the Source cache its data?  In memory in a private member, or in the 
Store?


You got it! (in the Cache/Store)


Whew!  Ok, I can't give this much more thought ATM, but sounds like a 
good direction to me, for what that's worth.

Geoff


Re: Event caching and CachedSource

2004-03-02 Thread Unico Hommes
Corin Moss wrote:

Hi,

That makes perfect sense. I implemented exactly that this week, although
I have to admit it is nowhere near as elegant as I would like.
Basically in my case it's a database update / delete / insert triggering
a cache clear, it is then a specific request generating the re-cache.
I'm then employing something similar to the "I'm sorry" method mentioned
previously.  I've re-implemented the "test" cron job, using a new method
on RefresherImpl which as I mentioned before, caches in the future (by
adding a new scheduled job.) 

Effectively the request to trigger the re-cache might take a few
milliseconds, but the job to recache the page goes on in the background
for as longs as it takes.
At this point, it is simply a call to an internal URL on loopback (very
messy.)  However, I see no reason that something like the XSPUtil
include source couldn't be used.  It would also pay to have a look at
the BackgroundEnvironment - its only weakness is (as commented) that it
doesn't support objects which try to access specific HTTPenvironment
things (object maps etc.)
 

I am not sure I get the whole picture yet but why do you need internal 
processing? Does this mean you cache cocoon: sources? That does not 
sound appropriate.

Like I said - I've got this working at the moment - let me know if you
want the code :)
 

Yes please, "show me the code!" ;-)

Unico

Corin

-Original Message-
From: Unico Hommes [mailto:[EMAIL PROTECTED]
Sent: Wednesday, 3 March 2004 2:39 a.m.
To: [EMAIL PROTECTED]
Subject: Re: Event caching and CachedSource
Corin Moss wrote:

 

Hiya,

I'm probably wrong here, but my understanding of the RefresherImpl is
   

 

that the "timeout" is used to cache the page on a timed basis a la cron
   

 

(although that could be what you mean).

I'm not entirely sure how this helps with external validation directly
:)
What I've been playing around with in this class is a "refreshInFuture"
   

 

method which does a one-time-only refresh in x seconds (probably 1
   

 

minimum to be safe from expiry problems :)

Does that help you at all? I'm happy to contribute it if it would.

   

 

Hmm, I guess my explanation was a little bit dense. Let me put this into

context of the things I am trying to solve.

We recently deployed a website that is backed by a webdav repository.

Obviously this introduces some network overhead and particular parts of

the site can get slow to generate. Especially those generated using

TraversableGenerators.

Traditional caching requires the objects on behalf of which it caches to

provide a so called Validity object in order to determine whether its

cached objects are still valid upon subsequent request. Most sources

will try to determine this by providing a last modifed timestamp the

cache can compare. Since retrieving the last modification time is an

expensive operation in the case of a webdav source determining the

validity of a cached response would be almost as expensive as generating

a new response, requiring a webdav propfind for each source that is a

member of the generated pipeline.

Instead we employ a different strategy altoghether. We just tell the

Cache that the source is always valid using a special Validity object.

Cache invalidation will be accomplished by an external event triggering

the removal of all pipelines the Source is associated with. This means

though that a subsequent request will be slow since nothing is cached

anymore.

Perhaps even more importantly, since the pipelines can be huge objects

the generation of which potentially requires many network calls it is

much better to cache objects at the most atomic level: the source. Hence

my interest in CachedSource.

What I am proposing is to extend the capability of CachedSource so that

an external event (say, someone saving a document in the webdav

repository) will trigger the retrieval of a fresh one. But in the

background, away from the critical path (asynchronous).

Hope that explains it better.

Unico


CAUTION: This e-mail and any attachment(s) contains information
that is intended to be read only by the named recipient(s). It
may contain information that is confidential, proprietary or the
subject of legal privilege. This information is not to be used by
any other person and/or organisation. If you are not the intended
recipient, please advise us immediately and delete this e-mail
from your system. Do not use any information contained in it.

For more information on the Television New Zealand Group, visit
us online at http://www.tvnz.co.nz

 




Re: Event caching and CachedSource

2004-03-02 Thread Unico Hommes
Geoff Howard wrote:

Unico Hommes wrote:

Corin Moss wrote:

Hiya,

I'm probably wrong here, but my understanding of the RefresherImpl is
that the "timeout" is used to cache the page on a timed basis a la cron
(although that could be what you mean).
I'm not entirely sure how this helps with external validation directly
:)
What I've been playing around with in this class is a "refreshInFuture"
method which does a one-time-only refresh in x seconds (probably 1
minimum to be safe from expiry problems :)
Does that help you at all? I'm happy to contribute it if it would.

 

Hmm, I guess my explanation was a little bit dense. Let me put this 
into context of the things I am trying to solve.

We recently deployed a website that is backed by a webdav repository. 
Obviously this introduces some network overhead and particular parts 
of the site can get slow to generate. Especially those generated 
using TraversableGenerators.

Traditional caching requires the objects on behalf of which it caches 
to provide a so called Validity object in order to determine whether 
its cached objects are still valid upon subsequent request. Most 
sources will try to determine this by providing a last modifed 
timestamp the cache can compare. Since retrieving the last 
modification time is an expensive operation in the case of a webdav 
source determining the validity of a cached response would be almost 
as expensive as generating a new response, requiring a webdav 
propfind for each source that is a member of the generated pipeline.

Instead we employ a different strategy altoghether. We just tell the 
Cache that the source is always valid using a special Validity 
object. Cache invalidation will be accomplished by an external event 
triggering the removal of all pipelines the Source is associated 
with. This means though that a subsequent request will be slow since 
nothing is cached anymore.

Perhaps even more importantly, since the pipelines can be huge 
objects the generation of which potentially requires many network 
calls it is much better to cache objects at the most atomic level: 
the source. Hence my interest in CachedSource.

What I am proposing is to extend the capability of CachedSource so 
that an external event (say, someone saving a document in the webdav 
repository) will trigger the retrieval of a fresh one. But in the 
background, away from the critical path (asynchronous).

Hope that explains it better.


Ok, I'm done being dense - I get this now. 


Sorry it wasn't meant to suggest you or anybody was being dense. Just 
that my explanation was overly concise.;-)

My last response about pluggable cache strategies at the pipeline 
level is totally mismatched.  At the Source level you don't have to 
worry about re-assembling the pipeline, so all you would have to do is 
re-cache the source when it is invalidated externally.


Exactly.

Can you just handle this when the JMS/other event comes in?  Currently 
it's just translated to an Event and sent to the Cache, but you could 
also contact the Source as well at that point?  Where does the Source 
cache its data?  In memory in a private member, or in the Store?


You got it! (in the Cache/Store)

Unico


Re: Event caching and CachedSource

2004-03-02 Thread Unico Hommes
Geoff Howard wrote:

Unico Hommes wrote:

Geoff Howard wrote:

Unico Hommes wrote:

Hi gang :-)

A drawback I have been running into lately with eventcache 
mechanism is that it lacks the ability to remove heavy processing 
from the critical path. An event will simply remove a set of cached 
pipelines from the cache completely. Making the subsequent request 
for such a pipeline potentialy very slow. In applications where 
isolation is not a requirement this is an unnecessary drawback.


Below sounds interesting and good but I haven't understood how event 
cache is related.  AFAICS the only difference with eventcache and 
the other validity types is that for the others an invalid response 
is found in cache, but not used because it is found invalid after 
retrieval, but the event cache removes the entry at invalidation 
time since it knows it will never be useful.  Both cases mean that 
the next person to request that resource will have to wait for the 
full generation.  Maybe because I've only glanced at the refresher 
stuff?

I guess you are right that at the Cache level nothing really changes. 
I overlooked that fact. I will do some more research on what is 
required to accomplish that in the case of the Refresher, but my idea 
was that the cached response would be served until a newly generated 
one could replace the stale one. Since the Refresher talks to the 
Cache directly, given the correct Validity strategy it can exercise 
full control over it.


So, stale entries are served until they can be regenerated?  I've 
looked for this in the past (someone called it the "I'm Sorry" pattern 
:) ) and at the time thought it might be better implemented by a 
pluggable strategy at the pipeline execution level.  Currently we have:

- Assemble Pipeline
- Gather key from Pipeline
- Check cache for key
- If object for key found, check its validity
- If valid, serve the cached response
- Else, execute pipeline and serve it.
the cache point pipeline, and the non-caching pipeline are other 
implementations of different strategies, but are accomplished by 
inheritance instead of composing a Strategy.  I haven't ever thought 
it through carefully but it seems like making those last 5 steps (as a 
group) a pluggable strategy would allow things like this "I'm Sorry" 
pattern, as well as more powerful concepts like Stefano's proposed 
adaptive cache.  Just raw thoughts at this point...


I see two things at stake in my use case. The strategy pattern as you 
call it (regular,inverted,'i'm sorry', adaptive,etc.) and the 
granularity of  objects in the cache. In my case it is very inefficient 
to only cache complete pipelines and I need to have multiple levels of 
caching to optimize performance: besides caching the complete pipeline, 
also the individual sources that compise a traversable generation.

I am not sure I understand what you mean with 'pluggable strategy'. 
Isn't this what we already have with the different pipeline implementations?

Unico


Re: Event caching and CachedSource

2004-03-02 Thread Geoff Howard
Unico Hommes wrote:

Corin Moss wrote:

Hiya,

I'm probably wrong here, but my understanding of the RefresherImpl is
that the "timeout" is used to cache the page on a timed basis a la cron
(although that could be what you mean).
I'm not entirely sure how this helps with external validation directly
:)
What I've been playing around with in this class is a "refreshInFuture"
method which does a one-time-only refresh in x seconds (probably 1
minimum to be safe from expiry problems :)
Does that help you at all? I'm happy to contribute it if it would.

 

Hmm, I guess my explanation was a little bit dense. Let me put this 
into context of the things I am trying to solve.

We recently deployed a website that is backed by a webdav repository. 
Obviously this introduces some network overhead and particular parts 
of the site can get slow to generate. Especially those generated using 
TraversableGenerators.

Traditional caching requires the objects on behalf of which it caches 
to provide a so called Validity object in order to determine whether 
its cached objects are still valid upon subsequent request. Most 
sources will try to determine this by providing a last modifed 
timestamp the cache can compare. Since retrieving the last 
modification time is an expensive operation in the case of a webdav 
source determining the validity of a cached response would be almost 
as expensive as generating a new response, requiring a webdav propfind 
for each source that is a member of the generated pipeline.

Instead we employ a different strategy altoghether. We just tell the 
Cache that the source is always valid using a special Validity object. 
Cache invalidation will be accomplished by an external event 
triggering the removal of all pipelines the Source is associated with. 
This means though that a subsequent request will be slow since nothing 
is cached anymore.

Perhaps even more importantly, since the pipelines can be huge objects 
the generation of which potentially requires many network calls it is 
much better to cache objects at the most atomic level: the source. 
Hence my interest in CachedSource.

What I am proposing is to extend the capability of CachedSource so 
that an external event (say, someone saving a document in the webdav 
repository) will trigger the retrieval of a fresh one. But in the 
background, away from the critical path (asynchronous).

Hope that explains it better.


Ok, I'm done being dense - I get this now.  My last response about 
pluggable cache strategies at the pipeline level is totally mismatched.  
At the Source level you don't have to worry about re-assembling the 
pipeline, so all you would have to do is re-cache the source when it is 
invalidated externally.

Can you just handle this when the JMS/other event comes in?  Currently 
it's just translated to an Event and sent to the Cache, but you could 
also contact the Source as well at that point?  Where does the Source 
cache its data?  In memory in a private member, or in the Store?

Geoff


RE: Event caching and CachedSource

2004-03-02 Thread Corin Moss

Hi,

That makes perfect sense. I implemented exactly that this week, although
I have to admit it is nowhere near as elegant as I would like.

Basically in my case it's a database update / delete / insert triggering
a cache clear, it is then a specific request generating the re-cache.

I'm then employing something similar to the "I'm sorry" method mentioned
previously.  I've re-implemented the "test" cron job, using a new method
on RefresherImpl which as I mentioned before, caches in the future (by
adding a new scheduled job.) 

Effectively the request to trigger the re-cache might take a few
milliseconds, but the job to recache the page goes on in the background
for as longs as it takes.

At this point, it is simply a call to an internal URL on loopback (very
messy.)  However, I see no reason that something like the XSPUtil
include source couldn't be used.  It would also pay to have a look at
the BackgroundEnvironment - its only weakness is (as commented) that it
doesn't support objects which try to access specific HTTPenvironment
things (object maps etc.)

Like I said - I've got this working at the moment - let me know if you
want the code :)

Corin


-Original Message-
From: Unico Hommes [mailto:[EMAIL PROTECTED]
Sent: Wednesday, 3 March 2004 2:39 a.m.
To: [EMAIL PROTECTED]
Subject: Re: Event caching and CachedSource


Corin Moss wrote:

>Hiya,
>
>I'm probably wrong here, but my understanding of the RefresherImpl is
>that the "timeout" is used to cache the page on a timed basis a la cron

>(although that could be what you mean).
>
>I'm not entirely sure how this helps with external validation directly
>:)
>
>What I've been playing around with in this class is a "refreshInFuture"

>method which does a one-time-only refresh in x seconds (probably 1
>minimum to be safe from expiry problems :)
>
>Does that help you at all? I'm happy to contribute it if it would.
>
> 
>

Hmm, I guess my explanation was a little bit dense. Let me put this into

context of the things I am trying to solve.

We recently deployed a website that is backed by a webdav repository.
Obviously this introduces some network overhead and particular parts of
the site can get slow to generate. Especially those generated using
TraversableGenerators.

Traditional caching requires the objects on behalf of which it caches to

provide a so called Validity object in order to determine whether its
cached objects are still valid upon subsequent request. Most sources
will try to determine this by providing a last modifed timestamp the
cache can compare. Since retrieving the last modification time is an
expensive operation in the case of a webdav source determining the
validity of a cached response would be almost as expensive as generating

a new response, requiring a webdav propfind for each source that is a
member of the generated pipeline.

Instead we employ a different strategy altoghether. We just tell the
Cache that the source is always valid using a special Validity object.
Cache invalidation will be accomplished by an external event triggering
the removal of all pipelines the Source is associated with. This means
though that a subsequent request will be slow since nothing is cached
anymore.

Perhaps even more importantly, since the pipelines can be huge objects
the generation of which potentially requires many network calls it is
much better to cache objects at the most atomic level: the source. Hence

my interest in CachedSource.

What I am proposing is to extend the capability of CachedSource so that
an external event (say, someone saving a document in the webdav
repository) will trigger the retrieval of a fresh one. But in the
background, away from the critical path (asynchronous).

Hope that explains it better.

Unico


CAUTION: This e-mail and any attachment(s) contains information
that is intended to be read only by the named recipient(s). It
may contain information that is confidential, proprietary or the
subject of legal privilege. This information is not to be used by
any other person and/or organisation. If you are not the intended
recipient, please advise us immediately and delete this e-mail
from your system. Do not use any information contained in it.


For more information on the Television New Zealand Group, visit
us online at http://www.tvnz.co.nz



Re: Event caching and CachedSource

2004-03-02 Thread Unico Hommes
Corin Moss wrote:

Hiya,

I'm probably wrong here, but my understanding of the RefresherImpl is
that the "timeout" is used to cache the page on a timed basis a la cron
(although that could be what you mean).
I'm not entirely sure how this helps with external validation directly
:)
What I've been playing around with in this class is a "refreshInFuture"
method which does a one-time-only refresh in x seconds (probably 1
minimum to be safe from expiry problems :)
Does that help you at all? I'm happy to contribute it if it would.

 

Hmm, I guess my explanation was a little bit dense. Let me put this into 
context of the things I am trying to solve.

We recently deployed a website that is backed by a webdav repository. 
Obviously this introduces some network overhead and particular parts of 
the site can get slow to generate. Especially those generated using 
TraversableGenerators.

Traditional caching requires the objects on behalf of which it caches to 
provide a so called Validity object in order to determine whether its 
cached objects are still valid upon subsequent request. Most sources 
will try to determine this by providing a last modifed timestamp the 
cache can compare. Since retrieving the last modification time is an 
expensive operation in the case of a webdav source determining the 
validity of a cached response would be almost as expensive as generating 
a new response, requiring a webdav propfind for each source that is a 
member of the generated pipeline.

Instead we employ a different strategy altoghether. We just tell the 
Cache that the source is always valid using a special Validity object. 
Cache invalidation will be accomplished by an external event triggering 
the removal of all pipelines the Source is associated with. This means 
though that a subsequent request will be slow since nothing is cached 
anymore.

Perhaps even more importantly, since the pipelines can be huge objects 
the generation of which potentially requires many network calls it is 
much better to cache objects at the most atomic level: the source. Hence 
my interest in CachedSource.

What I am proposing is to extend the capability of CachedSource so that 
an external event (say, someone saving a document in the webdav 
repository) will trigger the retrieval of a fresh one. But in the 
background, away from the critical path (asynchronous).

Hope that explains it better.

Unico


Re: Event caching and CachedSource

2004-03-02 Thread Geoff Howard
Unico Hommes wrote:

Geoff Howard wrote:

Unico Hommes wrote:

Hi gang :-)

A drawback I have been running into lately with eventcache mechanism 
is that it lacks the ability to remove heavy processing from the 
critical path. An event will simply remove a set of cached pipelines 
from the cache completely. Making the subsequent request for such a 
pipeline potentialy very slow. In applications where isolation is 
not a requirement this is an unnecessary drawback.


Below sounds interesting and good but I haven't understood how event 
cache is related.  AFAICS the only difference with eventcache and the 
other validity types is that for the others an invalid response is 
found in cache, but not used because it is found invalid after 
retrieval, but the event cache removes the entry at invalidation time 
since it knows it will never be useful.  Both cases mean that the 
next person to request that resource will have to wait for the full 
generation.  Maybe because I've only glanced at the refresher stuff?

I guess you are right that at the Cache level nothing really changes. 
I overlooked that fact. I will do some more research on what is 
required to accomplish that in the case of the Refresher, but my idea 
was that the cached response would be served until a newly generated 
one could replace the stale one. Since the Refresher talks to the 
Cache directly, given the correct Validity strategy it can exercise 
full control over it.


So, stale entries are served until they can be regenerated?  I've looked 
for this in the past (someone called it the "I'm Sorry" pattern :) ) and 
at the time thought it might be better implemented by a pluggable 
strategy at the pipeline execution level.  Currently we have:

- Assemble Pipeline
- Gather key from Pipeline
- Check cache for key
- If object for key found, check its validity
- If valid, serve the cached response
- Else, execute pipeline and serve it.
the cache point pipeline, and the non-caching pipeline are other 
implementations of different strategies, but are accomplished by 
inheritance instead of composing a Strategy.  I haven't ever thought it 
through carefully but it seems like making those last 5 steps (as a 
group) a pluggable strategy would allow things like this "I'm Sorry" 
pattern, as well as more powerful concepts like Stefano's proposed 
adaptive cache.  Just raw thoughts at this point...

Bottom line for me at moment is: do you foresee a need to modify the 
eventcache API to accomodate this need?  I'm getting ready to start a 
discussion on changing the eventcache unstable status -- should I 
hold off?

I don't think my current work will influence the eventcache API 
directly. Although I am not sure if
the eventcache stuff can be considered stable enough. I still have 
some doubts about the ease of use of parts of it especially the way 
events are associated with cached objects. But lets discuss that 
separately.


Ah, good.  Ok, I'll pick up on another thread.

Geoff


RE: Event caching and CachedSource

2004-03-02 Thread Carsten Ziegeler
Unico Hommes wrote:

> > BTW, how does CachedSource accomplish something different from the 
> > caching point pipeline (which seems to accomplish more, though I've 
> > never used it).
> >
> I never used it either. So I really don't know. Perhaps 
> someone else could comment on this?
> 
The CachedSource caches a source :) whereas the caching point pipeline
caches part of a pipeline. They could be used in combination but have
different purposes.
The caching point pipeline can cache the beginning of a pipeline upto
the point, but this only works if all components in the pipeline
support the caching; if not, nothing is cached.

Now, imagine that you have a database source that fetches content
from a slow database (or cms). The usual caching alg. tries to
look if the source read by the generator has changed since the last call.
In the case of the database source this is not possible and the
pipeline is never cached.
With the cached source the content fetched from the db is cached,
reducing the requests to the back-end system and the generator
can use this to test if the source has changed, allowing the
pipeline (or a part of it) to be cached as well.

HTH
Carsten



Re: Event caching and CachedSource

2004-03-02 Thread Unico Hommes
Geoff Howard wrote:

Unico Hommes wrote:

Hi gang :-)

A drawback I have been running into lately with eventcache mechanism 
is that it lacks the ability to remove heavy processing from the 
critical path. An event will simply remove a set of cached pipelines 
from the cache completely. Making the subsequent request for such a 
pipeline potentialy very slow. In applications where isolation is not 
a requirement this is an unnecessary drawback.


Below sounds interesting and good but I haven't understood how event 
cache is related.  AFAICS the only difference with eventcache and the 
other validity types is that for the others an invalid response is 
found in cache, but not used because it is found invalid after 
retrieval, but the event cache removes the entry at invalidation time 
since it knows it will never be useful.  Both cases mean that the next 
person to request that resource will have to wait for the full 
generation.  Maybe because I've only glanced at the refresher stuff?

I guess you are right that at the Cache level nothing really changes. I 
overlooked that fact. I will do some more research on what is required 
to accomplish that in the case of the Refresher, but my idea was that 
the cached response would be served until a newly generated one could 
replace the stale one. Since the Refresher talks to the Cache directly, 
given the correct Validity strategy it can exercise full control over it.

Bottom line for me at moment is: do you foresee a need to modify the 
eventcache API to accomodate this need?  I'm getting ready to start a 
discussion on changing the eventcache unstable status -- should I hold 
off?

I don't think my current work will influence the eventcache API 
directly. Although I am not sure if
the eventcache stuff can be considered stable enough. I still have some 
doubts about the ease of use of parts of it especially the way events 
are associated with cached objects. But lets discuss that separately.

I am looking at the excellent CachedSource stuff that is in the 
scratchpad area ATM and am wondering how it fits together with the 
eventcache stuff. One thing I am looking into right now is to write 
an EventAware Refresher implementation.

For those unfamiliar with CachedSource, it is a Source wrapper that 
can cache a its delegate. Refreshing can be done either synchronously 
or asynchronously but currently only based upon a specified time-out. 
What I'd like to do is generalize this a bit in order to add the 
ability to  externally trigger invalidation.

For this however I think a modification to the Refresher interface is 
needed.


BTW, how does CachedSource accomplish something different from the 
caching point pipeline (which seems to accomplish more, though I've 
never used it).

I never used it either. So I really don't know. Perhaps someone else 
could comment on this?

Cheers,
Unico



Re: Event caching and CachedSource

2004-03-02 Thread Unico Hommes
Carsten Ziegeler wrote:

Unico Hommes wrote:
 

Hi gang :-)

A drawback I have been running into lately with eventcache 
mechanism is that it lacks the ability to remove heavy 
processing from the critical path. An event will simply 
remove a set of cached pipelines from the cache completely. 
Making the subsequent request for such a pipeline potentialy 
very slow. In applications where isolation is not a 
requirement this is an unnecessary drawback.

I am looking at the excellent CachedSource stuff that is in 
the scratchpad area ATM and am wondering how it fits together 
with the eventcache stuff. One thing I am looking into right 
now is to write an EventAware Refresher implementation.

For those unfamiliar with CachedSource, it is a Source 
wrapper that can cache a its delegate. Refreshing can be done 
either synchronously or asynchronously but currently only 
based upon a specified time-out. What I'd like to do is 
generalize this a bit in order to add the ability to 
externally trigger invalidation.

For this however I think a modification to the Refresher 
interface is needed.

Instead of:

Refresher {
 refresh(key,uri,timeout);
 periodicallyRefresh(key,uri,timeout);
}
I'd like to remove timeout semantics from the interface:

Refresher {
 refresh(key,uri,params);
}
I don't think there is currently a reason for there being two 
the separate methods. So I think we can safely combine them 
into one. But I guess I am looking at Carsten for confirmation... :-)

   

Although you actually don't need my confirmation as it's not my
but *our* source, here it is :)
 

OK, thanks. Just trying exclude the possibility of overlooking something 
and allowing you the oppertunity to comment on any changes beforehand.

I think this makes sense and I think we should also move this
out of the scratchpad afterwards as well.
 

OK, agreed. But where should it go.

Unico


Re: Event caching and CachedSource

2004-03-02 Thread Geoff Howard
Unico Hommes wrote:

Hi gang :-)

A drawback I have been running into lately with eventcache mechanism 
is that it lacks the ability to remove heavy processing from the 
critical path. An event will simply remove a set of cached pipelines 
from the cache completely. Making the subsequent request for such a 
pipeline potentialy very slow. In applications where isolation is not 
a requirement this is an unnecessary drawback.


Below sounds interesting and good but I haven't understood how event 
cache is related.  AFAICS the only difference with eventcache and the 
other validity types is that for the others an invalid response is found 
in cache, but not used because it is found invalid after retrieval, but 
the event cache removes the entry at invalidation time since it knows it 
will never be useful.  Both cases mean that the next person to request 
that resource will have to wait for the full generation.  Maybe because 
I've only glanced at the refresher stuff?

Bottom line for me at moment is: do you foresee a need to modify the 
eventcache API to accomodate this need?  I'm getting ready to start a 
discussion on changing the eventcache unstable status -- should I hold off?

I am looking at the excellent CachedSource stuff that is in the 
scratchpad area ATM and am wondering how it fits together with the 
eventcache stuff. One thing I am looking into right now is to write an 
EventAware Refresher implementation.

For those unfamiliar with CachedSource, it is a Source wrapper that 
can cache a its delegate. Refreshing can be done either synchronously 
or asynchronously but currently only based upon a specified time-out. 
What I'd like to do is generalize this a bit in order to add the 
ability to  externally trigger invalidation.

For this however I think a modification to the Refresher interface is 
needed.


BTW, how does CachedSource accomplish something different from the 
caching point pipeline (which seems to accomplish more, though I've 
never used it).

Geoff


RE: Event caching and CachedSource

2004-03-02 Thread Carsten Ziegeler
Unico Hommes wrote:
> 
> Hi gang :-)
> 
> A drawback I have been running into lately with eventcache 
> mechanism is that it lacks the ability to remove heavy 
> processing from the critical path. An event will simply 
> remove a set of cached pipelines from the cache completely. 
> Making the subsequent request for such a pipeline potentialy 
> very slow. In applications where isolation is not a 
> requirement this is an unnecessary drawback.
> 
> I am looking at the excellent CachedSource stuff that is in 
> the scratchpad area ATM and am wondering how it fits together 
> with the eventcache stuff. One thing I am looking into right 
> now is to write an EventAware Refresher implementation.
> 
> For those unfamiliar with CachedSource, it is a Source 
> wrapper that can cache a its delegate. Refreshing can be done 
> either synchronously or asynchronously but currently only 
> based upon a specified time-out. What I'd like to do is 
> generalize this a bit in order to add the ability to 
> externally trigger invalidation.
> 
> For this however I think a modification to the Refresher 
> interface is needed.
> 
> Instead of:
> 
> Refresher {
>   refresh(key,uri,timeout);
>   periodicallyRefresh(key,uri,timeout);
> }
> 
> I'd like to remove timeout semantics from the interface:
> 
> Refresher {
>   refresh(key,uri,params);
> }
> 
> I don't think there is currently a reason for there being two 
> the separate methods. So I think we can safely combine them 
> into one. But I guess I am looking at Carsten for confirmation... :-)
> 
Although you actually don't need my confirmation as it's not my
but *our* source, here it is :)
I think this makes sense and I think we should also move this
out of the scratchpad afterwards as well.

Carsten



RE: Event caching and CachedSource

2004-03-02 Thread Corin Moss

Hiya,

I'm probably wrong here, but my understanding of the RefresherImpl is
that the "timeout" is used to cache the page on a timed basis a la cron
(although that could be what you mean).

I'm not entirely sure how this helps with external validation directly
:)

What I've been playing around with in this class is a "refreshInFuture"
method which does a one-time-only refresh in x seconds (probably 1
minimum to be safe from expiry problems :)

Does that help you at all? I'm happy to contribute it if it would.

Corin


-Original Message-
From: Unico Hommes [mailto:[EMAIL PROTECTED]
Sent: Wednesday, 3 March 2004 12:44 a.m.
To: [EMAIL PROTECTED]
Subject: Event caching and CachedSource


Hi gang :-)

A drawback I have been running into lately with eventcache mechanism is
that it lacks the ability to remove heavy processing from the critical
path. An event will simply remove a set of cached pipelines from the
cache completely. Making the subsequent request for such a pipeline
potentialy very slow. In applications where isolation is not a
requirement this is an unnecessary drawback.

I am looking at the excellent CachedSource stuff that is in the
scratchpad area ATM and am wondering how it fits together with the
eventcache stuff. One thing I am looking into right now is to write an
EventAware Refresher implementation.

For those unfamiliar with CachedSource, it is a Source wrapper that can
cache a its delegate. Refreshing can be done either synchronously or
asynchronously but currently only based upon a specified time-out. What
I'd like to do is generalize this a bit in order to add the ability to 
externally trigger invalidation.

For this however I think a modification to the Refresher interface is
needed.

Instead of:

Refresher {
  refresh(key,uri,timeout);
  periodicallyRefresh(key,uri,timeout);
}

I'd like to remove timeout semantics from the interface:

Refresher {
  refresh(key,uri,params);
}

I don't think there is currently a reason for there being two the
separate methods. So I think we can safely combine them into one. But I
guess I am looking at Carsten for confirmation... :-)

Cheers,
Unico



CAUTION: This e-mail and any attachment(s) contains information
that is intended to be read only by the named recipient(s). It
may contain information that is confidential, proprietary or the
subject of legal privilege. This information is not to be used by
any other person and/or organisation. If you are not the intended
recipient, please advise us immediately and delete this e-mail
from your system. Do not use any information contained in it.


For more information on the Television New Zealand Group, visit
us online at http://www.tvnz.co.nz