Re: Event caching and CachedSource
Vadim Gritsenko wrote: Unico Hommes wrote: Carsten Ziegeler wrote: Unico Hommes wrote: I'd also like to change the protocol URL a little bit. Since the timeout parameter will only be applicable to the delay refresher implementation and not to the event aware one I think it would be better to specify it with a query parameter instead. Current syntax: cache://[EMAIL PROTECTED]@http://www.apache.org/ Proposed syntax: cache:http://www.apache.org/?cache-expires=60&cache-name=main The protocol:subprotocol syntax is also more in line with well established conventions such as in jdbc for instance. Let me know if you have any objections or comments. No objections from me, but the parameters must have clear names, which means there shouldn't be a conflict. Imagine: cache:http://www.apache.org/?cache-expires=60&cache-name=main&expires=500 (Dumb example, I know) But what I mean is that the real url/source could also have parameters and it must be clear which ones are for the cache source and which ones are for the real source, so perhaps something like "cocoon-cache..." or perhaps better using invalid names like "cocoon:cache=60"? Yeah I had been thinkin along the same lines. I like the colon notation because it resembles familiar namespace notation. So I'll go with your latter suggestion. Does it make sense to have it both ways? So, say, you can use either: cache:main:[EMAIL PROTECTED]://www.apache.org/ or: cache:@http://www.apache.org/?cache:name=main&cache:expires=60 ? Hmm, I would prefer to settle on just one syntax. Prevents confusion and minimizes amount of code to maintain. Also what to do when expiration value is not applicable? Ignore it or throw an exception. I think we should keep it as simple as possible. Unico
Re: Event caching and CachedSource
Unico Hommes wrote: Carsten Ziegeler wrote: Unico Hommes wrote: I'd also like to change the protocol URL a little bit. Since the timeout parameter will only be applicable to the delay refresher implementation and not to the event aware one I think it would be better to specify it with a query parameter instead. Current syntax: cache://[EMAIL PROTECTED]@http://www.apache.org/ Proposed syntax: cache:http://www.apache.org/?cache-expires=60&cache-name=main The protocol:subprotocol syntax is also more in line with well established conventions such as in jdbc for instance. Let me know if you have any objections or comments. No objections from me, but the parameters must have clear names, which means there shouldn't be a conflict. Imagine: cache:http://www.apache.org/?cache-expires=60&cache-name=main&expires=500 (Dumb example, I know) But what I mean is that the real url/source could also have parameters and it must be clear which ones are for the cache source and which ones are for the real source, so perhaps something like "cocoon-cache..." or perhaps better using invalid names like "cocoon:cache=60"? Yeah I had been thinkin along the same lines. I like the colon notation because it resembles familiar namespace notation. So I'll go with your latter suggestion. Does it make sense to have it both ways? So, say, you can use either: cache:main:[EMAIL PROTECTED]://www.apache.org/ or: cache:@http://www.apache.org/?cache:name=main&cache:expires=60 ? Vadim
Re: Event caching and CachedSource
Carsten Ziegeler wrote: Unico Hommes wrote: I'd also like to change the protocol URL a little bit. Since the timeout parameter will only be applicable to the delay refresher implementation and not to the event aware one I think it would be better to specify it with a query parameter instead. Current syntax: cache://[EMAIL PROTECTED]@http://www.apache.org/ Proposed syntax: cache:http://www.apache.org/?cache-expires=60&cache-name=main The protocol:subprotocol syntax is also more in line with well established conventions such as in jdbc for instance. Let me know if you have any objections or comments. No objections from me, but the parameters must have clear names, which means there shouldn't be a conflict. Imagine: cache:http://www.apache.org/?cache-expires=60&cache-name=main&expires=500 (Dumb example, I know) But what I mean is that the real url/source could also have parameters and it must be clear which ones are for the cache source and which ones are for the real source, so perhaps something like "cocoon-cache..." or perhaps better using invalid names like "cocoon:cache=60"? Yeah I had been thinkin along the same lines. I like the colon notation because it resembles familiar namespace notation. So I'll go with your latter suggestion. Unico
RE: Event caching and CachedSource
Unico Hommes wrote: > > I'd also like to change the protocol URL a little bit. Since > the timeout parameter will only be applicable to the delay > refresher implementation and not to the event aware one I > think it would be better to specify it with a query parameter instead. > > Current syntax: cache://[EMAIL PROTECTED]@http://www.apache.org/ > Proposed syntax: > cache:http://www.apache.org/?cache-expires=60&cache-name=main > > The protocol:subprotocol syntax is also more in line with > well established conventions such as in jdbc for instance. > > Let me know if you have any objections or comments. > No objections from me, but the parameters must have clear names, which means there shouldn't be a conflict. Imagine: cache:http://www.apache.org/?cache-expires=60&cache-name=main&expires=500 (Dumb example, I know) But what I mean is that the real url/source could also have parameters and it must be clear which ones are for the cache source and which ones are for the real source, so perhaps something like "cocoon-cache..." or perhaps better using invalid names like "cocoon:cache=60"? Carsten
Re: Event caching and CachedSource
Carsten Ziegeler wrote: Unico Hommes wrote: Hi gang :-) A drawback I have been running into lately with eventcache mechanism is that it lacks the ability to remove heavy processing from the critical path. An event will simply remove a set of cached pipelines from the cache completely. Making the subsequent request for such a pipeline potentialy very slow. In applications where isolation is not a requirement this is an unnecessary drawback. I am looking at the excellent CachedSource stuff that is in the scratchpad area ATM and am wondering how it fits together with the eventcache stuff. One thing I am looking into right now is to write an EventAware Refresher implementation. For those unfamiliar with CachedSource, it is a Source wrapper that can cache a its delegate. Refreshing can be done either synchronously or asynchronously but currently only based upon a specified time-out. What I'd like to do is generalize this a bit in order to add the ability to externally trigger invalidation. For this however I think a modification to the Refresher interface is needed. Instead of: Refresher { refresh(key,uri,timeout); periodicallyRefresh(key,uri,timeout); } I'd like to remove timeout semantics from the interface: Refresher { refresh(key,uri,params); } I don't think there is currently a reason for there being two the separate methods. So I think we can safely combine them into one. But I guess I am looking at Carsten for confirmation... :-) Although you actually don't need my confirmation as it's not my but *our* source, here it is :) I think this makes sense and I think we should also move this out of the scratchpad afterwards as well. I'd also like to change the protocol URL a little bit. Since the timeout parameter will only be applicable to the delay refresher implementation and not to the event aware one I think it would be better to specify it with a query parameter instead. Current syntax: cache://[EMAIL PROTECTED]@http://www.apache.org/ Proposed syntax: cache:http://www.apache.org/?cache-expires=60&cache-name=main The protocol:subprotocol syntax is also more in line with well established conventions such as in jdbc for instance. Let me know if you have any objections or comments. Unico
RE: Event caching and CachedSource
Unico Hommes <[EMAIL PROTECTED]> writes: > > Hi gang :-) > > A drawback I have been running into lately with eventcache > mechanism is > that it lacks the ability to remove heavy processing from the > critical > path. An event will simply remove a set of cached pipelines from the > cache completely. Making the subsequent request for such a pipeline > potentialy very slow. In applications where isolation is not a > requirement this is an unnecessary drawback. > > I am looking at the excellent CachedSource stuff that is in the > scratchpad area ATM and am wondering how it fits together with the > eventcache stuff. One thing I am looking into right now is to > write an > EventAware Refresher implementation. Cool, in our case, for much of our data, we know we can, in theory, repopulate the cache the moment after the data is invalidated (which is the moment before the new version is committed). However, we need to do this asynchronously if possible. We haven't started to look at this issue, but it sounds like this might be the way to go? > > For those unfamiliar with CachedSource, it is a Source > wrapper that can > cache a its delegate. Refreshing can be done either synchronously or > asynchronously but currently only based upon a specified > time-out. What > I'd like to do is generalize this a bit in order to add the > ability to > externally trigger invalidation. >
Re: Event caching and CachedSource
Unico Hommes wrote: Geoff Howard wrote: Unico Hommes wrote: Corin Moss wrote: Hiya, I'm probably wrong here, but my understanding of the RefresherImpl is that the "timeout" is used to cache the page on a timed basis a la cron (although that could be what you mean). I'm not entirely sure how this helps with external validation directly :) What I've been playing around with in this class is a "refreshInFuture" method which does a one-time-only refresh in x seconds (probably 1 minimum to be safe from expiry problems :) Does that help you at all? I'm happy to contribute it if it would. Hmm, I guess my explanation was a little bit dense. Let me put this into context of the things I am trying to solve. We recently deployed a website that is backed by a webdav repository. Obviously this introduces some network overhead and particular parts of the site can get slow to generate. Especially those generated using TraversableGenerators. Traditional caching requires the objects on behalf of which it caches to provide a so called Validity object in order to determine whether its cached objects are still valid upon subsequent request. Most sources will try to determine this by providing a last modifed timestamp the cache can compare. Since retrieving the last modification time is an expensive operation in the case of a webdav source determining the validity of a cached response would be almost as expensive as generating a new response, requiring a webdav propfind for each source that is a member of the generated pipeline. Instead we employ a different strategy altoghether. We just tell the Cache that the source is always valid using a special Validity object. Cache invalidation will be accomplished by an external event triggering the removal of all pipelines the Source is associated with. This means though that a subsequent request will be slow since nothing is cached anymore. Perhaps even more importantly, since the pipelines can be huge objects the generation of which potentially requires many network calls it is much better to cache objects at the most atomic level: the source. Hence my interest in CachedSource. What I am proposing is to extend the capability of CachedSource so that an external event (say, someone saving a document in the webdav repository) will trigger the retrieval of a fresh one. But in the background, away from the critical path (asynchronous). Hope that explains it better. Ok, I'm done being dense - I get this now. Sorry it wasn't meant to suggest you or anybody was being dense. Just that my explanation was overly concise.;-) No, I saw where you placed blame (on your explanation) -- I just disagreed with your diagnosis. :) In my case, it was the "explainee" not the explanation that was the problem! My last response about pluggable cache strategies at the pipeline level is totally mismatched. At the Source level you don't have to worry about re-assembling the pipeline, so all you would have to do is re-cache the source when it is invalidated externally. Exactly. Can you just handle this when the JMS/other event comes in? Currently it's just translated to an Event and sent to the Cache, but you could also contact the Source as well at that point? Where does the Source cache its data? In memory in a private member, or in the Store? You got it! (in the Cache/Store) Whew! Ok, I can't give this much more thought ATM, but sounds like a good direction to me, for what that's worth. Geoff
Re: Event caching and CachedSource
Corin Moss wrote: Hi, That makes perfect sense. I implemented exactly that this week, although I have to admit it is nowhere near as elegant as I would like. Basically in my case it's a database update / delete / insert triggering a cache clear, it is then a specific request generating the re-cache. I'm then employing something similar to the "I'm sorry" method mentioned previously. I've re-implemented the "test" cron job, using a new method on RefresherImpl which as I mentioned before, caches in the future (by adding a new scheduled job.) Effectively the request to trigger the re-cache might take a few milliseconds, but the job to recache the page goes on in the background for as longs as it takes. At this point, it is simply a call to an internal URL on loopback (very messy.) However, I see no reason that something like the XSPUtil include source couldn't be used. It would also pay to have a look at the BackgroundEnvironment - its only weakness is (as commented) that it doesn't support objects which try to access specific HTTPenvironment things (object maps etc.) I am not sure I get the whole picture yet but why do you need internal processing? Does this mean you cache cocoon: sources? That does not sound appropriate. Like I said - I've got this working at the moment - let me know if you want the code :) Yes please, "show me the code!" ;-) Unico Corin -Original Message- From: Unico Hommes [mailto:[EMAIL PROTECTED] Sent: Wednesday, 3 March 2004 2:39 a.m. To: [EMAIL PROTECTED] Subject: Re: Event caching and CachedSource Corin Moss wrote: Hiya, I'm probably wrong here, but my understanding of the RefresherImpl is that the "timeout" is used to cache the page on a timed basis a la cron (although that could be what you mean). I'm not entirely sure how this helps with external validation directly :) What I've been playing around with in this class is a "refreshInFuture" method which does a one-time-only refresh in x seconds (probably 1 minimum to be safe from expiry problems :) Does that help you at all? I'm happy to contribute it if it would. Hmm, I guess my explanation was a little bit dense. Let me put this into context of the things I am trying to solve. We recently deployed a website that is backed by a webdav repository. Obviously this introduces some network overhead and particular parts of the site can get slow to generate. Especially those generated using TraversableGenerators. Traditional caching requires the objects on behalf of which it caches to provide a so called Validity object in order to determine whether its cached objects are still valid upon subsequent request. Most sources will try to determine this by providing a last modifed timestamp the cache can compare. Since retrieving the last modification time is an expensive operation in the case of a webdav source determining the validity of a cached response would be almost as expensive as generating a new response, requiring a webdav propfind for each source that is a member of the generated pipeline. Instead we employ a different strategy altoghether. We just tell the Cache that the source is always valid using a special Validity object. Cache invalidation will be accomplished by an external event triggering the removal of all pipelines the Source is associated with. This means though that a subsequent request will be slow since nothing is cached anymore. Perhaps even more importantly, since the pipelines can be huge objects the generation of which potentially requires many network calls it is much better to cache objects at the most atomic level: the source. Hence my interest in CachedSource. What I am proposing is to extend the capability of CachedSource so that an external event (say, someone saving a document in the webdav repository) will trigger the retrieval of a fresh one. But in the background, away from the critical path (asynchronous). Hope that explains it better. Unico CAUTION: This e-mail and any attachment(s) contains information that is intended to be read only by the named recipient(s). It may contain information that is confidential, proprietary or the subject of legal privilege. This information is not to be used by any other person and/or organisation. If you are not the intended recipient, please advise us immediately and delete this e-mail from your system. Do not use any information contained in it. For more information on the Television New Zealand Group, visit us online at http://www.tvnz.co.nz
Re: Event caching and CachedSource
Geoff Howard wrote: Unico Hommes wrote: Corin Moss wrote: Hiya, I'm probably wrong here, but my understanding of the RefresherImpl is that the "timeout" is used to cache the page on a timed basis a la cron (although that could be what you mean). I'm not entirely sure how this helps with external validation directly :) What I've been playing around with in this class is a "refreshInFuture" method which does a one-time-only refresh in x seconds (probably 1 minimum to be safe from expiry problems :) Does that help you at all? I'm happy to contribute it if it would. Hmm, I guess my explanation was a little bit dense. Let me put this into context of the things I am trying to solve. We recently deployed a website that is backed by a webdav repository. Obviously this introduces some network overhead and particular parts of the site can get slow to generate. Especially those generated using TraversableGenerators. Traditional caching requires the objects on behalf of which it caches to provide a so called Validity object in order to determine whether its cached objects are still valid upon subsequent request. Most sources will try to determine this by providing a last modifed timestamp the cache can compare. Since retrieving the last modification time is an expensive operation in the case of a webdav source determining the validity of a cached response would be almost as expensive as generating a new response, requiring a webdav propfind for each source that is a member of the generated pipeline. Instead we employ a different strategy altoghether. We just tell the Cache that the source is always valid using a special Validity object. Cache invalidation will be accomplished by an external event triggering the removal of all pipelines the Source is associated with. This means though that a subsequent request will be slow since nothing is cached anymore. Perhaps even more importantly, since the pipelines can be huge objects the generation of which potentially requires many network calls it is much better to cache objects at the most atomic level: the source. Hence my interest in CachedSource. What I am proposing is to extend the capability of CachedSource so that an external event (say, someone saving a document in the webdav repository) will trigger the retrieval of a fresh one. But in the background, away from the critical path (asynchronous). Hope that explains it better. Ok, I'm done being dense - I get this now. Sorry it wasn't meant to suggest you or anybody was being dense. Just that my explanation was overly concise.;-) My last response about pluggable cache strategies at the pipeline level is totally mismatched. At the Source level you don't have to worry about re-assembling the pipeline, so all you would have to do is re-cache the source when it is invalidated externally. Exactly. Can you just handle this when the JMS/other event comes in? Currently it's just translated to an Event and sent to the Cache, but you could also contact the Source as well at that point? Where does the Source cache its data? In memory in a private member, or in the Store? You got it! (in the Cache/Store) Unico
Re: Event caching and CachedSource
Geoff Howard wrote: Unico Hommes wrote: Geoff Howard wrote: Unico Hommes wrote: Hi gang :-) A drawback I have been running into lately with eventcache mechanism is that it lacks the ability to remove heavy processing from the critical path. An event will simply remove a set of cached pipelines from the cache completely. Making the subsequent request for such a pipeline potentialy very slow. In applications where isolation is not a requirement this is an unnecessary drawback. Below sounds interesting and good but I haven't understood how event cache is related. AFAICS the only difference with eventcache and the other validity types is that for the others an invalid response is found in cache, but not used because it is found invalid after retrieval, but the event cache removes the entry at invalidation time since it knows it will never be useful. Both cases mean that the next person to request that resource will have to wait for the full generation. Maybe because I've only glanced at the refresher stuff? I guess you are right that at the Cache level nothing really changes. I overlooked that fact. I will do some more research on what is required to accomplish that in the case of the Refresher, but my idea was that the cached response would be served until a newly generated one could replace the stale one. Since the Refresher talks to the Cache directly, given the correct Validity strategy it can exercise full control over it. So, stale entries are served until they can be regenerated? I've looked for this in the past (someone called it the "I'm Sorry" pattern :) ) and at the time thought it might be better implemented by a pluggable strategy at the pipeline execution level. Currently we have: - Assemble Pipeline - Gather key from Pipeline - Check cache for key - If object for key found, check its validity - If valid, serve the cached response - Else, execute pipeline and serve it. the cache point pipeline, and the non-caching pipeline are other implementations of different strategies, but are accomplished by inheritance instead of composing a Strategy. I haven't ever thought it through carefully but it seems like making those last 5 steps (as a group) a pluggable strategy would allow things like this "I'm Sorry" pattern, as well as more powerful concepts like Stefano's proposed adaptive cache. Just raw thoughts at this point... I see two things at stake in my use case. The strategy pattern as you call it (regular,inverted,'i'm sorry', adaptive,etc.) and the granularity of objects in the cache. In my case it is very inefficient to only cache complete pipelines and I need to have multiple levels of caching to optimize performance: besides caching the complete pipeline, also the individual sources that compise a traversable generation. I am not sure I understand what you mean with 'pluggable strategy'. Isn't this what we already have with the different pipeline implementations? Unico
Re: Event caching and CachedSource
Unico Hommes wrote: Corin Moss wrote: Hiya, I'm probably wrong here, but my understanding of the RefresherImpl is that the "timeout" is used to cache the page on a timed basis a la cron (although that could be what you mean). I'm not entirely sure how this helps with external validation directly :) What I've been playing around with in this class is a "refreshInFuture" method which does a one-time-only refresh in x seconds (probably 1 minimum to be safe from expiry problems :) Does that help you at all? I'm happy to contribute it if it would. Hmm, I guess my explanation was a little bit dense. Let me put this into context of the things I am trying to solve. We recently deployed a website that is backed by a webdav repository. Obviously this introduces some network overhead and particular parts of the site can get slow to generate. Especially those generated using TraversableGenerators. Traditional caching requires the objects on behalf of which it caches to provide a so called Validity object in order to determine whether its cached objects are still valid upon subsequent request. Most sources will try to determine this by providing a last modifed timestamp the cache can compare. Since retrieving the last modification time is an expensive operation in the case of a webdav source determining the validity of a cached response would be almost as expensive as generating a new response, requiring a webdav propfind for each source that is a member of the generated pipeline. Instead we employ a different strategy altoghether. We just tell the Cache that the source is always valid using a special Validity object. Cache invalidation will be accomplished by an external event triggering the removal of all pipelines the Source is associated with. This means though that a subsequent request will be slow since nothing is cached anymore. Perhaps even more importantly, since the pipelines can be huge objects the generation of which potentially requires many network calls it is much better to cache objects at the most atomic level: the source. Hence my interest in CachedSource. What I am proposing is to extend the capability of CachedSource so that an external event (say, someone saving a document in the webdav repository) will trigger the retrieval of a fresh one. But in the background, away from the critical path (asynchronous). Hope that explains it better. Ok, I'm done being dense - I get this now. My last response about pluggable cache strategies at the pipeline level is totally mismatched. At the Source level you don't have to worry about re-assembling the pipeline, so all you would have to do is re-cache the source when it is invalidated externally. Can you just handle this when the JMS/other event comes in? Currently it's just translated to an Event and sent to the Cache, but you could also contact the Source as well at that point? Where does the Source cache its data? In memory in a private member, or in the Store? Geoff
RE: Event caching and CachedSource
Hi, That makes perfect sense. I implemented exactly that this week, although I have to admit it is nowhere near as elegant as I would like. Basically in my case it's a database update / delete / insert triggering a cache clear, it is then a specific request generating the re-cache. I'm then employing something similar to the "I'm sorry" method mentioned previously. I've re-implemented the "test" cron job, using a new method on RefresherImpl which as I mentioned before, caches in the future (by adding a new scheduled job.) Effectively the request to trigger the re-cache might take a few milliseconds, but the job to recache the page goes on in the background for as longs as it takes. At this point, it is simply a call to an internal URL on loopback (very messy.) However, I see no reason that something like the XSPUtil include source couldn't be used. It would also pay to have a look at the BackgroundEnvironment - its only weakness is (as commented) that it doesn't support objects which try to access specific HTTPenvironment things (object maps etc.) Like I said - I've got this working at the moment - let me know if you want the code :) Corin -Original Message- From: Unico Hommes [mailto:[EMAIL PROTECTED] Sent: Wednesday, 3 March 2004 2:39 a.m. To: [EMAIL PROTECTED] Subject: Re: Event caching and CachedSource Corin Moss wrote: >Hiya, > >I'm probably wrong here, but my understanding of the RefresherImpl is >that the "timeout" is used to cache the page on a timed basis a la cron >(although that could be what you mean). > >I'm not entirely sure how this helps with external validation directly >:) > >What I've been playing around with in this class is a "refreshInFuture" >method which does a one-time-only refresh in x seconds (probably 1 >minimum to be safe from expiry problems :) > >Does that help you at all? I'm happy to contribute it if it would. > > > Hmm, I guess my explanation was a little bit dense. Let me put this into context of the things I am trying to solve. We recently deployed a website that is backed by a webdav repository. Obviously this introduces some network overhead and particular parts of the site can get slow to generate. Especially those generated using TraversableGenerators. Traditional caching requires the objects on behalf of which it caches to provide a so called Validity object in order to determine whether its cached objects are still valid upon subsequent request. Most sources will try to determine this by providing a last modifed timestamp the cache can compare. Since retrieving the last modification time is an expensive operation in the case of a webdav source determining the validity of a cached response would be almost as expensive as generating a new response, requiring a webdav propfind for each source that is a member of the generated pipeline. Instead we employ a different strategy altoghether. We just tell the Cache that the source is always valid using a special Validity object. Cache invalidation will be accomplished by an external event triggering the removal of all pipelines the Source is associated with. This means though that a subsequent request will be slow since nothing is cached anymore. Perhaps even more importantly, since the pipelines can be huge objects the generation of which potentially requires many network calls it is much better to cache objects at the most atomic level: the source. Hence my interest in CachedSource. What I am proposing is to extend the capability of CachedSource so that an external event (say, someone saving a document in the webdav repository) will trigger the retrieval of a fresh one. But in the background, away from the critical path (asynchronous). Hope that explains it better. Unico CAUTION: This e-mail and any attachment(s) contains information that is intended to be read only by the named recipient(s). It may contain information that is confidential, proprietary or the subject of legal privilege. This information is not to be used by any other person and/or organisation. If you are not the intended recipient, please advise us immediately and delete this e-mail from your system. Do not use any information contained in it. For more information on the Television New Zealand Group, visit us online at http://www.tvnz.co.nz
Re: Event caching and CachedSource
Corin Moss wrote: Hiya, I'm probably wrong here, but my understanding of the RefresherImpl is that the "timeout" is used to cache the page on a timed basis a la cron (although that could be what you mean). I'm not entirely sure how this helps with external validation directly :) What I've been playing around with in this class is a "refreshInFuture" method which does a one-time-only refresh in x seconds (probably 1 minimum to be safe from expiry problems :) Does that help you at all? I'm happy to contribute it if it would. Hmm, I guess my explanation was a little bit dense. Let me put this into context of the things I am trying to solve. We recently deployed a website that is backed by a webdav repository. Obviously this introduces some network overhead and particular parts of the site can get slow to generate. Especially those generated using TraversableGenerators. Traditional caching requires the objects on behalf of which it caches to provide a so called Validity object in order to determine whether its cached objects are still valid upon subsequent request. Most sources will try to determine this by providing a last modifed timestamp the cache can compare. Since retrieving the last modification time is an expensive operation in the case of a webdav source determining the validity of a cached response would be almost as expensive as generating a new response, requiring a webdav propfind for each source that is a member of the generated pipeline. Instead we employ a different strategy altoghether. We just tell the Cache that the source is always valid using a special Validity object. Cache invalidation will be accomplished by an external event triggering the removal of all pipelines the Source is associated with. This means though that a subsequent request will be slow since nothing is cached anymore. Perhaps even more importantly, since the pipelines can be huge objects the generation of which potentially requires many network calls it is much better to cache objects at the most atomic level: the source. Hence my interest in CachedSource. What I am proposing is to extend the capability of CachedSource so that an external event (say, someone saving a document in the webdav repository) will trigger the retrieval of a fresh one. But in the background, away from the critical path (asynchronous). Hope that explains it better. Unico
Re: Event caching and CachedSource
Unico Hommes wrote: Geoff Howard wrote: Unico Hommes wrote: Hi gang :-) A drawback I have been running into lately with eventcache mechanism is that it lacks the ability to remove heavy processing from the critical path. An event will simply remove a set of cached pipelines from the cache completely. Making the subsequent request for such a pipeline potentialy very slow. In applications where isolation is not a requirement this is an unnecessary drawback. Below sounds interesting and good but I haven't understood how event cache is related. AFAICS the only difference with eventcache and the other validity types is that for the others an invalid response is found in cache, but not used because it is found invalid after retrieval, but the event cache removes the entry at invalidation time since it knows it will never be useful. Both cases mean that the next person to request that resource will have to wait for the full generation. Maybe because I've only glanced at the refresher stuff? I guess you are right that at the Cache level nothing really changes. I overlooked that fact. I will do some more research on what is required to accomplish that in the case of the Refresher, but my idea was that the cached response would be served until a newly generated one could replace the stale one. Since the Refresher talks to the Cache directly, given the correct Validity strategy it can exercise full control over it. So, stale entries are served until they can be regenerated? I've looked for this in the past (someone called it the "I'm Sorry" pattern :) ) and at the time thought it might be better implemented by a pluggable strategy at the pipeline execution level. Currently we have: - Assemble Pipeline - Gather key from Pipeline - Check cache for key - If object for key found, check its validity - If valid, serve the cached response - Else, execute pipeline and serve it. the cache point pipeline, and the non-caching pipeline are other implementations of different strategies, but are accomplished by inheritance instead of composing a Strategy. I haven't ever thought it through carefully but it seems like making those last 5 steps (as a group) a pluggable strategy would allow things like this "I'm Sorry" pattern, as well as more powerful concepts like Stefano's proposed adaptive cache. Just raw thoughts at this point... Bottom line for me at moment is: do you foresee a need to modify the eventcache API to accomodate this need? I'm getting ready to start a discussion on changing the eventcache unstable status -- should I hold off? I don't think my current work will influence the eventcache API directly. Although I am not sure if the eventcache stuff can be considered stable enough. I still have some doubts about the ease of use of parts of it especially the way events are associated with cached objects. But lets discuss that separately. Ah, good. Ok, I'll pick up on another thread. Geoff
RE: Event caching and CachedSource
Unico Hommes wrote: > > BTW, how does CachedSource accomplish something different from the > > caching point pipeline (which seems to accomplish more, though I've > > never used it). > > > I never used it either. So I really don't know. Perhaps > someone else could comment on this? > The CachedSource caches a source :) whereas the caching point pipeline caches part of a pipeline. They could be used in combination but have different purposes. The caching point pipeline can cache the beginning of a pipeline upto the point, but this only works if all components in the pipeline support the caching; if not, nothing is cached. Now, imagine that you have a database source that fetches content from a slow database (or cms). The usual caching alg. tries to look if the source read by the generator has changed since the last call. In the case of the database source this is not possible and the pipeline is never cached. With the cached source the content fetched from the db is cached, reducing the requests to the back-end system and the generator can use this to test if the source has changed, allowing the pipeline (or a part of it) to be cached as well. HTH Carsten
Re: Event caching and CachedSource
Geoff Howard wrote: Unico Hommes wrote: Hi gang :-) A drawback I have been running into lately with eventcache mechanism is that it lacks the ability to remove heavy processing from the critical path. An event will simply remove a set of cached pipelines from the cache completely. Making the subsequent request for such a pipeline potentialy very slow. In applications where isolation is not a requirement this is an unnecessary drawback. Below sounds interesting and good but I haven't understood how event cache is related. AFAICS the only difference with eventcache and the other validity types is that for the others an invalid response is found in cache, but not used because it is found invalid after retrieval, but the event cache removes the entry at invalidation time since it knows it will never be useful. Both cases mean that the next person to request that resource will have to wait for the full generation. Maybe because I've only glanced at the refresher stuff? I guess you are right that at the Cache level nothing really changes. I overlooked that fact. I will do some more research on what is required to accomplish that in the case of the Refresher, but my idea was that the cached response would be served until a newly generated one could replace the stale one. Since the Refresher talks to the Cache directly, given the correct Validity strategy it can exercise full control over it. Bottom line for me at moment is: do you foresee a need to modify the eventcache API to accomodate this need? I'm getting ready to start a discussion on changing the eventcache unstable status -- should I hold off? I don't think my current work will influence the eventcache API directly. Although I am not sure if the eventcache stuff can be considered stable enough. I still have some doubts about the ease of use of parts of it especially the way events are associated with cached objects. But lets discuss that separately. I am looking at the excellent CachedSource stuff that is in the scratchpad area ATM and am wondering how it fits together with the eventcache stuff. One thing I am looking into right now is to write an EventAware Refresher implementation. For those unfamiliar with CachedSource, it is a Source wrapper that can cache a its delegate. Refreshing can be done either synchronously or asynchronously but currently only based upon a specified time-out. What I'd like to do is generalize this a bit in order to add the ability to externally trigger invalidation. For this however I think a modification to the Refresher interface is needed. BTW, how does CachedSource accomplish something different from the caching point pipeline (which seems to accomplish more, though I've never used it). I never used it either. So I really don't know. Perhaps someone else could comment on this? Cheers, Unico
Re: Event caching and CachedSource
Carsten Ziegeler wrote: Unico Hommes wrote: Hi gang :-) A drawback I have been running into lately with eventcache mechanism is that it lacks the ability to remove heavy processing from the critical path. An event will simply remove a set of cached pipelines from the cache completely. Making the subsequent request for such a pipeline potentialy very slow. In applications where isolation is not a requirement this is an unnecessary drawback. I am looking at the excellent CachedSource stuff that is in the scratchpad area ATM and am wondering how it fits together with the eventcache stuff. One thing I am looking into right now is to write an EventAware Refresher implementation. For those unfamiliar with CachedSource, it is a Source wrapper that can cache a its delegate. Refreshing can be done either synchronously or asynchronously but currently only based upon a specified time-out. What I'd like to do is generalize this a bit in order to add the ability to externally trigger invalidation. For this however I think a modification to the Refresher interface is needed. Instead of: Refresher { refresh(key,uri,timeout); periodicallyRefresh(key,uri,timeout); } I'd like to remove timeout semantics from the interface: Refresher { refresh(key,uri,params); } I don't think there is currently a reason for there being two the separate methods. So I think we can safely combine them into one. But I guess I am looking at Carsten for confirmation... :-) Although you actually don't need my confirmation as it's not my but *our* source, here it is :) OK, thanks. Just trying exclude the possibility of overlooking something and allowing you the oppertunity to comment on any changes beforehand. I think this makes sense and I think we should also move this out of the scratchpad afterwards as well. OK, agreed. But where should it go. Unico
Re: Event caching and CachedSource
Unico Hommes wrote: Hi gang :-) A drawback I have been running into lately with eventcache mechanism is that it lacks the ability to remove heavy processing from the critical path. An event will simply remove a set of cached pipelines from the cache completely. Making the subsequent request for such a pipeline potentialy very slow. In applications where isolation is not a requirement this is an unnecessary drawback. Below sounds interesting and good but I haven't understood how event cache is related. AFAICS the only difference with eventcache and the other validity types is that for the others an invalid response is found in cache, but not used because it is found invalid after retrieval, but the event cache removes the entry at invalidation time since it knows it will never be useful. Both cases mean that the next person to request that resource will have to wait for the full generation. Maybe because I've only glanced at the refresher stuff? Bottom line for me at moment is: do you foresee a need to modify the eventcache API to accomodate this need? I'm getting ready to start a discussion on changing the eventcache unstable status -- should I hold off? I am looking at the excellent CachedSource stuff that is in the scratchpad area ATM and am wondering how it fits together with the eventcache stuff. One thing I am looking into right now is to write an EventAware Refresher implementation. For those unfamiliar with CachedSource, it is a Source wrapper that can cache a its delegate. Refreshing can be done either synchronously or asynchronously but currently only based upon a specified time-out. What I'd like to do is generalize this a bit in order to add the ability to externally trigger invalidation. For this however I think a modification to the Refresher interface is needed. BTW, how does CachedSource accomplish something different from the caching point pipeline (which seems to accomplish more, though I've never used it). Geoff
RE: Event caching and CachedSource
Unico Hommes wrote: > > Hi gang :-) > > A drawback I have been running into lately with eventcache > mechanism is that it lacks the ability to remove heavy > processing from the critical path. An event will simply > remove a set of cached pipelines from the cache completely. > Making the subsequent request for such a pipeline potentialy > very slow. In applications where isolation is not a > requirement this is an unnecessary drawback. > > I am looking at the excellent CachedSource stuff that is in > the scratchpad area ATM and am wondering how it fits together > with the eventcache stuff. One thing I am looking into right > now is to write an EventAware Refresher implementation. > > For those unfamiliar with CachedSource, it is a Source > wrapper that can cache a its delegate. Refreshing can be done > either synchronously or asynchronously but currently only > based upon a specified time-out. What I'd like to do is > generalize this a bit in order to add the ability to > externally trigger invalidation. > > For this however I think a modification to the Refresher > interface is needed. > > Instead of: > > Refresher { > refresh(key,uri,timeout); > periodicallyRefresh(key,uri,timeout); > } > > I'd like to remove timeout semantics from the interface: > > Refresher { > refresh(key,uri,params); > } > > I don't think there is currently a reason for there being two > the separate methods. So I think we can safely combine them > into one. But I guess I am looking at Carsten for confirmation... :-) > Although you actually don't need my confirmation as it's not my but *our* source, here it is :) I think this makes sense and I think we should also move this out of the scratchpad afterwards as well. Carsten
RE: Event caching and CachedSource
Hiya, I'm probably wrong here, but my understanding of the RefresherImpl is that the "timeout" is used to cache the page on a timed basis a la cron (although that could be what you mean). I'm not entirely sure how this helps with external validation directly :) What I've been playing around with in this class is a "refreshInFuture" method which does a one-time-only refresh in x seconds (probably 1 minimum to be safe from expiry problems :) Does that help you at all? I'm happy to contribute it if it would. Corin -Original Message- From: Unico Hommes [mailto:[EMAIL PROTECTED] Sent: Wednesday, 3 March 2004 12:44 a.m. To: [EMAIL PROTECTED] Subject: Event caching and CachedSource Hi gang :-) A drawback I have been running into lately with eventcache mechanism is that it lacks the ability to remove heavy processing from the critical path. An event will simply remove a set of cached pipelines from the cache completely. Making the subsequent request for such a pipeline potentialy very slow. In applications where isolation is not a requirement this is an unnecessary drawback. I am looking at the excellent CachedSource stuff that is in the scratchpad area ATM and am wondering how it fits together with the eventcache stuff. One thing I am looking into right now is to write an EventAware Refresher implementation. For those unfamiliar with CachedSource, it is a Source wrapper that can cache a its delegate. Refreshing can be done either synchronously or asynchronously but currently only based upon a specified time-out. What I'd like to do is generalize this a bit in order to add the ability to externally trigger invalidation. For this however I think a modification to the Refresher interface is needed. Instead of: Refresher { refresh(key,uri,timeout); periodicallyRefresh(key,uri,timeout); } I'd like to remove timeout semantics from the interface: Refresher { refresh(key,uri,params); } I don't think there is currently a reason for there being two the separate methods. So I think we can safely combine them into one. But I guess I am looking at Carsten for confirmation... :-) Cheers, Unico CAUTION: This e-mail and any attachment(s) contains information that is intended to be read only by the named recipient(s). It may contain information that is confidential, proprietary or the subject of legal privilege. This information is not to be used by any other person and/or organisation. If you are not the intended recipient, please advise us immediately and delete this e-mail from your system. Do not use any information contained in it. For more information on the Television New Zealand Group, visit us online at http://www.tvnz.co.nz
Event caching and CachedSource
Hi gang :-) A drawback I have been running into lately with eventcache mechanism is that it lacks the ability to remove heavy processing from the critical path. An event will simply remove a set of cached pipelines from the cache completely. Making the subsequent request for such a pipeline potentialy very slow. In applications where isolation is not a requirement this is an unnecessary drawback. I am looking at the excellent CachedSource stuff that is in the scratchpad area ATM and am wondering how it fits together with the eventcache stuff. One thing I am looking into right now is to write an EventAware Refresher implementation. For those unfamiliar with CachedSource, it is a Source wrapper that can cache a its delegate. Refreshing can be done either synchronously or asynchronously but currently only based upon a specified time-out. What I'd like to do is generalize this a bit in order to add the ability to externally trigger invalidation. For this however I think a modification to the Refresher interface is needed. Instead of: Refresher { refresh(key,uri,timeout); periodicallyRefresh(key,uri,timeout); } I'd like to remove timeout semantics from the interface: Refresher { refresh(key,uri,params); } I don't think there is currently a reason for there being two the separate methods. So I think we can safely combine them into one. But I guess I am looking at Carsten for confirmation... :-) Cheers, Unico