JB - I can shed some light on this kind of system, and give you some suggestions about building one. Just about every dynamic, scalable site employs some form of caching, and the following is a model I have found useful in increasing the scalability of my sites.
The idea behind caching is to pre-generate as much content as possible to reduce overhead on your Web server. One of the big 'hits' we take as developers is with database access, so, naturally, caching data is probably going to be a big part of your strategy. There are a number of issues with doing this, however: deciding what data to cache, indexing your data store, handling inserts / updates / deletes, handling record locking, detecting the presence of data, error handling for when something goes wrong, all of these things need to be thought out before building a system like this. Now, it's great that you want to use cached queries to store data and QoQ's to access it. But I have found this approach to be a little slower than dumping the same information into structures. For instance, in one application I have a list of values specific to congressional districts and use a structure indexed with each congressional district names to contain all the data for that area. This includes multiple structures, string and numeric values. But structure lookups are fast compared to QoQ, at least in CF5. Whenever a user requests information for a particular distrct, the data is pulled directly out of that structure. Whenever a user adds / updates / deletes information on that particular district, I update the database and then the key in the structure they changed. This process comes out like this: 1) CFQuery with Insert / Update / Delete 2) Use a readonly CFLock to copy the structure out of the application scope 3) Perform the action on the copy of the structure 4) Use an exclusive CFLock to copy the new structure back into the application scope By handling changes to the cached data in this way, you can maintain synchronicity between the database and the server without having to run a select statement. This method, however, also illustrates the source of a lot of problems. Locking that shared information can be really troublesome in several situations: 1) Dirty reads. Between the time the data structure is initially copied out of the application scope to when it is copied back in, another user can copy the same data prior to it being updated then write it back in. The first user's changes would be lost. 2) When you have a lot of users, the high volume of requests can make the update take a long time, or users can encounter errors. Each exclusive lock means only one user has access to that data and all others wait. Usually, this is easier to work around than the next problem. 3) When you are operating in a clustered environment, there is no way to share application scopes between servers. You need a mechanism for alerting each server it is time to update the cached data. There are a number of ways around these problems. I maintain a lot of metadata regarding each data structure, including when it was last updated - down to the millisecond. In the process above, I will add a check to see when the last time the structure was updated and use that to figure out whether or not this is a legit update. In the event that a dirty read has occurred, I have an alternate process that kicks in: 1) Exclusively lock the data structure to be changed 2) Change it directly 3) Release the lock. This process works, for the most part, without a serious performance hit (on our Red Hat servers, the difference is +- 3ms). There have been times though, under heavy load, when this process lead to the number of locks overwhelming CF's ability to deal with them. After this occurred several times, we decided to move to a clustered server approach. The problem with data caching and clustered servers is that servers do not share data scopes. They can use a common set of session variables, but the ideal would be for them to share the application scope (since that's where all the data is). I get around this problem by passing data back and forth in the database. I have a separate database dedicated exclusively to 'state' reporting. Each time I do an update to a specific data structure that needs to be mirrored on another server, I record the server, the structure, the key, and the time last updated. On each page request, each server runs a quick select statement to check (based on internal IP) if it needs to update any data. When one server needs to update information, it follows a process similar to the one used for record updates: 1) Use a readonly CFLock to copy the key of the structure out of the application scope 2) Perform the action on the copy of the key of the structure 3) Use an exclusive CFLock to copy the new key back into the application scope The performance hit on this one depends on the particular data structure, and the goal is to make the process invisible to the user. The bechmark I use is 5 +- 3ms per record, if I cannot get it to take less time I will go another route. Now, despite all this there are still cracks in the system. Occassionally, a really nasty dirty read happens and I hear about it and it drives me up a wall trying to sort understand what went wrong. One thing I do to keep the data current is to refresh one data structure every ten minutes. I have a scheduled task that loops through a list of all the data structures on each server and refreshes one each time the server refreshes. This procedure takes an average of 45 seconds each time it runs, so, obviously, the data needs to be handled in a certain way. What I do to handle the refreshes is: 1) Call the query and construct the data 2) Lock the application scope 3) Copy the new structure and update all appropriate metadata about each structure 4) Release the lock This strategy really cuts down on dirty reads - we specifically track them using error pages and the error rate is less than .01% for any hour in a 24-hour period (and most days far less than that). It has been as high as 3% since we moved to clustered servers, but has not moved above .01 in the last 9 months. And there have been a number of issues with change management. Generally, I use multiple actions for each data structure, for instance, on the district structure there are files for updating the whole structure, for updating a specific key in the structure, for updating a key of a key, etc. Every once in a while, I will make an update to handle the entire data structure that is not reflected on all the other levels and chaos will ensue on our testing server. It is important that processes be built around a site like this to ensure changes in how the data structure is handled occur for every action that affects the structure. Of course, this is why we use Visio. This overly-long message started off with a discussion about caching in general. The other big thing I do is cache dynamic content, and I use several schemes for caching based on user permissions. If anyone finds this useful, let me know and I might be willing to write that up as well. M -----Original Message----- From: James Blaha [mailto:[EMAIL PROTECTED] Sent: Tuesday, July 15, 2003 1:35 PM To: CF-Talk Subject: Re: I need some good old advice. Jon, Your comment has the meat I'm looking for! Thanks. What exactly do you mean by: "schedule the cache update, and build in logic on each page request to test if the cache exists, if not, requery immediately" Can you please give me some kind of an example of what the templates involved would have for code and how you would use it? What do you mean by: If possible, abstract access to the cache as much as possible right now...since it is your datastore, you are bound to think of new ways to use it, or need to use it down the road. FYI: My data from the table involved is affected on two sides. On one side there are users on the web entering data which goes into a table. On the other side is a BackOffice were staff edits and queries that data. Querying, updates and deletions only happen in the BackOffice. Regards, JB jon hall wrote: >If this is a high traffic site, or the query takes extraordinarily long >to execute, keep in mind the people hitting the site while the query is >recaching, and right after the server is restarted, etc. They could get >errors, bad data, or long delays... > >I've got an app that does something similar now where we chose to >schedule the cache update, and build in logic on each page request to >test if the cache exists, if not, requery immediately. It works fairly >well under load...other than a few possible CF5 bugs with cached >queries under load that I can't control, but thankfully are rare. > >If possible, abstract access to the cache as much as possible right >now...since it is your datastore, you are bound to think of new ways to >use it, or need to use it down the road. > > > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~| Archives: http://www.houseoffusion.com/cf_lists/index.cfm?forumid=4 Subscription: http://www.houseoffusion.com/cf_lists/index.cfm?method=subscribe&forumid=4 FAQ: http://www.thenetprofits.co.uk/coldfusion/faq Your ad could be here. Monies from ads go to support these lists and provide more resources for the community. http://www.fusionauthority.com/ads.cfm Unsubscribe: http://www.houseoffusion.com/cf_lists/unsubscribe.cfm?user=89.70.4