First of all, thanks for the suggestions, everyone! It's giving me a lot to chew on. I now realize (sound of hand smacking forehead) that the main problem is not the list of links and tracking users, but rather the inline Wiki links:

On Feb 4, 2005, at 8:58 AM, Malcolm J Harwood wrote:

What are you doing with the data once you have it? Is there any reason that it
needs to be 'live'?

Sort of -- imagine our Wiki scenario, but without delimiters (I think this is rather common in the .biz world). So if the "dinosaur" node contains:


"Some scientists suggest that dinosaurs may actually have evolved from birds."

It'll automagically link to the "birds" node. However lets say the node "scientist" node doesn't yet exist -- but when it does, we want it to link up. I wouldn't say it "needs to be live," but it would be nice to get that link happening sooner rather than later.

The way the system works now, it is live. Every time a page is generated, it stores the most recent node ID along with the cached file. The next time the page is viewed, it checks to see what node is the most recent, and compares it against what was the newest when the file was cached. If they're the same, nothing has changed, and the cache file is served. If they're different, the system looks through the node additions that happened since the node was cached, and sees if the original node's text contains any of those node names. If it does, it regenerates, recaches and serves the page. Otherwise, it revalidates the cache file by storing the new most recent node ID with the old cache file, and serves it up.

The problem with this is that 99% of the time, the document won't contain any of the new node names, so mod_perl is wasting most of its time serving up cached HTML.

However, If you use a cron job log-analysis approach, every time a new node is added, you have to search through EVERY node's text to see if it needs a link to the new node. Image this with 1,000,000 two page documents.

So maybe my system is as optimized as it's going to get?

- ben



Reply via email to