Hello,

I'm curious how the "pros" would approach an interesting system design problem I'm facing. I'm building a system which keeps track of user's movements through a collection of information (for the sake of argument, a Wiki). For example, if John moves from the "dinosaur" page to the "bird" page, the system logs it -- but only once a day per connection between nodes per user. That is, if Jane then travels from "dinosaur" to "bird," it will log it, but if "John" travels moves back to "dinosaur" from "bird," it won't be logged. The result is a log of every unique connection made by every user that day.

The question is, how would you do this with the least amount of strain on the server?

Currently, I'm using Squid to switch between thttpd (for non-"Wiki" files) and mod_perl, with the metadata in MySQL, and the text data in flatfiles (don't worry, everything's write-once). The code I'm using to generate the "Wiki" pages is fairly fast as I'm testing it, but it's not clear (and impossible to test) how well it will scale as more nodes and users are added. As a defensive measure, I'm caching the HTML output of the mod_perl handler, but the cached files aren't being served by thttpd, because the handler still needs to register where people are going. So every time a page is requested, the handler looks and sees if this user has made this connection in the past 24 hours, if not log it, and then either serve the cached file or generate a new one (they go out of date sporadically).

My initial thoughts on how to improve the system were to relieve mod_perl of having to serve the files, and instead write a perl script that would run daily to analyze the day's thttpd log files, and then update the database. However, certain factors (including the need to store user data in cookies, which have to be checked against MySQL) make this impossible.

Am I on the right track with this?

- ben



Reply via email to