On Feb 5, 2005, at 5:38 PM, Perrin Harkins wrote:

It sounds like the problem is not so much that mod_perl is serving cached HTML, since that is easily improved with a reverse proxy server, but rather that your entire cache gets invalidated whenever anyone creates a new node, and mod_perl has to spend time regenerating pages that usually don't actually need to be regenerated.

That's not how it works. The entire cache IS invalidated when a new node is added. But when you request one of the nodes, it checks to see what the new nodes are. It then searches the node text for those new node names. If there are no matches, it revalidates the cache file (without regenerating it), and serves it. Otherwise, it regenerates the node.


To reiterate, the node is NOT regenerated until it actually needs to be -- but it is analyzed on every view to see if this is the case.

The way I would do this is by adding full-text search capabilities on your data, using something like MySQL's text search columns which allow you to index new documents on the fly rather than rebuilding the whole index. Then, when someone adds a new node called "Dinosaurs", you do a search for all nodes that contain the word "Dinosaurs" and invalidate their caches.

But if you have 1,000,000 documents (or even 10,000), do you really want to search through every single document every time a node is added? Furthermore, do you really want every document loaded into the MySQL database?


My thinking is that if you have many documents, odds are only a small subset are being actively viewed, so it doesn't make sense to keep those unpopular documents constantly up-to-date...

- ben



Reply via email to