ben syverson wrote:
That's not how it works. The entire cache IS invalidated when a new node is added.

What I'm saying is that you only invalidate the entire cache right now because you have no way of telling which nodes are affected by the change. If you had a full-text index, you could efficiently determine which nodes are affected by a change and only invalidate them.


But when you request one of the nodes, it checks to see what the new nodes are. It then searches the node text for those new node names. If there are no matches, it revalidates the cache file (without regenerating it), and serves it. Otherwise, it regenerates the node.

Yes, I understood all of that. That's what I meant by "regenerates." I'm suggesting an approach that lets you skip revalidating, since the cache would only be invalidated on documents that actually contained matches.


But if you have 1,000,000 documents (or even 10,000), do you really want to search through every single document every time a node is added?

Have you ever used an inverted word index? This is what full-text search usually is based on. Searching a million documents efficiently should be no big deal. You also only have to do this as part of the job of creating a new node. You don't need to do it when serving files.


Furthermore, do you really want every document loaded into the MySQL database?

I suggested MySQL as an easy starting point, since it allows incremental updates to the text index. There are many things you could use, and some will have more compact storage than others.


My thinking is that if you have many documents, odds are only a small subset are being actively viewed, so it doesn't make sense to keep those unpopular documents constantly up-to-date...

You can use this approach for invalidation and still wait until the pages are requested to regenerate them.


If the system is running fast enough and not having scalability problems, there's no reason for you to get into making changes like what I'm describing. I thought you were concerned about the time wasted by revalidating unchanged documents, and this approach would eliminate that.

- Perrin

Reply via email to