Paul Rubin wrote:
If you are just trying to avoid too many files in a directory, another
option is to put files in subdirectories like:

base = struct.pack('i', hash(page_name))
base = base.encode('base64').strip().strip('=')
filename = os.path.join(base, page_name)


Using subdirectories certainly keeps directory size down, and it's a
good idea for MoinMoin given the way MoinMoin uses the file system.
But for really big wikis, I think using the file system like that
isn't workable even with subdirectories.  Plus, there's the issue of
how to find backlinks and how to do full text search.

If the data has to be somewhere, and you have to have relatively random access to it (i.e., access any page; not necessarily a chunk of a page), then the filesystem does that pretty well, with lots of good features like caching and whatnot. I can't see a reason not to use the filesystem, really.


For backlink indexing, that's a relatively easy index to maintain manually, simply by scanning pages whenever they are modified. The result of that indexing can be efficiently put in yet another file (well, maybe one file per page).

For full text search, you'll want already-existing code to do it for you. MySQL contains such code. But there's also lots of that software that works well on the filesystem to do the same thing.

A database would be important if you wanted to do arbitrary queries combining several sources of data. And that's certainly possible in a wiki, but that's not so much a scaling issue as a flexibility-in-reporting issue.

--
Ian Bicking  /  [EMAIL PROTECTED]  /  http://blog.ianbicking.org
--
http://mail.python.org/mailman/listinfo/python-list

Reply via email to