Re: Genral application architecture question

Ben van Staveren Tue, 25 Sep 2007 22:01:35 -0700


Any pearls of wisdom available so far ?

Yes, why not integrate the actual retrieval into the site that peopleuse to do the searches with? Or is that physically separate from theplace you store documents at?

Of course, if this document server is then a separate Apacheinstance, all the application document links would have to berewritten as
http:://this_other_server/getdoc/doc-id
and that is quite some work. (I also kind of dislike the idea ofthe end-user browsers accessing the document-server directly.)
So, having also followed some other threads on this list, I amwondering which other solution would be available, such asmod_rewrite or mod_proxy and the like in the "front" server, andthe "document server" being located "behind" that one
Any ideas or recommendations around this subject ?
(Maybe also ideas about relative performamce issues)

Well, using mod_proxy to do some reverse proxying would work, butusers would still be able to more or less 'browse' the document treeif they know where to look. No real way around that one ;)

As a third concern for the same :
One of the things that the document server must do in order todecode a "document-id" into a real path on the disk, is to read acouple of relatively large index files, parse them and store theminto memory for later referral (at the moment, it's in a perl hash).
I would of course like to avoid having to do that for each request.
Ideally, I would like to have these files read amd parsed once intosome shareable table accessible by all Apache/mp2 children, andusable read-only by all concurrent request handlers. But alsothese files do change from time to time (as new documents areadded), so they must be re-read and re-parsed from time to time(when their last mod-time changes).This is of course easy in a single-threaded server, but I don'tquite see how to do that best in an Apache/mp2 context.

I suggest setting up MySQL and storing that information in there --depending on the type of documents you search through, you couldpotentially even put the documents in the database as well, althoughthat's not really a 'good' way of doing it. So in the end, if youstore those indexes in the database, you get the sharedaccessibility, and you can always use a cronjob to update it.


Just my 2 cents :)

Re: Genral application architecture question

Reply via email to