The least that can be said is that people on this list are eager to help. Thanks for the ideas, and keep them coming.

About this "gen(e)ral application architecture question", I would just like to narrow down the scope a bit, if it's allright for everyone.

I do already have the full-text indexing and search engine, and the application based on it, so I don't really need more choices there.
Also, the storage architecture for the original documents is fixed.
It is based on the files individually stored on disk, spread out in some kind of multi-level directory structure, not in a database. To put this in another way : I am grateful for the miscellaneous suggestions in those respects, but I cannot afford right now to change those parts of the system.

My question was thus - in my intention - centered basically the link between the two : once the user has found and displayed on a web page, via the search engine, the meta-data and text of a document (stored within the indexing and search engine), and next to it a URL link to the original document itself, how to deliver this original document as efficiently as possible. The URL link to the document contains an "abstract" path identifier (not a path) which must be translated into a real path on the document server, which itself is located on the same host or on a different host. The translation cannot be calculated, it needs to use a table associating identifiers to the path they represent, and this table is at the base a simple flat text file located on the document server, and must remain so for the time being. The translation is currently effected by a separate single-process "document server", and it is this document server that I am thinking of replacing by an Apache2/mp2 based dedicated server.

The first part of the question was thus aimed at finding out if rewriting the document server on the base of Apache2/mod_perl2 seemed a good idea, rather than developing oneself a stand-alone forking or threaded new document server.

The general gist of the responses seems to indicate that it is, or at least nobody so far came in against it.

The second part of the question was to sound out what solutions existed, in a multi-process or multi-threaded document server thus based on AP2/mp2, to share the id/path translation table as much as possible between document request handlers so as to avoid mutiple reads and parses of this table, which originally is in a flat text file on disk and for the time being must remain so.
I think that part has now been "forked off" to the separate thread
"Re: Sharing data between many requests".

Thanks to all.



Reply via email to