Thank you, David, for your very interesting suggestions. As I said earlier in this thread somewhere, we are still at the exploratory stage (considering Lucene as a replacement for a commercial engine) so it will be some time before I can get my hands dirty, but you have certainly given me some good ideas. Answers to your questions are below.
-- Robert On Thu, 17 Mar 2005, David Spencer wrote: > Fun, interesting question - maybe you could elaborate on the > requirements a bit. > We deliver on-line content -- journals, reference works and the like. Users can save their own queries and set an alert on any of the saved queries. As new documents get published (quite a few each day) they are matched against the saved queries and emails are generated to let people know that such-and-such an article was matched by one of their saved queries. > [ snipped ] > > - How complicated are the queries? Are they essentially a list of words > ANDed together, or are they generalized queries against multiple fields > with things like fuzzy term expansion and phrase matches allowed? > The queries can achieve any level of complexity, as they are written by users. There are certainly multiple fields, wildcards, sounds-like, phrase matches, etc. etc. > - How big are the incoming documents? > While most of the incoming documents are relatively small (under 100 Kb), some can be fairly large (500 Kb and more) --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]