Thank you, David, for your very interesting suggestions. As I said
earlier in this thread somewhere, we are still at the exploratory
stage (considering Lucene as a replacement for a commercial engine)
so it will be some time before I can get my hands dirty, but you have
certainly given me some good ideas. Answers to your questions are
below.

-- Robert

On Thu, 17 Mar 2005, David Spencer wrote:

> Fun, interesting question - maybe you could elaborate on the
> requirements a bit.
>
We deliver on-line content -- journals, reference works and the like.
Users can save their own queries and set an alert on any of the saved
queries. As new documents get published (quite a few each day) they
are matched against the saved queries and emails are generated to let
people know that such-and-such an article was matched by one of their
saved queries.

> [ snipped ]
>
> - How complicated are the queries? Are they essentially a list of words
> ANDed together, or are they generalized queries against multiple fields
> with things like fuzzy term expansion and phrase matches allowed?
>
The queries can achieve any level of complexity, as they are written by
users. There are certainly multiple fields, wildcards, sounds-like, phrase
matches, etc. etc.

> - How big are the incoming documents?
>
While most of the incoming documents are relatively small (under 100 Kb),
some can be fairly large (500 Kb and more)

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to