Robert Watkins wrote:

Thank you, David, for your very interesting suggestions.

You're welcome, it's a fun problem.

Unless there's a requirement to the contrary, I would try to only execute the queries, say, 1x/day, and try to avoid all the cool optimizations by default - this uses one of those first principles of optimization, "don't do it".

I'd also suggest you look at how you invoke the (Lucene) JVM - a combination of the -server flag, a big argument to -Xmx (max memory), use of a RAMDirectory, and maybe even -XX:CompileThreshold=10 should give you better-than-default performance. The last secret arg tells the JVM to compile methods sooner.

 As I said
earlier in this thread somewhere, we are still at the exploratory
stage (considering Lucene as a replacement for a commercial engine)
so it will be some time before I can get my hands dirty, but you have
certainly given me some good ideas. Answers to your questions are
below.

-- Robert

On Thu, 17 Mar 2005, David Spencer wrote:


Fun, interesting question - maybe you could elaborate on the
requirements a bit.


We deliver on-line content -- journals, reference works and the like. Users can save their own queries and set an alert on any of the saved queries. As new documents get published (quite a few each day) they are matched against the saved queries and emails are generated to let people know that such-and-such an article was matched by one of their saved queries.


[ snipped ]

- How complicated are the queries? Are they essentially a list of words
ANDed together, or are they generalized queries against multiple fields
with things like fuzzy term expansion and phrase matches allowed?


The queries can achieve any level of complexity, as they are written by users. There are certainly multiple fields, wildcards, sounds-like, phrase matches, etc. etc.


- How big are the incoming documents?


While most of the incoming documents are relatively small (under 100 Kb), some can be fairly large (500 Kb and more)

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Reply via email to