Caveat:  I have not yet installed Lucerne or begun to experiment with it
yet.  I have scanned the FAQ, but don't see anything that addresses this
question.  Pardon the somewhat slow buildup to the question below, but I
want to set the context.

I am developing an application for 'text mining' adverse event reports in
the pharmaceutical industry.  The querying will be driven by
'dictionaries', 'thesauri',  'taxonomies' or 'ontologies' (pick your
favorite) of drug names, compounds, and medical conditions.  These thesauri
are quite large.  For example, our drug name thesaurus is on the order of
60,000+ terms.

I was planning on using Verity to accomplish my first approach to shallow
text mining since Verity is our corproate-wide search engine technology and
it supports a number of relevant features (including 'topic sets' for
representing the taxonomies).  However, Verity imposes restrictions on the
size of topic sets that currently prohibit me from using it with our large
taxonomies.  It is not obvious that they will be able to fix this problem
in the timeframe I need.  Thus I am turning to other alternatives, and
Lucerne appears to be one.

So given that context, my question is this:  Does anyone on this list have
experience attempting to use very large queries (potentially thousands or
tens of thousands of terms) in Lucerne?  Does anyone have any knowledge of
design or implementation details that would inhibit the use of such
queries?  Does anyone have any idea of what the performance would be like
in retrieving via such queries?

--------------------------------------
Gary H. Merrill
Director and Principal Scientist, New Applications
Data Exploration Sciences
GlaxoSmithKline Inc.
(919) 483-8456




---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to