Caveat: I have not yet installed Lucerne or begun to experiment with it yet. I have scanned the FAQ, but don't see anything that addresses this question. Pardon the somewhat slow buildup to the question below, but I want to set the context.
I am developing an application for 'text mining' adverse event reports in the pharmaceutical industry. The querying will be driven by 'dictionaries', 'thesauri', 'taxonomies' or 'ontologies' (pick your favorite) of drug names, compounds, and medical conditions. These thesauri are quite large. For example, our drug name thesaurus is on the order of 60,000+ terms. I was planning on using Verity to accomplish my first approach to shallow text mining since Verity is our corproate-wide search engine technology and it supports a number of relevant features (including 'topic sets' for representing the taxonomies). However, Verity imposes restrictions on the size of topic sets that currently prohibit me from using it with our large taxonomies. It is not obvious that they will be able to fix this problem in the timeframe I need. Thus I am turning to other alternatives, and Lucerne appears to be one. So given that context, my question is this: Does anyone on this list have experience attempting to use very large queries (potentially thousands or tens of thousands of terms) in Lucerne? Does anyone have any knowledge of design or implementation details that would inhibit the use of such queries? Does anyone have any idea of what the performance would be like in retrieving via such queries? -------------------------------------- Gary H. Merrill Director and Principal Scientist, New Applications Data Exploration Sciences GlaxoSmithKline Inc. (919) 483-8456 --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]