>> documents contain very long strings for chemical substances, users are
>> interested in certain parts of the string e.g. find all documents that
>> comprise "*foo*" be it "1-foo-bar" or "rab-oof-13-foonyl-naphthalene").

> So you're saying you want users to be able to search for "of-13" and 
> match that second one?  User's really are demanding that?

Yes and yes. Users range from Information Professionals to "naive" end
users.
If there's a string like "N-(t-Butyl)-N-(3,5-dinitrobenzoyl)-nitroxyl" users
can be expected to search for "dinitro", "3,5-dinitro", "nitrobenz" etc.

There are also sequences of amino acids or DNA that users might want to
match
partially. 

> But, keep in mind that WildcardQuery itself does support "*oo*" and it 
> would work as expected (although with the performance caveat if the 
> index is huge).  If you want QueryParser to support a leading wildcard 
> character, you would have to customize it yourself.

That's what I have been pondering about the whole morning and I'm going to
give
it a try. As far as I have tested, Lucene carries out these queries far
quicker 
than your average relational DB text retrieval tool :-)

Thanks a lot for your comments!

Best regards,

René

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to