[EMAIL PROTECTED] a écrit :
I am working on a program to index/search chemical element/compound. Say I write an analyzer to filter out chemical terms, such as H2O. I noticed that I can specify a tocken's type. Can I construct a token as new Token ("H2", start, end, "chem");
My questions is
How do I search all the tokens with "chem" type token, such as H2O, O2, etc? Any sample like this?
If this approach doesn't work, what's the best approach?
You may assign a type to the tokens, and then you may filter them according to their type *but* the index forgets this info since it stores *terms* (field/value pairs).
Compare : http://lucene.apache.org/java/docs/api/org/apache/lucene/analysis/Token.html and http://lucene.apache.org/java/docs/api/org/apache/lucene/index/Term.html
Notice however that the terms have also their relative position (the Token's positionIncrement, default = 1) stored in the index ; this allows proximity searches.
So... how to do ?
1) use a dedicated field "chem" where only chemical content is allowed (filter out every token whose type is different from "chem")
2) manipulate your termText : "chem_H2" ; the same for your queries
3) play with the query rather than with the index content : filter out what is not chemical
There may be other solutions...
Cheers,
p.b.
--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]