Luke Shannon wrote:
Hello;
Does anyone see a problem with the following approach?
No, no problem with it and it's in fact what my "Wordnet Query
Expansion" sandbox module does.
The nice thing about Lucene is you at least have the option of doing
things the other way - you can write a custom Analyzer that puts all
synonyms at the same token offset so they appear to be in the same place
in the token stream. Thinking about it...this approach, with the
Analyzer, lets user search for phrases which would match a synonym, so,
using your example below, the text "bright red engine" would be matched
by either phrase "bright red" or "bright colour". Doing the query
expansion is trickier if you allow phrases.
For synonyms, rather than putting them in the index, I put the original term
and all the synonyms in the query.
Every time I create a query, I check if the term has any synonyms. If it
does, I create Boolean Query OR'ing one Query object for each synonym.
So if I have a synoym list:
red = colour, primary, stop
And someone wants to search the desc field for the red, I would end up with
something like:
( (desc:*red*) (desc:*colout*) (desc:*stop*) ).
I don't like that bit about substring terms, but if it's right for you
ok - if you insist on loosening things I'd consider fuzzy terms
(desc:red~ ...etc).
Now the synonyms would'nt be in the index, the Query would account for all
the possible synonym terms.
Luke
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]