Re: Handling Synonyms

2005-02-21 Thread David Spencer
Luke Shannon wrote:
Does anyone see a problem with the following approach?
No, no problem with it and it's in fact what my "Wordnet Query 
Expansion" sandbox module does.

The nice thing about Lucene is you at least have the option of doing 
things the other way - you can write a custom Analyzer that puts all 
synonyms at the same token offset so they appear to be in the same place 
in the token stream. Thinking about it...this approach, with the 
Analyzer, lets user search for phrases which would match a synonym, so, 
using your example below, the text "bright red engine" would be matched 
by either phrase "bright red" or "bright colour". Doing the query 
expansion is trickier if you allow phrases.

For synonyms, rather than putting them in the index, I put the original term
and all the synonyms in the query.
Every time I create a query, I check if the term has any synonyms. If it
does, I create Boolean Query OR'ing one Query object for each synonym.
So if I have a synoym list:
red = colour, primary, stop
And someone wants to search the desc field for the red, I would end up with
something like:
( (desc:*red*) (desc:*colout*) (desc:*stop*) ).
I don't like that bit about substring terms, but if it's right for you 
ok - if you insist on loosening things I'd consider fuzzy terms 
(desc:red~ ...etc).

Now the synonyms would'nt be in the index, the Query would account for all
the possible synonym terms.

To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Handling Synonyms

2005-02-21 Thread Luke Shannon

Does anyone see a problem with the following approach?

For synonyms, rather than putting them in the index, I put the original term
and all the synonyms in the query.

Every time I create a query, I check if the term has any synonyms. If it
does, I create Boolean Query OR'ing one Query object for each synonym.

So if I have a synoym list:

red = colour, primary, stop

And someone wants to search the desc field for the red, I would end up with
something like:

( (desc:*red*) (desc:*colout*) (desc:*stop*) ).

Now the synonyms would'nt be in the index, the Query would account for all
the possible synonym terms.


To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]