The mock synonym code in LIA will work just fine with multiple words,
but you need to pass them to the synonym engine as a single string
containing multiple words, which means treating some phrases in the
input text as single tokens, and that's likely where you'll have to put
in the work.

I think you will need to create a custom tokenizer to deal with phrases
you'd like to keep together. I have done something similar (e.g. United
Kingdom has Britain, England and UK as synonyms), but in my case I'm
indexing only one or two word "documents" not huge blocks of text, so it
was very simple to implement a tokenizer.

Hope this is some help.

Colin


-----Original Message-----
From: Dragon Fly [mailto:[EMAIL PROTECTED] 
Sent: 21 April, 2006 13:49
To: java-user@lucene.apache.org
Subject: Synonyms ...

Hi,

What is the best way to implement the following?

Document 1 contains the following text:
  "THE CZECH REPUBLIC ORGANIZATION"

Document 2 contains the following text:
  "THE CZE ORGANISATION"

Synonym rules:
  (1) CZECH REPUBLIC --> CZE
  (2) CZE --> CZECH REPUBLIC
  (3) ORGANIZATION --> ORG, ORGANISATION

All of the following phrase searches must match BOTH documents:
  "CZECH REPUBLIC ORGANIZATION"
  "CZECH REPUBLIC ORGANISATION"
  "CZECH REPUBLIC ORG"
  "CZE ORGANIZATION"
  "CZE ORGANISATION"
  "CZE ORG"

I don't think the SynonymAnalyzer described in LIA would work because
some of my "synonyms" contain multiple words.  Thank you.

_________________________________________________________________
Don't just search. Find. Check out the new MSN Search! 
http://search.msn.click-url.com/go/onm00200636ave/direct/01/


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to