I have done the same as Luke but I needed lucene 1.9rc1 to accomplish it. I tried it with 1.4.3 but the queryparser could not handle it.
Andrew -----Original Message----- From: Luke Shannon <[EMAIL PROTECTED]> Sent: May 5, 2005 8:54 AM To: java-user@lucene.apache.org, Pablo Gomes Ludermir <[EMAIL PROTECTED]> Subject: Re: indexing synonyms / reducing the index size Hi Pablo; I handle synonyms in the Query rather than the Index. Whenever I build a query I check to see if there is a synonym for each word, or a replacement for the entire string the user is searching on. If there is (either or both cases) I include all the synonyms/replacement strings applicable plus the original word/string in the Query. This reduces index size (synonyms not in there), but it did result in some queries exceeding the default max clause count for the BooleanQuery. I ended up having to increase this. Luke ----- Original Message ----- From: "Pablo Gomes Ludermir" <[EMAIL PROTECTED]> To: "Lucene user list" <java-user@lucene.apache.org> Sent: Wednesday, May 04, 2005 5:38 PM Subject: indexing synonyms / reducing the index size Hello all, I know that we can expand a word to get its synonyms with Wordnet. I was wondering if we could reduce the index size by including a synonym instead of a word on the synonym list. For instance, if "screen" shows up, I would like to replace it by "monitor" (it is a stupid example, but it was the first thing that crossed my mind). Thus, instead of having both entries on the index, I would have only one. Thus, I would need to pre-process any queries, replacing the words by its synonyms as well. I was wondering if someone has done such a thing in an analyzer already and could give me a little help. My aim is to reduce the index as much as possible (I already have a stemmer and a stopword filter on the analyzer). Could anyone point other ways to reduce the number of terms of an index? The fact is that I would like to create "extra vectors" with my own weighting scheme, and it is a quite costly algorithm, so the less terms I have the better it performs. Regards, Pablo -- Pablo Gomes Ludermir [EMAIL PROTECTED] --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]