This sounds like something I could use :) I'd say keep it out of the index for various reasons that a few people already mentioned, and Thesaurus is an easier to understand word to non-tech, non-IR people, I think.
Otis --- Peter Carlson <[EMAIL PROTECTED]> wrote: > Hi Eric, > > Thanks for the feedback. My intention was to abstract the source, but > one of > my questions was, does Lucene set a configuration file which will use > this > "Thesaurus" query, or will that have to be setup manually by the > developer. > > Currently, Lucene does not provide a configuration file. > > As far as if the information is in the index directory. I was > thinking this > might be a nice place for this information to exist, then it doesn't > add any > other overhead to the system (i.e. No configuration file) and might > be > easier to support multiple sources since the index has already been > abstracted. If you wanted to share the "Thesaurus" across many > different > indices you could "copy" or "merge" that index component into the > data > source. This could even be part of the build process for a file > system. > > --Peter > > On 5/15/02 6:45 AM, "Eric D. Friedman" <[EMAIL PROTECTED]> > wrote: > > > Whichever storage mechanism you choose, you should be sure to > abstract its > > interface so that people can make other choices. With that out of > the way, > > it doesn't matter too much whether you pick a properties file or an > XML > > file. > > > > That said, I wouldn't expect to find this data stored in the index > > directory, since it's not part of the index and since users may > want to > > share the data across several indices. I would also lean toward > the > > XML file (for a file solution, that is -- an RDBMS should be > supported > > too), since that lends itself more naturally to describing > one-to-many > > relations than a properties file does. > > > > Personal opinion: "Thesaurus" is a more descriptive term than > > "TermExpansion." To me, term expansion suggests some kind of text > > globbing, whereas a thesaurus is a reference (a "lookup table") > that > > provides *semantic* expansions of the kind you describe. Oracle's > > intermedia indexing engine has thesaurus features similar to what > you > > describe and calls them by that name. > > > -- > To unsubscribe, e-mail: > <mailto:[EMAIL PROTECTED]> > For additional commands, e-mail: > <mailto:[EMAIL PROTECTED]> > __________________________________________________ Do You Yahoo!? LAUNCH - Your Yahoo! Music Experience http://launch.yahoo.com -- To unsubscribe, e-mail: <mailto:[EMAIL PROTECTED]> For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>
