Hi. I'm trying to extract expressions from the terms position information, i.e., if two words appears frequently side-by-side, then we can consider that the two words are only one. For instance, 'Object' and 'Oriented' appears side-by-side 9 times out of 10. It allows us to define a new expression, 'Object_Oriented'. Does anyone knows the statistical method to detect such expressions ?
Thanks. Gilles Moyse -----Message d'origine----- De : Eric Jain [mailto:[EMAIL PROTECTED] Envoyé : mardi 21 octobre 2003 09:24 À : Lucene Users List Objet : Re: Lucene on Windows > The CVS version of Lucene has a patch that allows one to use a > 'Compound Index' instead of the traditional one. This reduces the > number of open files. For more info, see/make the Javadocs for > IndexWriter. Interesting option. Do you have a rough idea of what the performance impact of using this setting is? -- Eric Jain --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]