Tips: 1) Don't send to 3 mail lists when 1 will do please continue this conversation on java-user only.
2) Most "suggest" tools work off an index of previous searches (not documents). Do you have a large set of searches? If not, making sensible suggestions based on document content can be much more compute intensive. My assumption here is you are having to work with doc content. 3) You don't need to go to the expense of running a query and ranking and scoring documents - look at the lower level APIs terms() and termDocs() - use them to find the matching terms 4) word suggestions ideally shouldn't be independent of each other - look at completed words in the query string and use them to inform the selection of suggestions for the incomplete term being typed. The termDocs()/termPositions() apis give you all the data you need to establish what docs/positions exist for completed terms and these can be cross-referenced with the list of docs/positions for the "alternative" terms under consideration. A high proximity between completed term occurences and a suggested term's occurences makes a strong candidate. A fast way to do proximity tests might be to compared sorted arrays of numbers where each number represents a term using a function like: termspaceNumber=[DocNumber * maxNumTermsPerDoc]+ termPositionInDoc You could then compare long[]completedTermOccurences with long[]suggestedAlternativeTermOccurences looking for matches where numbers differ by 1 or 2. A faster (rougher) comparison solution which ignored word proximity would be just to compare bitsets of doc ids looking for high levels of overlap(intersection/union). You can use TermEnum.docFreq() to quickly rule out very rare words from your calculations. Cheers, Mark Send instant messages to your online friends http://uk.messenger.yahoo.com --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]