Hi, dudes, I finished an search suggestion module a few months ago.
The framework is as below: 1. Log all your search keywords or retrieve all your segmented terms which can be searched 2. Index all the keywords and or terms with N-Gram tech 3. Search this index with a same analyzer based on user input You can find a example in lucene contribute materials: SpellCheck or you can find the source code of our project here: http://code.google.com/p/askrosa/source/browse/trunk/RosaCrawler/src/autocomplete/AutoCompleter.java This code is not really up to date, the parameters in the EdgeNGramTokenFilter should be 2 and 5 or some values not too small and not too large. I recommend you read some thing about N-Gram(don't worry, it's very easy to understand) You can do any modifying to the code and redeploy. Hope you will enjoy the search tools you are building. On Wed, Dec 2, 2009 at 6:09 AM, Otis Gospodnetic <otis_gospodne...@yahoo.com > wrote: > Hi, > > Have a look at http://www.sematext.com/products/autocomplete/index.html > > It handles Chinese and large volumes of data. > > Otis > -- > Sematext -- http://sematext.com/ -- Solr - Lucene - Nutch > > > > ----- Original Message ---- > > From: fulin tang <tangfu...@gmail.com> > > To: java-user@lucene.apache.org > > Sent: Thu, November 26, 2009 9:10:41 PM > > Subject: Re: Need help regarding implementation of autosuggest using > jquery > > > > By the way , we search Chinese words, so Trie tree looks not perfect > > for us either > > > > > > 2009/11/27 fulin tang : > > > We have the same needs in our music search, and we found this is not a > > > good approach for performance reason . > > > > > > Did any one have experience of implement the autosuggestion in a heavy > > > product environment ? > > > Any suggestions ? > > > > > > > > > 2009/11/26 Anshum : > > >> Try this, > > >> Change the code as required: > > >> --------- > > >> > > >> > > >> import java.io.IOException; > > >> > > >> import org.apache.lucene.index.CorruptIndexException; > > >> import org.apache.lucene.index.IndexReader; > > >> import org.apache.lucene.index.Term; > > >> import org.apache.lucene.index.TermEnum; > > >> > > >> /** > > >> * @author anshum > > >> * > > >> */ > > >> public class GetTermsToSuggest { > > >> > > >> private static void getTerms(String inputText) { > > >> IndexReader reader = null; > > >> try { > > >> reader = IndexReader.open("/home/anshum/index/testindex"); > > >> String field = "fieldname"; > > >> field = field.intern(); > > >> TermEnum tenum = reader.terms(new Term("fieldname", "")); > > >> Boolean hasRun = false; > > >> try { > > >> do { > > >> final Term term = tenum.term(); > > >> if (term == null || term.field() != field) > > >> break; > > >> final String termText = term.text(); > > >> if (termText.startsWith(inputText)) { > > >> System.out.println(termText); > > >> hasRun = true; > > >> } else if (hasRun == true) > > >> break; > > >> } while (tenum.next()); > > >> tenum.close(); > > >> } catch (IOException e) { > > >> e.printStackTrace(); > > >> } > > >> } catch (CorruptIndexException e2) { > > >> e2.printStackTrace(); > > >> } catch (IOException e2) { > > >> e2.printStackTrace(); > > >> } > > >> > > >> } > > >> > > >> /** > > >> * @param args > > >> */ > > >> public static void main(String[] args) { > > >> GetTermsToSuggest.getTerms(args[0]); > > >> } > > >> } > > >> > > >> > > >> -- > > >> Anshum Gupta > > >> Naukri Labs! > > >> http://ai-cafe.blogspot.com > > >> > > >> The facts expressed here belong to everybody, the opinions to me. The > > >> distinction is yours to draw............ > > >> > > >> > > >> On Thu, Nov 26, 2009 at 3:19 PM, Uwe Schindler wrote: > > >> > > >>> You can fix this if you just create the initial term not with "", > instead > > >>> with your prefix: > > >>> TermEnum tenum = reader.terms(new Term(field,prefix)); > > >>> > > >>> And inside the while loop just break out, > > >>> > > >>> if (!termText.startsWith(prefix)) break; > > >>> > > >>> ----- > > >>> Uwe Schindler > > >>> H.-H.-Meier-Allee 63, D-28213 Bremen > > >>> http://www.thetaphi.de > > >>> eMail: u...@thetaphi.de > > >>> > > >>> > > >>> > -----Original Message----- > > >>> > From: DHIVYA M [mailto:dhivyakrishna...@yahoo.com] > > >>> > Sent: Thursday, November 26, 2009 10:39 AM > > >>> > To: java-user@lucene.apache.org > > >>> > Subject: RE: Need help regarding implementation of autosuggest > using > > >>> > jquery > > >>> > > > >>> > Sir, > > >>> > > > >>> > Your suggestion was fantastic. > > >>> > > > >>> > I tried the below mentioned code but it is showing me the entire > result > > >>> of > > >>> > indexed words starting from the letter that i give as input. > > >>> > Ex: > > >>> > if i give "fo" > > >>> > am getting all the indexes from the word starting with fo upto > words > > >>> > starting with z. > > >>> > i.e. it starts displaying from the word matching the search word > and ends > > >>> > up with the last word available in the index file. > > >>> > > > >>> > Kindly suggest me a solution for this problem > > >>> > > > >>> > Thanks in advance, > > >>> > Dhivya > > >>> > > > >>> > --- On Wed, 25/11/09, Uwe Schindler wrote: > > >>> > > > >>> > > > >>> > From: Uwe Schindler > > >>> > Subject: RE: Need help regarding implementation of autosuggest > using > > >>> > jquery > > >>> > To: java-user@lucene.apache.org > > >>> > Date: Wednesday, 25 November, 2009, 9:54 AM > > >>> > > > >>> > > > >>> > Hi Dhivya, > > >>> > > > >>> > you can iterate all terms in the index using a TermEnum, that can > be > > >>> > retrieved using IndexReader.terms(Term startTerm). > > >>> > > > >>> > If you are interested in all terms from a specific field, position > the > > >>> > TermEnum on the first possible term in this field ("") and iterate > until > > >>> > the > > >>> > field name changes. As terms in the TermEnum are first ordered by > field > > >>> > name > > >>> > then by term text (in UTF-16 order), the loop would look like this: > > >>> > > > >>> > IndexReader reader = ... > > >>> > String field = .... > > >>> > Field = field.intern(); // important for the while loop > > >>> > TermEnum tenum = reader.terms(new Term(field,"")); > > >>> > try { > > >>> > do { > > >>> > final Term term = tenum.term(); > > >>> > if (term==null || term.field()!=field) break; > > >>> > final String termText = term.text(); > > >>> > // do something with the termText > > >>> > } while (tenum.next()); > > >>> > } finally { > > >>> > tenum.close(); > > >>> > } > > >>> > > > >>> > > > >>> > ----- > > >>> > Uwe Schindler > > >>> > H.-H.-Meier-Allee 63, D-28213 Bremen > > >>> > http://www.thetaphi.de > > >>> > eMail: u...@thetaphi.de > > >>> > > > >>> > > > >>> > > -----Original Message----- > > >>> > > From: DHIVYA M [mailto:dhivyakrishna...@yahoo.com] > > >>> > > Sent: Wednesday, November 25, 2009 8:06 AM > > >>> > > To: java user > > >>> > > Subject: Need help regarding implementation of autosuggest using > jquery > > >>> > > > > >>> > > Hi all, > > >>> > > > > >>> > > Am using lucene 2.3.2 as a search engine in my e-paper site. So > that i > > >>> > > want the user to search the news. I achieved that objective but > now am > > >>> > > trying to implement autosuggest so that user can pick a choice > from the > > >>> > > drop down and no need of typing in the entire sentence or so. > > >>> > > > > >>> > > I have download Jquery for this purpose and am trying to > implement it. > > >>> > > The collections of data to refer for the suggestion is given in > an > > >>> > > arraylist or jus with in a string. > > >>> > > > > >>> > > But for my application, i need to populate the suggestions with > the > > >>> > > indexed words available in the index file created during indexing > > >>> > > operation. > > >>> > > > > >>> > > Can anyone give an idea to read the contents from the index file > and > > >>> > make > > >>> > > it available as suggestions? or anyother idea to achieve this > > >>> objective? > > >>> > > > > >>> > > Thanks in advance, > > >>> > > Dhivya > > >>> > > > > >>> > > > > >>> > > The INTERNET now has a personality. YOURS! See your Yahoo! > > >>> > Homepage. > > >>> > > http://in.yahoo.com/ > > >>> > > > >>> > > > >>> > > --------------------------------------------------------------------- > > >>> > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > > >>> > For additional commands, e-mail: java-user-h...@lucene.apache.org > > >>> > > > >>> > > > >>> > > > >>> > > > >>> > The INTERNET now has a personality. YOURS! See your Yahoo! > > >>> Homepage. > > >>> > http://in.yahoo.com/ > > >>> > > >>> > > >>> --------------------------------------------------------------------- > > >>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > > >>> For additional commands, e-mail: java-user-h...@lucene.apache.org > > >>> > > >>> > > >> > > > > > > > > > > > > -- > > > 梦的开始挣扎于城市的边缘 > > > 心的远方执着在脚步的瞬间 > > > 我的宿命埋藏了寂寞的永远 > > > > > > > > > > > -- > > 梦的开始挣扎于城市的边缘 > > 心的远方执着在脚步的瞬间 > > 我的宿命埋藏了寂寞的永远 > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > > For additional commands, e-mail: java-user-h...@lucene.apache.org > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > > -- Weiwei Wang Alex Wang 王巍巍 Room 403, Mengmin Wei Building Computer Science Department Gulou Campus of Nanjing University Nanjing, P.R.China, 210093 Homepage: http://cs.nju.edu.cn/rl/weiweiwang