Hello List,
Lucene 3.0.1
Windows Vista Premium Home Edition
I am currently attempting to configure my IndexFiles.java file. My intention is
to add the following functionality to the code as I require input text to be
further analyzed than what the default analyzer does.
IndexWriter writer = new IndexWriter(FSDirectory.open(INDEX_DIR),
new NGramTokenFilter(
new LowerCaseFilter(
new StandardFilter(
new StandardTokenizer
(Version.LUCENE_CURRENT, null)))), true,
IndexWriter.MaxFieldLength.LIMITED);
System.out.println("Indexing to directory '" +INDEX_DIR+ "'...");
indexDocs(writer, docDir);
System.out.println("Optimizing...");
writer.optimize();
writer.close();
Date end = new Date();
System.out.println(end.getTime() - start.getTime() + " total milliseconds");
My problem lies in the IndexWriter class and the number of
analyzer's/tokenizer's I am permitted to pass as parameters and I find that
this is slightly unclear from the javadocs. Are there any existing resources to
solve this problem? or can someone help me out please.
Anything would be greatly appreciated.
Lews Mc
Glasgow Caledonian University is a registered Scottish charity, number SC021474
Winner: Times Higher Education's Widening Participation Initiative of the Year
2009 and Herald Society's Education Initiative of the Year 2009
http://www.gcu.ac.uk/newsevents/news/bycategory/theuniversity/1/name,6219,en.html
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]