You should give sw rather than analyzer in the IndexWriter actor. Steve www.lucidworks.com On Jun 11, 2014 2:24 AM, "Manjula Wijewickrema" <manjul...@gmail.com> wrote:
> Hi, > > In my programme, I can index and search a document based on unigrams. I > modified the code as follows to obtain the results based on bigrams. > However, it did not give me the desired output. > > ***************** > > *public* *static* *void* createIndex() *throws* CorruptIndexException, > LockObtainFailedException, > > > > IOException { > > > > > > *final* String[] NEW_STOP_WORDS = {"a", "able", "about", > "actually", "after", "allow", "almost", "already", "also", "although", > "always", "am", "an", "and", "any", "anybody"}; //only a portion > > > > SnowballAnalyzer analyzer = *new* SnowballAnalyzer("English", > NEW_STOP_WORDS ); > > Directory directory = > FSDirectory.getDirectory(*INDEX_DIRECTORY* > ); > > > > ShingleAnalyzerWrapper sw=*new* > ShingleAnalyzerWrapper(analyzer,2); > > sw.setOutputUnigrams(*false*); > > > > IndexWriter w= *new* IndexWriter(*INDEX_DIRECTORY*, analyzer, > *true*,IndexWriter.MaxFieldLength.*UNLIMITED*); > > File dir = *new* File(*FILES_TO_INDEX_DIRECTORY*); > > File[] files = dir.listFiles(); > > > > > > *for* (File file : files) { > > > > Document doc = *new* Document(); > > String text=""; > > doc.add(*new* Field("contents",text,Field.Store.*YES*, > Field.Index.UN_TOKENIZED,Field.TermVector.*YES*)); > > > > > > Reader reader = *new* FileReader(file); > > doc.add(*new* Field(*FIELD_CONTENTS*, reader)); > > w.addDocument(doc); > > } > > w.optimize(); > > w.close(); > > > > } > > > **************** > > Still the output is; > > > {contents: /1, assist/1, fine/1, librari/1, librarian/1, main/1, manjula/3, > name/1, sabaragamuwa/1, univers/1} > > ******************* > > > If anybody can, please help me to obtain the correct output. > > > Thanks, > > > Manjula. >