You should give sw rather than analyzer in the IndexWriter actor.

Steve
www.lucidworks.com
 On Jun 11, 2014 2:24 AM, "Manjula Wijewickrema" <manjul...@gmail.com>
wrote:

> Hi,
>
> In my programme, I can index and search a document based on unigrams. I
> modified the code as follows to obtain the results based on bigrams.
> However, it did not give me the desired output.
>
> *****************
>
> *public* *static* *void* createIndex() *throws* CorruptIndexException,
> LockObtainFailedException,
>
>
>
> IOException {
>
>
>
>
>
>             *final* String[] NEW_STOP_WORDS = {"a", "able", "about",
> "actually", "after", "allow", "almost", "already", "also", "although",
> "always", "am",   "an", "and", "any", "anybody"};  //only a portion
>
>
>
>             SnowballAnalyzer analyzer = *new* SnowballAnalyzer("English",
> NEW_STOP_WORDS );
>
>             Directory directory =
> FSDirectory.getDirectory(*INDEX_DIRECTORY*
> );
>
>
>
>             ShingleAnalyzerWrapper sw=*new*
> ShingleAnalyzerWrapper(analyzer,2);
>
>             sw.setOutputUnigrams(*false*);
>
>
>
>             IndexWriter w= *new* IndexWriter(*INDEX_DIRECTORY*, analyzer,
> *true*,IndexWriter.MaxFieldLength.*UNLIMITED*);
>
>             File dir = *new* File(*FILES_TO_INDEX_DIRECTORY*);
>
>             File[] files = dir.listFiles();
>
>
>
>
>
>             *for* (File file : files) {
>
>
>
>                   Document doc = *new* Document();
>
>                   String text="";
>
>                   doc.add(*new* Field("contents",text,Field.Store.*YES*,
> Field.Index.UN_TOKENIZED,Field.TermVector.*YES*));
>
>
>
>
>
>                   Reader reader = *new* FileReader(file);
>
>                   doc.add(*new* Field(*FIELD_CONTENTS*, reader));
>
>                   w.addDocument(doc);
>
>             }
>
>             w.optimize();
>
>             w.close();
>
>
>
>       }
>
>
> ****************
>
> Still the output is;
>
>
> {contents: /1, assist/1, fine/1, librari/1, librarian/1, main/1, manjula/3,
> name/1, sabaragamuwa/1, univers/1}
>
> *******************
>
>
> If anybody can, please help me to obtain the correct output.
>
>
> Thanks,
>
>
> Manjula.
>

Reply via email to