Re: Can't get case insensitive keyword analyzer to work

Jack Krupansky Tue, 12 Aug 2014 04:32:09 -0700

And unfiltered. So even if you use the keyword tokenizer that only generatesa single token, you still want token filtering, such as lower case.


-- Jack Krupansky

-----Original Message-----From: Christoph Kaser

Sent: Tuesday, August 12, 2014 3:07 AM
To: java-user@lucene.apache.org
Subject: Re: Can't get case insensitive keyword analyzer to work

Hello Milind,

if you don't set the field to be tokenized, no analyzer will be used and
the field's contents will be stored "as-is", i.e. case sensitive.
It's the analyzer's job to tokenize the input, so if you use an analyzer
that does not separate the input into several tokens (like the
KeywordAnalyzer), your input will remain "untokenized".

Regards
Christoph

Am 12.08.2014 um 03:38 schrieb Milind:

I found the problem.  But it makes no sense to me.

If I set the field type to be tokenized, it works.  But if I set it to not
be tokenized the search fails.  i.e. I have to pass in true to the method.
     theFieldType.setTokenized(storeTokenized);

I want the field to be stored as un-tokenized.  But it seems that I don't
need to do that.  The LowerCaseKeywordAnalyzer works if the field is
tokenized, but not if it's un-tokenized!

How can that be?


On Mon, Aug 11, 2014 at 1:49 PM, Milind <mili...@gmail.com> wrote:

It does look like the lowercase is working.

The following code

         Document theDoc = theIndexReader.document(0);
         System.out.println(theDoc.get("sn"));
         IndexableField theField = theDoc.getField("sn");
         TokenStream theTokenStream = theField.tokenStream(theAnalyzer);
         System.out.println(theTokenStream);

produces the following output
     SN345-B21
     LowerCaseFilter@5f70bea5 term=sn345-b21,bytes=[73 6e 33 34 35 2d 62
32 31],startOffset=0,endOffset=9

But the search does not work.  Anything obvious popping out for anyone?


On Sat, Aug 9, 2014 at 4:39 PM, Milind <mili...@gmail.com> wrote:

I looked at a couple of examples on how to get keyword analyzer to be

case insensitive but I think I missed something since it's not workingfor

me.

In the code below, I'm indexing text in upper case and searching inlower

case.  But I get back no hits.  Do I need to something more while
indexing?

     private static class LowerCaseKeywordAnalyzer extends Analyzer
     {
         @Override
         protected TokenStreamComponents createComponents(String
theFieldName, Reader theReader)
         {
             KeywordTokenizer theTokenizer = new
KeywordTokenizer(theReader);
             TokenStreamComponents theTokenStreamComponents =
                 new TokenStreamComponents(
                         theTokenizer,
                         new LowerCaseFilter(Version.LUCENE_46,
theTokenizer));
             return theTokenStreamComponents;
         }
     }

     private static void addDocment(IndexWriter theWriter,
                                       String theFieldName,
                                       String theValue,
                                       boolean storeTokenized)
         throws Exception
     {
           Document theDocument = new Document();
           FieldType theFieldType = new FieldType();
           theFieldType.setStored(true);
           theFieldType.setIndexed(true);
           theFieldType.setTokenized(storeTokenized);
           theDocument.add(new Field(theFieldName, theValue,
theFieldType));
           theWriter.addDocument(theDocument);
     }


     static void testLowerCaseKeywordAnalyzer()
         throws Exception
     {
         Version theVersion = Version.LUCENE_46;
         Directory theIndex = new RAMDirectory();

         Analyzer theAnalyzer = new LowerCaseKeywordAnalyzer();

         IndexWriterConfig theConfig = new IndexWriterConfig(theVersion,

theAnalyzer);
         IndexWriter theWriter = new IndexWriter(theIndex, theConfig);
         addDocment(theWriter, "sn", "SN345-B21", false);
         addDocment(theWriter, "sn", "SN445-B21", false);
         theWriter.close();

         QueryParser theParser = new QueryParser(theVersion, "sn",
theAnalyzer);
         Query theQuery = theParser.parse("sn:sn345-b21");
         IndexReader theIndexReader = DirectoryReader.open(theIndex);
         IndexSearcher theSearcher = new IndexSearcher(theIndexReader);
         TopScoreDocCollector theCollector =
TopScoreDocCollector.create(10, true);
         theSearcher.search(theQuery, theCollector);
         ScoreDoc[] theHits = theCollector.topDocs().scoreDocs;

System.out.println("Number of results found: " +theHits.length);

     }

--
Regards
Milind

--
Regards
Milind



--
------------------------------------------------------------------------

Weil Individualität der beste Standard ist

Dipl.-Inf. Christoph Kaser

IconParc GmbH
Sophienstraße 1
80333 München

iconparc.de

Tel: +49 - 89- 15 90 06 - 21
Fax: +49 - 89- 15 90 06 - 19

Geschäftsleitung: Dipl.-Ing. Roland Brückner, Dipl.-Inf. Sven Angerer.
HRB 121830, Amtsgericht München


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: Can't get case insensitive keyword analyzer to work

Reply via email to