Hi Uwe, Jack, Thank you very much for your answers. I will work on it.
Regards, Bianca 2014-08-07 14:04 GMT+01:00 Jack Krupansky <j...@basetechnology.com>: > Also, usually query-time analysis is done by a "query parser", so if you > aren't going through a quwery parser, you have to call the aalyzer > yourself. The stemming is very likely the culprit here. > > -- Jack Krupansky > > -----Original Message----- From: Uwe Schindler > Sent: Thursday, August 7, 2014 9:00 AM > To: java-user@lucene.apache.org > Subject: RE: EnglishAnalyzer vs WhiteSpaceAnalyzer in getting Term > Frequency > > > Hi, > > if you create the term yourself, it is not going through the analyzer: > public int getTermFrequency(String term, String id) > (you create a BytesRef out of it). So you have to also let the term go > through the analyzer. The stemming analyzers change the terms, so you won't > find them without also stemming the term before you > > StandardAnalyzer does not do stemming, so terms (mostly) stay as they are. > But also for this analyzer, you theoretically has to pass the term through > the analyzer before you can do a term frequency lookup. Just think about > that the term was not lowercased, in that case you would also not find it > in the termdictionary. If it goes through the analyzer, you would find it > because the analyzer lowercases it. > > Uwe > > ----- > Uwe Schindler > H.-H.-Meier-Allee 63, D-28213 Bremen > http://www.thetaphi.de > eMail: u...@thetaphi.de > > > -----Original Message----- >> From: Bianca Pereira [mailto:aivykar...@gmail.com] >> Sent: Thursday, August 07, 2014 2:47 PM >> To: java-user >> Subject: Re: EnglishAnalyzer vs WhiteSpaceAnalyzer in getting Term >> Frequency >> >> Hi Jack, >> >> Thank you very much. I just changed for the StandardAnalyzer and it is >> working as I would like. But there is something I still cannot understand. >> If I use the same analyzer for indexing and for searching, the same term >> should be parsed in the same way in both moments, shouldn't it? It is why >> I >> still don't understand why the EnglishAnalyzer was not working. Any idea >> on >> that? >> >> Best Regards, >> Bianca >> >> >> 2014-08-07 12:40 GMT+01:00 Jack Krupansky <j...@basetechnology.com>: >> >> > Generally, the standard analyzer will be a better choice, unless you >> > have some special need. >> > >> > A language-specific analyzer will include stemming. The English >> > analyzer includes the Porter stemmer. >> > >> > Generally, you need to apply a compatible analyzer to query terms to >> > match the index, or you need to manually filter your query terms. >> > Sounds like maybe a term got stemmed. >> > >> > -- Jack Krupansky >> > >> > -----Original Message----- From: Bianca Pereira >> > Sent: Thursday, August 7, 2014 7:28 AM >> > To: java-user@lucene.apache.org >> > Subject: EnglishAnalyzer vs WhiteSpaceAnalyzer in getting Term >> > Frequency >> > >> > >> > Hi, >> > >> > I am new in the list and I have been working on a problem for some >> > time already. I would like to know if someone has any idea of how I >> > can solve it. >> > >> > Given a term, I want to get the term frequency in a lucene document. >> > When I use the WhiteSpaceAnalyzer my code works properly but when I >> > use the EnglishAnalyzer it returns 0 as frequency for any term. >> > >> > In order to get the term appearing both as "term" or "term," in the >> > text the EnglishAnalyzer is the best one to be used (I suppose). >> > >> > Any help is more than welcome. >> > >> > Best Regards, >> > Bianca >> > >> > ---------------------------- >> > Here is my code: >> > >> > TO INDEX >> > >> > public class LuceneDescriptionIndexer implements Closeable { >> > >> > private IndexWriter descWriter; >> > >> > >> > public LuceneDescriptionIndexer(Directory luceneDirectory, Analyzer >> > analyzer) >> > >> > throws IOException { >> > >> > openIndex(luceneDirectory, analyzer); >> > >> > } >> > >> > private void openIndex(Directory directory, Analyzer analyzer) throws >> > IOException { >> > >> > IndexWriterConfig descIwc = new IndexWriterConfig(LuceneConfig. >> > INDEX_VERSION, analyzer); >> > >> > descWriter = new IndexWriter(directory, descIwc); >> > >> > } >> > >> > public void indexDocument(String id, String text) throws IOException { >> > >> > IndexableField idField = new StringField("id",id,Field.Store.YES); >> > >> > FieldType fieldType = new FieldType(); >> > >> > fieldType.setStoreTermVectors(true); >> > >> > fieldType.setStoreTermVectorPositions(true); >> > >> > fieldType.setIndexed(true); >> > >> > fieldType.setIndexOptions(IndexOptions.DOCS_AND_FREQS); >> > >> > fieldType.setStored(true); >> > >> > >> > >> > Document doc = new Document(); >> > >> > doc.add(idField); >> > >> > doc.add(new Field("description", text, fieldType)); >> > >> > >> > >> > descWriter.addDocument(doc); >> > >> > } >> > >> > @Override >> > >> > public void close() throws IOException { >> > >> > descWriter.commit(); >> > >> > descWriter.close(); >> > >> > } >> > >> > } >> > >> > >> > TO QUERY >> > >> > public class LuceneTermStatistics implements TermKBStatistics { >> > >> > >> > private IndexReader luceneIndexReader; >> > >> > private Analyzer analyzer; >> > >> > private IndexSearcher searcher; >> > >> > >> > public LuceneTermStatistics(IndexReader reader, Analyzer analyzer) { >> > >> > this.luceneIndexReader = reader; >> > >> > this.analyzer = analyzer; >> > >> > this.searcher = new IndexSearcher(reader); >> > >> > } >> > >> > /** >> > >> > * Create an instance of LuceneTermStatistics from the Config options. >> > >> > */ >> > >> > public static LuceneTermStatistics configureInstance(String indexPath, >> > Analyzer analyzer) >> > >> > throws IOException { >> > >> > FSDirectory index = FSDirectory.open(new File(indexPath)); >> > >> > DirectoryReader indexReader = DirectoryReader.open(index); >> > >> > return new LuceneTermStatistics(indexReader, analyzer); >> > >> > } >> > >> > @Override >> > >> > public int getTermFrequency(String term, String id) >> > >> > throws Exception { >> > >> > int docId = getDocId(id); >> > >> > // Get the vector with the frequency for the term in all documents >> > >> > DocsEnum de = MultiFields.getTermDocsEnum( >> > >> > luceneIndexReader, MultiFields.getLiveDocs(luceneIndexReader), >> > "description", >> > >> > new BytesRef(term)); >> > >> > // Get the frequency for the document of interest >> > >> > if (de != null) { >> > >> > int docNo; >> > >> > while((docNo = de.nextDoc()) != DocsEnum.NO_MORE_DOCS) { >> > >> > if(docNo == docId) >> > >> > return de.freq(); >> > >> > } >> > >> > } >> > >> > return 0; >> > >> > } >> > >> > >> > private int getDocId(String id) throws IOException { >> > >> > BooleanQuery idQuery = new BooleanQuery(); >> > >> > idQuery.add(new TermQuery(new Term("id", id)), Occur.MUST); >> > >> > >> > TopScoreDocCollector collector = TopScoreDocCollector.create(1, >> > false); >> > >> > searcher.search(idQuery, collector); >> > >> > TopDocs topDocs = collector.topDocs(); >> > >> > if (topDocs.totalHits == 0) >> > >> > return -1; >> > >> > return topDocs.scoreDocs[0].doc; >> > >> > } >> > >> > } >> > >> > --------------------------------------------------------------------- >> > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >> > For additional commands, e-mail: java-user-h...@lucene.apache.org >> > >> > >> > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > >