Ivan Dimitrov Vasilev created LUCENE-4490: ---------------------------------------------
Summary: TermPositions misses some terms in some cases Key: LUCENE-4490 URL: https://issues.apache.org/jira/browse/LUCENE-4490 Project: Lucene - Core Issue Type: Bug Components: core/search Affects Versions: 3.6.1, 3.4 Reporter: Ivan Dimitrov Vasilev I have the following code: public static void main(String[] args) throws Exception { RAMDirectory dir = new RAMDirectory(); IndexWriterConfig iwc = new IndexWriterConfig(Version.LUCENE_34, new StandardAnalyzer(Version.LUCENE_34)); org.apache.lucene.index.IndexWriter iw = new org.apache.lucene.index.IndexWriter(dir, iwc); Document doc = new Document(); doc.add(new Field("name", "a", Field.Store.YES, Field.Index.ANALYZED_NO_NORMS)); iw.addDocument(doc); iw.close(); IndexReader ir = IndexReader.open(dir); Term t = new Term("name", "a"); TermPositions tp = ir.termPositions(); tp.seek(t); boolean flag = false; while (tp.next()) { System.out.println(tp.doc()); flag = true; } if (!flag) { System.out.println("Missing term"); } System.out.println(ir.document(0)); tp.close(); ir.close(); } The output is: Missing term Document<stored,indexed,tokenized,omitNorms<name:a>> So the document contains term <name:a> but the TermPositions can not find it. When replacing the line: doc.add(new Field("name", "a", Field.Store.YES, Field.Index.ANALYZED_NO_NORMS)); with the line: doc.add(new Field("name", "b", Field.Store.YES, Field.Index.ANALYZED_NO_NORMS)); and line: Term t = new Term("name", "a"); with the line: Term t = new Term("name", "b"); Everything is OK. The output is: 0 Document<stored,indexed,tokenized,omitNorms<name:b>>. I did some debugging on it and found that when executing tp.seek(t); when I reached the line 68 of constructor of SegmentTermEnum: size = input.readLong(); // read the size In the case of term <name:b> - the size was assigned 1, while in the case term <name:a> it was assigned 0. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org