[jira] [Created] (LUCENE-4637) Using StringField for while storing becomes a regular Field when the document is retrieved - Due to this the next search cannot find the document because the value of the Field got tokenized which was not desired when it was added

Rahul Parekh (JIRA) Wed, 19 Dec 2012 05:35:15 -0800

Rahul Parekh created LUCENE-4637:
------------------------------------

             Summary: Using StringField for while storing becomes a regular 
Field when the document is retrieved - Due to this the next search cannot find 
the document because the value of the Field got tokenized which was not desired 
when it was added
                 Key: LUCENE-4637
                 URL: https://issues.apache.org/jira/browse/LUCENE-4637
             Project: Lucene - Core
          Issue Type: Bug
          Components: core/index
    Affects Versions: 4.0
         Environment: Windows, JDK 1.6
            Reporter: Rahul Parekh



In this case, I don't want Lucene to tokenize the value for a field that is 
being added to the document in the index and hence StringField is used. Once we 
do a search using one of the field values, we get the Document object from the 
searcher but the type of the field becomes Field instead of the StringField. So 
when I try to update the value of one of the fields in the document and then do 
another seacrh using the same term, the 2nd search fails to find this document.

In order to reproduce the case, please use the code below.


Sample class:
/////////////


package com.thegoldensource.demo;

import java.io.IOException;
import java.util.List;

import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.analysis.core.WhitespaceAnalyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.document.Field.Store;
import org.apache.lucene.document.StringField;
import org.apache.lucene.index.DirectoryReader;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.index.IndexWriterConfig;
import org.apache.lucene.index.IndexableField;
import org.apache.lucene.index.Term;
import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.search.ScoreDoc;
import org.apache.lucene.search.TermQuery;
import org.apache.lucene.search.TopDocs;
import org.apache.lucene.store.Directory;
import org.apache.lucene.store.RAMDirectory;
import org.apache.lucene.util.Version;

/**
 *
 * @author rparekh
 *
 */
public class SimpleTest {

        /**
         * @param args
         */
        public static void main(String[] args) {
                
                try
                {

                        Analyzer analyzer = new 
WhitespaceAnalyzer(Version.LUCENE_40);
        
                    // Store the index in memory:
                    Directory directory = new RAMDirectory();
                    // To store an index on disk, use this instead:
                    //Directory directory = FSDirectory.open("/tmp/testindex");
                    IndexWriterConfig config = new 
IndexWriterConfig(Version.LUCENE_30, analyzer);
                    IndexWriter iwriter = new IndexWriter(directory, config);
                    Document doc = new Document();
                    String text = "This is the text";
        
                    doc.add(new StringField("id", "a", Store.NO));
                    doc.add(new StringField("content", text, Store.YES));
                    
                    iwriter.addDocument(doc);
                    iwriter.commit();
                    
                    
                    // Now search the index:
                    DirectoryReader ireader = DirectoryReader.open(directory);
                    IndexSearcher isearcher = new IndexSearcher(ireader);
                   
                    Term myTerm = new Term("id", "a");
                    TermQuery query = new TermQuery(myTerm); 
                    TopDocs docs = isearcher.search(query, 1);
                    int hits = docs.totalHits;
                    System.out.println("Hits : " + hits);
                    ScoreDoc[] documents = docs.scoreDocs;
                    Document d = null;
                    for(int i=0; i < documents.length; i++)
                    {
                        d = isearcher.doc(documents[i].doc);
                        IndexableField contentField = d.getField("content");
                        
                        if(contentField != null)
                        {
                                System.out.println("Content from doc : [" + 
contentField.stringValue() + "]");  
                        }
                    }

                    // For updating the value of a field, remove the field and 
add it again.
                    d.removeField("content");
                    d.add(new StringField("content", "new content",Store.YES));
                List<IndexableField> fields = d.getFields();
                
                    iwriter.updateDocument(myTerm, fields);
                    
                    iwriter.commit();
                    iwriter.close();
                    
                    // Search the document again
                    DirectoryReader newReader = DirectoryReader.open(directory);
                    IndexSearcher newSeracher = new IndexSearcher(newReader);
                    
                    TermQuery newTermQuery = new TermQuery(myTerm);
                    TopDocs newTopDocs = newSeracher.search(newTermQuery, 10);
                    int hits1 = newTopDocs.totalHits;
                    
                    // Number of hits should be 1 but it is 0 (zero) - This is 
because the type of 
                    // Fields in the document that was retrieved changes from 
the original StringField to Field which is not correct
                    System.out.println("Hits again : " + hits1 );
                    
                    if(hits1 > 0)
                    {
                            ScoreDoc[] documents1 = newTopDocs.scoreDocs;
                            
                            Document d1 = newSeracher.doc(documents1[0].doc);
                            
                            IndexableField newContent = d1.getField("content");
                            if(newContent != null)
                            {
                                System.out.println("New Content : [" + 
newContent.stringValue() + "]");
                            }
                    }
                    
                    ireader.close();
                    directory.close();
                }
                catch(IOException e)
                {
                        System.err.print(e);
                }
                
        }

}

/////////////


Output:
Hits : 1
Content from doc : [This is the text]
Hits again : 0

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Created] (LUCENE-4637) Using StringField for while storing becomes a regular Field when the document is retrieved - Due to this the next search cannot find the document because the value of the Field got tokenized which was not desired when it was added

Reply via email to