Hi,
Sorry for all the code, It got sent out accidentally.
The following code is part of the Benchmark utility in Lucene, specifically
SubmissionReport.java
// Here reader is the IndexReader.
Iterator itr = docMap.entrySet().iterator();
int totalNumDocuments = reader.numDocs();
ScoreDoc sd[] = td.scoreDocs;
String sep = " \t ";
DocNameExtractor docext = new DocNameExtractor(docNameField);
for (int i=0; i<sd.length; i++)
{
String docName = docext.docName(searcher,sd[i].doc);
// ***** The Map of documents will help us get the docid
int indexedDocID = docMap.get(docName);
Fields fields = reader.getTermVectors(indexedDocID);
Iterator<String> strItr=fields.iterator();
/// ********** The following while is printing the fieldNames which only
show 2 fields out of the 5 that I am looking for.
while(strItr.hasNext())
{
String fieldName = strItr.next();
System.out.println("next field " + fieldName);
}
Document DocList= reader.document(indexedDocID);
List<IndexableField> field_list = DocList.getFields();
/// ****** The following for loop prints the five fields and it's
related information.
for(int j=0; j < field_list.size(); j++)
{
System.out.println ( "list field is : " + field_list.get(j).name() );
IndexableFieldType IFT = field_list.get(j).fieldType();
System.out.println(" Field storeTermVectorOffsets : " +
IFT.storeTermVectorOffsets());
System.out.println(" Field stored :" + IFT.stored());
}
// ***************************** //
}
/**** THE OUTPUT for this section of code is
fields size : 2
next field body
next field docname
list field is : docid
Field storeTermVectorOffsets : false
list field is : docname
Field storeTermVectorOffsets : false
list field is : docdate
Field storeTermVectorOffsets : false
list field is : doctitle
Field storeTermVectorOffsets : false
list field is : body
Field storeTermVectorOffsets : false
*******/
Hope this code comes out legible in the email.
Thank you.
Regards,
Sachin Kulkarni
On Tue, Aug 19, 2014 at 8:39 AM, Sachin Kulkarni <[email protected]>
wrote:
> Hi Kumaran,
>
>
>
> The following code is part of the Benchmark utility in Lucene,
> specifically SubmissionReport.java
>
>
> Iterator itr = docMap.entrySet().iterator();
> int totalNumDocuments = reader.numDocs();
> ScoreDoc sd[] = td.scoreDocs;
> String sep = " \t ";
> DocNameExtractor docext = new DocNameExtractor(docNameField);
> for (int i=0; i<sd.length; i++)
> {
> System.out.println("i = " + i);
> String docName = docext.docName(searcher,sd[i].doc);
> System.out.println("docName : " + docName + "\t map size " +
> docMap.size());
> // ***** The Map will help us get the docid and
> int indexedDocID = docMap.get(docName);
> System.out.println("indexed doc id : " + indexedDocID + "\t docname : "
> + docName);
> // ******** GET THE tf-idf data now ************ //
> Fields fields = reader.getTermVectors(indexedDocID);
> System.out.println("fields size : " + fields.size());
> // **** Print log output for testing **** //
> Iterator<String> strItr=fields.iterator();
> while(strItr.hasNext())
> {
> String fieldName = strItr.next();
> System.out.println("next field " + fieldName);
> }
> Document DocList= reader.document(indexedDocID);
> List<IndexableField> field_list = DocList.getFields();
> for(int j=0; j < field_list.size(); j++)
> {
> System.out.println ( "list field is : " + field_list.get(j).name() );
> IndexableFieldType IFT = field_list.get(j).fieldType();
> System.out.println(" Field storeTermVectorOffsets : " +
> IFT.storeTermVectorOffsets());
> //System.out.println(" Field stored :" + IFT.stored());
> //for (FieldInfo.IndexOptions c : IFT.indexOptions().values())
> // System.out.println(c);
> }
> // *****************************88 //
>
>
> On Tue, Aug 19, 2014 at 2:04 AM, Kumaran Ramasubramanian <
> [email protected]> wrote:
>
>> Hi Sachin Kulkarni,
>>
>> If possible, Please share your code.
>>
>>
>> -
>> Kumaran R
>>
>>
>>
>>
>>
>> On Tue, Aug 19, 2014 at 9:07 AM, Sachin Kulkarni <[email protected]>
>> wrote:
>>
>> > Hi,
>> >
>> > I am using Lucene 4.6.0.
>> >
>> > I have been storing 5 fields for my documents in the index, namely body,
>> > title, docname, docdate and docid.
>> >
>> > But when I get the fields using
>> IndexReader.getTermVectors(indexedDocID) I
>> > only get
>> > the docname and body fields and can retrieve the term vectors for those
>> > fields, but not others.
>> >
>> > I check to see if all the five fields are stored using
>> > IndexedFieldType.stored()
>> > and all return true. I also check to see that all the fields are indexed
>> > and they are, but
>> > still when I try to getTermVectors I only receive two fields back.
>> >
>> > Is there any other config setting that I am missing while indexing that
>> is
>> > causing this behavior?
>> >
>> > Thanks to Kumaran and Ian for their answers to my previous questions
>> but I
>> > have not been able to figure out the above one yet.
>> >
>> > Thank you very much.
>> >
>> > Regards,
>> > Sachin
>> >
>>
>
>