When I run your code, as is except for using RAMDirectory and setting
up an IndexWriter using StandardAnalyzer
RAMDirectory dir = new RAMDirectory();
Analyzer anl = new StandardAnalyzer(Version.LUCENE_40);
IndexWriterConfig iwcfg = new IndexWriterConfig(Version.LUCENE_40, anl);
IndexWriter iw = new IndexWriter(dir, iwcfg);
...
iw.addDocument(doc);
iw.close();
it prints
doc 0 had 1 terms.
If change text to .e.g. "this is foobar gibberish" it says there are 2
terms. So it looks OK to me. "this" and "is" are presumably in the
default list of stop words.
Not relevant, but why are you using SlowCompositeReaderWrapper rather than just
IndexReader rdr = DirectoryReader.open(dir)? I get the same results either way,
--
Ian.
On Thu, Jan 17, 2013 at 5:52 AM, Jon Stewart
<[email protected]> wrote:
> Hello,
>
> I cannot extract document term vectors from an index, and have not
> turned up much in some determined googling. In short, when I call
> IndexReader.getTermVector(docID, field) or
> IndexReader.getTermVectors(docID) and then navigate down to the Terms
> for the specified field, I get a null result.
>
> // Indexing:
> String bodyText = "this is foobar";
> final FieldType BodyOptions = new FieldType();
> BodyOptions.setIndexed(true);
>
> BodyOptions.setIndexOptions(IndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS);
> BodyOptions.setStored(true);
> BodyOptions.setStoreTermVectors(true);
> BodyOptions.setTokenized(true);
> Document doc = new Document();
> doc.add(new Field("body", bodyText, BodyOptions));
>
> When I examine docs in Luke, I can see the term vectors.
>
> // Retrieving (at a later time)
> DirectoryReader dirRdr = DirectoryReader.open(FSDirectory.open(new
> File(path)));
> SlowCompositeReaderWrapper rdr = new SlowCompositeReaderWrapper(dirRdr);
> for (int i = 0; i < rdr.maxDoc(); ++i) {
> int numTerms = 0;
> Terms terms = rdr.getTermVector(i, "body");
> if (terms != null) {
> TermsEnum term = terms.iterator(null);
> while (term.next() != null) {
> ++numTerms;
> }
> System.out.println("doc " + i + " had " + numTerms + " terms");
> }
> else {
> System.err.println("null term vector on doc " + i);
> }
> }
>
> On every doc, the Terms object I get back from getTermVector(i, "body") is
> null.
>
>
> Jon
> --
> Jon Stewart, Principal
> (646) 719-0317 | [email protected] | Arlington, VA
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]