Maybe I'm missing something here, but why not just boost the terms in the fields at query time?
Best Erick On Fri, Apr 20, 2012 at 4:20 AM, Kasun Perera <[email protected]> wrote: > I have documents that are marked up with Taxonomy and Ontology terms > separately. > When I calculate the document similarity, I want to give higher weights to > those Taxonomy terms and Ontology terms. > > > When I index the document, I have defined the Document content, Taxonomy > and Ontology terms as Fields for each document like this in my program. > > > *Field ontologyTerm= new Field("fiboterms", fiboTermList[curDocNo], > Field.Store.YES, Field.Index.ANALYZED, Field.TermVector.YES);* > > *Field taxonomyTerm = new Field("taxoterms", taxoTermList[curDocNo], > Field.Store.YES, Field.Index.ANALYZED, Field.TermVector.YES);* > > *Field document = new Field(docNames[curDocNo], strRdElt, > Field.TermVector.YES);* > > > > I’m using Lucene index .TermFreqVector functions to calculate TFIDF values > and, then calculate cosine similarity between two documents using TFIDF > values. > > > For give weights to Ontology and Taxonomy terms when calculating the cosine > similarity, what I can do is, programmatically multiply the Taxonomy > and Ontology > term frequencies with defined weight factor before calculating the TFIDF > scores. Will this give higher weight to Taxonomy and Ontology terms in > document similarity calculation? > > > Are there Lucene functions that can be used to give higher weights to the > certain fields when calculating TFIDF values using TermFreqVector? can I > just use the setboost() function for this purpose, then how? > > -- > Regards > > Kasun Perera --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
