do you mean get tf of the hited documents when doing search?
it's a difficult problem because only TermScorer has TermDocs and using tf
in score() function.
but this we can't know whether these doc is selected because we use a
priorityQueue in TopScoreDocCollector
public void collect(int doc) throws IOException {
float score = scorer.score();
// This collector cannot handle these scores:
assert score != Float.NEGATIVE_INFINITY;
assert !Float.isNaN(score);
totalHits++;
if (score <= pqTop.score) {
// Since docs are returned in-order (i.e., increasing doc Id), a
document
// with equal score to pqTop.score cannot compete since HitQueue
favors
// documents with lower doc Ids. Therefore reject those docs too.
return;
}
pqTop.doc = doc + docBase;
pqTop.score = score;
pqTop = (ScoreDoc) pq.updateTop();
}
Because the scorer in BooleanScorer2 is
ConjuctionScorer(DisjuctionScorer). it can not get tf.
If you want to do this. you have to modify many codes
2010/12/25 Ranjit Kumar <[email protected]>
> Hi,
>
> Merry Christmas!!
>
>
>
> In case of Boolean query like *’sql AND server’ *.
>
> I am using parser to get correct document containing both *sql* and *
> server*. Inside for loop in below code I get correct documented and to get
> frequency I need to sum frequency of ‘sql’ and ‘server’ individually with
> the help of *termDocs.read(). *
>
> As I am searching through millions of document. So, to calculate frequency
> it takes about *160 Second* for *80k* document.
>
>
>
> Is there any way to get frequency of ‘*Boolean query’ *directly without
> manipulation. As it takes lots of time. In case of single term and phrase
> query, I got frequency for 80k document within 10 Seconds.
>
>
>
> *QueryParser parser = new QueryParser(Version.LUCENE_CURRENT, field,
> analyzer);*
>
> *parser.setDefaultOperator(QueryParser.AND_OPERATOR);*
>
> *Query query = parser.parse(“sql AND server”);*
>
> *TopDocs docs = searcher.search(query, null, n);*
>
> *TermDocs termDocs = reader.termDocs();*
>
> * termDocs.seek(new Term(field,
> query.toString(field).split(" ")[0].toLowerCase()));*
>
> * int[] ints = new int[docs.totalHits];*
>
> * int[] ints1 = new int[docs.totalHits];*
>
> * termDocs.read(ints, ints1);*
>
> * List lsts = Arrays.asList(ArrayUtils.toObject(ints));
> *
>
> * List lsts1 =
> Arrays.asList(ArrayUtils.toObject(ints1));*
>
> * *
>
> * termDocs.seek(new Term(field,
> query.toString(field).split(" ")[1].toLowerCase()));*
>
> * int[] inta = new int[docs.totalHits];*
>
> * int[] inta1 = new int[docs.totalHits];*
>
> * termDocs.read(inta, inta1);*
>
> * List lsta = Arrays.asList(ArrayUtils.toObject(inta));
> *
>
> * List lsta1 =
> Arrays.asList(ArrayUtils.toObject(inta1));*
>
> * *
>
> * int totalFreq = 0;*
>
> * int docId = -1;*
>
> * String a = null;*
>
> * int contId = 0;*
>
> * String path=null;*
>
> *for (int i = 0; i < docs2.scoreDocs.length; i++) { *
>
> * docId = docs2.scoreDocs[i].doc;*
>
> * path=reader.document(docId).get("path");*
>
> * a = path.substring(path.lastIndexOf("\\") + 1,
> path.lastIndexOf("."));*
>
> * try {*
>
> * if(a.indexOf("e")>-1){*
>
> * contId =
> Integer.parseInt(a.substring(0, a.length()-1));*
>
> * }else{*
>
> * contId = Integer.parseInt(a);*
>
> * }*
>
> * } catch (Exception e) {*
>
> *// e.printStackTrace();*
>
> * }*
>
> * *
>
> * if ((lsts.indexOf(docId) > -1 &&
> lsta.indexOf(docId) > -1) || (lsta.indexOf(docId) > -1) &&
> lsts.indexOf(docId) > -1) {*
>
> * totalFreq = (Integer)
> lsts1.get(lsts.indexOf(docId)) + (Integer) lsta1.get(lsta.indexOf(docId));
> *
>
> * } else if (lsts.indexOf(docId) > -1) {*
>
> * totalFreq = (Integer)
> lsts1.get(lsts.indexOf(docId));*
>
> * } else if (lsta.indexOf(docId) > -1) {*
>
> * totalFreq = (Integer)
> lsta1.get(lsta.indexOf(docId));*
>
> * }*
>
> * *
>
> *
> w.write(contId+"\t"+ID+"\t"+totalFreq+"\t"+reader.document(docId).get("path")+"\n");
> *
>
> * }*
>
>
>
>
>
> Thanks & Regards,
>
> *Ranjit Kumar ***
>
> Associate Software Engineer
>
>
>
> [image: cid:[email protected]]
>
>
>
> *US:* +1 408.540.0001
>
> *UK:* +44 208.099.1660
>
> *India:* +91 124.474.8100 | +91 124.410.1350
>
> *FAX:* +1 408.516.9050
>
> http://www.otssolutions.com
>
>
> ===================================================================================================
> Private, Confidential and Privileged. This e-mail and any files and
> attachments transmitted with it are confidential and/or privileged. They are
> intended solely for the use of the intended recipient. The content of this
> e-mail and any file or attachment transmitted with it may have been changed
> or altered without the consent of the author. If you are not the intended
> recipient, please note that any review, dissemination, disclosure,
> alteration, printing, circulation or Transmission of this e-mail and/or any
> file or attachment transmitted with it, is prohibited and may be unlawful.
> If you have received this e-mail or any file or attachment transmitted with
> it in error please notify OTS Solutions at
> [email protected]===================================================================================================
>
>