Hi to all,
I started to use benchmark 4.0 to create submission report files with the
following code:
BufferedReader br = new BufferedReader(fr);
QualityQuery qqs[] = qReader.readQueries(br);
QualityQueryParser qqParser = new SimpleQQParser("title", "body");
QualityBenchmark qrun = new QualityBenchmark(qqs, qqParser,
searcher, "docname") ;
SubmissionReport submitLog = new SubmissionReport(loggertest,
"test");
QualityStats stats[] = qrun.execute(null, submitLog, null);
My index is created by lucene 3.6. I use LA Times topics 401-450. With 3.6,
no problem. However, when I use benchmark 4.0 I realised that it returns the
results only for the first query 401 which is "foreign minorities, Germany".
When I debug the code, at SimpleQQParser, the boolean query generated is
"body:foreign" without other keywords. I go on debugging and it seems that
the problem is raised at QueryParserBase.newFieldQuery which returns null
for the rest of all queries and other keywords in the same query. I
updated the code for my adhoc use. Unless, I don't know how to fix it or
it also happens to someone else?!
Second problem, for the same collection MAP = 0.17 with default similarity,
MAP= 0.07 with lucene 4.0 BM25 similarity (b=0.75, k1=1.2). I got MAP = 0.14
with BM25 implemented based on http://ipl.cs.aueb.gr/stougianni/bm25_2.html.
However this collection is represented in the litterature with MAP around
0.25 with BM25 scoring function. Did someone evaluate the different
similarities and can share the results?
Best Regards,
ZP
--
View this message in context:
http://lucene.472066.n3.nabble.com/Lucene-4-0-benchmark-bug-tp4014238.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]