wikipedia.alg in benchmark is only able to extract and index current pages dumps. It does not take revisions into account. Do you know any way to do this? Or should I change EnwikiContentSource to handle the versions?
Although, Wikipedia dumps are widely used especially for research purposes, as far as I know, there is no topics/qrels for them (except the one http://www.mpi-inf.mpg.de/~kberberi/ecir2010/ here for revision history dump 2001 - 2005 which is annotated based on temporal expressions). The question is that do you know any other? By the way, I think in wikipedia.alg query.maker=org.apache.lucene.benchmark.byTask.feeds.*ReutersQueryMaker* should be remplaced by *EnwikiQueryMaker*. Thanks in advance, Best regards -- ZP -- View this message in context: http://lucene.472066.n3.nabble.com/Wikipedia-revision-history-dump-lucene-benchmark-tp3900346p3900346.html Sent from the Lucene - Java Users mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org