Hi all, I have already sent this mail to Simon Willnauer, and he suggested me to post it here for discussion.
I am David Nemeskey, a PhD student at the Eotvos Lorand University, Budapest, Hungary. I am doing an IR-related research, and we have considered using Lucene as our search engine. We were quite satisfied with the speed and ease of use. However, we would like to experiment with different ranking algorithms, and this is where problems arise. Lucene only supports the VSM, and unfortunately the ranking architecture seems to be tailored specifically to its needs. I would be very much interested in revamping the ranking component as a GSoC project. The following modifications should be doable in the allocated time frame: - a new ranking class hierarchy, which is generic enough to allow easy implementation of new weighting schemes (at least bag-of-words ones), - addition of state-of-the-art ranking methods, such as Okapi BM25, proximity and DFR models, - configuration for ranking selection, with the old method as default. I believe all users of Lucene would profit from such a project. It would provide the scientific community with an even more useful research aid, while regular users could benefit from superior ranking results. Please let me know your opinion about this proposal. Thank you very much, David Nemeskey --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
