Hi Isabel, I had used the C# platform to work on the project. I am attaching a presentation which I used in my last thesis review. It doesn't contain any results at the moment. The Complete project is done on C# in a single application. The indexed documents are searched for the keyword. The top 1000 documents are retrieved using a modified lucene search. The Complement-Naive-Bayes-Classifier(coded up for this project) then run on the retrieved document to do post processing. For each TREC query the search takes few hundred milliseconds. The document retrieval and classifiers take another 100 ms per document(HTML parsing + Tokenising + Classification). The classifier models are loaded in memory when the application starts.
Right now the classifiers work in the post processing stage after document retrieval. If its possible to have the classifier run along with Lucene and spit out sentences and add them to a field in real-time, It would essentially enable this system to be online and allow for real-time queries. This is primarily the reason why I am interested in working for this project. Please go through the presentation. I would gladly answer any queries except results( :D which i cannot evaluate till the TREC runs take place in September 2008) Robin