Hi Isabel,
             I had used the C# platform to work on the project. I am
attaching a presentation which I used in my last thesis review. It doesn't
contain any results at the moment. The Complete project is done on C# in a
single application. The indexed documents are searched for the keyword. The
top 1000 documents are retrieved using a modified lucene search. The
Complement-Naive-Bayes-Classifier(coded up for this project) then run on the
retrieved document to do post processing. For each TREC query the search
takes few hundred milliseconds. The document retrieval and classifiers take
another 100 ms per document(HTML parsing + Tokenising + Classification). The
classifier models are loaded in memory when the application starts.

Right now the classifiers work in the post processing stage after document
retrieval. If its possible to have the classifier run along with Lucene and
spit out sentences and add them to a field in real-time, It would
essentially enable this system to be online and allow for real-time queries.
This is primarily the reason why I am interested in working for this
project. Please go through the presentation. I would gladly answer any
queries except results( :D which i cannot evaluate till the TREC runs take
place in September 2008)

Robin

Reply via email to