Hi team,
I have built a small lucene index from the wikipedia dump that helps with calculating a feature for the coreference module. It gives a big improvement in performance and it is likely that there are more features that can be incorporated from this resource.

My question is about how to go about including this resource. The Copyrights page says the text is available under the Creative Commons Attribution-ShareAlike 3.0 License which is very permissive. But I'm wondering if anyone has any experience with this. Specifically, the resource is a lucene index of 5000 wikipedia articles, where each indexed document is a wiki entry with the title and slightly modified full text (wiki syntax stripped and foreign characters removed). Any knowledge on this subject would be appreciated.

Thanks,

--
Tim Miller, PhD
Postdoctoral Research Fellow
Children's Hospital Informatics Program
Boston Children's Hospital and Harvard Medical School
617-919-1223

Reply via email to