Hi team,
I have built a small lucene index from the wikipedia dump that helps
with calculating a feature for the coreference module. It gives a big
improvement in performance and it is likely that there are more features
that can be incorporated from this resource.
My question is about how to go about including this resource. The
Copyrights page says the text is available under the Creative Commons
Attribution-ShareAlike 3.0 License which is very permissive. But I'm
wondering if anyone has any experience with this. Specifically, the
resource is a lucene index of 5000 wikipedia articles, where each
indexed document is a wiki entry with the title and slightly modified
full text (wiki syntax stripped and foreign characters removed). Any
knowledge on this subject would be appreciated.
Thanks,
--
Tim Miller, PhD
Postdoctoral Research Fellow
Children's Hospital Informatics Program
Boston Children's Hospital and Harvard Medical School
617-919-1223