http://www.anc.org/ ... but, this suggests the data they collect is only for research and education.
On 8/8/2012 10:31 AM, Jason Baldridge wrote: > Sorry if I missed something along the way -- who did the annotation of the > Wikipedia data? > > BTW, the OANC will soon come out with their 3.0 release of MASC (the > Manually Annotated Sub-Corpus), with about 800k tokens of English text > (multiple domains, including twitter, blogs, transcribed spoken, and more) > labeled with several different levels of analysis, including chunks (noun > and verb), entities, tokens, POS tags, sentence boundaries, and logical > forms. > > http://www.americannationalcorpus.org/MASC/Home.html > > On Wed, Aug 8, 2012 at 2:47 AM, Jörn Kottmann <[email protected]> wrote: > >> On 08/08/2012 06:16 AM, Michael Schmitz wrote: >> >>> Hi, here are some models trained on Wikipedia data. They have similar >>> performance. Is this useful? >>> >> Yes, people who do not have access to our MUC based training >> data can just use the wiki data instead and combine it with their data. >> >> Thanks for sharing. >> >> Now all we need is a way to get label corrections from the community :-) >> >> Jörn >> > >
