There is a link to a pre-release of the MASC data that I have but am not
sure I can share. I believe they are planning to have a finalized version
out in September.

AFAIK, the MASC data is unencumbered -- Nancy Ide is very committed to
having truly open data and annotations. It would be great if the community
can give back to the OANC with further annotations, tools, and such -- some
of the annotation stuff being discussed here would could be great for this.

On Wed, Aug 8, 2012 at 7:47 PM, James Kosin <james.ko...@gmail.com> wrote:

>
> http://www.anc.org/
>
> ... but, this suggests the data they collect is only for research and
> education.
>
> On 8/8/2012 10:31 AM, Jason Baldridge wrote:
> > Sorry if I missed something along the way -- who did the annotation of
> the
> > Wikipedia data?
> >
> > BTW, the OANC will soon come out with their 3.0 release of MASC (the
> > Manually Annotated Sub-Corpus), with about 800k tokens of English text
> > (multiple domains, including twitter, blogs, transcribed spoken, and
> more)
> > labeled with several different levels of analysis, including chunks (noun
> > and verb), entities, tokens, POS tags, sentence boundaries, and logical
> > forms.
> >
> > http://www.americannationalcorpus.org/MASC/Home.html
> >
> > On Wed, Aug 8, 2012 at 2:47 AM, Jörn Kottmann <kottm...@gmail.com>
> wrote:
> >
> >> On 08/08/2012 06:16 AM, Michael Schmitz wrote:
> >>
> >>> Hi, here are some models trained on Wikipedia data.  They have similar
> >>> performance.  Is this useful?
> >>>
> >> Yes, people who do not have access to our MUC based training
> >> data can just use the wiki data instead and combine it with their data.
> >>
> >> Thanks for sharing.
> >>
> >> Now all we need is a way to get label corrections from the community :-)
> >>
> >> Jörn
> >>
> >
> >
>
>


-- 
Jason Baldridge
Associate Professor, Department of Linguistics
The University of Texas at Austin
http://www.jasonbaldridge.com
http://twitter.com/jasonbaldridge

Reply via email to