2014-11-06, Edward Garrett sanoi:

> * the beauty of a general purpose tool is that it doesn't insist on
> specific tagsets or schemes or whatever. many linguists would object
> vociferously, and with good reason, to pos schemes imposed on them by
> committee.

I'd hardly call it a committee decision, it's rather a consensus of
scientists at the moment. Working on improving it would be a good
thing. Working around it not so much.

> quoting a very small piece from the google universal pos
> article:
> 
> "As a re- sult, when combined with the original tree- bank data, this
> universal tagset and mapping produce a dataset consisting of common
> parts- of-speech for 22 different languages."
> 
> aren't there something like 7,000 languages spoken in the world
> today? that hardly makes this tagset universal...

Yeah I'm not a claiming to be a big fan of google universal poses. I'd
be more than happy to see more publications to show them wrong with
evidence. I'm more than anything just tired of looking at and
mangling nonsense tagsets that take tons of effort to compare to and
work with anything else in the world for no obvious reasons.

> CG seems to me like
> a great tool for LINGUISTS and not just computational linguists and
> language technologists.

I dislike the distinction :-/

The problem I have is that LINGUISTICS to me is not the science of
selecting between brackets and plusses and spaces and upper or lower
cases and three or four letters when encoding your linguistic concepts
as tags. A properly designed tool for computational linguistics
could easily ensure reasonable abstraction level for concepts such as
noun or accusative that you don't need to bother LINGUISTS with
nonsense like mangling internal string representations of whether
accusative is acc or +Acc or \ ACCU or <acc>. Honestly, letting
(forcing) users to select internal string encodings for the concepts of
parsers has been enormous burden to all computational linguistics. It
can be solved by standard tagsets or better tools (history shows that
it's hardly ever solved by expecting the end users to do a good job),
and dealing with it properly makes doing linguistics easier cause things
will actually be comparable within and between languages and
computational linguistics easier as things will actually be
interoperable.

-- 
Flammie, computer scientist bachelor + linguist master = computational
linguist doctor, free software Finnish localiser,
and more! <http://www.iki.fi/flammie/>

-- 
You received this message because you are subscribed to the Google Groups 
"Constraint Grammar" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/constraint-grammar.
For more options, visit https://groups.google.com/d/optout.

Reply via email to