2014-11-06, Edward Garrett sanoi: > * the beauty of a general purpose tool is that it doesn't insist on > specific tagsets or schemes or whatever. many linguists would object > vociferously, and with good reason, to pos schemes imposed on them by > committee.
I'd hardly call it a committee decision, it's rather a consensus of scientists at the moment. Working on improving it would be a good thing. Working around it not so much. > quoting a very small piece from the google universal pos > article: > > "As a re- sult, when combined with the original tree- bank data, this > universal tagset and mapping produce a dataset consisting of common > parts- of-speech for 22 different languages." > > aren't there something like 7,000 languages spoken in the world > today? that hardly makes this tagset universal... Yeah I'm not a claiming to be a big fan of google universal poses. I'd be more than happy to see more publications to show them wrong with evidence. I'm more than anything just tired of looking at and mangling nonsense tagsets that take tons of effort to compare to and work with anything else in the world for no obvious reasons. > CG seems to me like > a great tool for LINGUISTS and not just computational linguists and > language technologists. I dislike the distinction :-/ The problem I have is that LINGUISTICS to me is not the science of selecting between brackets and plusses and spaces and upper or lower cases and three or four letters when encoding your linguistic concepts as tags. A properly designed tool for computational linguistics could easily ensure reasonable abstraction level for concepts such as noun or accusative that you don't need to bother LINGUISTS with nonsense like mangling internal string representations of whether accusative is acc or +Acc or \ ACCU or <acc>. Honestly, letting (forcing) users to select internal string encodings for the concepts of parsers has been enormous burden to all computational linguistics. It can be solved by standard tagsets or better tools (history shows that it's hardly ever solved by expecting the end users to do a good job), and dealing with it properly makes doing linguistics easier cause things will actually be comparable within and between languages and computational linguistics easier as things will actually be interoperable. -- Flammie, computer scientist bachelor + linguist master = computational linguist doctor, free software Finnish localiser, and more! <http://www.iki.fi/flammie/> -- You received this message because you are subscribed to the Google Groups "Constraint Grammar" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/constraint-grammar. For more options, visit https://groups.google.com/d/optout.
