One thing that I think might be nice moving forward is to develop a robust set of models and test sets that involve at least two languages. I'm thinking Portuguese would be a good one in addition to English since:
- several of us speak it (I'm a non-native speaker who lived in Brazil for a couple of years -- who else?) - there are truly free annotated resources for it: http://www.linguateca.pt/ - it's pretty darn widely spoken in the world, both as first and second language Doing something like this would help push the annotation effort forward as well. E.g. we commit to providing support for a language means we need to get at least some annotations going for each level of analysis we want to support, and that will in turn spur development on the tool that Jorn has been putting together. Jason -- Jason Baldridge Associate Professor, Department of Linguistics The University of Texas at Austin http://www.jasonbaldridge.com http://twitter.com/jasonbaldridge
