This is very promising. I would like to know more about this. Could it be added for Dutch, and is it controllable from the xml? If so, from rules, disambiguation or other? These tags could help a lot sjmplifying rules, but how are the chunktags added?
Ruud > Hi, > > I have added a Chunker interface that every language can implement. It > works like a tagger, but it's not supposed to assign part-of-speech tags > to single words, but chunk (phrase) tags. Typical chunks are noun chunks > and verb chunks. Typical noun chunks look like this in English: > > a boy > the young boy > the clever young boy > the wonder boy > > Why is this relevant? Because we currently have false alarms for > sentences like "There are over 500 college and university chapters." LT > only looks at "500 college" and the rule that matches a number followed > by a singular noun will be triggered. Instead, the rule needs to match a > number followed by a singular noun chunk. With a properly working > chunker that's possible. > > What exists in git is just the interface, but I have an English chunker > in a local branch that allows matching noun chunks like this: > > <token chunk="B-NP-singular"/> > <token chunk="I-NP-singular" max="-1"/> > > NP means noun phrase, B means beginning, and I means inside. So > B-NP-singular could match 'a' or 'the', while I-NP-singular with > max="-1" could match 'young boy'. In other words, although chunks are > larger entities that span several words, the chunk tags are assigned to > each of the words inside a chunk. > > I'll keep you updated about my progress with making the English chunker > work properly. Let me know if you have any questions/comments. > > Regards > Daniel > > -- > http://www.danielnaber.de > > ------------------------------------------------------------------------------ > Introducing Performance Central, a new site from SourceForge and > AppDynamics. Performance Central is your source for news, insights, > analysis and resources for efficient Application Performance Management. > Visit us today! > http://pubads.g.doubleclick.net/gampad/clk?id=48897511&iu=/4140/ostg.clktrk > _______________________________________________ > Languagetool-devel mailing list > Languagetool-devel@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/languagetool-devel > ------------------------------------------------------------------------------ Introducing Performance Central, a new site from SourceForge and AppDynamics. Performance Central is your source for news, insights, analysis and resources for efficient Application Performance Management. Visit us today! http://pubads.g.doubleclick.net/gampad/clk?id=48897511&iu=/4140/ostg.clktrk _______________________________________________ Languagetool-devel mailing list Languagetool-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/languagetool-devel