Thanks Yes, i have done this already. It works great. The power of a hybrid pos tagger is great, because you can control exceptional tagging cases, or correct output mistakes. It would be cool if the api had support for this
Radu Pe Mar 15, 2011, la 12:44 PM, Jörn Kottmann <[email protected]> a scris: > On 3/13/11 12:54 AM, Radu Simionescu wrote: >> Hello >> >> I am making paper a pos tagger for Romanian for my disertation. I want to be >> able to restrict the outcomes even more than just using a dictionary. I >> want to >> use some rules for disambiguation, based on the context. This would allow >> me to >> use smaller corpus, and also to fix consistent output mistakes. >> >> So I want to be able to give the postagger the possible set of outcomes for >> each word from the input, separately. So, since the training of a model >> doesn't >> really use the pos dictionary, I figured I could make this parser by making >> small modifications to the API, because the dictionary can change from one >> sentence/word to the other. Please let me know if I am wrong. >> > > There is no out-of-the-box support for this, but I believe it should be easy > to implement, > all you need to do is to write a custom sequence validator which does what > you described > above. > > Just have a look at the POSTaggerME class, you need to modify the constructor > to give it > a custom fetaure generator. We should open a jira issue and extend our API to > pass-in > a sequence validator object. > > Jörn
