Hi,

I have a few questions about abbreviation in sentence detector. I'd like to
understand how it is working and improve it if possible.

1) How is the setence detector using the abbreviation dictionary? All train
methods in SentenceDetectorME takes an abbreviation dictionary as argument,
but is only saving it to the model. It is not using the dictionary to create
the context generator, but it should, shouldn't it?

2) The command line trainer does not allow to pass an abbreviation
dictionary. Maybe it should allow to pass a file name that contains the
dictionary.

3) Maybe we should include tools to extract the abbreviation dictionary from
the train corpus. Optionally this could be executed during training too.


What do you think?

Reply via email to