On 1/19/12 2:05 PM, Riccardo Tasso wrote:
I'm working on NameFinder too. How can I determine the right parameters (iterations, cutoff and feature generation) for my use case? Is there any guideline?

No we don't have any guides yet (any contributions are welcome).

When I am doing training I always take our defaults as a base line and then modify the parameters to see how it changes the performance. When you are working with a training set which grows over time I suggest to once in a while start again from the default and verify if the modifications are still
giving an improvement.

A few hints:
- Using more iterations on the maxent model helps especially when your data set is small,
   e.g. try 300 to 500 instead of 100.

- Depending on domain and language feature generation should be adapted, try to use our xml feature generation (for this use trunk version, there was a severe bug in 1.5.2).

- Try the perceptron, usually has a higher recall, train it with a cutoff of 0.

- Use our build-in evaluation to test how a model performs, it can output performance numbers
   and print out misclassified samples.

- Look carefully at misclassified samples maybe there is are patterns which do not really work
   with your model.

- Add training data which contains cases which should work but do not.

Hope this helps,
Jörn

Reply via email to