On 1/19/12 2:05 PM, Riccardo Tasso wrote:
I'm working on NameFinder too. How can I determine the right
parameters (iterations, cutoff and feature generation) for my use
case? Is there any guideline?
No we don't have any guides yet (any contributions are welcome).
When I am doing training I always take our defaults as a base line and
then modify the parameters
to see how it changes the performance. When you are working with a
training set which grows over
time I suggest to once in a while start again from the default and
verify if the modifications are still
giving an improvement.
A few hints:
- Using more iterations on the maxent model helps especially when your
data set is small,
e.g. try 300 to 500 instead of 100.
- Depending on domain and language feature generation should be adapted,
try to use
our xml feature generation (for this use trunk version, there was a
severe bug in 1.5.2).
- Try the perceptron, usually has a higher recall, train it with a
cutoff of 0.
- Use our build-in evaluation to test how a model performs, it can
output performance numbers
and print out misclassified samples.
- Look carefully at misclassified samples maybe there is are patterns
which do not really work
with your model.
- Add training data which contains cases which should work but do not.
Hope this helps,
Jörn