2011/6/20 Amal Elmah <[email protected]>: > > thanks for replying > > What I need to do is to make a new model that can extracts the names of > recipes in specific website for cooking > could you please correct me if I made any wrong : > > - first, I made a training file (training.txt) in this file I chose a lot of > sentences that contain recipe name. I put each sentence in one line for > example > <START>Shortbread <END> is an easy buttery biscuits as homemade Christmas > presents . > ... etc > > - then I use the command line training tool to generate the new model > - After that I will use this model in my application to deal with any new > page from this cooking website. > - the features will be extracted automatically by Opennlp so I do not need to > specify that just I nedd to provide as many training data as I can (this is > what I understood) > > Are all my steps right?
Yes but I am not sure that the name finder will be able to find good models for this problem. > Do I need to do anything to make the results more accurate? Probably more annotated data :) You could also build your own feature extractor with a list of well know recipes names coming from a thesaurus (a.k.a. a gazetteer) but this would require a bit of programming with the OpenNLP API (AFAIK there is no such Gazetteer feature extractor implemented in the source tree so far). -- Olivier http://twitter.com/ogrisel - http://github.com/ogrisel
