Hi, Am I right in saying that, I will also need to create and train my own HTML sentence detector in order to parse the HTML into chunks that can be tokenised?
Cheers Paul Cowan Cutting-Edge Solutions (Scotland) http://thesoftwaresimpleton.blogspot.com/ On 17 December 2010 15:10, Jörn Kottmann <[email protected]> wrote: > On 12/17/10 2:19 PM, James Kosin wrote: > >> I have the following questions that I would appreciate an answer for: >> > >> > 1. Can I have the different name finding tags in the same data? >> > > Yes, but that means you train a model which can detect each of these > names. You should test both, multiple name types in one model, > and separate models for each name type. You can use the built > in evaluation to validate your results. > > > 2. Does the<START:address> <END> make sense over multiple lines or >> should I >> > break this up further? >> > No not possible, names spanning multiple sentences (a line is a sentence), > is not supported. > > > > 3. I want to use 200 or 300 different examples, do I need to create >> separate >> > files for each example or can I merge them all into 1 and if it is only >> 1, >> > do I need to mark up the start and end of a file? >> > If you want to use the command line training tool they must be all in one > file, if you use the API > its up to you to merge these different sources into one name sample stream. > > Jörn >
