Hi,

Am I right in saying that, I will also need to create and train my own HTML
sentence detector in order to parse the HTML into chunks that can be
tokenised?

Cheers

Paul Cowan

Cutting-Edge Solutions (Scotland)

http://thesoftwaresimpleton.blogspot.com/



On 17 December 2010 15:10, Jörn Kottmann <[email protected]> wrote:

> On 12/17/10 2:19 PM, James Kosin wrote:
>
>> I have the following questions that I would appreciate an answer for:
>> >
>> >  1. Can I have the different name finding tags in the same data?
>>
>
> Yes, but that means you train a model which can detect each of these
> names. You should test both, multiple name types in one model,
> and separate models for each name type. You can use the built
> in evaluation to validate your results.
>
>  >  2. Does the<START:address>  <END>  make sense over multiple lines or
>> should I
>> >  break this up further?
>>
> No not possible, names spanning multiple sentences (a line is a sentence),
> is not supported.
>
>
>  >  3. I want to use 200 or 300 different examples, do I need to create
>> separate
>> >  files for each example or can I merge them all into 1 and if it is only
>> 1,
>> >  do I need to mark up the start and end of a file?
>>
> If you want to use the command line training tool they must be all in one
> file, if you use the API
> its up to you to merge these different sources into one name sample stream.
>
> Jörn
>

Reply via email to