2011/4/10 James Kosin <[email protected]>:
>
> Hi Olivier,

Hi James,

> First, the page is a page for all the models both English an others.
>
> Usually what happens is you take the raw text... parse the text with the
> Sentence detector models, this separates the text into sentences that
> are more easily parsed.
> Then the sentences are parsed with the Tokenizer, which takes the
> sentences and breaks the sentence up into tokens (small pieces) usually
> words and moves punctuation away from words.
> Next, you use the Name finder models to parse the tokenized text.  Most
> of the models take the tokenized text and produces the required output.
> There is no model that is trained on any one set of data... at least I
> don't believe so.

Ok then let me be more specific:

Has en-ner-{person,place,organization}.bin been trained with the
output of SimpleTokenizer class or with TokenizerME + en-token.bin?

What is your feeling with respect to SimpleTokenizer versus
TokenizerME in general (for European languages)?

-- 
Olivier
http://twitter.com/ogrisel - http://github.com/ogrisel

Reply via email to