2011/1/24 Olivier Grisel <[email protected]>:
> 2011/1/24 Michael Migdol <[email protected]>:
>> Hi Everyone,
>
> Hi Michael and thanks for sharing your experiments,
>
>> So, a few questions:
>>
>> 1) Olivier stated results for english location entities were a recall of
>> 0.64.  Does this mean that, in general, detecting only 3 of 5
>> countries mentioned in an article is about what one would expect?
>> There were actually 12 mentions in the article for the 5 distinct
>> countries  (it found Sweden twice), so the recall for this simple test
>> was actually more like 42%.  And obviously, a single article is not a
>> sufficient sample size to judge with.   I know, my next task should be to
>> run the OpenNLP evaluator on a separate dataset, right?
>
> The evaluation I did was using a model trained on more than 100k
> sentences. Maybe the recall is even worth than mine because you used a
> much smaller training set. I realised that I haven't uploaded my
> results for English on S3. I will do so and let you know when it's
> done.

Here it is:

  http://pignlproc.s3.amazonaws.com/corpus/en/opennlp_location/part-r-00000

The output is chunked: to get the following chunks replace the
trailing file name with part-r-00001, part-r-00002, and so on.

Indeed by looking at the output, China is never annotated while most
other countries are. This is likely to be caused by the fact that the
country article in wikipedia / dbpedia is named "People's Republic of
China" while "China" is the article for the civilization. Furthermore
pignlproc does not yet resolve Wikipedia / DBpedia redirect data such
as available from
http://downloads.dbpedia.org/3.5.1/en/redirects_en.nt.bz2 .

So I think it would be really worth implementing the additional left
outer JOIN / COGROUP on the redirect data. If you manage to do so,
please send me a patch :)

Also here are the resulting models I trained in my post for the
English language:

  http://pignlproc.s3.amazonaws.com/models/opennlp/en-ner-location.bin
  http://pignlproc.s3.amazonaws.com/models/opennlp/en-ner-person.bin
  http://pignlproc.s3.amazonaws.com/models/opennlp/en-ner-organization.bin

Best,

-- 
Olivier
http://twitter.com/ogrisel - http://github.com/ogrisel

Reply via email to