rhead created OPENNLP-702:
-----------------------------
Summary: DictionaryNameFinder Not Finding Longest Match When Name
Ends in a Number
Key: OPENNLP-702
URL: https://issues.apache.org/jira/browse/OPENNLP-702
Project: OpenNLP
Issue Type: Bug
Components: Name Finder, Tokenizer
Environment: Darwin Kernel Version 12.5.0
Reporter: rhead
Here's my dictionary:
<?xml version="1.0" encoding="UTF-8"?>
<dictionary case_sensitive="false">
<entry>
<token>vitamin</token>
<token>b12</token>
</entry>
<entry>
<token>vitamin</token>
<token>b</token>
</entry>
<entry>
<token>john</token>
<token>doe</token>
</entry>
<entry>
<token>john</token>
<token>d</token>
</entry>
</dictionary>
When ran on this sentence using a DictionaryNameFinder: My name is john doe,
aka john d. I
like vitamin b12.
The following tokens are found: john doe, john d, vitamin b
As you can see, when the 2nd token ends in a number, the longest match is
discarded.
(Originally from:
http://mail-archives.apache.org/mod_mbox/opennlp-users/201406.mbox/%3C1402268906.31205.YahooMailNeo%40web121102.mail.ne1.yahoo.com%3E)
--
This message was sent by Atlassian JIRA
(v6.2#6252)