[ https://issues.apache.org/jira/browse/OPENNLP-471?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
James Kosin reassigned OPENNLP-471: ----------------------------------- Assignee: James Kosin > DictionaryNameFinder has HASHing issues > --------------------------------------- > > Key: OPENNLP-471 > URL: https://issues.apache.org/jira/browse/OPENNLP-471 > Project: OpenNLP > Issue Type: Bug > Components: Name Finder > Reporter: James Kosin > Assignee: James Kosin > Labels: dictionary, namefinder > Fix For: tools-1.5.3 > > > The DictionaryNameFinder has issues finding multi-token names when the > dictionary is searched a token at a time by the find() method. If, the > dictionary doesn't have a single (or shorter) token match available in the > dictionary. > Having a dictionary with {"folic", "acid"} without an entry for {"folic"} > will cause the find() method to totally skip the fact there is a longer match > possible. > Thanks to Jim for pushing this and to my debugging skills to find. > Two possiblilites come to mind: > 1) I don't really like, is we turn it into a larger problem by trying longer > matches when shorter ones don't match. Unfortunately, this turns quickly > into a race to see who can wait longer. > 2) A way of returning a possible match that may need exploring, or a > look-ahead type system to say we don't match "folic" but if you have "acid" > after "folic" we have a match for that in the dictionary. > 3) Leave it as is and modify the dictionary to add shorter terms to the > dictionary... maybe marking as not-a-valid entry so we can know we need a > longer match. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira