Jorn,
1) The last commit should have said non-static and not static.
2) The UIMA Annotators don't seem to be using the case sensitivity
dictionary; so, I've set the serializer to use true for the case
sensitivity flag and not save the return result. Otherwise, we would
really want the UIMA StringDictionary to be really a Dictionary instead
from OpenNLP tools.
3) The POSDictionary is an interesting situation. There are actually
multiple issues:
public String[] getTags(String word) {
if (isCaseSensitive) {
return dictionary.get(word);
}
else {
return dictionary.get(word.toLowerCase());
}
}
This section of code is totally broken for the following reasons, (a)
it completely depends on how the dictionary is built originally, if case
sensitive then you can't try the above to implement a case insensitive,
or visa versa. (b) it will never find the correct item otherwise.
The original code was only setup for the caseSensitive flag to be always
true.
We need to look over the POSDictionary and determine how we want this to
work and outline a plan. Can I get a vote on any ideas?
Thanks,
James
On 8/8/2011 9:54 PM, James Kosin (JIRA) wrote:
[
https://issues.apache.org/jira/browse/OPENNLP-239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13081362#comment-13081362
]
James Kosin commented on OPENNLP-239:
-------------------------------------
Jorn,
Sorry, I've been busy... The default was because I didn't know the
default state for many of the models. Most of the time it is based on
how they are created. I can fix that easily; so that it gets set to
true if not present.
The static was required because the function is static and doesn't have
access to the non-static members. I agree it was a nasty compromise.
The other way to go would be to add the serializing to the Dictionary
object itself... but, I don't know the problems with growing the
Dictionary class too large... lastly, we could have the serializing as
non-static meaning we would need to create a DictionarySerializer to use.
James
I'm also not familiar with the
Case Sensitivie Flag& Custom Tag Dictionary
--------------------------------------------
Key: OPENNLP-239
URL: https://issues.apache.org/jira/browse/OPENNLP-239
Project: OpenNLP
Issue Type: New Feature
Components: Parser
Affects Versions: tools-1.5.1-incubating
Reporter: mark meiklejohn
Assignee: James Kosin
Fix For: tools-1.5.2-incubating
Unable to set case sensitive flag as per TreebankParser 1.3.1 or use a custom
tag dictionary
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira