[
https://issues.apache.org/jira/browse/UIMA-1840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12891188#action_12891188
]
Marshall Schor edited comment on UIMA-1840 at 7/23/10 11:20 AM:
----------------------------------------------------------------
After a bit of discussion (Eddie and Marshall), we think the problem is in
several parts.
The containsType method in the ResultSpecification interface has a Javadoc
which says that if the language passed is x-unspecified or null, any language
(in the result spec) will match.
This is untrue, according to the implementation. In particular, if the
language passed as a parameter is x-unspecified, or null, then the only
result-specification language which matches is x-unspecified. In particular,
"en" in the result spec won't match with x-unspecified (containsType /
containsFeature will return false).
This behavior seems appropriate for the use case where an annotator declares it
uses or outputs type X for language "en", and type Y for language
"x-unspecified". In this case, if a CAS has a language x-unspecified, it seems
appropriate that the test containsType in the annotator code should return
false for type X and true for type Y.
Note that if the result-specification has a language x-unspecified for some
type Y, then containsType returns true for type Y regardless of the language.
Thus the containsType method is not symmetric with respect to the ordering of
the language arguments.
So, *fix # 1 is to correct the javadoc for containsType (and containsFeature)*
in the interface.
The method computeAnalysisComponentResultSpec in PrimitiveAnalysisEngine_impl
does an intersection of the result spec coming from the aggregate's computation
of this, with the primitive's Capabilities. The intent is to not have anything
in the resultSpec which is not specified in the primitive's output
capabilities.
This intersection includes the call to containsType and containsFeature,
comparing the aggregate's spec with this component's spec. For the language
argument, it uses the language from the primitive annotator's output capability
specs.
Here, the logic for handling x-unspecified should be different from the logic
when a CAS's language is being tested:
CAS's language = x-unspecified, result-spec: some language e.g. "en", result =
false (per above logic).
Primitive lang output capability = x-unspecified, result-spec, some language
e.g. "en", result should be true.
In this case, the primitive says that no matter what language (if any) is being
specified, it outputs the types and/or features.
So, in this case, we have to use a new kind of test, not the containsType /
containsFeature test. *Fix # 2 is to write the correct intersection test and
use it here*. This test should work as follows, for each type and/or feature:
-1. Prim lang output capability = x-unspecified, result-spec should change its
language (just for this primitive) to x-unspecified.-
1. Prim lang output capability = x-unspecified, result-spec should keep its
language (which by definition is x-unspecified or a subset of that) (see 2
comments below).
1a. (Added after 2 comments below) (change result spec to more restrictive
language).
* Prim lang output capability = en, result spec x-unspecified: result spec for
that primitive should be switched from x-unspecified to en (the more
restrictive).
2. Prim lang output capability = xx-yy, result-spec lang = xx, change its
language to xx-yy (just for this primitive). Otherwise, we have the case
where a primitive marks some output type TP as en-us (only output for US
English), embedded in an aggregate which says it outputs TP for "en" (perhaps
by routing to several different primitives), and a CAS comes with "en-gb". The
primitive processing this CAS should not produce type TP, in this case. But if
the resut-spec language was allowed to remain "en", it would.
-3. Prim lang output capability = xx, result-spec lang = xx-yy. In this case,
keep the result-spec xx-yy. This is for the case where the aggregate says to
output type T for en-us, and although the prim says it outputs type T for en,
that's not needed by the aggregate unless it's en-us. But for this to work,
the aggregate has to widen the language specs via a union of input
specifications among all of its delegates for the same type, in case some
"flow" could happen that would have this use case make sense. I don't think
this is currently done. So for now, we will "widen" the result spec to the
containing language xx, at the cost of having some perhaps unneeded
computation.-
3. Following Adam's suggestion below - to always use the most restrictive
language:
* Prim lang output capability = xx, result-spec lang = xx-yy: switch the result
spec lang to xx-yy
It would be good to have another pair of eyes check this :-)
was (Author: schor):
After a bit of discussion (Eddie and Marshall), we think the problem is in
several parts.
The containsType method in the ResultSpecification interface has a Javadoc
which says that if the language passed is x-unspecified or null, any language
(in the result spec) will match.
This is untrue, according to the implementation. In particular, if the
language passed as a parameter is x-unspecified, or null, then the only
result-specification language which matches is x-unspecified. In particular,
"en" in the result spec won't match with x-unspecified (containsType /
containsFeature will return false).
This behavior seems appropriate for the use case where an annotator declares it
uses or outputs type X for language "en", and type Y for language
"x-unspecified". In this case, if a CAS has a language x-unspecified, it seems
appropriate that the test containsType in the annotator code should return
false for type X and true for type Y.
Note that if the result-specification has a language x-unspecified for some
type Y, then containsType returns true for type Y regardless of the language.
Thus the containsType method is not symmetric with respect to the ordering of
the language arguments.
So, *fix # 1 is to correct the javadoc for containsType (and containsFeature)*
in the interface.
The method computeAnalysisComponentResultSpec in PrimitiveAnalysisEngine_impl
does an intersection of the result spec coming from the aggregate's computation
of this, with the primitive's Capabilities. The intent is to not have anything
in the resultSpec which is not specified in the primitive's output
capabilities.
This intersection includes the call to containsType and containsFeature,
comparing the aggregate's spec with this component's spec. For the language
argument, it uses the language from the primitive annotator's output capability
specs.
Here, the logic for handling x-unspecified should be different from the logic
when a CAS's language is being tested:
CAS's language = x-unspecified, result-spec: some language e.g. "en", result =
false (per above logic).
Primitive lang output capability = x-unspecified, result-spec, some language
e.g. "en", result should be true.
In this case, the primitive says that no matter what language (if any) is being
specified, it outputs the types and/or features.
So, in this case, we have to use a new kind of test, not the containsType /
containsFeature test. *Fix # 2 is to write the correct intersection test and
use it here*. This test should work as follows, for each type and/or feature:
-1. Prim lang output capability = x-unspecified, result-spec should change its
language (just for this primitive) to x-unspecified.-
1. Prim lang output capability = x-unspecified, result-spec should keep its
language (which by definition is x-unspecified or a subset of that) (see 2
comments below).
2. Prim lang output capability = xx-yy, result-spec lang = xx, change its
language to xx-yy (just for this primitive). Otherwise, we have the case
where a primitive marks some output type TP as en-us (only output for US
English), embedded in an aggregate which says it outputs TP for "en" (perhaps
by routing to several different primitives), and a CAS comes with "en-gb". The
primitive processing this CAS should not produce type TP, in this case. But if
the resut-spec language was allowed to remain "en", it would.
3. Prim lang output capability = xx, result-spec lang = xx-yy. In this case,
keep the result-spec xx-yy. This is for the case where the aggregate says to
output type T for en-us, and although the prim says it outputs type T for en,
that's not needed by the aggregate unless it's en-us. But for this to work,
the aggregate has to widen the language specs via a union of input
specifications among all of its delegates for the same type, in case some
"flow" could happen that would have this use case make sense. I don't think
this is currently done. So for now, we will "widen" the result spec to the
containing language xx, at the cost of having some perhaps unneeded computation.
It would be good to have another pair of eyes check this :-)
> Result Specification behavior incorrect for aggregates
> ------------------------------------------------------
>
> Key: UIMA-1840
> URL: https://issues.apache.org/jira/browse/UIMA-1840
> Project: UIMA
> Issue Type: Bug
> Components: Core Java Framework
> Reporter: Eddie Epstein
>
> For a scenario using default result specifications, if an annotator with
> language "x-unspecified" is included in an aggregate with a different
> language, say "en", any containsType method calls from the annotator will
> return false.
> This behavior is incorrect given that the annotator has declared that it will
> work with any language.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.