[ https://issues.apache.org/jira/browse/UIMA-1840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12891188#action_12891188 ]
Marshall Schor commented on UIMA-1840: -------------------------------------- After a bit of discussion (Eddie and Marshall), we think the problem is in several parts. The containsType method in the ResultSpecification interface has a Javadoc which says that if the language passed is x-unspecified or null, any language (in the result spec) will match. This is untrue, according to the implementation. In particular, if the language passed as a parameter is x-unspecified, or null, then the only result-specification language which matches is x-unspecified. In particular, "en" in the result spec won't match with x-unspecified (containsType / containsFeature will return false). This behavior seems appropriate for the use case where an annotator declares it uses or outputs type X for language "en", and type Y for language "x-unspecified". In this case, if a CAS has a language x-unspecified, it seems appropriate that the test containsType in the annotator code should return false for type X and true for type Y. Note that if the result-specification has a language x-unspecified for some type Y, then containsType returns true for type Y regardless of the language. Thus the containsType method is not symmetric with respect to the ordering of the language arguments. So, *fix # 1 is to correct the javadoc for containsType (and containsFeature)* in the interface. The method computeAnalysisComponentResultSpec in PrimitiveAnalysisEngine_impl does an intersection of the result spec coming from the aggregate's computation of this, with the primitive's Capabilities. The intent is to not have anything in the resultSpec which is not specified in the primitive's output capabilities. This intersection includes the call to containsType and containsFeature, comparing the aggregate's spec with this component's spec. For the language argument, it uses the language from the primitive annotator's output capability specs. Here, the logic for handling x-unspecified should be different from the logic when a CAS's language is being tested: CAS's language = x-unspecified, result-spec: some language e.g. "en", result = false (per above logic). Primitive lang output capability = x-unspecified, result-spec, some language e.g. "en", result should be true. In this case, the primitive says that no matter what language (if any) is being specified, it outputs the types and/or features. So, in this case, we have to use a new kind of test, not the containsType / containsFeature test. *Fix # 2 is to write the correct intersection test and use it here*. This test should work as follows, for each type and/or feature: 1. Prim lang output capability = x-unspecified, result-spec should change its language (just for this primitive) to x-unspecified. 2. Prim lang output capability = xx-yy, result-spec lang = xx, change its language to xx-yy (just for this primitive). Otherwise, we have the case where a primitive marks some output type TP as en-us (only output for US English), embedded in an aggregate which says it outputs TP for "en" (perhaps by routing to several different primitives), and a CAS comes with "en-gb". The primitive processing this CAS should not produce type TP, in this case. But if the resut-spec language was allowed to remain "en", it would. 3. Prim lang output capability = xx, result-spec lang = xx-yy. In this case, keep the result-spec xx-yy. This is for the case where the aggregate says to output type T for en-us, and although the prim says it outputs type T for en, that's not needed by the aggregate unless it's en-us. But for this to work, the aggregate has to widen the language specs via a union of input specifications among all of its delegates for the same type, in case some "flow" could happen that would have this use case make sense. I don't think this is currently done. So for now, we will "widen" the result spec to the containing language xx, at the cost of having some perhaps unneeded computation. It would be good to have another pair of eyes check this :-) > Result Specification behavior incorrect for aggregates > ------------------------------------------------------ > > Key: UIMA-1840 > URL: https://issues.apache.org/jira/browse/UIMA-1840 > Project: UIMA > Issue Type: Bug > Components: Core Java Framework > Reporter: Eddie Epstein > > For a scenario using default result specifications, if an annotator with > language "x-unspecified" is included in an aggregate with a different > language, say "en", any containsType method calls from the annotator will > return false. > This behavior is incorrect given that the annotator has declared that it will > work with any language. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.