[jira] Issue Comment Edited: (UIMA-1840) Result Specification behavior incorrect for aggregates

Marshall Schor (JIRA) Fri, 23 Jul 2010 08:21:17 -0700

    [ 
https://issues.apache.org/jira/browse/UIMA-1840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12891188#action_12891188
 ]


Marshall Schor edited comment on UIMA-1840 at 7/23/10 11:20 AM:
----------------------------------------------------------------

After a bit of discussion (Eddie and Marshall), we think the problem is in 
several parts.

The containsType method in the ResultSpecification interface has a Javadoc 
which says that if the language passed is x-unspecified or null, any language 
(in the result spec) will match.

This is untrue, according to the implementation.  In particular, if the 
language passed as a parameter is x-unspecified, or null, then the only 
result-specification language which matches is x-unspecified.  In particular, 
"en" in the result spec won't match with x-unspecified (containsType / 
containsFeature will return false).

This behavior seems appropriate for the use case where an annotator declares it 
uses or outputs type X for language "en", and type Y for language 
"x-unspecified".  In this case, if a CAS has a language x-unspecified, it seems 
appropriate that the test containsType in the annotator code should return 
false for type X and true for type Y.

Note that if the result-specification has a language x-unspecified for some 
type Y, then containsType returns true for type Y regardless of the language.  
Thus the containsType method is not symmetric with respect to the ordering of 
the language arguments.

So, *fix # 1 is to correct the javadoc for containsType (and containsFeature)* 
in the interface.

The method computeAnalysisComponentResultSpec in PrimitiveAnalysisEngine_impl 
does an intersection of the result spec coming from the aggregate's computation 
of this, with the primitive's Capabilities.  The intent is to not have anything 
in the resultSpec which is not specified in the primitive's output 
capabilities.  

This intersection includes the call to containsType and containsFeature, 
comparing the aggregate's spec with this component's spec.  For the language 
argument, it uses the language from the primitive annotator's output capability 
specs.

Here, the logic for handling x-unspecified should be different from the logic 
when a CAS's language is being tested:

CAS's language = x-unspecified, result-spec: some language e.g. "en", result = 
false (per above logic).

Primitive lang output capability = x-unspecified, result-spec, some language 
e.g. "en", result should be true.  

In this case, the primitive says that no matter what language (if any) is being 
specified, it outputs the types and/or features. 

So, in this case, we have to use a new kind of test, not the containsType / 
containsFeature test.  *Fix # 2 is to write the correct intersection test and 
use it here*.  This test should work as follows, for each type and/or feature:

-1. Prim lang output capability = x-unspecified, result-spec should change its 
language (just for this primitive) to x-unspecified.-

1. Prim lang output capability = x-unspecified, result-spec should keep its 
language (which by definition is x-unspecified or a subset of that) (see 2 
comments below).

1a. (Added after 2 comments below) (change result spec to more restrictive 
language).
* Prim lang output capability = en, result spec x-unspecified: result spec for 
that primitive should be switched from x-unspecified to en (the more 
restrictive).

2. Prim lang output capability = xx-yy, result-spec lang = xx, change its 
language to xx-yy (just for this primitive).   Otherwise, we have the case 
where a primitive marks some output type TP as en-us (only output for US 
English), embedded in an aggregate which says it outputs TP for "en" (perhaps 
by routing to several different primitives), and a CAS comes with "en-gb".  The 
primitive processing this CAS should not produce type TP, in this case.  But if 
the resut-spec language was allowed to remain "en", it would.   

-3. Prim lang output capability = xx, result-spec lang = xx-yy.  In this case, 
keep the result-spec xx-yy.  This is for the case where the aggregate says to 
output type T for en-us, and although the prim says it outputs type T for en, 
that's not needed by the aggregate unless it's en-us.  But for this to work, 
the aggregate has to widen the language specs via a union of input 
specifications among all of its delegates for the same type, in case some 
"flow" could happen that would have this use case make sense.  I don't think 
this is currently done.  So for now, we will "widen" the result spec to the 
containing language xx, at the cost of having some perhaps unneeded 
computation.-

3. Following Adam's suggestion below - to always use the most restrictive 
language:
* Prim lang output capability = xx, result-spec lang = xx-yy: switch the result 
spec lang to xx-yy

It would be good to have another pair of eyes check this :-) 




      was (Author: schor):
    After a bit of discussion (Eddie and Marshall), we think the problem is in 
several parts.

The containsType method in the ResultSpecification interface has a Javadoc 
which says that if the language passed is x-unspecified or null, any language 
(in the result spec) will match.

This is untrue, according to the implementation.  In particular, if the 
language passed as a parameter is x-unspecified, or null, then the only 
result-specification language which matches is x-unspecified.  In particular, 
"en" in the result spec won't match with x-unspecified (containsType / 
containsFeature will return false).

This behavior seems appropriate for the use case where an annotator declares it 
uses or outputs type X for language "en", and type Y for language 
"x-unspecified".  In this case, if a CAS has a language x-unspecified, it seems 
appropriate that the test containsType in the annotator code should return 
false for type X and true for type Y.

Note that if the result-specification has a language x-unspecified for some 
type Y, then containsType returns true for type Y regardless of the language.  
Thus the containsType method is not symmetric with respect to the ordering of 
the language arguments.

So, *fix # 1 is to correct the javadoc for containsType (and containsFeature)* 
in the interface.

The method computeAnalysisComponentResultSpec in PrimitiveAnalysisEngine_impl 
does an intersection of the result spec coming from the aggregate's computation 
of this, with the primitive's Capabilities.  The intent is to not have anything 
in the resultSpec which is not specified in the primitive's output 
capabilities.  

This intersection includes the call to containsType and containsFeature, 
comparing the aggregate's spec with this component's spec.  For the language 
argument, it uses the language from the primitive annotator's output capability 
specs.

Here, the logic for handling x-unspecified should be different from the logic 
when a CAS's language is being tested:

CAS's language = x-unspecified, result-spec: some language e.g. "en", result = 
false (per above logic).

Primitive lang output capability = x-unspecified, result-spec, some language 
e.g. "en", result should be true.  

In this case, the primitive says that no matter what language (if any) is being 
specified, it outputs the types and/or features. 

So, in this case, we have to use a new kind of test, not the containsType / 
containsFeature test.  *Fix # 2 is to write the correct intersection test and 
use it here*.  This test should work as follows, for each type and/or feature:

-1. Prim lang output capability = x-unspecified, result-spec should change its 
language (just for this primitive) to x-unspecified.-

1. Prim lang output capability = x-unspecified, result-spec should keep its 
language (which by definition is x-unspecified or a subset of that) (see 2 
comments below).

2. Prim lang output capability = xx-yy, result-spec lang = xx, change its 
language to xx-yy (just for this primitive).   Otherwise, we have the case 
where a primitive marks some output type TP as en-us (only output for US 
English), embedded in an aggregate which says it outputs TP for "en" (perhaps 
by routing to several different primitives), and a CAS comes with "en-gb".  The 
primitive processing this CAS should not produce type TP, in this case.  But if 
the resut-spec language was allowed to remain "en", it would.   

3. Prim lang output capability = xx, result-spec lang = xx-yy.  In this case, 
keep the result-spec xx-yy.  This is for the case where the aggregate says to 
output type T for en-us, and although the prim says it outputs type T for en, 
that's not needed by the aggregate unless it's en-us.  But for this to work, 
the aggregate has to widen the language specs via a union of input 
specifications among all of its delegates for the same type, in case some 
"flow" could happen that would have this use case make sense.  I don't think 
this is currently done.  So for now, we will "widen" the result spec to the 
containing language xx, at the cost of having some perhaps unneeded computation.

It would be good to have another pair of eyes check this :-) 



  
> Result Specification behavior incorrect for aggregates
> ------------------------------------------------------
>
>                 Key: UIMA-1840
>                 URL: https://issues.apache.org/jira/browse/UIMA-1840
>             Project: UIMA
>          Issue Type: Bug
>          Components: Core Java Framework
>            Reporter: Eddie Epstein
>
> For a scenario using default result specifications, if an annotator with 
> language "x-unspecified" is included in an aggregate with a different 
> language, say "en", any containsType method calls from the annotator will 
> return false. 
> This behavior is incorrect given that the annotator has declared that it will 
> work with any language.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Issue Comment Edited: (UIMA-1840) Result Specification behavior incorrect for aggregates

Reply via email to