[ 
https://issues.apache.org/jira/browse/UIMA-4111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marshall Schor resolved UIMA-4111.
----------------------------------
    Resolution: Fixed

Also changed the type of the internal index used for Sofa FSs in the base view 
to be a Bag, rather than a Set.  This should be marginally smaller and faster, 
and will avoid the new code which would be creating a Default Bag index for 
this in the base view.

> Change how default bag indices are created
> ------------------------------------------
>
>                 Key: UIMA-4111
>                 URL: https://issues.apache.org/jira/browse/UIMA-4111
>             Project: UIMA
>          Issue Type: Improvement
>          Components: Core Java Framework
>            Reporter: Marshall Schor
>            Assignee: Marshall Schor
>             Fix For: 2.7.0SDK
>
>
> UIMA-173 added the concept of a universal default bag index for types that 
> would be created if no other index was defined for that type.  That Jira has 
> a link to the motivation, where it is clear that this was intended to 
> simplify how UIMA works and allow all feature structures that were 
> addedToIndexes() to be retrieved. 
> UIMA-297 corrected some anomalies in the original implementation.
> This Jira is to correct the edge cases that happen when there are only Set 
> indices defined for a type.  Because of the behavior of Set indices which
> do not add to their index the 2nd or subsequent FSs whose key values match 
> the comparator definition for the Set, the original motivation of the default 
> bag index is thwarted in this case.  This has caused several edge case 
> issues; a special note about this surprising behavior had to be included in 
> the UIMA documentation, etc. 
> More recently, another edge case has been discovered, when an annotator 
> contained in an aggregate having sufficient index definitions to insure a 
> non-set index for type T is remoted, and that remote service has only a Set 
> index for type T.  Assume that the client has added-to-indices 100 instances 
> of type T, the CAS is serialized to the remote, the remote deserializes the 
> CAS and does 100 add-to-indices, of which perhaps 50 succeed, and the other 
> 50 are no-ops (due to the Set equivalance).  Now when the remote CAS is 
> returned, only 50 will appear in the index back at the client.  This goes 
> against the principle in UIMA where we try and have remoting of components 
> not affect the semantics, where possible.  This is also quite a surprising 
> effect, which won't be expected by most users.  This is also an "unstable" 
> effect, in that, if a pipeline "assembler" (knowing little about the 
> "internals" of the components) were to add a component to the remote which 
> included a non-set index for type T, it would start behaving differently, not 
> losing any indexed items. 
> The converse would also be true: If the remote had no indices defined for 
> type T, then add-to-indices for type T would be recorded in lazily created 
> default bag indices, and those events would be sent back to the client. If an 
> assembler were to now add a component which contained only a set definition 
> for type T, this behavior would suddenly start dropping FSs that were 
> excluded due to the Set comparator. 
> For all these reasons (discovered in discussions with Edward Epstein and Adam 
> Lally), and because of the original intent of this default bag index 
> (discovered by reading the mail archives pointed to by the above two Jiras 
> which describe in some detail the motivations for this), this Jira changes 
> the logic of when the default bag index is created to create it whenever the 
> situation is that some add-to-indices event would not record an addition 
> (e.g., if there were no indices, or only Set indices, and the FS matched 
> elements already in the Sets.).
> This change will affect documentation, so update that too.  In particular, 
> the NOTE in this section 
> http://uima.apache.org/d/uimaj-2.6.0/tutorials_and_users_guides.html#ugr.tug.aae.reading_results_previous_annotators
>  will no longer apply.
> The behavior of getAllIndexedFS(type) will change - it will no longer have an 
> exception for the special case where only Set indices were defined for the 
> type.
> Because it seems that it is extremely unlikely that the previous behavior was 
> being depended upon, there is no global flag to restore the previous behavior.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to