[
https://issues.apache.org/jira/browse/UIMA-4111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Marshall Schor resolved UIMA-4111.
----------------------------------
Resolution: Fixed
Also changed the type of the internal index used for Sofa FSs in the base view
to be a Bag, rather than a Set. This should be marginally smaller and faster,
and will avoid the new code which would be creating a Default Bag index for
this in the base view.
> Change how default bag indices are created
> ------------------------------------------
>
> Key: UIMA-4111
> URL: https://issues.apache.org/jira/browse/UIMA-4111
> Project: UIMA
> Issue Type: Improvement
> Components: Core Java Framework
> Reporter: Marshall Schor
> Assignee: Marshall Schor
> Fix For: 2.7.0SDK
>
>
> UIMA-173 added the concept of a universal default bag index for types that
> would be created if no other index was defined for that type. That Jira has
> a link to the motivation, where it is clear that this was intended to
> simplify how UIMA works and allow all feature structures that were
> addedToIndexes() to be retrieved.
> UIMA-297 corrected some anomalies in the original implementation.
> This Jira is to correct the edge cases that happen when there are only Set
> indices defined for a type. Because of the behavior of Set indices which
> do not add to their index the 2nd or subsequent FSs whose key values match
> the comparator definition for the Set, the original motivation of the default
> bag index is thwarted in this case. This has caused several edge case
> issues; a special note about this surprising behavior had to be included in
> the UIMA documentation, etc.
> More recently, another edge case has been discovered, when an annotator
> contained in an aggregate having sufficient index definitions to insure a
> non-set index for type T is remoted, and that remote service has only a Set
> index for type T. Assume that the client has added-to-indices 100 instances
> of type T, the CAS is serialized to the remote, the remote deserializes the
> CAS and does 100 add-to-indices, of which perhaps 50 succeed, and the other
> 50 are no-ops (due to the Set equivalance). Now when the remote CAS is
> returned, only 50 will appear in the index back at the client. This goes
> against the principle in UIMA where we try and have remoting of components
> not affect the semantics, where possible. This is also quite a surprising
> effect, which won't be expected by most users. This is also an "unstable"
> effect, in that, if a pipeline "assembler" (knowing little about the
> "internals" of the components) were to add a component to the remote which
> included a non-set index for type T, it would start behaving differently, not
> losing any indexed items.
> The converse would also be true: If the remote had no indices defined for
> type T, then add-to-indices for type T would be recorded in lazily created
> default bag indices, and those events would be sent back to the client. If an
> assembler were to now add a component which contained only a set definition
> for type T, this behavior would suddenly start dropping FSs that were
> excluded due to the Set comparator.
> For all these reasons (discovered in discussions with Edward Epstein and Adam
> Lally), and because of the original intent of this default bag index
> (discovered by reading the mail archives pointed to by the above two Jiras
> which describe in some detail the motivations for this), this Jira changes
> the logic of when the default bag index is created to create it whenever the
> situation is that some add-to-indices event would not record an addition
> (e.g., if there were no indices, or only Set indices, and the FS matched
> elements already in the Sets.).
> This change will affect documentation, so update that too. In particular,
> the NOTE in this section
> http://uima.apache.org/d/uimaj-2.6.0/tutorials_and_users_guides.html#ugr.tug.aae.reading_results_previous_annotators
> will no longer apply.
> The behavior of getAllIndexedFS(type) will change - it will no longer have an
> exception for the special case where only Set indices were defined for the
> type.
> Because it seems that it is extremely unlikely that the previous behavior was
> being depended upon, there is no global flag to restore the previous behavior.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)