Marshall Schor created UIMA-5168:
------------------------------------
Summary: uv3 vs backporting most things to uv2?
Key: UIMA-5168
URL: https://issues.apache.org/jira/browse/UIMA-5168
Project: UIMA
Issue Type: Question
Components: Core Java Framework
Reporter: Marshall Schor
The uv3 docs - overview has a summary of the "features" / benefits of uv3. I
was surprised to realize, looking at these, that most of these could be
back-ported into version 2.
Because of this, there is a choice in moving forwards, either to stick to the
current v2 data representation models (sticking), or switch to new v3 ones (for
Java). In the subsequent discussion, "sticking" refers to a currently
non-existent v2 where the v3 improvements (except for changing how Feature
Structures are stored) are backported.
The two benefits lost in sticking are:
* garbage collection of unreferenced Feature Structures.
* larger limits on the number of Feature Structures per CAS (approximately
order of magnitude). This is due to the fact that in v2, all of the slots for
all Feature Structures and int and float arrays are kept in one int array,
which has a limit of approximately 2 billion words.
Benefits in sticking include:
* (perhaps) better backwards compatibility
* a smaller memory footprint if JCas is not being used (imagine UIMA running on
a smartphone)
* (maybe) better performance in some cases, including serialization
Regarding performance differences: v3 may be more performant in many cases
because of not needing to switch from low-level int handles to JCas object
references. But it may be less performant in some operations involving
serialization, because of the overhead to emulate/model the way v2 does
serialization. New Native-to-v3 serializaton forms that are not backward
compatible could be added to v3 to overcome this.
The things that could be backported to v2 include:
* redesigning the JCas cover classes for higher performance (eliminating the
xxx_Type classes, putting an extra field in the xxx cover class instead).
** note: a JCas class migration would be needed for this, similar to the one
for v3.
* redesigning much of the supporting infrastructure to improve performance by
increasing locality of reference.
* supporting arbitrary Java Objects, and backporting the implementation of
FSArrayList and IntegerArrayList
* integrating with Java 8 - including the new select framework
* eliminating problems with ConcurrentModificationException while iterating
over UIMA indexes
* reusing Type Systems
Comparing v3 versus v2+backport, what do people think of the balance between
pro/con? Should we focus on a v2+backport direction instead of v3?
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)