Marshall Schor created UIMA-5168:
------------------------------------

             Summary: uv3 vs backporting most things to uv2?
                 Key: UIMA-5168
                 URL: https://issues.apache.org/jira/browse/UIMA-5168
             Project: UIMA
          Issue Type: Question
          Components: Core Java Framework
            Reporter: Marshall Schor


The uv3 docs - overview has a summary of the "features" / benefits of uv3.  I 
was surprised to realize, looking at these, that most of these could be 
back-ported into version 2.  

Because of this, there is a choice in moving forwards, either to stick to the 
current v2 data representation models (sticking), or switch to new v3 ones (for 
Java).  In the subsequent discussion, "sticking" refers to a currently 
non-existent v2 where the v3 improvements (except for changing how Feature 
Structures are stored) are backported.

The two benefits lost in sticking are: 
* garbage collection of unreferenced Feature Structures.
* larger limits on the number of Feature Structures per CAS (approximately 
order of magnitude).  This is due to the fact that in v2, all of the slots for 
all Feature Structures and int and float arrays are kept in one int array, 
which has a limit of approximately 2 billion words.

Benefits in sticking include:
* (perhaps) better backwards compatibility
* a smaller memory footprint if JCas is not being used (imagine UIMA running on 
a smartphone)
* (maybe) better performance in some cases, including serialization

Regarding performance differences:  v3 may be more performant in many cases 
because of not needing to switch from low-level int handles to JCas object 
references.  But it may be less performant in some operations involving 
serialization, because of the overhead to emulate/model the way v2 does 
serialization.  New Native-to-v3 serializaton forms that are not backward 
compatible could be added to v3 to overcome this.   

The things that could be backported to v2 include:
* redesigning the JCas cover classes for higher performance (eliminating the 
xxx_Type classes, putting an extra field in the xxx cover class instead).
** note: a JCas class migration would be needed for this, similar to the one 
for v3.
* redesigning much of the supporting infrastructure to improve performance by 
increasing locality of reference.
* supporting arbitrary Java Objects, and backporting the implementation of 
FSArrayList and IntegerArrayList
* integrating with Java 8 - including the new select framework
* eliminating problems with ConcurrentModificationException while iterating 
over UIMA indexes
* reusing Type Systems

Comparing v3 versus v2+backport, what do people think of the balance between 
pro/con?  Should we focus on a v2+backport direction instead of v3?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to