Marshall Schor created UIMA-5168: ------------------------------------ Summary: uv3 vs backporting most things to uv2? Key: UIMA-5168 URL: https://issues.apache.org/jira/browse/UIMA-5168 Project: UIMA Issue Type: Question Components: Core Java Framework Reporter: Marshall Schor
The uv3 docs - overview has a summary of the "features" / benefits of uv3. I was surprised to realize, looking at these, that most of these could be back-ported into version 2. Because of this, there is a choice in moving forwards, either to stick to the current v2 data representation models (sticking), or switch to new v3 ones (for Java). In the subsequent discussion, "sticking" refers to a currently non-existent v2 where the v3 improvements (except for changing how Feature Structures are stored) are backported. The two benefits lost in sticking are: * garbage collection of unreferenced Feature Structures. * larger limits on the number of Feature Structures per CAS (approximately order of magnitude). This is due to the fact that in v2, all of the slots for all Feature Structures and int and float arrays are kept in one int array, which has a limit of approximately 2 billion words. Benefits in sticking include: * (perhaps) better backwards compatibility * a smaller memory footprint if JCas is not being used (imagine UIMA running on a smartphone) * (maybe) better performance in some cases, including serialization Regarding performance differences: v3 may be more performant in many cases because of not needing to switch from low-level int handles to JCas object references. But it may be less performant in some operations involving serialization, because of the overhead to emulate/model the way v2 does serialization. New Native-to-v3 serializaton forms that are not backward compatible could be added to v3 to overcome this. The things that could be backported to v2 include: * redesigning the JCas cover classes for higher performance (eliminating the xxx_Type classes, putting an extra field in the xxx cover class instead). ** note: a JCas class migration would be needed for this, similar to the one for v3. * redesigning much of the supporting infrastructure to improve performance by increasing locality of reference. * supporting arbitrary Java Objects, and backporting the implementation of FSArrayList and IntegerArrayList * integrating with Java 8 - including the new select framework * eliminating problems with ConcurrentModificationException while iterating over UIMA indexes * reusing Type Systems Comparing v3 versus v2+backport, what do people think of the balance between pro/con? Should we focus on a v2+backport direction instead of v3? -- This message was sent by Atlassian JIRA (v6.3.4#6332)