+0 I do not have a strong opinion here. As far as I now, we do not use (many) empty StringArrays (or others) but rather null values for the features.
Personally, I would prefer sharing as I see no real need for 0-length arrays used as markers. Peter Am 14.09.2017 um 16:36 schrieb Marshall Schor: > I was mistaken about Java in one detail: for things like Integer(17), there > are > two ways to create it: new Integer(17), or Integer.valueOf(17). The first > call > does create a fresh, not == to any other Integer object, while the 2nd call > will > reuse an existing Integer object for 17 (if it exists). Users are encouraged > to > switch to Integer.valueOf(xxx) for efficiency in the Javadocs. > > I'm now slightly leaning against doing this change for UIMA, because of the > edge > cases where the user could have depended on object un-equality for 0-length > arrays and lists. > > Users could "manually" achieve the same result using the shared instance > values, > and (for xmi serialization) marking any features that contain these values as > "multi-references-allowed" so the deserialization would share them. This > could > become a suggested "best practice" for those who use 0-length arrays and empty > lists. > > Not doing this would make two Jiras a "won't fix": > https://issues.apache.org/jira/browse/UIMA-5564 > https://issues.apache.org/jira/browse/UIMA-5566 > > What do others think? > > -Marshall > > On 9/13/2017 8:22 AM, Marshall Schor wrote: >> I posted a Jira for a proposed change in how 0-length UIMA arrays and lists >> are >> managed. These are immutable objects, and (theoretically) one instance (per >> CAS) could be shared. >> >> In the current implementation, this is managed explicitly by the user - they >> can >> use a bunch of new APIs to get shared instances. >> >> I'm thinking a better way is to make this automatically the case, and remove >> the >> new bunch of APIs (a smaller API set is always a good thing, for essentially >> the >> same functionality, IMHO). The implementation would change so that the calls >> that create "new" 0-length arrays/lists would instead of creating a new one, >> only do that if none already existed, and if one already did, it would return >> that one. >> >> This follows Java's general direction for immutable objects, like Strings and >> Integer values, which can be shared. >> >> For cases where people wanted/needed a CAS value "marker" that was tiny, but >> unique (like you get with Java's new Object()), we would keep "new >> TOP(aCas)" as >> something that generated unique instances. What do others think? >> >> I've seen large-scale implementations of UIMA pipelines with lots of >> defaulted >> 0-length arrays in them; this has the potential to improve space/time >> performance a reasonable amount for these. >> >> -Marshall >> >>
