The fact that we can switch to field-based DataObjects with minimal effort and without sacrificing a single thing in the Cayenne design is a *very* big deal! Thanks John for bringing the possibility to everyone's attention, and Nikita - for the working code and benchmarks.
I am going to try this out on a real app some time next week. Very exciting! :) Andrus > On Jul 5, 2017, at 5:19 PM, Nikita Timofeev <ntimof...@objectstyle.com> wrote: > > Hi all, > > I've run some additional benchmarks for field-based classes inspired > by John and they were so promising, that I've moved on > to the implementation. > > So here is pull request for you to review [1]. > Here [2] you can see what new generated classes will look like. > > For me there is no visible downsides in this solution, e.g. both > memory usage and speed are improved. > All tests are clean and the only minor incompatibility out there > is in HOLLOW state that no longer resets object's values [3] > (though this can be implemented as well, I'm just > not sure this is really needed). > > P.S. here is some raw numbers from my benchmarks. > I'm giving absolute numbers, but really only their relation is important. > Results for old version are on the left, for new version on the right. > > Memory usage: > ============== > 1. 10.000 small objects > (int, Date and String ~ 20 chars) >>>> 6Mb vs 2.5Mb <<< > > 2. 10.000 objects with big values > (int, Date and String ~ 1K chars) > Actually in case of same classes (same field number), > there will be just constant difference, > so this is just to get idea what to expect in different cases. >>>> 24.5Mb vs 21Mb <<< > > Performance: > ============== > (numbers are in millions ops per sec, measured with JMH benchmark) > 1. Getter: >>>> 107 vs 177 <<< > > 2. Setter: > Not so impressive, as Cayenne stack took most of the > time here to process graph diff, but still new methods are better. >>>> 12.5 vs 14.5 <<< > > 3. readPropertyDirectly: >>>> 152 vs 248 <<< > > 4. writePropertyDirectly: > This is map.put() vs switch(String) battle, > and map definitely loosing it :) >>>> 126 vs 582 <<< > > [1] https://github.com/apache/cayenne/pull/235 > [2] > https://github.com/stariy95/cayenne/blob/544aae0866e8fb1712f07f00794ea3263a4c95b5/cayenne-server/src/test/java/org/apache/cayenne/testdo/testmap/auto/_Artist.java > [3] > https://github.com/stariy95/cayenne/blob/544aae0866e8fb1712f07f00794ea3263a4c95b5/cayenne-server/src/test/java/org/apache/cayenne/access/DataContextExtrasIT.java#L144 > > On Wed, Jun 21, 2017 at 10:20 PM, John Huss <johnth...@gmail.com> wrote: >> I was surprised by the difference in memory too, but this is a small diff >> (apart from the newly generated readPropertyDirectly/writePropertyDirectly >> methods) so there isn't anything else going on. My unverified assumption >> of HashMap is that is doubles in size each time it resizes, so entities >> with more fields could cause more waste. For example a entity with 65 >> fields would have 63 empty array slots (ignoring fill factor). So the >> exact savings may vary. >> >> On Sat, Jun 17, 2017 at 1:01 AM Robert Zeigler <robert.zeig...@roxanemy.com> >> wrote: >> >>> I’m also a little surprised at the 1/2-ing… what were the values being >>> stored? I suppose in theory, many values are relatively “small”, >>> memory-wise, so having the overhead of also storing the key could ~double >>> the memory use, but if you’re storing large values, I wouldn’t expect the >>> utilization to drop as dramatically. What were your data values (type and >>> length distribution for strings)? >>> >>> Thanks! >>> >>> Robert >>> >>>> On Jun 10, 2017, at 6:49 AM, Michael Gentry <blackn...@gmail.com> wrote: >>>> >>>> Hi John, >>>> >>>> I'm a little surprised that map-based storage is over 2x worse in memory >>>> consumption. I'm wondering if there is more going on here than storage >>> of >>>> the property values. Would it be simple enough to adapt your test case >>> to >>>> compare a list of POJOs vs a list of maps and see what the memory >>> footprint >>>> and difference is that way? >>>> >>>> I personally was thinking the big improvement for using fields directly >>> is >>>> the speed improvement. I didn't think the memory consumption difference >>>> would be that dramatic. >>>> >>>> Thanks, >>>> >>>> mrg >>>> >>>> >>>> On Fri, Jun 9, 2017 at 10:55 AM, John Huss <johnth...@gmail.com> wrote: >>>> >>>>> I did some experimenting recently to see if changes to the way data in >>>>> stored in Cayenne objects could reduce the amount of memory they >>> consume. >>>>> >>>>> I chose to use separate fields for each property instead of a HashMap >>>>> (which is what CayenneDataObject uses). The results were very >>> affirming. >>>>> For my test of loading 10,000 objects from every table in my database I >>> got >>>>> it to use about about *half the memory* of the default class (from 921 >>> MB >>>>> down to 431 MB). >>>>> >>>>> I know there has been some discussion already about addressing this >>> topic >>>>> for the next major release, so I thought I'd throw in some observations >>> / >>>>> questions here. >>>>> >>>>> For my implementation I subclassed CayenneDataObject because in previous >>>>> experience I found implementing a replacement to be much more difficult >>> and >>>>> subject to more bugs due to the less frequently used code path that >>>>> PersistentObject and it's descriptors take you down. My apps rely on >>>>> things that are sort of specific to CayenneDataObject like Validating. >>>>> >>>>> So one question is how we should be addressing the need that people may >>>>> have to create their own data classes. Right now I believe the >>> recommended >>>>> path is to subclass PersistentObject, but I'm not convinced that that >>> is a >>>>> viable solution without wholesale copying most of CayenneDataObject into >>>>> your subclass. I'd rather see a fuller base class (in addition to >>> keeping >>>>> PersistentObject around) that includes all of CayenneDataObject except >>> the >>>>> property storage (HashMap). >>>>> >>>>> For my implementation I had to modify CayenneDataObject, but only >>> slightly >>>>> to avoid creating the HashMap which I wasn't using. However, because >>> class >>>>> isn't really intended for customization this map is referenced in >>> multiple >>>>> methods that can't easily be overridden to change the way things are >>>>> stored. >>>>> >>>>> Another approach might be to ask why anyone should need to customize the >>>>> way data is stored in the objects if we can just use the best solution >>>>> possible in the first place? I can't imagine a more efficient >>>>> representation that fields. However, fields present difficulties for >>> the >>>>> use case where you aren't generating unique classes for your model but >>> just >>>>> rely on the generic class. In theory this could be addressed via >>> runtime >>>>> code generation or something else, but that would be quite a change. >>>>> >>>>> So I'm looking forward to discussing this and toward the future. >>>>> >>>>> John >>>>> >>> >>> > > > > -- > Best regards, > Nikita Timofeev