Kudos on the improvements, and to the original developers (Andrus, et al) for a fantastic design. These days, I’ve been doing a lot more Python coding than Java and I use SQLAlchemy pretty extensively. It’s nice… but I still miss Cayenne’s simplicity/ease of use (SQLAlchemy uses a transaction model more akin to Hibernate, though not as egregious).
Best, Robert > On Jul 6, 2017, at 7:27 AM, Andrus Adamchik <and...@objectstyle.org> wrote: > > The fact that we can switch to field-based DataObjects with minimal effort > and without sacrificing a single thing in the Cayenne design is a *very* big > deal! Thanks John for bringing the possibility to everyone's attention, and > Nikita - for the working code and benchmarks. > > I am going to try this out on a real app some time next week. Very exciting! > :) > > Andrus > > >> On Jul 5, 2017, at 5:19 PM, Nikita Timofeev <ntimof...@objectstyle.com> >> wrote: >> >> Hi all, >> >> I've run some additional benchmarks for field-based classes inspired >> by John and they were so promising, that I've moved on >> to the implementation. >> >> So here is pull request for you to review [1]. >> Here [2] you can see what new generated classes will look like. >> >> For me there is no visible downsides in this solution, e.g. both >> memory usage and speed are improved. >> All tests are clean and the only minor incompatibility out there >> is in HOLLOW state that no longer resets object's values [3] >> (though this can be implemented as well, I'm just >> not sure this is really needed). >> >> P.S. here is some raw numbers from my benchmarks. >> I'm giving absolute numbers, but really only their relation is important. >> Results for old version are on the left, for new version on the right. >> >> Memory usage: >> ============== >> 1. 10.000 small objects >> (int, Date and String ~ 20 chars) >>>>> 6Mb vs 2.5Mb <<< >> >> 2. 10.000 objects with big values >> (int, Date and String ~ 1K chars) >> Actually in case of same classes (same field number), >> there will be just constant difference, >> so this is just to get idea what to expect in different cases. >>>>> 24.5Mb vs 21Mb <<< >> >> Performance: >> ============== >> (numbers are in millions ops per sec, measured with JMH benchmark) >> 1. Getter: >>>>> 107 vs 177 <<< >> >> 2. Setter: >> Not so impressive, as Cayenne stack took most of the >> time here to process graph diff, but still new methods are better. >>>>> 12.5 vs 14.5 <<< >> >> 3. readPropertyDirectly: >>>>> 152 vs 248 <<< >> >> 4. writePropertyDirectly: >> This is map.put() vs switch(String) battle, >> and map definitely loosing it :) >>>>> 126 vs 582 <<< >> >> [1] https://github.com/apache/cayenne/pull/235 >> [2] >> https://github.com/stariy95/cayenne/blob/544aae0866e8fb1712f07f00794ea3263a4c95b5/cayenne-server/src/test/java/org/apache/cayenne/testdo/testmap/auto/_Artist.java >> [3] >> https://github.com/stariy95/cayenne/blob/544aae0866e8fb1712f07f00794ea3263a4c95b5/cayenne-server/src/test/java/org/apache/cayenne/access/DataContextExtrasIT.java#L144 >> >> On Wed, Jun 21, 2017 at 10:20 PM, John Huss <johnth...@gmail.com> wrote: >>> I was surprised by the difference in memory too, but this is a small diff >>> (apart from the newly generated readPropertyDirectly/writePropertyDirectly >>> methods) so there isn't anything else going on. My unverified assumption >>> of HashMap is that is doubles in size each time it resizes, so entities >>> with more fields could cause more waste. For example a entity with 65 >>> fields would have 63 empty array slots (ignoring fill factor). So the >>> exact savings may vary. >>> >>> On Sat, Jun 17, 2017 at 1:01 AM Robert Zeigler <robert.zeig...@roxanemy.com> >>> wrote: >>> >>>> I’m also a little surprised at the 1/2-ing… what were the values being >>>> stored? I suppose in theory, many values are relatively “small”, >>>> memory-wise, so having the overhead of also storing the key could ~double >>>> the memory use, but if you’re storing large values, I wouldn’t expect the >>>> utilization to drop as dramatically. What were your data values (type and >>>> length distribution for strings)? >>>> >>>> Thanks! >>>> >>>> Robert >>>> >>>>> On Jun 10, 2017, at 6:49 AM, Michael Gentry <blackn...@gmail.com> wrote: >>>>> >>>>> Hi John, >>>>> >>>>> I'm a little surprised that map-based storage is over 2x worse in memory >>>>> consumption. I'm wondering if there is more going on here than storage >>>> of >>>>> the property values. Would it be simple enough to adapt your test case >>>> to >>>>> compare a list of POJOs vs a list of maps and see what the memory >>>> footprint >>>>> and difference is that way? >>>>> >>>>> I personally was thinking the big improvement for using fields directly >>>> is >>>>> the speed improvement. I didn't think the memory consumption difference >>>>> would be that dramatic. >>>>> >>>>> Thanks, >>>>> >>>>> mrg >>>>> >>>>> >>>>> On Fri, Jun 9, 2017 at 10:55 AM, John Huss <johnth...@gmail.com> wrote: >>>>> >>>>>> I did some experimenting recently to see if changes to the way data in >>>>>> stored in Cayenne objects could reduce the amount of memory they >>>> consume. >>>>>> >>>>>> I chose to use separate fields for each property instead of a HashMap >>>>>> (which is what CayenneDataObject uses). The results were very >>>> affirming. >>>>>> For my test of loading 10,000 objects from every table in my database I >>>> got >>>>>> it to use about about *half the memory* of the default class (from 921 >>>> MB >>>>>> down to 431 MB). >>>>>> >>>>>> I know there has been some discussion already about addressing this >>>> topic >>>>>> for the next major release, so I thought I'd throw in some observations >>>> / >>>>>> questions here. >>>>>> >>>>>> For my implementation I subclassed CayenneDataObject because in previous >>>>>> experience I found implementing a replacement to be much more difficult >>>> and >>>>>> subject to more bugs due to the less frequently used code path that >>>>>> PersistentObject and it's descriptors take you down. My apps rely on >>>>>> things that are sort of specific to CayenneDataObject like Validating. >>>>>> >>>>>> So one question is how we should be addressing the need that people may >>>>>> have to create their own data classes. Right now I believe the >>>> recommended >>>>>> path is to subclass PersistentObject, but I'm not convinced that that >>>> is a >>>>>> viable solution without wholesale copying most of CayenneDataObject into >>>>>> your subclass. I'd rather see a fuller base class (in addition to >>>> keeping >>>>>> PersistentObject around) that includes all of CayenneDataObject except >>>> the >>>>>> property storage (HashMap). >>>>>> >>>>>> For my implementation I had to modify CayenneDataObject, but only >>>> slightly >>>>>> to avoid creating the HashMap which I wasn't using. However, because >>>> class >>>>>> isn't really intended for customization this map is referenced in >>>> multiple >>>>>> methods that can't easily be overridden to change the way things are >>>>>> stored. >>>>>> >>>>>> Another approach might be to ask why anyone should need to customize the >>>>>> way data is stored in the objects if we can just use the best solution >>>>>> possible in the first place? I can't imagine a more efficient >>>>>> representation that fields. However, fields present difficulties for >>>> the >>>>>> use case where you aren't generating unique classes for your model but >>>> just >>>>>> rely on the generic class. In theory this could be addressed via >>>> runtime >>>>>> code generation or something else, but that would be quite a change. >>>>>> >>>>>> So I'm looking forward to discussing this and toward the future. >>>>>> >>>>>> John >>>>>> >>>> >>>> >> >> >> >> -- >> Best regards, >> Nikita Timofeev >