Re: Cayenne object storage / memory usage

Andrus Adamchik Thu, 06 Jul 2017 05:27:24 -0700

The fact that we can switch to field-based DataObjects with minimal effort and 
without sacrificing a single thing in the Cayenne design is a *very* big deal! 
Thanks John for bringing the possibility to everyone's attention, and Nikita - 
for the working code and benchmarks.


I am going to try this out on a real app some time next week. Very exciting! :)

Andrus


> On Jul 5, 2017, at 5:19 PM, Nikita Timofeev <[email protected]> wrote:
> 
> Hi all,
> 
> I've run some additional benchmarks for field-based classes inspired
> by John and they were so promising, that I've moved on
> to the implementation.
> 
> So here is pull request for you to review [1].
> Here [2] you can see what new generated classes will look like.
> 
> For me there is no visible downsides in this solution, e.g. both
> memory usage and speed are improved.
> All tests are clean and the only minor incompatibility out there
> is in HOLLOW state that no longer resets object's values [3]
> (though this can be implemented as well, I'm just
> not sure this is really needed).
> 
> P.S. here is some raw numbers from my benchmarks.
> I'm giving absolute numbers, but really only their relation is important.
> Results for old version are on the left, for new version on the right.
> 
> Memory usage:
> ==============
> 1. 10.000 small objects
> (int, Date and String ~ 20 chars)
>>>> 6Mb vs 2.5Mb <<<
> 
> 2. 10.000 objects with big values
> (int, Date and String ~ 1K chars)
> Actually in case of same classes (same field number),
> there will be just constant difference,
> so this is just to get idea what to expect in different cases.
>>>> 24.5Mb vs 21Mb <<<
> 
> Performance:
> ==============
> (numbers are in millions ops per sec, measured with JMH benchmark)
> 1. Getter:
>>>> 107 vs 177 <<<
> 
> 2. Setter:
> Not so impressive, as Cayenne stack took most of the
> time here to process graph diff, but still new methods are better.
>>>> 12.5 vs 14.5 <<<
> 
> 3. readPropertyDirectly:
>>>> 152 vs 248 <<<
> 
> 4. writePropertyDirectly:
> This is map.put() vs switch(String) battle,
> and map definitely loosing it :)
>>>> 126 vs 582 <<<
> 
> [1] https://github.com/apache/cayenne/pull/235
> [2] 
> https://github.com/stariy95/cayenne/blob/544aae0866e8fb1712f07f00794ea3263a4c95b5/cayenne-server/src/test/java/org/apache/cayenne/testdo/testmap/auto/_Artist.java
> [3] 
> https://github.com/stariy95/cayenne/blob/544aae0866e8fb1712f07f00794ea3263a4c95b5/cayenne-server/src/test/java/org/apache/cayenne/access/DataContextExtrasIT.java#L144
> 
> On Wed, Jun 21, 2017 at 10:20 PM, John Huss <[email protected]> wrote:
>> I was surprised by the difference in memory too, but this is a small diff
>> (apart from the newly generated readPropertyDirectly/writePropertyDirectly
>> methods) so there isn't anything else going on.  My unverified assumption
>> of HashMap is that is doubles in size each time it resizes, so entities
>> with more fields could cause more waste. For example a entity with 65
>> fields would have 63 empty array slots (ignoring fill factor).  So the
>> exact savings may vary.
>> 
>> On Sat, Jun 17, 2017 at 1:01 AM Robert Zeigler <[email protected]>
>> wrote:
>> 
>>> I’m also a little surprised at the 1/2-ing… what were the values being
>>> stored? I suppose in theory, many values are relatively “small”,
>>> memory-wise, so having the overhead of also storing the key could ~double
>>> the memory use, but if you’re storing large values, I wouldn’t expect the
>>> utilization to drop as dramatically. What were your data values (type and
>>> length distribution for strings)?
>>> 
>>> Thanks!
>>> 
>>> Robert
>>> 
>>>> On Jun 10, 2017, at 6:49 AM, Michael Gentry <[email protected]> wrote:
>>>> 
>>>> Hi John,
>>>> 
>>>> I'm a little surprised that map-based storage is over 2x worse in memory
>>>> consumption.  I'm wondering if there is more going on here than storage
>>> of
>>>> the property values.  Would it be simple enough to adapt your test case
>>> to
>>>> compare a list of POJOs vs a list of maps and see what the memory
>>> footprint
>>>> and difference is that way?
>>>> 
>>>> I personally was thinking the big improvement for using fields directly
>>> is
>>>> the speed improvement.  I didn't think the memory consumption difference
>>>> would be that dramatic.
>>>> 
>>>> Thanks,
>>>> 
>>>> mrg
>>>> 
>>>> 
>>>> On Fri, Jun 9, 2017 at 10:55 AM, John Huss <[email protected]> wrote:
>>>> 
>>>>> I did some experimenting recently to see if changes to the way data in
>>>>> stored in Cayenne objects could reduce the amount of memory they
>>> consume.
>>>>> 
>>>>> I chose to use separate fields for each property instead of a HashMap
>>>>> (which is what CayenneDataObject uses).  The results were very
>>> affirming.
>>>>> For my test of loading 10,000 objects from every table in my database I
>>> got
>>>>> it to use about about *half the memory* of the default class (from 921
>>> MB
>>>>> down to 431 MB).
>>>>> 
>>>>> I know there has been some discussion already about addressing this
>>> topic
>>>>> for the next major release, so I thought I'd throw in some observations
>>> /
>>>>> questions here.
>>>>> 
>>>>> For my implementation I subclassed CayenneDataObject because in previous
>>>>> experience I found implementing a replacement to be much more difficult
>>> and
>>>>> subject to more bugs due to the less frequently used code path that
>>>>> PersistentObject and it's descriptors take you down.  My apps rely on
>>>>> things that are sort of specific to CayenneDataObject like Validating.
>>>>> 
>>>>> So one question is how we should be addressing the need that people may
>>>>> have to create their own data classes. Right now I believe the
>>> recommended
>>>>> path is to subclass PersistentObject, but I'm not convinced that that
>>> is a
>>>>> viable solution without wholesale copying most of CayenneDataObject into
>>>>> your subclass.  I'd rather see a fuller base class (in addition to
>>> keeping
>>>>> PersistentObject around) that includes all of CayenneDataObject except
>>> the
>>>>> property storage (HashMap).
>>>>> 
>>>>> For my implementation I had to modify CayenneDataObject, but only
>>> slightly
>>>>> to avoid creating the HashMap which I wasn't using. However, because
>>> class
>>>>> isn't really intended for customization this map is referenced in
>>> multiple
>>>>> methods that can't easily be overridden to change the way things are
>>>>> stored.
>>>>> 
>>>>> Another approach might be to ask why anyone should need to customize the
>>>>> way data is stored in the objects if we can just use the best solution
>>>>> possible in the first place?  I can't imagine a more efficient
>>>>> representation that fields.  However, fields present difficulties for
>>> the
>>>>> use case where you aren't generating unique classes for your model but
>>> just
>>>>> rely on the generic class.  In theory this could be addressed via
>>> runtime
>>>>> code generation or something else, but that would be quite a change.
>>>>> 
>>>>> So I'm looking forward to discussing this and toward the future.
>>>>> 
>>>>> John
>>>>> 
>>> 
>>> 
> 
> 
> 
> -- 
> Best regards,
> Nikita Timofeev

Re: Cayenne object storage / memory usage

Reply via email to