Re: Cayenne object storage / memory usage

Andrus Adamchik Wed, 05 Jul 2017 10:22:22 -0700

>  I'm wondering if you
> inadvertently switched old vs new in the performance section?  (Since the
> new, on the right, is always slower.)


The benchmark is million ops per second. So a bigger value is better/faster 
(kind of like RPM in a car).

Andrus

> On Jul 5, 2017, at 7:31 PM, Michael Gentry <blackn...@gmail.com> wrote:
> 
> Hi Nikita,
> 
> I saw the pull request and was taking a glance at it, so thanks for
> following up with an e-mail.
> 
> The memory improvement looks quite nice, but I'm wondering if you
> inadvertently switched old vs new in the performance section?  (Since the
> new, on the right, is always slower.)
> 
> Thanks,
> 
> mrg
> 
> 
> On Wed, Jul 5, 2017 at 10:19 AM, Nikita Timofeev <ntimof...@objectstyle.com>
> wrote:
> 
>> Hi all,
>> 
>> I've run some additional benchmarks for field-based classes inspired
>> by John and they were so promising, that I've moved on
>> to the implementation.
>> 
>> So here is pull request for you to review [1].
>> Here [2] you can see what new generated classes will look like.
>> 
>> For me there is no visible downsides in this solution, e.g. both
>> memory usage and speed are improved.
>> All tests are clean and the only minor incompatibility out there
>> is in HOLLOW state that no longer resets object's values [3]
>> (though this can be implemented as well, I'm just
>> not sure this is really needed).
>> 
>> P.S. here is some raw numbers from my benchmarks.
>> I'm giving absolute numbers, but really only their relation is important.
>> Results for old version are on the left, for new version on the right.
>> 
>> Memory usage:
>> ==============
>> 1. 10.000 small objects
>> (int, Date and String ~ 20 chars)
>>>>> 6Mb vs 2.5Mb <<<
>> 
>> 2. 10.000 objects with big values
>> (int, Date and String ~ 1K chars)
>> Actually in case of same classes (same field number),
>> there will be just constant difference,
>> so this is just to get idea what to expect in different cases.
>>>>> 24.5Mb vs 21Mb <<<
>> 
>> Performance:
>> ==============
>> (numbers are in millions ops per sec, measured with JMH benchmark)
>> 1. Getter:
>>>>> 107 vs 177 <<<
>> 
>> 2. Setter:
>> Not so impressive, as Cayenne stack took most of the
>> time here to process graph diff, but still new methods are better.
>>>>> 12.5 vs 14.5 <<<
>> 
>> 3. readPropertyDirectly:
>>>>> 152 vs 248 <<<
>> 
>> 4. writePropertyDirectly:
>> This is map.put() vs switch(String) battle,
>> and map definitely loosing it :)
>>>>> 126 vs 582 <<<
>> 
>> [1] https://github.com/apache/cayenne/pull/235
>> [2] https://github.com/stariy95/cayenne/blob/
>> 544aae0866e8fb1712f07f00794ea3263a4c95b5/cayenne-server/src/
>> test/java/org/apache/cayenne/testdo/testmap/auto/_Artist.java
>> [3] https://github.com/stariy95/cayenne/blob/
>> 544aae0866e8fb1712f07f00794ea3263a4c95b5/cayenne-server/src/
>> test/java/org/apache/cayenne/access/DataContextExtrasIT.java#L144
>> 
>> On Wed, Jun 21, 2017 at 10:20 PM, John Huss <johnth...@gmail.com> wrote:
>>> I was surprised by the difference in memory too, but this is a small diff
>>> (apart from the newly generated readPropertyDirectly/
>> writePropertyDirectly
>>> methods) so there isn't anything else going on.  My unverified assumption
>>> of HashMap is that is doubles in size each time it resizes, so entities
>>> with more fields could cause more waste. For example a entity with 65
>>> fields would have 63 empty array slots (ignoring fill factor).  So the
>>> exact savings may vary.
>>> 
>>> On Sat, Jun 17, 2017 at 1:01 AM Robert Zeigler <
>> robert.zeig...@roxanemy.com>
>>> wrote:
>>> 
>>>> I’m also a little surprised at the 1/2-ing… what were the values being
>>>> stored? I suppose in theory, many values are relatively “small”,
>>>> memory-wise, so having the overhead of also storing the key could
>> ~double
>>>> the memory use, but if you’re storing large values, I wouldn’t expect
>> the
>>>> utilization to drop as dramatically. What were your data values (type
>> and
>>>> length distribution for strings)?
>>>> 
>>>> Thanks!
>>>> 
>>>> Robert
>>>> 
>>>>> On Jun 10, 2017, at 6:49 AM, Michael Gentry <blackn...@gmail.com>
>> wrote:
>>>>> 
>>>>> Hi John,
>>>>> 
>>>>> I'm a little surprised that map-based storage is over 2x worse in
>> memory
>>>>> consumption.  I'm wondering if there is more going on here than
>> storage
>>>> of
>>>>> the property values.  Would it be simple enough to adapt your test
>> case
>>>> to
>>>>> compare a list of POJOs vs a list of maps and see what the memory
>>>> footprint
>>>>> and difference is that way?
>>>>> 
>>>>> I personally was thinking the big improvement for using fields
>> directly
>>>> is
>>>>> the speed improvement.  I didn't think the memory consumption
>> difference
>>>>> would be that dramatic.
>>>>> 
>>>>> Thanks,
>>>>> 
>>>>> mrg
>>>>> 
>>>>> 
>>>>> On Fri, Jun 9, 2017 at 10:55 AM, John Huss <johnth...@gmail.com>
>> wrote:
>>>>> 
>>>>>> I did some experimenting recently to see if changes to the way data
>> in
>>>>>> stored in Cayenne objects could reduce the amount of memory they
>>>> consume.
>>>>>> 
>>>>>> I chose to use separate fields for each property instead of a HashMap
>>>>>> (which is what CayenneDataObject uses).  The results were very
>>>> affirming.
>>>>>> For my test of loading 10,000 objects from every table in my
>> database I
>>>> got
>>>>>> it to use about about *half the memory* of the default class (from
>> 921
>>>> MB
>>>>>> down to 431 MB).
>>>>>> 
>>>>>> I know there has been some discussion already about addressing this
>>>> topic
>>>>>> for the next major release, so I thought I'd throw in some
>> observations
>>>> /
>>>>>> questions here.
>>>>>> 
>>>>>> For my implementation I subclassed CayenneDataObject because in
>> previous
>>>>>> experience I found implementing a replacement to be much more
>> difficult
>>>> and
>>>>>> subject to more bugs due to the less frequently used code path that
>>>>>> PersistentObject and it's descriptors take you down.  My apps rely on
>>>>>> things that are sort of specific to CayenneDataObject like
>> Validating.
>>>>>> 
>>>>>> So one question is how we should be addressing the need that people
>> may
>>>>>> have to create their own data classes. Right now I believe the
>>>> recommended
>>>>>> path is to subclass PersistentObject, but I'm not convinced that that
>>>> is a
>>>>>> viable solution without wholesale copying most of CayenneDataObject
>> into
>>>>>> your subclass.  I'd rather see a fuller base class (in addition to
>>>> keeping
>>>>>> PersistentObject around) that includes all of CayenneDataObject
>> except
>>>> the
>>>>>> property storage (HashMap).
>>>>>> 
>>>>>> For my implementation I had to modify CayenneDataObject, but only
>>>> slightly
>>>>>> to avoid creating the HashMap which I wasn't using. However, because
>>>> class
>>>>>> isn't really intended for customization this map is referenced in
>>>> multiple
>>>>>> methods that can't easily be overridden to change the way things are
>>>>>> stored.
>>>>>> 
>>>>>> Another approach might be to ask why anyone should need to customize
>> the
>>>>>> way data is stored in the objects if we can just use the best
>> solution
>>>>>> possible in the first place?  I can't imagine a more efficient
>>>>>> representation that fields.  However, fields present difficulties for
>>>> the
>>>>>> use case where you aren't generating unique classes for your model
>> but
>>>> just
>>>>>> rely on the generic class.  In theory this could be addressed via
>>>> runtime
>>>>>> code generation or something else, but that would be quite a change.
>>>>>> 
>>>>>> So I'm looking forward to discussing this and toward the future.
>>>>>> 
>>>>>> John
>>>>>> 
>>>> 
>>>> 
>> 
>> 
>> 
>> --
>> Best regards,
>> Nikita Timofeev
>>

Re: Cayenne object storage / memory usage

Reply via email to