Re: Entity Caching

Jacques Le Roux Sun, 22 Mar 2015 03:44:14 -0700

Sound good to me, it's only a matter of convention

Jacques


Le 22/03/2015 11:31, Adrian Crum a écrit :

I was thinking a maxSize setting of -1 means the cache is disabled. That would 
be an easy change to make.

Having a separate enable/disable property will take a lot of rewriting.

Adrian Crum
Sandglass Software
www.sandglass-software.com

On 3/22/2015 10:21 AM, Jacques Le Roux wrote:

Maybe using 1 in the meantime ? ;)

Jacques

Le 22/03/2015 11:17, Adrian Crum a écrit :

Oops, my bad. A maxSize setting of zero means there is no limit.

I spent some time looking through the UtilCache code, and I don't see
any way to disable a cache. Generally speaking, any setting of zero
means there is no limit for that setting.

It would be nice to have an enabled/disabled setting.

Adrian Crum
Sandglass Software
www.sandglass-software.com

On 3/22/2015 9:38 AM, Adrian Crum wrote:

Interesting. The current cache code ignores the maxSize setting.

Adrian Crum
Sandglass Software
www.sandglass-software.com

On 3/22/2015 7:38 AM, Adrian Crum wrote:

I don't see an enable/disable setting but

default.maxSize=0 in cache.properties

should do it.


Adrian Crum
Sandglass Software
www.sandglass-software.com

On 3/22/2015 3:16 AM, Christian Carlow wrote:

Is there a convenient setting for disabling cache completely as David
mentioned he did?

On Sat, 2015-03-21 at 21:39 -0400, Ron Wheeler wrote:

I agree with Adrian that caching should be a sysadmin choice.

I would also caution that measuring cache performance during
testing is
not a very useful activity. Testing tends to test one use case
once and
move on to the next.
In production, users tend to do the same thing over and over.
Testing might fill a shopping cart a few times and do a lot of other
administrative functions as many times . In real life, shopping carts
are filled much more frequently than catalog updates (one hopes).
Using
performance numbers from functional testing will be misleading.

The other message that I get from David's discussion is that
caching t
built by professional caching experts  (Database developers as he
mentioned) worked better than caching systems built by application
developers.
It is likely that ehcache and the database built-in caching functions
will outperform caching systems built by OFBiz developers and will
handle the main cases better and will handle edge cases properly.
They
will probably integrate better and be easier to configure at
run-time or
during deployment. They will also be easier to tune by the system
administrator.

I understand that Adrian needs to fix this quickly. I suppose that
caching could be eliminated to solve the problem while a better
solution
is implemented.

Do we know what it will take to add enough ehcache to make the system
perform adequately to meet current requirements?

Ron


On 21/03/2015 6:22 AM, Adrian Crum wrote:

I will try to say it again, but differently.

If I am a developer, I am not aware of the subtleties of caching
various entities. Entity cache settings will be determined during
staging. So, I write my code as if everything will be cached -
leaving
the door open for a sysadmin to configure caching during staging.

During staging, a sysadmin can start off with caching disabled, and
then switch on caching for various entities while performance tests
are being run. After some time, the sysadmin will have cache
settings
that provide optimal throughput. Does that mean ALL entities are
cached? No, only the ones that need to be.

The point I'm trying to make is this: The decision to cache or not
should be made by a sysadmin, not by a developer.

Adrian Crum
Sandglass Software
www.sandglass-software.com

On 3/21/2015 10:08 AM, Scott Gray wrote:

My preference is to make ALL Delegator calls use the cache.


Perhaps I misunderstood the above sentence? I responded because I
don't
think caching everything is a good idea

On 21 Mar 2015 20:41, "Adrian Crum"
<adrian.c...@sandglass-software.com>
wrote:


Thanks for the info David! I agree 100% with everything you said.

There may be some misunderstanding about my advice. I suggested
that

caching should be configured in the settings file, I did not
suggest
that
everything should be cached all the time.


Like you said, JMeter tests can reveal what needs to be cached,
and a

sysadmin can fine-tune performance by tweaking the cache settings.
The
problem I mentioned is this: A sysadmin can't improve
performance by
caching a particular entity if a developer has hard-coded it not
to be
cached.


Btw, I removed the complicated condition checking in the condition
cache

because it didn't work. Not only was the system spending a lot of
time
evaluating long lists of values (each value having a potentially
long
list
of conditions), at the end of the evaluation the result was
always a
cache
miss.




Adrian Crum
Sandglass Software
www.sandglass-software.com

On 3/20/2015 9:22 PM, David E. Jones wrote:



Stepping back a little, some history and theory of the entity
cache

might be helpful.


The original intent of the entity cache was a simple way to keep

frequently used values/records closer to the code that uses
them, ie
in the
application server. One real world example of this is the goal
to be
able
to render ecommerce catalog and product pages without hitting the
database.


Over time the entity caching was made more complex to handle more

caching scenarios, but still left to the developer to determine if
caching
is appropriate for the code they are writing.


In theory is it possible to write an entity cache that can be
used
100%

of the time? IMO the answer is NO. This is almost possible for
single
record caching, with the cache ultimately becoming an in-memory
relational
database running on the app server (with full transaction support,
etc)...
but for List caching it totally kills the whole concept. The
current
entity
cache keeps lists of results by the query condition used to get
those
results and this is very different from what a database does, and
makes
things rather messy and inefficient outside simple use cases.


On top of these big functional issues (which are deal killers
IMO),

there is also the performance issue. The point, or intent at least,
of the
entity cache is to improve performance. As the cache gets more
complex the
performance will suffer, and because of the whole concept of
caching
results by queries the performance will be WORSE than the DB
performance
for the same queries in most cases. Databases are quite fast and
efficient,
and we'll never be able to reproduce their ability to scale and
search in
something like an in-memory entity cache, especially not
considering the
massive redundancy and overhead of caching lists of values by
condition.


As an example of this in the real world: on a large OFBiz
project I

worked on that finished last year we went into production with the
entity
cache turned OFF, completely DISABLED. Why? When doing load testing
on a
whim one of the guys decided to try it without the entity cache
enabled,
and the body of JMeter tests that exercised a few dozen of the most
common
user paths through the system actually ran FASTER. The database
(MySQL in
this case) was hit over the network, but responded quickly
enough to
make
things work quite well for the various find queries, and FAR faster
for
updates, especially creates. This project was one of the higher
volume
projects I'm aware of for OFBiz, at peaks handling sustained
processing of
around 10 orders per second (36,000 per hour), with some short term
peaks
much higher, closer to 20-30 orders per second... and longer term
peaks
hitting over 200k orders in one day (north America only day time,
around a
12 hour window).


I found this to be curious so looked into it a bit more and the
main

performance culprit was updates, ESPECIALLY creates on any entity
that has
an active list cache. Auto-clearing that cache requires running the
condition for each cache entry on the record to see if it matches,
and if
it does then it is cleared. This could be made more efficient by
expanding
the reverse index concept to index all values of fields in
conditions...
though that would be fairly complex to implement because of the
wide
variety of conditions that CAN be performed on fields, and even
moreso when
they are combined with other logic... especially NOTs and ORs. This
could
potentially increase performance, but would again add yet more
complexity
and overhead.


To turn this dilemma into a nightmare, consider caching
view-entities.

In general as systems scale if you ever have to iterate over stuff
your
performance is going to get hit REALLY hard compared to indexed and
other
less than n operations.


The main lesson from the story: caching, especially list caching,
should

ONLY be done in limited cases when the ratio of reads to write is
VERY
high, and more particularly the ratio of reads to creates. When
considering
whether to use a cache this should be considered carefully, because
records
are sometimes updated from places that developers are unaware,
sometimes at
surprising volumes. For example, it might seem great (and help a
lot
in dev
and lower scale testing) to cache inventory information for viewing
on a
category screen, but always go to the DB to avoid stale data on a
product
detail screen and when adding to cart. The problem is that with
high
order
volumes the inventory data is pretty much constantly being updated,
so the
caches are constantly... SLOWLY... being cleared as InventoryDetail
records
are created for reservations and issuances.


To turn this nightmare into a deal killer, consider multiple
application

servers and the need for either a (SLOW) distributed cache or
(SLOW)
distributed cache clearing. These have to go over the network
anyway, so
might as well go to the database!


In the case above where we decided to NOT use the entity cache at
all

the tests were run on one really beefy server showing that
disabling the
cache was faster. When we ran it in a cluster of just 2 servers
with
direct
DCC (the best case scenario for a distributed cache) we not only
saw
a big
performance hit, but also got various run-time errors from stale
data.


I really don't how anyone could back the concept of caching all
finds by

default... you don't even have to imagine edge cases, just consider
the
problems ALREADY being faced with more limited caching and how
often the
entity cache simply isn't a good solution.


As for improving the entity caching in OFBiz, there are some
concepts in

Moqui that might be useful:


1. add a cache attribute to the entity definition with true,
false,
and

never options; true and false being defaults that can be
overridden by
code, and never being an absolute (OFBiz does have this option
IIRC);
this
would default to false, true being a useful setting for common
things
like
Enumeration, StatusItem, etc, etc


2. add general support in the entity engine find methods for a
"for

update" parameter, and if true don't cache (and pass this on to the
DB to
lock the record(s) being queried), also making the value mutable


3. a write-through per-transaction cache; you can do some really
cool

stuff with this, avoiding most database hits during a transaction
until the
end when the changes are dumped to the DB; the Moqui implementation
of this
concept even looks for cached records that any find condition would
require
to get results and does the query in-memory, not having to go to
the
database at all... and for other queries augments the results with
values
in the cache


The whole concept of a write-through cache that is limited to the
scope

of a single transaction shows some of the issues you would run into
even if
trying to make the entity cache transactional. Especially with more
complex
finds it just falls apart. The current Moqui implementation handles
quite a
bit, but there are various things that I've run into testing it
with
real-world business services that are either a REAL pain to handle
(so I
haven't yet, but it is conceptually possible) or that I simply
can't
think
of any good way to handle... and for those you simply can't use the
write-through cache.


There are some notes in the code for this, and some
code/comments to

more thoroughly communicate this concept, in this class in Moqui:

https://github.com/moqui/moqui/blob/master/framework/src/main/groovy/org/moqui/impl/context/TransactionCache.groovy


I should also say that my motivation to handle every edge case
even
for

this write-through cache is limited... yes there is room for
improvement
handling more scenarios, but how big will the performance increase
ACTUALLY
be for them? The efforts on this so far have been based on
profiling
results and making sure there is a significant difference (which
there is
for many services in Mantle Business Artifacts, though I haven't
even
come
close to testing all of them this way).


The same concept would apply to a read-only entity cache... some
things

might be possible to support, but would NOT improve performance
making them
a moot point.


I don't know if I've written enough to convince everyone
listening
that

even attempting a universal read-only entity cache is a useless
idea... I'm
sure some will still like the idea. If anyone gets into it and
wants
to try
it out in their own branch of OFBiz, great... knock yourself out
(probably
literally...). But PLEASE no one ever commit something like this to
the
primary branch in the repo... not EVER.


The whole idea that the OFBiz entity cache has had more limited
ability

to handle different scenarios in the past than it does now is
not an
argument of any sort supporting the idea of taking the entity cache
to the
ultimate possible end... which theoretically isn't even that far
from
where
it is now.


To apply a more useful standard the arguments should be for a
_useful_

objective, which means increasing performance. I guarantee an
always
used
find cache will NOT increase performance, it will kill it dead and
cause
infinite concurrency headaches in the process.


-David

On 19 Mar 2015, at 10:46, Adrian Crum <

adrian.c...@sandglass-software.com> wrote:


The translation to English is not good, but I think I understand
what

you are saying.


The entity values in the cache MUST be immutable - because
multiple

threads share the values. To do otherwise would require complicated
synchronization code in GenericValue (which would cause blocking
and
hurt
performance).


When I first starting working on the entity cache issues, it
appeared

to me that mutable entity values may have been in the original
design
(to
enable a write-through cache). That is my guess - I am not sure. At
some
time, the entity values in the cache were made immutable, but the
change
was incomplete - some cached entity values were immutable and
others
were
not. That is one of the things I fixed - I made sure ALL entity
values
coming from the cache are immutable.


One way we can eliminate the additional complication of cloning

immutable entity values is to wrap the List in a custom Iterator
implementation that automatically clones elements as they are
retrieved
from the List. The drawback is the performance hit - because you
would be
cloning values that might not get modified. I think it is more
efficient to
clone an entity value only when you intend to modify it.


Adrian Crum
Sandglass Software
www.sandglass-software.com

On 3/19/2015 4:19 PM, Nicolas Malin wrote:


Le 18/03/2015 13:16, Adrian Crum a écrit :


If you code Delegator calls to avoid the cache, then there
is no
way
for a sysadmin to configure the caching behavior - that bit of
code
will ALWAYS make a database call.

If you make all Delegator calls use the cache, then there
is an
additional complication that will add a bit more code: the
GenericValue instances retrieved from the cache are
immutable -
if you
want to modify them, then you will have to clone them. So,
this
approach can produce an additional line of code.



I don't see any logical reason why we need to keep a
GenericValue
came
from cache as immutable. In large vision, a developper give
information
on cache or not only he want force the cache using during his
process.
As OFBiz manage by default transaction, timezone, locale,
auto-matching
or others.
The entity engine would be works with admin sys cache tuning.

As example delegator.find("Party", "partyId", partyId) use the
default
parameter from cache.properties and after the store on a cached
GenericValue is a delegator's problem. I see a simple test like
that :
if (genericValue came from cache) {
      if (value is already done) {
         getFromDataBase
         update Value
      }
      else refuse (or not I have a doubt :) )
}
store


Nicolas

Re: Entity Caching

Reply via email to