Re: Entity Caching

Christian Carlow Sat, 21 Mar 2015 20:17:52 -0700

Is there a convenient setting for disabling cache completely as David
mentioned he did?


On Sat, 2015-03-21 at 21:39 -0400, Ron Wheeler wrote:
> I agree with Adrian that caching should be a sysadmin choice.
> 
> I would also caution that measuring cache performance during testing is 
> not a very useful activity. Testing tends to test one use case once and 
> move on to the next.
> In production, users tend to do the same thing over and over.
> Testing might fill a shopping cart a few times and do a lot of other 
> administrative functions as many times . In real life, shopping carts 
> are filled much more frequently than catalog updates (one hopes). Using 
> performance numbers from functional testing will be misleading.
> 
> The other message that I get from David's discussion is that caching t
> built by professional caching experts  (Database developers as he 
> mentioned) worked better than caching systems built by application 
> developers.
> It is likely that ehcache and the database built-in caching functions 
> will outperform caching systems built by OFBiz developers and will 
> handle the main cases better and will handle edge cases properly. They 
> will probably integrate better and be easier to configure at run-time or 
> during deployment. They will also be easier to tune by the system 
> administrator.
> 
> I understand that Adrian needs to fix this quickly. I suppose that 
> caching could be eliminated to solve the problem while a better solution 
> is implemented.
> 
> Do we know what it will take to add enough ehcache to make the system 
> perform adequately to meet current requirements?
> 
> Ron
> 
> 
> On 21/03/2015 6:22 AM, Adrian Crum wrote:
> > I will try to say it again, but differently.
> >
> > If I am a developer, I am not aware of the subtleties of caching 
> > various entities. Entity cache settings will be determined during 
> > staging. So, I write my code as if everything will be cached - leaving 
> > the door open for a sysadmin to configure caching during staging.
> >
> > During staging, a sysadmin can start off with caching disabled, and 
> > then switch on caching for various entities while performance tests 
> > are being run. After some time, the sysadmin will have cache settings 
> > that provide optimal throughput. Does that mean ALL entities are 
> > cached? No, only the ones that need to be.
> >
> > The point I'm trying to make is this: The decision to cache or not 
> > should be made by a sysadmin, not by a developer.
> >
> > Adrian Crum
> > Sandglass Software
> > www.sandglass-software.com
> >
> > On 3/21/2015 10:08 AM, Scott Gray wrote:
> >>> My preference is to make ALL Delegator calls use the cache.
> >>
> >> Perhaps I misunderstood the above sentence? I responded because I don't
> >> think caching everything is a good idea
> >>
> >> On 21 Mar 2015 20:41, "Adrian Crum" <adrian.c...@sandglass-software.com>
> >> wrote:
> >>>
> >>> Thanks for the info David! I agree 100% with everything you said.
> >>>
> >>> There may be some misunderstanding about my advice. I suggested that
> >> caching should be configured in the settings file, I did not suggest 
> >> that
> >> everything should be cached all the time.
> >>>
> >>> Like you said, JMeter tests can reveal what needs to be cached, and a
> >> sysadmin can fine-tune performance by tweaking the cache settings. The
> >> problem I mentioned is this: A sysadmin can't improve performance by
> >> caching a particular entity if a developer has hard-coded it not to be
> >> cached.
> >>>
> >>> Btw, I removed the complicated condition checking in the condition 
> >>> cache
> >> because it didn't work. Not only was the system spending a lot of time
> >> evaluating long lists of values (each value having a potentially long 
> >> list
> >> of conditions), at the end of the evaluation the result was always a 
> >> cache
> >> miss.
> >>>
> >>>
> >>>
> >>> Adrian Crum
> >>> Sandglass Software
> >>> www.sandglass-software.com
> >>>
> >>> On 3/20/2015 9:22 PM, David E. Jones wrote:
> >>>>
> >>>>
> >>>> Stepping back a little, some history and theory of the entity cache
> >> might be helpful.
> >>>>
> >>>> The original intent of the entity cache was a simple way to keep
> >> frequently used values/records closer to the code that uses them, ie 
> >> in the
> >> application server. One real world example of this is the goal to be 
> >> able
> >> to render ecommerce catalog and product pages without hitting the 
> >> database.
> >>>>
> >>>> Over time the entity caching was made more complex to handle more
> >> caching scenarios, but still left to the developer to determine if 
> >> caching
> >> is appropriate for the code they are writing.
> >>>>
> >>>> In theory is it possible to write an entity cache that can be used 
> >>>> 100%
> >> of the time? IMO the answer is NO. This is almost possible for single
> >> record caching, with the cache ultimately becoming an in-memory 
> >> relational
> >> database running on the app server (with full transaction support, 
> >> etc)...
> >> but for List caching it totally kills the whole concept. The current 
> >> entity
> >> cache keeps lists of results by the query condition used to get those
> >> results and this is very different from what a database does, and makes
> >> things rather messy and inefficient outside simple use cases.
> >>>>
> >>>> On top of these big functional issues (which are deal killers IMO),
> >> there is also the performance issue. The point, or intent at least, 
> >> of the
> >> entity cache is to improve performance. As the cache gets more 
> >> complex the
> >> performance will suffer, and because of the whole concept of caching
> >> results by queries the performance will be WORSE than the DB performance
> >> for the same queries in most cases. Databases are quite fast and 
> >> efficient,
> >> and we'll never be able to reproduce their ability to scale and 
> >> search in
> >> something like an in-memory entity cache, especially not considering the
> >> massive redundancy and overhead of caching lists of values by condition.
> >>>>
> >>>> As an example of this in the real world: on a large OFBiz project I
> >> worked on that finished last year we went into production with the 
> >> entity
> >> cache turned OFF, completely DISABLED. Why? When doing load testing on a
> >> whim one of the guys decided to try it without the entity cache enabled,
> >> and the body of JMeter tests that exercised a few dozen of the most 
> >> common
> >> user paths through the system actually ran FASTER. The database 
> >> (MySQL in
> >> this case) was hit over the network, but responded quickly enough to 
> >> make
> >> things work quite well for the various find queries, and FAR faster for
> >> updates, especially creates. This project was one of the higher volume
> >> projects I'm aware of for OFBiz, at peaks handling sustained 
> >> processing of
> >> around 10 orders per second (36,000 per hour), with some short term 
> >> peaks
> >> much higher, closer to 20-30 orders per second... and longer term peaks
> >> hitting over 200k orders in one day (north America only day time, 
> >> around a
> >> 12 hour window).
> >>>>
> >>>> I found this to be curious so looked into it a bit more and the main
> >> performance culprit was updates, ESPECIALLY creates on any entity 
> >> that has
> >> an active list cache. Auto-clearing that cache requires running the
> >> condition for each cache entry on the record to see if it matches, 
> >> and if
> >> it does then it is cleared. This could be made more efficient by 
> >> expanding
> >> the reverse index concept to index all values of fields in conditions...
> >> though that would be fairly complex to implement because of the wide
> >> variety of conditions that CAN be performed on fields, and even 
> >> moreso when
> >> they are combined with other logic... especially NOTs and ORs. This 
> >> could
> >> potentially increase performance, but would again add yet more 
> >> complexity
> >> and overhead.
> >>>>
> >>>> To turn this dilemma into a nightmare, consider caching view-entities.
> >> In general as systems scale if you ever have to iterate over stuff your
> >> performance is going to get hit REALLY hard compared to indexed and 
> >> other
> >> less than n operations.
> >>>>
> >>>> The main lesson from the story: caching, especially list caching, 
> >>>> should
> >> ONLY be done in limited cases when the ratio of reads to write is VERY
> >> high, and more particularly the ratio of reads to creates. When 
> >> considering
> >> whether to use a cache this should be considered carefully, because 
> >> records
> >> are sometimes updated from places that developers are unaware, 
> >> sometimes at
> >> surprising volumes. For example, it might seem great (and help a lot 
> >> in dev
> >> and lower scale testing) to cache inventory information for viewing on a
> >> category screen, but always go to the DB to avoid stale data on a 
> >> product
> >> detail screen and when adding to cart. The problem is that with high 
> >> order
> >> volumes the inventory data is pretty much constantly being updated, 
> >> so the
> >> caches are constantly... SLOWLY... being cleared as InventoryDetail 
> >> records
> >> are created for reservations and issuances.
> >>>>
> >>>> To turn this nightmare into a deal killer, consider multiple 
> >>>> application
> >> servers and the need for either a (SLOW) distributed cache or (SLOW)
> >> distributed cache clearing. These have to go over the network anyway, so
> >> might as well go to the database!
> >>>>
> >>>> In the case above where we decided to NOT use the entity cache at all
> >> the tests were run on one really beefy server showing that disabling the
> >> cache was faster. When we ran it in a cluster of just 2 servers with 
> >> direct
> >> DCC (the best case scenario for a distributed cache) we not only saw 
> >> a big
> >> performance hit, but also got various run-time errors from stale data.
> >>>>
> >>>> I really don't how anyone could back the concept of caching all 
> >>>> finds by
> >> default... you don't even have to imagine edge cases, just consider the
> >> problems ALREADY being faced with more limited caching and how often the
> >> entity cache simply isn't a good solution.
> >>>>
> >>>> As for improving the entity caching in OFBiz, there are some 
> >>>> concepts in
> >> Moqui that might be useful:
> >>>>
> >>>> 1. add a cache attribute to the entity definition with true, false, 
> >>>> and
> >> never options; true and false being defaults that can be overridden by
> >> code, and never being an absolute (OFBiz does have this option IIRC); 
> >> this
> >> would default to false, true being a useful setting for common things 
> >> like
> >> Enumeration, StatusItem, etc, etc
> >>>>
> >>>> 2. add general support in the entity engine find methods for a "for
> >> update" parameter, and if true don't cache (and pass this on to the 
> >> DB to
> >> lock the record(s) being queried), also making the value mutable
> >>>>
> >>>> 3. a write-through per-transaction cache; you can do some really cool
> >> stuff with this, avoiding most database hits during a transaction 
> >> until the
> >> end when the changes are dumped to the DB; the Moqui implementation 
> >> of this
> >> concept even looks for cached records that any find condition would 
> >> require
> >> to get results and does the query in-memory, not having to go to the
> >> database at all... and for other queries augments the results with 
> >> values
> >> in the cache
> >>>>
> >>>> The whole concept of a write-through cache that is limited to the 
> >>>> scope
> >> of a single transaction shows some of the issues you would run into 
> >> even if
> >> trying to make the entity cache transactional. Especially with more 
> >> complex
> >> finds it just falls apart. The current Moqui implementation handles 
> >> quite a
> >> bit, but there are various things that I've run into testing it with
> >> real-world business services that are either a REAL pain to handle (so I
> >> haven't yet, but it is conceptually possible) or that I simply can't 
> >> think
> >> of any good way to handle... and for those you simply can't use the
> >> write-through cache.
> >>>>
> >>>> There are some notes in the code for this, and some code/comments to
> >> more thoroughly communicate this concept, in this class in Moqui:
> >>>>
> >>>>
> >> https://github.com/moqui/moqui/blob/master/framework/src/main/groovy/org/moqui/impl/context/TransactionCache.groovy
> >>  
> >>
> >>>>
> >>>> I should also say that my motivation to handle every edge case even 
> >>>> for
> >> this write-through cache is limited... yes there is room for improvement
> >> handling more scenarios, but how big will the performance increase 
> >> ACTUALLY
> >> be for them? The efforts on this so far have been based on profiling
> >> results and making sure there is a significant difference (which 
> >> there is
> >> for many services in Mantle Business Artifacts, though I haven't even 
> >> come
> >> close to testing all of them this way).
> >>>>
> >>>> The same concept would apply to a read-only entity cache... some 
> >>>> things
> >> might be possible to support, but would NOT improve performance 
> >> making them
> >> a moot point.
> >>>>
> >>>> I don't know if I've written enough to convince everyone listening 
> >>>> that
> >> even attempting a universal read-only entity cache is a useless 
> >> idea... I'm
> >> sure some will still like the idea. If anyone gets into it and wants 
> >> to try
> >> it out in their own branch of OFBiz, great... knock yourself out 
> >> (probably
> >> literally...). But PLEASE no one ever commit something like this to the
> >> primary branch in the repo... not EVER.
> >>>>
> >>>> The whole idea that the OFBiz entity cache has had more limited 
> >>>> ability
> >> to handle different scenarios in the past than it does now is not an
> >> argument of any sort supporting the idea of taking the entity cache 
> >> to the
> >> ultimate possible end... which theoretically isn't even that far from 
> >> where
> >> it is now.
> >>>>
> >>>> To apply a more useful standard the arguments should be for a _useful_
> >> objective, which means increasing performance. I guarantee an always 
> >> used
> >> find cache will NOT increase performance, it will kill it dead and cause
> >> infinite concurrency headaches in the process.
> >>>>
> >>>> -David
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>> On 19 Mar 2015, at 10:46, Adrian Crum <
> >> adrian.c...@sandglass-software.com> wrote:
> >>>>>
> >>>>> The translation to English is not good, but I think I understand what
> >> you are saying.
> >>>>>
> >>>>> The entity values in the cache MUST be immutable - because multiple
> >> threads share the values. To do otherwise would require complicated
> >> synchronization code in GenericValue (which would cause blocking and 
> >> hurt
> >> performance).
> >>>>>
> >>>>> When I first starting working on the entity cache issues, it appeared
> >> to me that mutable entity values may have been in the original design 
> >> (to
> >> enable a write-through cache). That is my guess - I am not sure. At some
> >> time, the entity values in the cache were made immutable, but the change
> >> was incomplete - some cached entity values were immutable and others 
> >> were
> >> not. That is one of the things I fixed - I made sure ALL entity values
> >> coming from the cache are immutable.
> >>>>>
> >>>>> One way we can eliminate the additional complication of cloning
> >> immutable entity values is to wrap the List in a custom Iterator
> >> implementation that automatically clones elements as they are retrieved
> >> from the List. The drawback is the performance hit - because you 
> >> would be
> >> cloning values that might not get modified. I think it is more 
> >> efficient to
> >> clone an entity value only when you intend to modify it.
> >>>>>
> >>>>> Adrian Crum
> >>>>> Sandglass Software
> >>>>> www.sandglass-software.com
> >>>>>
> >>>>> On 3/19/2015 4:19 PM, Nicolas Malin wrote:
> >>>>>>
> >>>>>> Le 18/03/2015 13:16, Adrian Crum a écrit :
> >>>>>>>
> >>>>>>> If you code Delegator calls to avoid the cache, then there is no 
> >>>>>>> way
> >>>>>>> for a sysadmin to configure the caching behavior - that bit of code
> >>>>>>> will ALWAYS make a database call.
> >>>>>>>
> >>>>>>> If you make all Delegator calls use the cache, then there is an
> >>>>>>> additional complication that will add a bit more code: the
> >>>>>>> GenericValue instances retrieved from the cache are immutable - 
> >>>>>>> if you
> >>>>>>> want to modify them, then you will have to clone them. So, this
> >>>>>>> approach can produce an additional line of code.
> >>>>>>
> >>>>>>
> >>>>>> I don't see any logical reason why we need to keep a GenericValue 
> >>>>>> came
> >>>>>> from cache as immutable. In large vision, a developper give 
> >>>>>> information
> >>>>>> on cache or not only he want force the cache using during his 
> >>>>>> process.
> >>>>>> As OFBiz manage by default transaction, timezone, locale, 
> >>>>>> auto-matching
> >>>>>> or others.
> >>>>>> The entity engine would be works with admin sys cache tuning.
> >>>>>>
> >>>>>> As example delegator.find("Party", "partyId", partyId) use the 
> >>>>>> default
> >>>>>> parameter from cache.properties and after the store on a cached
> >>>>>> GenericValue is a delegator's problem. I see a simple test like 
> >>>>>> that :
> >>>>>> if (genericValue came from cache) {
> >>>>>>      if (value is already done) {
> >>>>>>         getFromDataBase
> >>>>>>         update Value
> >>>>>>      }
> >>>>>>      else refuse (or not I have a doubt :) )
> >>>>>> }
> >>>>>> store
> >>>>>>
> >>>>>>
> >>>>>> Nicolas
> >>>>
> >>>>
> >>
> >
> 
>

Re: Entity Caching

Reply via email to