Re: [ovirt-devel] Caching of data from the database done properly

2016-07-12 Thread Roman Mohr
Hi Yevgeny,

On Mon, Jul 11, 2016 at 7:59 PM, Yevgeny Zaspitsky  wrote:
>
>
> On Tue, Jul 5, 2016 at 7:14 AM, Roman Mohr  wrote:
>>
>> On Mon, Jul 4, 2016 at 11:58 PM, Roman Mohr  wrote:
>> > Hi Everyone,
>> >
>> > I wanted to discuss a practice which seems to be pretty common in the
>> > engine which I find very limiting, dangerous and for some things it
>> > can even be a blocker.
>> >
>> > There are several places in the engine where we are using maps as
>> > cache in singletons to avoid reloading data from the database. Two
>> > prominent ones are the QuotaManager[1] and the MacPoolPerCluster[2].
>> >
>> > While it looks tempting to just use a map as cache, add some locks
>> > around it and create an injectable singleton, this has some drawbacks:
>> >
>> > 1) We have an autoritative source for our data and it offers
>> > transactions to take care of inconsistencies or parallel updates.
>> > Doing all that in a service again duplicates this.
>> > 2) Caching on the service layer is definitely not a good idea. It can
>> > introduce unwanted side effects when someone invokes the DAOs
>> > directly.
>> > 3) The point is more about the question if a cache is really needed:
>> > Do I just want that cache because I find it convenient to do a
>> > #getMacPoolForCluster(Guid clusterId) in a loop instead of just
>> > loading it once before the loop, or do my usage requirements really
>> > force me to use a cache?
>> >
>> > If you really need a cache, consider the following:
>> >
>> > 1) Do the caching on the DAO layer. This guarantees the best
>> > consistency across the data.
>> > 2) Yes this means either locking in the DAOs or a transactional cache.
>> > But before you complain, think about what in [1] and [2] is done. We
>> > do exactly that there, so the complexity is already introduced anyway.
>> > 3) Since we are working with transactions, a custom cache should NEVER
>> > cache writes (really just talking about our use case here). This makes
>> > checks for existing IDs before adding an entity or similar checks
>> > unnecessary, don't duplicate constraint checks like in [2].
>> > 4) There should always be a way to disable the cache (even if it is
>> > just for testing).
>> > 5) If I can't convince you to move the cache to the DAO layer, still
>> > add a way to disable the cache.
>> >
>>
>> I forgot to mention one thing: There are of course cases where
>> something is loaded on startup. Mostly things which can have multiple
>> sources.
>> For instance for the application configuration itself it is pretty
>> common, or like in the scheduler the scheduling policies where some
>> are Java only,
>> some are coming from other sources. It is still good
>>
>> But for normal business entities accessing parts of it through
>> services and parts of it through services is not the best thing to do
>> (if constructiong the whole business entity out of multiple daos is
>> complex, Repositories can help, but the cache should still be in the
>> dao layer).
>
>
> I do not agree that the caching should be on the DAO layer - that might lead
> to getting an entity that is built of parts that are not coherent each with
> another if the different DAO caches are not in sync.

I can't agree here.
That is what transactions are for. A second layer cache normally
follows transactions. You have interceptors to detect rollbacks and
commits.
If you don't have JTA in place there is then normally a small window
where you can read stale data in different transactions (which is fine
in most cases). It does have nothing to do with where the cache is.

It is much easier to stay in sync since there is no way to by-bass the cache.

> I'd put the cache on the Repositories (non-existent currently) or a higher
> layer, just above the transaction boundaries, so the cache would contain
> service call results rather than raw data.

What does that mean above the Transaction boundaries?
Yes a second level cache is to have a cache across transaction
boundaries and you also have that when you place them in the DAO
layer.

You would further make it very hard to track weather you are allowed
to manipulate data through DAOs, Repositories or Services when you
don't place the basic cache inside the DAOs since you might always by
accident by-pass the cache.

For higher layer caches in singletons it is also almost a prerequisite
to have the basic cache in the DAO layer because you can then also
listen on cache changes for dependent entities inside the singleton
(all cache implementations I know have listeners) and invalidate or
update derived caches. This, in combination with the possibility to
disable the cache completely on all layers, makes the cache completely
transparent on every layer. Which makes it very easy to write sane
code when using all the different services, DAOs, Repositories, ... .

> Then the cache would prevent from
> the application accessing the DB connection pool for a connection.

The cache inside the DAO is before you acquire a DB connect

Re: [ovirt-devel] Caching of data from the database done properly

2016-07-11 Thread Eldad Marciano
What about JPA i think in 3.5 it raised to start using it.
At the past (3.3) we found big performance issue  around hist monitoring area 
(which fixed by a map eventually).

Roy, 
Can you tell where the JPA stand today ?

Regards,
-Eldad

> On 11 ביולי 2016, at 20:59, Yevgeny Zaspitsky  wrote:
> 
> 
> 
>> On Tue, Jul 5, 2016 at 7:14 AM, Roman Mohr  wrote:
>> On Mon, Jul 4, 2016 at 11:58 PM, Roman Mohr  wrote:
>> > Hi Everyone,
>> >
>> > I wanted to discuss a practice which seems to be pretty common in the
>> > engine which I find very limiting, dangerous and for some things it
>> > can even be a blocker.
>> >
>> > There are several places in the engine where we are using maps as
>> > cache in singletons to avoid reloading data from the database. Two
>> > prominent ones are the QuotaManager[1] and the MacPoolPerCluster[2].
>> >
>> > While it looks tempting to just use a map as cache, add some locks
>> > around it and create an injectable singleton, this has some drawbacks:
>> >
>> > 1) We have an autoritative source for our data and it offers
>> > transactions to take care of inconsistencies or parallel updates.
>> > Doing all that in a service again duplicates this.
>> > 2) Caching on the service layer is definitely not a good idea. It can
>> > introduce unwanted side effects when someone invokes the DAOs
>> > directly.
>> > 3) The point is more about the question if a cache is really needed:
>> > Do I just want that cache because I find it convenient to do a
>> > #getMacPoolForCluster(Guid clusterId) in a loop instead of just
>> > loading it once before the loop, or do my usage requirements really
>> > force me to use a cache?
>> >
>> > If you really need a cache, consider the following:
>> >
>> > 1) Do the caching on the DAO layer. This guarantees the best
>> > consistency across the data.
>> > 2) Yes this means either locking in the DAOs or a transactional cache.
>> > But before you complain, think about what in [1] and [2] is done. We
>> > do exactly that there, so the complexity is already introduced anyway.
>> > 3) Since we are working with transactions, a custom cache should NEVER
>> > cache writes (really just talking about our use case here). This makes
>> > checks for existing IDs before adding an entity or similar checks
>> > unnecessary, don't duplicate constraint checks like in [2].
>> > 4) There should always be a way to disable the cache (even if it is
>> > just for testing).
>> > 5) If I can't convince you to move the cache to the DAO layer, still
>> > add a way to disable the cache.
>> >
>> 
>> I forgot to mention one thing: There are of course cases where
>> something is loaded on startup. Mostly things which can have multiple
>> sources.
>> For instance for the application configuration itself it is pretty
>> common, or like in the scheduler the scheduling policies where some
>> are Java only,
>> some are coming from other sources. It is still good
>> 
>> But for normal business entities accessing parts of it through
>> services and parts of it through services is not the best thing to do
>> (if constructiong the whole business entity out of multiple daos is
>> complex, Repositories can help, but the cache should still be in the
>> dao layer).
> 
> I do not agree that the caching should be on the DAO layer - that might lead 
> to getting an entity that is built of parts that are not coherent each with 
> another if the different DAO caches are not in sync. 
> I'd put the cache on the Repositories (non-existent currently) or a higher 
> layer, just above the transaction boundaries, so the cache would contain 
> service call results rather than raw data. Then the cache would prevent from 
> the application accessing the DB connection pool for a connection. Yes, 
> different service caches might have same entities duplicated in the memory, 
> but I do not care of that until that's proven as a problem and if it would 
> I'd go to improving cache - making that more capable.
> 
>> I hope  you get what I mean.
>> 
>> > For as long as there is no general caching solution with something
>> > like ehcache or infinispan, in my eyes such small things matter a lot
>> > to keep a project maintainable.
>> >
>> > That are some of the best practices I have seen around caching
>> > database data. It would be great if we could agree on something like
>> > that. Maybe there is already an agreement on something and I am just
>> > not aware of it.
>> >
>> > Looking forward to hear your feedback.
>> >
>> > Roman
>> >
>> > [1] 
>> > https://github.com/oVirt/ovirt-engine/blob/master/backend/manager/modules/bll/src/main/java/org/ovirt/engine/core/bll/quota/QuotaManager.java
>> > [2] 
>> > https://github.com/oVirt/ovirt-engine/blob/master/backend/manager/modules/bll/src/main/java/org/ovirt/engine/core/bll/network/macpool/MacPoolPerCluster.java
>> ___
>> Devel mailing list
>> Devel@ovirt.org
>> http://lists.ovirt.org/mailman/listinfo/devel
> 
> ___

Re: [ovirt-devel] Caching of data from the database done properly

2016-07-11 Thread Yevgeny Zaspitsky
On Tue, Jul 5, 2016 at 7:14 AM, Roman Mohr  wrote:

> On Mon, Jul 4, 2016 at 11:58 PM, Roman Mohr  wrote:
> > Hi Everyone,
> >
> > I wanted to discuss a practice which seems to be pretty common in the
> > engine which I find very limiting, dangerous and for some things it
> > can even be a blocker.
> >
> > There are several places in the engine where we are using maps as
> > cache in singletons to avoid reloading data from the database. Two
> > prominent ones are the QuotaManager[1] and the MacPoolPerCluster[2].
> >
> > While it looks tempting to just use a map as cache, add some locks
> > around it and create an injectable singleton, this has some drawbacks:
> >
> > 1) We have an autoritative source for our data and it offers
> > transactions to take care of inconsistencies or parallel updates.
> > Doing all that in a service again duplicates this.
> > 2) Caching on the service layer is definitely not a good idea. It can
> > introduce unwanted side effects when someone invokes the DAOs
> > directly.
> > 3) The point is more about the question if a cache is really needed:
> > Do I just want that cache because I find it convenient to do a
> > #getMacPoolForCluster(Guid clusterId) in a loop instead of just
> > loading it once before the loop, or do my usage requirements really
> > force me to use a cache?
> >
> > If you really need a cache, consider the following:
> >
> > 1) Do the caching on the DAO layer. This guarantees the best
> > consistency across the data.
> > 2) Yes this means either locking in the DAOs or a transactional cache.
> > But before you complain, think about what in [1] and [2] is done. We
> > do exactly that there, so the complexity is already introduced anyway.
> > 3) Since we are working with transactions, a custom cache should NEVER
> > cache writes (really just talking about our use case here). This makes
> > checks for existing IDs before adding an entity or similar checks
> > unnecessary, don't duplicate constraint checks like in [2].
> > 4) There should always be a way to disable the cache (even if it is
> > just for testing).
> > 5) If I can't convince you to move the cache to the DAO layer, still
> > add a way to disable the cache.
> >
>
> I forgot to mention one thing: There are of course cases where
> something is loaded on startup. Mostly things which can have multiple
> sources.
> For instance for the application configuration itself it is pretty
> common, or like in the scheduler the scheduling policies where some
> are Java only,
> some are coming from other sources. It is still good
>
> But for normal business entities accessing parts of it through
> services and parts of it through services is not the best thing to do
> (if constructiong the whole business entity out of multiple daos is
> complex, Repositories can help, but the cache should still be in the
> dao layer).
>

I do not agree that the caching should be on the DAO layer - that might
lead to getting an entity that is built of parts that are not coherent each
with another if the different DAO caches are not in sync.
I'd put the cache on the Repositories (non-existent currently) or a higher
layer, just above the transaction boundaries, so the cache would contain
service call results rather than raw data. Then the cache would prevent
from the application accessing the DB connection pool for a connection.
Yes, different service caches might have same entities duplicated in the
memory, but I do not care of that until that's proven as a problem and if
it would I'd go to improving cache - making that more capable.

I hope  you get what I mean.
>
> > For as long as there is no general caching solution with something
> > like ehcache or infinispan, in my eyes such small things matter a lot
> > to keep a project maintainable.
> >
> > That are some of the best practices I have seen around caching
> > database data. It would be great if we could agree on something like
> > that. Maybe there is already an agreement on something and I am just
> > not aware of it.
> >
> > Looking forward to hear your feedback.
> >
> > Roman
> >
> > [1]
> https://github.com/oVirt/ovirt-engine/blob/master/backend/manager/modules/bll/src/main/java/org/ovirt/engine/core/bll/quota/QuotaManager.java
> > [2]
> https://github.com/oVirt/ovirt-engine/blob/master/backend/manager/modules/bll/src/main/java/org/ovirt/engine/core/bll/network/macpool/MacPoolPerCluster.java
> ___
> Devel mailing list
> Devel@ovirt.org
> http://lists.ovirt.org/mailman/listinfo/devel
>
___
Devel mailing list
Devel@ovirt.org
http://lists.ovirt.org/mailman/listinfo/devel

Re: [ovirt-devel] Caching of data from the database done properly

2016-07-09 Thread Roman Mohr
Hi Martin,

Great feedback. Thanks for the clarifications.

On Thu, Jul 7, 2016 at 3:25 PM, Martin Mucha  wrote:
> Hi,
>
> some of information in mail are not exactly true. Namely MacPoolPerCluster 
> *does not do caching*, it does not even have DB layer structures it could 
> cache. So how it works is: pool has configuration upon which it initialize 
> itself. After that, it looks into db for all used MACs, which currently 
> happens to be querying all MAC addresses of all VmNics. So this is 
> initialized from data in DB, but it does not cache them. Clients [of pool] 
> asks pool for MAC address, which is then used somewhere without pool 
> supervision. I don't want to question this design, and I'm not saying that it 
> wouldn't be possible to move it's logic to db layer, but once, long, long 
> time ago someone decided this to be done on bll, and so it is on bll layer.
>

had another look at the MacPoolPerCluster source. You are right. It is
caching some calculations and not database data. I agree that this
should not be in the dao layer or the database. Sorry for the wrong
accusations regarding the MacPoolPerCluster class.

> I understand, that these might come as a problem in arquillian testing, but 
> that's to be resolved, since, not all singletons are for caching. And even if 
> they are, testing framework should be able to cope with such common beans, we 
> shouldn't limit ourselves not to use singletons. Therefore, I wouldn't invest 
> into changing these 'caches', but into allowing more complex setups in our 
> testing. If it's not possible, then 'reset' method is second best solution — 
> we have to use write lock as suggested in cr, and then it should be fine.

For the context: [1]
Having singletons is fine from my perspective too. It is just about
caching data from the database. Spring offeres the @DirtiesContext (A
little bit nicer than with Arquillian where as far as I have seen you
would create different test cases and do new @Deployments). But I
prefer to reset these rare singletons explicitly in a base class for
every test. Otherwise it is always very hard to track down possible
side effects in the class because you did not set up a new context.

For me tests are first class citizens of an application, so having a
way to reinitialize singletons directly is what I prefer. When it is
about caching from the database it is normally not needed since you
just disable the database cache during the tests.

>
> About drawbacks:
> ad 1) yes, this comes as an extra problem, one has to deal with tx on his 
> own, that's true. This wasn't part of original solution, but it should be 
> fixed already.

As long as it is really just the last resort I am fine with it.

> ad 2) No. Caching done correctly is done closest to the consumer. I assume 
> you can similarly ruin hibernate l2 cache via accessing data through 
> different channel. But that's kinda common to all caches — if you bypass 
> them, you'll corrupt them. So do not bypass them, or in this case, use them 
> as they was designed. As it have been said, you ask pool for mac, or inform 
> it, that you're going to use your own, and then use it. It means, that it's 
> designed to actually bypass it on all writes. Therefore if someone writes a 
> code using MAC without prior notification to the pool about such action, it 
> would be a problem. To avoid this problem there has to be bigger refactor — 
> pool would have to persist MAC addresses somehow and not vmNicDao, or if 
> moved entirelly to db layer, there would have to be trigger on vmnic table or 
> something like that...

I missed the calculation part in the macPoolPerCluster, so this is ok,
most of my comments do not apply there now. Of course you can cache
wherever you have to to meet the requirements. Still the best thing
you can have is that the loaded entities, when cached in higher layers
get evicted too when you change something in the lower layers (e.g.
DAO). This is the normal expectation and what all the hibernate level2
cache is about. Since we don't use all these fancy caches which do all
the hard lifting for us it would be even more complex for us to do the
caching on higher layers right.

> ad 3) It was requested, to have at least tens of millions of macs in pool. 
> Forget about loop, initializing this structure for given clusterId is not 
> acceptable even once per request. Loading that structure is quite cheap 
> (now), but not that cheap to allow what you ask for. Moving whole stuff to db 
> layer would be probably beneficial (when it was originally implemented), but 
> it's worthless doing it now.
>

It is really just about caching from the database. Moving logic to the
dao layer or the database is definitely what I had in mind.

> About suggestions: neither of them applies to MacPoolPerCluster — point (3) 
> for example: since pool is not a simple cache of db structure, and does not 
> have corresponding data in db layer, it cannot cache writes and it even does 
> not do a

Re: [ovirt-devel] Caching of data from the database done properly

2016-07-07 Thread Martin Mucha
Hi,

some of information in mail are not exactly true. Namely MacPoolPerCluster 
*does not do caching*, it does not even have DB layer structures it could 
cache. So how it works is: pool has configuration upon which it initialize 
itself. After that, it looks into db for all used MACs, which currently happens 
to be querying all MAC addresses of all VmNics. So this is initialized from 
data in DB, but it does not cache them. Clients [of pool] asks pool for MAC 
address, which is then used somewhere without pool supervision. I don't want to 
question this design, and I'm not saying that it wouldn't be possible to move 
it's logic to db layer, but once, long, long time ago someone decided this to 
be done on bll, and so it is on bll layer.

I understand, that these might come as a problem in arquillian testing, but 
that's to be resolved, since, not all singletons are for caching. And even if 
they are, testing framework should be able to cope with such common beans, we 
shouldn't limit ourselves not to use singletons. Therefore, I wouldn't invest 
into changing these 'caches', but into allowing more complex setups in our 
testing. If it's not possible, then 'reset' method is second best solution — we 
have to use write lock as suggested in cr, and then it should be fine.

About drawbacks:
ad 1) yes, this comes as an extra problem, one has to deal with tx on his own, 
that's true. This wasn't part of original solution, but it should be fixed 
already.
ad 2) No. Caching done correctly is done closest to the consumer. I assume you 
can similarly ruin hibernate l2 cache via accessing data through different 
channel. But that's kinda common to all caches — if you bypass them, you'll 
corrupt them. So do not bypass them, or in this case, use them as they was 
designed. As it have been said, you ask pool for mac, or inform it, that you're 
going to use your own, and then use it. It means, that it's designed to 
actually bypass it on all writes. Therefore if someone writes a code using MAC 
without prior notification to the pool about such action, it would be a 
problem. To avoid this problem there has to be bigger refactor — pool would 
have to persist MAC addresses somehow and not vmNicDao, or if moved entirelly 
to db layer, there would have to be trigger on vmnic table or something like 
that...
ad 3) It was requested, to have at least tens of millions of macs in pool. 
Forget about loop, initializing this structure for given clusterId is not 
acceptable even once per request. Loading that structure is quite cheap (now), 
but not that cheap to allow what you ask for. Moving whole stuff to db layer 
would be probably beneficial (when it was originally implemented), but it's 
worthless doing it now.

About suggestions: neither of them applies to MacPoolPerCluster — point (3) for 
example: since pool is not a simple cache of db structure, and does not have 
corresponding data in db layer, it cannot cache writes and it even does not do 
any writes at all...

——
Surely I can imagine better implementation of this, but it'd require bigger 
changes whose benefits aren't worth of cost of this change. (I hope that) we 
have to deal with it. This was original design, and since testing framework 
(with changed caches or without) should deal with singletons etc. anyways, it's 
not worthy to change it. If there aren't any better options(I don't know), 
reinitializing bean can be used (and I apologize for blocking that). I'd avoid 
bigger rewrites in this area.

M.


- Original Message -
> On Mon, Jul 4, 2016 at 11:58 PM, Roman Mohr  wrote:
> > Hi Everyone,
> >
> > I wanted to discuss a practice which seems to be pretty common in the
> > engine which I find very limiting, dangerous and for some things it
> > can even be a blocker.
> >
> > There are several places in the engine where we are using maps as
> > cache in singletons to avoid reloading data from the database. Two
> > prominent ones are the QuotaManager[1] and the MacPoolPerCluster[2].
> >
> > While it looks tempting to just use a map as cache, add some locks
> > around it and create an injectable singleton, this has some drawbacks:
> >
> > 1) We have an autoritative source for our data and it offers
> > transactions to take care of inconsistencies or parallel updates.
> > Doing all that in a service again duplicates this.
> > 2) Caching on the service layer is definitely not a good idea. It can
> > introduce unwanted side effects when someone invokes the DAOs
> > directly.
> > 3) The point is more about the question if a cache is really needed:
> > Do I just want that cache because I find it convenient to do a
> > #getMacPoolForCluster(Guid clusterId) in a loop instead of just
> > loading it once before the loop, or do my usage requirements really
> > force me to use a cache?
> >
> > If you really need a cache, consider the following:
> >
> > 1) Do the caching on the DAO layer. This guarantees the best
> > consistency across the data.
> > 2) Yes this means 

Re: [ovirt-devel] Caching of data from the database done properly

2016-07-04 Thread Roman Mohr
On Mon, Jul 4, 2016 at 11:58 PM, Roman Mohr  wrote:
> Hi Everyone,
>
> I wanted to discuss a practice which seems to be pretty common in the
> engine which I find very limiting, dangerous and for some things it
> can even be a blocker.
>
> There are several places in the engine where we are using maps as
> cache in singletons to avoid reloading data from the database. Two
> prominent ones are the QuotaManager[1] and the MacPoolPerCluster[2].
>
> While it looks tempting to just use a map as cache, add some locks
> around it and create an injectable singleton, this has some drawbacks:
>
> 1) We have an autoritative source for our data and it offers
> transactions to take care of inconsistencies or parallel updates.
> Doing all that in a service again duplicates this.
> 2) Caching on the service layer is definitely not a good idea. It can
> introduce unwanted side effects when someone invokes the DAOs
> directly.
> 3) The point is more about the question if a cache is really needed:
> Do I just want that cache because I find it convenient to do a
> #getMacPoolForCluster(Guid clusterId) in a loop instead of just
> loading it once before the loop, or do my usage requirements really
> force me to use a cache?
>
> If you really need a cache, consider the following:
>
> 1) Do the caching on the DAO layer. This guarantees the best
> consistency across the data.
> 2) Yes this means either locking in the DAOs or a transactional cache.
> But before you complain, think about what in [1] and [2] is done. We
> do exactly that there, so the complexity is already introduced anyway.
> 3) Since we are working with transactions, a custom cache should NEVER
> cache writes (really just talking about our use case here). This makes
> checks for existing IDs before adding an entity or similar checks
> unnecessary, don't duplicate constraint checks like in [2].
> 4) There should always be a way to disable the cache (even if it is
> just for testing).
> 5) If I can't convince you to move the cache to the DAO layer, still
> add a way to disable the cache.
>

I forgot to mention one thing: There are of course cases where
something is loaded on startup. Mostly things which can have multiple
sources.
For instance for the application configuration itself it is pretty
common, or like in the scheduler the scheduling policies where some
are Java only,
some are coming from other sources. It is still good

But for normal business entities accessing parts of it through
services and parts of it through services is not the best thing to do
(if constructiong the whole business entity out of multiple daos is
complex, Repositories can help, but the cache should still be in the
dao layer).
I hope  you get what I mean.

> For as long as there is no general caching solution with something
> like ehcache or infinispan, in my eyes such small things matter a lot
> to keep a project maintainable.
>
> That are some of the best practices I have seen around caching
> database data. It would be great if we could agree on something like
> that. Maybe there is already an agreement on something and I am just
> not aware of it.
>
> Looking forward to hear your feedback.
>
> Roman
>
> [1] 
> https://github.com/oVirt/ovirt-engine/blob/master/backend/manager/modules/bll/src/main/java/org/ovirt/engine/core/bll/quota/QuotaManager.java
> [2] 
> https://github.com/oVirt/ovirt-engine/blob/master/backend/manager/modules/bll/src/main/java/org/ovirt/engine/core/bll/network/macpool/MacPoolPerCluster.java
___
Devel mailing list
Devel@ovirt.org
http://lists.ovirt.org/mailman/listinfo/devel


[ovirt-devel] Caching of data from the database done properly

2016-07-04 Thread Roman Mohr
Hi Everyone,

I wanted to discuss a practice which seems to be pretty common in the
engine which I find very limiting, dangerous and for some things it
can even be a blocker.

There are several places in the engine where we are using maps as
cache in singletons to avoid reloading data from the database. Two
prominent ones are the QuotaManager[1] and the MacPoolPerCluster[2].

While it looks tempting to just use a map as cache, add some locks
around it and create an injectable singleton, this has some drawbacks:

1) We have an autoritative source for our data and it offers
transactions to take care of inconsistencies or parallel updates.
Doing all that in a service again duplicates this.
2) Caching on the service layer is definitely not a good idea. It can
introduce unwanted side effects when someone invokes the DAOs
directly.
3) The point is more about the question if a cache is really needed:
Do I just want that cache because I find it convenient to do a
#getMacPoolForCluster(Guid clusterId) in a loop instead of just
loading it once before the loop, or do my usage requirements really
force me to use a cache?

If you really need a cache, consider the following:

1) Do the caching on the DAO layer. This guarantees the best
consistency across the data.
2) Yes this means either locking in the DAOs or a transactional cache.
But before you complain, think about what in [1] and [2] is done. We
do exactly that there, so the complexity is already introduced anyway.
3) Since we are working with transactions, a custom cache should NEVER
cache writes (really just talking about our use case here). This makes
checks for existing IDs before adding an entity or similar checks
unnecessary, don't duplicate constraint checks like in [2].
4) There should always be a way to disable the cache (even if it is
just for testing).
5) If I can't convince you to move the cache to the DAO layer, still
add a way to disable the cache.

For as long as there is no general caching solution with something
like ehcache or infinispan, in my eyes such small things matter a lot
to keep a project maintainable.

That are some of the best practices I have seen around caching
database data. It would be great if we could agree on something like
that. Maybe there is already an agreement on something and I am just
not aware of it.

Looking forward to hear your feedback.

Roman

[1] 
https://github.com/oVirt/ovirt-engine/blob/master/backend/manager/modules/bll/src/main/java/org/ovirt/engine/core/bll/quota/QuotaManager.java
[2] 
https://github.com/oVirt/ovirt-engine/blob/master/backend/manager/modules/bll/src/main/java/org/ovirt/engine/core/bll/network/macpool/MacPoolPerCluster.java
___
Devel mailing list
Devel@ovirt.org
http://lists.ovirt.org/mailman/listinfo/devel