Re: [hibernate-dev] Consistency guarantees of second level cache

2015-09-10 Thread Radim Vansa
On 09/09/2015 06:16 PM, Steve Ebersole wrote:
> Some comments inline and then a general discussion at the end...
>
> On Wed, Sep 9, 2015 at 10:32 AM Radim Vansa  > wrote:
>
> Thanks for correcting the terms, I'll try to use 'isolation'.
>
> TX2 reading B = 1 is not READ_UNCOMMITTED - value B = 1 was committed
> long ago (it's the initial value). It's reading A = 2 what can be
> considered read uncommitted (not isolated enough), but as the
> cache has
> nothing to do with that entry, we can't really prevent it - it's
> already
> in the DB. So it's really rather a stale read of B. If my terms I
> wrong,
> I apologize.
>
>
> I said that "TX2 reading "B->1" before TX1 commits is a question of 
> isolation and preventing READ_UNCOMMITTED access to the data".  In 
> other words TX2 reading "B->1" in your "timeline" is in fact an 
> example of the cache preventing READ_UNCOMMITTED access to the data.  
> So we are saying the same thing there.  But after that is where we 
> deviate.
>
> The issue with "isolation" is that it is always relative to "the truth 
> of the system".  This is a historical problem between Hibernate and 
> every manifestation of caching that has come from JBoss ;)  In this 
> usage (second level caching) the cache IS NOT the truth of the system; 
> the database is.
>
> So interestingly the data here *is* stale when looked at from the 
> perspective of the database (again the truth of the system).  And that 
> is fundamentally a problem.

I 100% agree that database is the source of the truth, and that the data 
is stale. My question is whether it is the problem (something we need to 
avoid by default), or whether something similar can be exhibited in 
session caching. Okay, I see that it is the problem. By the way the 
current implementation does not suffer of that, I am rather exploring 
further optimizations.

Therefore, relaxing this should go to the nonstrict read-write mode.

>
>
> "as close together as possible" is not enough - either you allow
> certain
> situation to happen (although you might try to minimize how often), or
> you guarantee that it does not happen. So, do I understand it
> correctly
> that 2LC should check ' hibernate.connection.isolation' and behave
> accordingly?
>
>
> Again, the problem is that you are registering your own syncs to do 
> things.  I understand that getting these 2 "events" as close together 
> as possible is just minimizing the risk.  But thats is an important 
> minimization.  Yes you still need to decide what to do when something 
> happens to occur between them.  But minimizing those cases (by 
> shrinking the gap) is important.
>
> And in regards to 'hibernate.connection.isolation'.. uh, no, 
> absolutely not.  I never said that.  Not even sure how you go there..

I've diverted a bit here - I have just looked up how is the isolation 
level set. If you set it for READ_UNCOMMITTED, there's no need to have 
the cache in sync with DB providing non-isolated results. But I'd rather 
not abuse this configuration option.

>
> In 2LC code I am sometimes registering synchronizations but always
> through
>
> SessionImplementor.getTransactionCoordinator()
> .getLocalSynchronizations().registerSynchronization(...)
>
> - I hope this is the right way and not asking for trouble. I usually
> just need to do something when I know whether the DB has written the
> data or not - Hibernate calls the *AccessStrategy methods well
> enough in
> the beforeCompletion part (or I should rather say during flush()) but
> sometimes I need to delegate some work to the afterCompletion part.
>
>
> Well let's look at the example you gave in detail.  And for reference, 
> this is outlined in the EntityRegionAccessStrategy javadocs.
>
> So for an UPDATE to an entity we'd get the following call sequence:
>
> 1) Session1 transaction begins
> 2) Session1 flush is triggered; we deem that both A and B have 
> changed and need to be written to the database.
> 2.a) SQL UPDATE issued for A
> 2.b) EntityRegionAccessStrategy#update called for A
> 2.c) SQL UPDATE issued for B
> 2.d) EntityRegionAccessStrategy#update called for B
> 3) Session1 transaction commits[1]
> 3.a) "before completion callbacks" (for this discussion, there are none)
> 3.b) JDBC transaction committed
> 3.c) "after completion callbacks"
> 3.c.1) EntityRegionAccessStrategy#afterUpdate called for A
> 3.c.2) EntityRegionAccessStrategy#afterUpdate called for B
>
> And again, that is the flow outlined in EntityRegionAccessStrategy:
> UPDATES : {@link #lockItem} -> {@link #update} -> {@link 
> #afterUpdate}
>
> So I am still not sure why you need to register a Synchronization.  
> You already get callbacks for "after completion".  Perhaps you meant 
> that there are times you need to do something during "before completion"?

No, I need to do work in afterCompletion, but I need 

Re: [hibernate-dev] Consistency guarantees of second level cache

2015-09-10 Thread Steve Ebersole
Yes, and maybe in general you allow users to pick between such a
strict/nonstrict mode.


On Thu, Sep 10, 2015 at 2:47 AM Radim Vansa  wrote:

> On 09/09/2015 06:16 PM, Steve Ebersole wrote:
> > Some comments inline and then a general discussion at the end...
> >
> > On Wed, Sep 9, 2015 at 10:32 AM Radim Vansa  > > wrote:
> >
> > Thanks for correcting the terms, I'll try to use 'isolation'.
> >
> > TX2 reading B = 1 is not READ_UNCOMMITTED - value B = 1 was committed
> > long ago (it's the initial value). It's reading A = 2 what can be
> > considered read uncommitted (not isolated enough), but as the
> > cache has
> > nothing to do with that entry, we can't really prevent it - it's
> > already
> > in the DB. So it's really rather a stale read of B. If my terms I
> > wrong,
> > I apologize.
> >
> >
> > I said that "TX2 reading "B->1" before TX1 commits is a question of
> > isolation and preventing READ_UNCOMMITTED access to the data".  In
> > other words TX2 reading "B->1" in your "timeline" is in fact an
> > example of the cache preventing READ_UNCOMMITTED access to the data.
> > So we are saying the same thing there.  But after that is where we
> > deviate.
> >
> > The issue with "isolation" is that it is always relative to "the truth
> > of the system".  This is a historical problem between Hibernate and
> > every manifestation of caching that has come from JBoss ;)  In this
> > usage (second level caching) the cache IS NOT the truth of the system;
> > the database is.
> >
> > So interestingly the data here *is* stale when looked at from the
> > perspective of the database (again the truth of the system).  And that
> > is fundamentally a problem.
>
> I 100% agree that database is the source of the truth, and that the data
> is stale. My question is whether it is the problem (something we need to
> avoid by default), or whether something similar can be exhibited in
> session caching. Okay, I see that it is the problem. By the way the
> current implementation does not suffer of that, I am rather exploring
> further optimizations.
>
> Therefore, relaxing this should go to the nonstrict read-write mode.
>
> >
> >
> > "as close together as possible" is not enough - either you allow
> > certain
> > situation to happen (although you might try to minimize how often),
> or
> > you guarantee that it does not happen. So, do I understand it
> > correctly
> > that 2LC should check ' hibernate.connection.isolation' and behave
> > accordingly?
> >
> >
> > Again, the problem is that you are registering your own syncs to do
> > things.  I understand that getting these 2 "events" as close together
> > as possible is just minimizing the risk.  But thats is an important
> > minimization.  Yes you still need to decide what to do when something
> > happens to occur between them.  But minimizing those cases (by
> > shrinking the gap) is important.
> >
> > And in regards to 'hibernate.connection.isolation'.. uh, no,
> > absolutely not.  I never said that.  Not even sure how you go there..
>
> I've diverted a bit here - I have just looked up how is the isolation
> level set. If you set it for READ_UNCOMMITTED, there's no need to have
> the cache in sync with DB providing non-isolated results. But I'd rather
> not abuse this configuration option.
>
> >
> > In 2LC code I am sometimes registering synchronizations but always
> > through
> >
> > SessionImplementor.getTransactionCoordinator()
> > .getLocalSynchronizations().registerSynchronization(...)
> >
> > - I hope this is the right way and not asking for trouble. I usually
> > just need to do something when I know whether the DB has written the
> > data or not - Hibernate calls the *AccessStrategy methods well
> > enough in
> > the beforeCompletion part (or I should rather say during flush()) but
> > sometimes I need to delegate some work to the afterCompletion part.
> >
> >
> > Well let's look at the example you gave in detail.  And for reference,
> > this is outlined in the EntityRegionAccessStrategy javadocs.
> >
> > So for an UPDATE to an entity we'd get the following call sequence:
> >
> > 1) Session1 transaction begins
> > 2) Session1 flush is triggered; we deem that both A and B have
> > changed and need to be written to the database.
> > 2.a) SQL UPDATE issued for A
> > 2.b) EntityRegionAccessStrategy#update called for A
> > 2.c) SQL UPDATE issued for B
> > 2.d) EntityRegionAccessStrategy#update called for B
> > 3) Session1 transaction commits[1]
> > 3.a) "before completion callbacks" (for this discussion, there are none)
> > 3.b) JDBC transaction committed
> > 3.c) "after completion callbacks"
> > 3.c.1) EntityRegionAccessStrategy#afterUpdate called for A
> > 3.c.2) EntityRegionAccessStrategy#afterUpdate called for B
> >
> > And again, that is the flow outlined in EntityRegionAccessStrategy:
> > 

[hibernate-dev] Consistency guarantees of second level cache

2015-09-09 Thread Radim Vansa
Hi,

I've been fixing a lot of consistency issues in Infinispan 2LC lately 
and also trying to improve performance. When reasoning about consistency 
guarantees I've usually assumed that we don't want to provide stale 
entries from the cache after the DB commits - that means, we have to 
invalidate them before the DB commit. This is a useful property if there 
are some application constraints on the data (e.g. that two entities 
have equal attributes). On the other hand, if we want the cache 
synchronized with DB only after the commit fully finishes, we could omit 
some pre-DB-commit RPCs and improve the performance a bit.

To illustrate the difference, imagine that we wouldn't require such 
atomicity of transactions: when we update the two entities in TX1 and 
one of them is cached and the other is not, in TX2 we could see updated 
value of the non-cached value but we could still hit cache for the other 
entity, seeing stale value, since TX1 has committed the DB but did not 
finish the commit yet on ORM side:

A = 1, B = 1
TX1: begin
TX1: (from flush) write A -> 2
TX1: (from flush) write B -> 2
TX1: DB (XA resource) commit
TX2: read A -> 2 (handled from DB)
TX2: read B -> 1 (cached entry)
TX1: cache commit (registered as synchronization) -> cache gets updated 
to B = 2
TX1 is completed, control flow returns to caller

Naturally, after TX1 returns from transaction commit, no stale values 
should be provided.

Since I don't have any deep experience with DBs (I assume that they 
behave really in the ACID way). I'd like to ask what are the guarantees 
that we want from 2LC, and if there's anything in the session caching 
that would loosen this ACIDity. I know we have the nonstrict-read-write 
mode (that could implement the less strict way), but I imagine this as 
something that breaks the contract a bit more, allowing even larger 
performance gains (going the best-effort way without any guarantees).

Thanks for your insight!

Radim

-- 
Radim Vansa 
JBoss Performance Team

___
hibernate-dev mailing list
hibernate-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/hibernate-dev


Re: [hibernate-dev] Consistency guarantees of second level cache

2015-09-09 Thread Steve Ebersole
Some comments inline and then a general discussion at the end...

On Wed, Sep 9, 2015 at 10:32 AM Radim Vansa  wrote:

> Thanks for correcting the terms, I'll try to use 'isolation'.
>
> TX2 reading B = 1 is not READ_UNCOMMITTED - value B = 1 was committed
> long ago (it's the initial value). It's reading A = 2 what can be
> considered read uncommitted (not isolated enough), but as the cache has
> nothing to do with that entry, we can't really prevent it - it's already
> in the DB. So it's really rather a stale read of B. If my terms I wrong,
> I apologize.
>

I said that "TX2 reading "B->1" before TX1 commits is a question of
isolation and preventing READ_UNCOMMITTED access to the data".  In other
words TX2 reading "B->1" in your "timeline" is in fact an example of the
cache preventing READ_UNCOMMITTED access to the data.  So we are saying the
same thing there.  But after that is where we deviate.

The issue with "isolation" is that it is always relative to "the truth of
the system".  This is a historical problem between Hibernate and every
manifestation of caching that has come from JBoss ;)  In this usage (second
level caching) the cache IS NOT the truth of the system; the database is.

So interestingly the data here *is* stale when looked at from the
perspective of the database (again the truth of the system).  And that is
fundamentally a problem.


"as close together as possible" is not enough - either you allow certain
> situation to happen (although you might try to minimize how often), or
> you guarantee that it does not happen. So, do I understand it correctly
> that 2LC should check ' hibernate.connection.isolation' and behave
> accordingly?
>

Again, the problem is that you are registering your own syncs to do
things.  I understand that getting these 2 "events" as close together as
possible is just minimizing the risk.  But thats is an important
minimization.  Yes you still need to decide what to do when something
happens to occur between them.  But minimizing those cases (by shrinking
the gap) is important.

And in regards to 'hibernate.connection.isolation'.. uh, no, absolutely
not.  I never said that.  Not even sure how you go there..



> In 2LC code I am sometimes registering synchronizations but always through
>
> SessionImplementor.getTransactionCoordinator()
> .getLocalSynchronizations().registerSynchronization(...)
>
> - I hope this is the right way and not asking for trouble. I usually
> just need to do something when I know whether the DB has written the
> data or not - Hibernate calls the *AccessStrategy methods well enough in
> the beforeCompletion part (or I should rather say during flush()) but
> sometimes I need to delegate some work to the afterCompletion part.
>

Well let's look at the example you gave in detail.  And for reference, this
is outlined in the EntityRegionAccessStrategy javadocs.

So for an UPDATE to an entity we'd get the following call sequence:

1) Session1 transaction begins
2) Session1 flush is triggered; we deem that both A and B have changed and
need to be written to the database.
2.a) SQL UPDATE issued for A
2.b) EntityRegionAccessStrategy#update called for A
2.c) SQL UPDATE issued for B
2.d) EntityRegionAccessStrategy#update called for B
3) Session1 transaction commits[1]
3.a) "before completion callbacks" (for this discussion, there are none)
3.b) JDBC transaction committed
3.c) "after completion callbacks"
3.c.1) EntityRegionAccessStrategy#afterUpdate called for A
3.c.2) EntityRegionAccessStrategy#afterUpdate called for B

And again, that is the flow outlined in EntityRegionAccessStrategy:
UPDATES : {@link #lockItem} -> {@link #update} -> {@link
#afterUpdate}

So I am still not sure why you need to register a Synchronization.  You
already get callbacks for "after completion".  Perhaps you meant that there
are times you need to do something during "before completion"?


[1] the exact "how* in (3) will vary, but the general ordering will remain
the same.  Making this ordering consistent was one of the main drivers of
redesigning the transaction handling in 5.0
___
hibernate-dev mailing list
hibernate-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/hibernate-dev


Re: [hibernate-dev] Consistency guarantees of second level cache

2015-09-09 Thread Steve Ebersole
To be precise when you talk the stale data you are really asking about
isolation.  TX2 reading "B->1" before TX1 commits is a question of
isolation and preventing READ_UNCOMMITTED access to the data.  The problem
is the split in the notion of "commit".  Those should be "as close together
as possible".  For what it is worth, Hibernate commits its work via
Synchronization as well.  My preference (and this is based on years of
fighting problems specifically between Hibernate and
TreeCache/JBossCache/Infinispan in regards to Synchronization ordering) is
that hibernate-infinispan piggyback on Hibernate's transaction handling.
Actually, I thought this is why we made some of the transaction changes we
did in Hibernate.. so that you could have a consistent view of the
transaction across jdbc/jta in hibernate-infinispan.  In my experience,
having hibernate-infinispan/Infinispan register its own Synchronization to
control this stuff is just asking for a lot of trouble.

Anyway, this also gets into the meaning of the concurrent access
strategies.  Which access strategy are you talking about in particular?  I
assume you mean the `transactional` strategy, just making sure.




On Wed, Sep 9, 2015 at 6:58 AM Radim Vansa  wrote:

> Hi,
>
> I've been fixing a lot of consistency issues in Infinispan 2LC lately
> and also trying to improve performance. When reasoning about consistency
> guarantees I've usually assumed that we don't want to provide stale
> entries from the cache after the DB commits - that means, we have to
> invalidate them before the DB commit. This is a useful property if there
> are some application constraints on the data (e.g. that two entities
> have equal attributes). On the other hand, if we want the cache
> synchronized with DB only after the commit fully finishes, we could omit
> some pre-DB-commit RPCs and improve the performance a bit.
>
> To illustrate the difference, imagine that we wouldn't require such
> atomicity of transactions: when we update the two entities in TX1 and
> one of them is cached and the other is not, in TX2 we could see updated
> value of the non-cached value but we could still hit cache for the other
> entity, seeing stale value, since TX1 has committed the DB but did not
> finish the commit yet on ORM side:
>
> A = 1, B = 1
> TX1: begin
> TX1: (from flush) write A -> 2
> TX1: (from flush) write B -> 2
> TX1: DB (XA resource) commit
> TX2: read A -> 2 (handled from DB)
> TX2: read B -> 1 (cached entry)
> TX1: cache commit (registered as synchronization) -> cache gets updated
> to B = 2
> TX1 is completed, control flow returns to caller
>
> Naturally, after TX1 returns from transaction commit, no stale values
> should be provided.
>
> Since I don't have any deep experience with DBs (I assume that they
> behave really in the ACID way). I'd like to ask what are the guarantees
> that we want from 2LC, and if there's anything in the session caching
> that would loosen this ACIDity. I know we have the nonstrict-read-write
> mode (that could implement the less strict way), but I imagine this as
> something that breaks the contract a bit more, allowing even larger
> performance gains (going the best-effort way without any guarantees).
>
> Thanks for your insight!
>
> Radim
>
> --
> Radim Vansa 
> JBoss Performance Team
>
> ___
> hibernate-dev mailing list
> hibernate-dev@lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/hibernate-dev
>
___
hibernate-dev mailing list
hibernate-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/hibernate-dev


Re: [hibernate-dev] Consistency guarantees of second level cache

2015-09-09 Thread Radim Vansa
Thanks for correcting the terms, I'll try to use 'isolation'.

TX2 reading B = 1 is not READ_UNCOMMITTED - value B = 1 was committed 
long ago (it's the initial value). It's reading A = 2 what can be 
considered read uncommitted (not isolated enough), but as the cache has 
nothing to do with that entry, we can't really prevent it - it's already 
in the DB. So it's really rather a stale read of B. If my terms I wrong, 
I apologize.

"as close together as possible" is not enough - either you allow certain 
situation to happen (although you might try to minimize how often), or 
you guarantee that it does not happen. So, do I understand it correctly 
that 2LC should check ' hibernate.connection.isolation' and behave 
accordingly?

In 2LC code I am sometimes registering synchronizations but always through

SessionImplementor.getTransactionCoordinator() 
.getLocalSynchronizations().registerSynchronization(...)

- I hope this is the right way and not asking for trouble. I usually 
just need to do something when I know whether the DB has written the 
data or not - Hibernate calls the *AccessStrategy methods well enough in 
the beforeCompletion part (or I should rather say during flush()) but 
sometimes I need to delegate some work to the afterCompletion part.

Radim


On 09/09/2015 04:51 PM, Steve Ebersole wrote:
> To be precise when you talk the stale data you are really asking about 
> isolation.  TX2 reading "B->1" before TX1 commits is a question of 
> isolation and preventing READ_UNCOMMITTED access to the data.  The 
> problem is the split in the notion of "commit".  Those should be "as 
> close together as possible".  For what it is worth, Hibernate commits 
> its work via Synchronization as well.  My preference (and this is 
> based on years of fighting problems specifically between Hibernate and 
> TreeCache/JBossCache/Infinispan in regards to Synchronization 
> ordering) is that hibernate-infinispan piggyback on Hibernate's 
> transaction handling.  Actually, I thought this is why we made some of 
> the transaction changes we did in Hibernate.. so that you could have a 
> consistent view of the transaction across jdbc/jta in 
> hibernate-infinispan.  In my experience, having 
> hibernate-infinispan/Infinispan register its own Synchronization to 
> control this stuff is just asking for a lot of trouble.
>
> Anyway, this also gets into the meaning of the concurrent access 
> strategies.  Which access strategy are you talking about in 
> particular?  I assume you mean the `transactional` strategy, just 
> making sure.
>
>
>
>
> On Wed, Sep 9, 2015 at 6:58 AM Radim Vansa  > wrote:
>
> Hi,
>
> I've been fixing a lot of consistency issues in Infinispan 2LC lately
> and also trying to improve performance. When reasoning about
> consistency
> guarantees I've usually assumed that we don't want to provide stale
> entries from the cache after the DB commits - that means, we have to
> invalidate them before the DB commit. This is a useful property if
> there
> are some application constraints on the data (e.g. that two entities
> have equal attributes). On the other hand, if we want the cache
> synchronized with DB only after the commit fully finishes, we
> could omit
> some pre-DB-commit RPCs and improve the performance a bit.
>
> To illustrate the difference, imagine that we wouldn't require such
> atomicity of transactions: when we update the two entities in TX1 and
> one of them is cached and the other is not, in TX2 we could see
> updated
> value of the non-cached value but we could still hit cache for the
> other
> entity, seeing stale value, since TX1 has committed the DB but did not
> finish the commit yet on ORM side:
>
> A = 1, B = 1
> TX1: begin
> TX1: (from flush) write A -> 2
> TX1: (from flush) write B -> 2
> TX1: DB (XA resource) commit
> TX2: read A -> 2 (handled from DB)
> TX2: read B -> 1 (cached entry)
> TX1: cache commit (registered as synchronization) -> cache gets
> updated
> to B = 2
> TX1 is completed, control flow returns to caller
>
> Naturally, after TX1 returns from transaction commit, no stale values
> should be provided.
>
> Since I don't have any deep experience with DBs (I assume that they
> behave really in the ACID way). I'd like to ask what are the
> guarantees
> that we want from 2LC, and if there's anything in the session caching
> that would loosen this ACIDity. I know we have the
> nonstrict-read-write
> mode (that could implement the less strict way), but I imagine this as
> something that breaks the contract a bit more, allowing even larger
> performance gains (going the best-effort way without any guarantees).
>
> Thanks for your insight!
>
> Radim
>
> --
> Radim Vansa >
> JBoss Performance Team