Re: [infinispan-dev] Atomic operations and transactions

2011-07-11 Thread Manik Surtani

On 4 Jul 2011, at 07:57, Galder Zamarreño wrote:

 I get the feeling that those atomic operations are particularly useful when 
 transactions are not used cos they allow you to reduce to cache operations to 
 one, hence avoiding the need to use a lock or synchronized block, or in our 
 case, a transaction.

Precisely.  I think the atomic ops should be documented such that they are not 
used within a transaction scope, possibly either:

1) suspending any ongoing tx if used, or 
2) throwing an illegal state exception if used within a tx scope

--
Manik Surtani
ma...@jboss.org
twitter.com/maniksurtani

Lead, Infinispan
http://www.infinispan.org




___
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev


Re: [infinispan-dev] Atomic operations and transactions

2011-07-11 Thread Manik Surtani

On 5 Jul 2011, at 10:23, Galder Zamarreño wrote:

 I've gone through some cases and end results would not differ at first glance 
 if the atomic ops suspend the txs. The only thing that would change would be 
 the expectations of lock acquisition timeouts by atomic ops within txs.

There is also the expectation of being able to roll back which goes away.

--
Manik Surtani
ma...@jboss.org
twitter.com/maniksurtani

Lead, Infinispan
http://www.infinispan.org




___
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev


Re: [infinispan-dev] transactions :: incremental locking

2011-07-11 Thread Mircea Markus

On 6 Jul 2011, at 16:41, Sanne Grinovero wrote:

 Very nice design, congratulations.
 Second picture is missing a final 4 step returning the ACK to N1 right?
yes, this happens when 1a, 1b and 1c returned to N1.
 
 Abot the third proposal I was thinking about a possible improvement,
 not sure if it works:
 consider the list defining the keys being locked, already ordered. As
 far as I understood about the ordering on the hashweel, you can now
 split this list to have the groups which need to be sent to each node:
 k1, k2, k3, k4, k5 - N2{k1, k2} N3{k3, k4} N4{k5}
 i.e. the sequence does not need to be reordered and we can make use of that.
 
 Now instead of sending the full list in multicast to each involved
 node, we send the list only to the first node, and wait for an ACK
 from the last node involved in the sequence.
 This does not reduce the number of async rpcs being executed, but
 sends less packets on the wire, and has the advantage of never needing
 to go back to recover from a failed node or recover from a potential
 deadlock, as all keys are always taken in proper order, not
 optimistically.
Your suggestion is a variation of 2nd approach(Async incremental locking). If 
we approach it this way, the prepare message, which is potentially big, is sent 
in sequence from one node to the other.
With current approach the prepare is send in parallel, and the lock request 
sent from N2-N3-N4 has a smaller payload.
 
 [reusing the same picture of Hybrid incremental approach]
 assuming this list of keys, owned by these nodes: k1, k2, k3, k4, k5
 - N2{k1, k2} N3{k3, k4} N4{k5}
 Now if N2 receives the list, it acquires k1, k2, in order, and sends
 the full list over to N3 with a message like
 
 N2{k1 *taken, k2 *taken} N3{k3, k4} N4{k5}
 
 and it will also wait for an async ACK. If it receives an ACK, N2's
 work is done.
 If it doesn't receive one in a reasonable time, it will assume
This failure detection functionality is in jgroups, I wouldn't want to have it 
duplicated here as well. But indeed N2 can be notified by jgroups about this. 
 N3 is
 dead: now the list of nodes changes, but the order in which these
 elements need to be acquired did not. In fact, some of these might be
 moved to N2 himself, some to N4 since the rehashing will move them to
 neighbours: N3 owned keys can be taken right away (again not violating
 the order), the rest of the list is again forwarded to the next alive
 node.
 The last node will return an ACK to the originating node to inform
 that the prepare phase was successful, and when he did it will send an
 ACK back to it's previous node to free it's resources, this one will
 propagate back the ACK, etc.
 No deadlocks, no timeouts, no undo operations. what am I missing?

 
 Cheers,
 Sanne
 
 2011/7/6 Mircea Markus mircea.mar...@jboss.com:
 
 On 6 Jul 2011, at 09:00, Paolo Romano wrote:
 
 On 7/5/11 3:47 PM, Mircea Markus wrote:
 On 5 Jul 2011, at 15:39, Manik Surtani wrote:
 
 Good stuff, shame about the RPC count .  ;)
 yeah. Still very valid when there are deadlocks, guess figures will tell 
 us more precisely what the gain is
 
 I agree with Manik's observation.
 
 When benchmarking it, it would be interesting to compare it also with
 the total order multicast approach that Pedro has been developing.
 +1
 
 When do you plan to have these locking optimizations available?
 We hope to have it in for 5.1, but it has a lower priority compared with the 
 other lock related changes.
 
 ___
 infinispan-dev mailing list
 infinispan-dev@lists.jboss.org
 https://lists.jboss.org/mailman/listinfo/infinispan-dev
 
 
 ___
 infinispan-dev mailing list
 infinispan-dev@lists.jboss.org
 https://lists.jboss.org/mailman/listinfo/infinispan-dev
 
 
 ___
 infinispan-dev mailing list
 infinispan-dev@lists.jboss.org
 https://lists.jboss.org/mailman/listinfo/infinispan-dev


___
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev


Re: [infinispan-dev] transactions :: incremental locking

2011-07-11 Thread Mircea Markus

On 8 Jul 2011, at 10:34, Galder Zamarreño wrote:

 Mircea, thx for writing this up. Some comments:
 
 It's not clear to me, at least given the examples in the wiki, how problem of 
 Tx1 and Tx2 with a and b explained in the introduction is fixed by the 
 solution.
 
 The problem exposed:
 - with some right timing, during prepare time it is possible for these two 
 transactions to deadlock:
   • Tx1 lock acquired on a @ N3
   • Tx2 lock acquired on b @ N4
   • Tx1 cannot progress acquiring  lock on b @ N4, that lock is 
 acquired by Tx2
   • Tx2 cannot acquire lock on a @ N3 as that lock is held by Tx1
   • Tx1 and Tx2 are waiting for each other = deadlock
 
 The 'alleged'' solution?
 
 In the example above, considering the view = {N1, N2, N3, N4} and 
 incremental lock acquisition:
   • Tx1 acquires lock on a@ N3
   • Tx2 tries to acquire lock on a@ N3. It cannot acquire and waits for 
 Tx1 to release it
   • Tx1 acquires locks on b@ N4, completes and releases locks
   • Tx2 acquires lock on a@ N3 and completes as well
 
 Is the latter block supposed to show how the example in the introduction is 
 fixed? If it is, it's not really doing it cos it's not explaining what 
 happens to Tx2 that is trying to acquire locks on 'b'...
Tx2 acquires lock on b @ N4 after locking a@N3, I've updated the document. 
 
 The diagrams and the associated explanations are confusing too. Example:
 
 N1 first locks a@N3 (1) - in the diagram, (1) represents a call from N1 to 
 N2
 
 b@N4 (RPC 2) - That corresponds to to 5/6 arrows? 
I've also updated this, thanks!
 
 On Jul 5, 2011, at 3:25 PM, Mircea Markus wrote:
 
 Hi,
 
 This is the document describing the incremental optimistic locking Dan and 
 myself discussed last week: 
 http://community.jboss.org/wiki/IncrementalOptimisticLocking
 Unless I'm missing something, this together with lock reordering[1] cover 
 100% of the possible deadlock situations in the optimistic locking scheme[2] 
 - which is pretty awesome!
 
 Cheers,
 Mircea   
 
 [1] http://community.jboss.org/wiki/LockReorderingForAvoidingDeadlocks
 [2] http://community.jboss.org/wiki/OptimisticLockingInInfinispan
 
 ___
 infinispan-dev mailing list
 infinispan-dev@lists.jboss.org
 https://lists.jboss.org/mailman/listinfo/infinispan-dev
 
 --
 Galder Zamarreño
 Sr. Software Engineer
 Infinispan, JBoss Cache
 
 
 ___
 infinispan-dev mailing list
 infinispan-dev@lists.jboss.org
 https://lists.jboss.org/mailman/listinfo/infinispan-dev


___
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev


Re: [infinispan-dev] transactions :: incremental locking

2011-07-11 Thread Sanne Grinovero
2011/7/11 Mircea Markus mircea.mar...@jboss.com:

 On 6 Jul 2011, at 16:41, Sanne Grinovero wrote:

 Very nice design, congratulations.
 Second picture is missing a final 4 step returning the ACK to N1 right?
 yes, this happens when 1a, 1b and 1c returned to N1.

 Abot the third proposal I was thinking about a possible improvement,
 not sure if it works:
 consider the list defining the keys being locked, already ordered. As
 far as I understood about the ordering on the hashweel, you can now
 split this list to have the groups which need to be sent to each node:
 k1, k2, k3, k4, k5 - N2{k1, k2} N3{k3, k4} N4{k5}
 i.e. the sequence does not need to be reordered and we can make use of that.

 Now instead of sending the full list in multicast to each involved
 node, we send the list only to the first node, and wait for an ACK
 from the last node involved in the sequence.
 This does not reduce the number of async rpcs being executed, but
 sends less packets on the wire, and has the advantage of never needing
 to go back to recover from a failed node or recover from a potential
 deadlock, as all keys are always taken in proper order, not
 optimistically.
 Your suggestion is a variation of 2nd approach(Async incremental locking). 
 If we approach it this way, the prepare message, which is potentially big, is 
 sent in sequence from one node to the other.
 With current approach the prepare is send in parallel, and the lock request 
 sent from N2-N3-N4 has a smaller payload.

Yes it's a variation, and I wouldn't have thought of it without
reading your nice proposal. Still I'm not understanding why the
original suggestion is doing anything in parallel, since you suggest
the payload must be kept low, you're sending multiple different keys
over the wire from N1, so you can't use multicast and assuming you
have a single wire these are sent in sequence too. About total number
of parallel communications, the original approach has the same number
of serialized node hops for the best scenario, but worse in case
something needs to be rolled back, imho it gets complicated if you
have to rollback a longer path when more nodes are involved!
so my intention was to a) make it simpler b) while still optimising
for the good case, having less chattiness in the worst case.

I agree my last proposal sends keys over to nodes which are not
interesting to them, still since it's not about the values I wouldn't
expect this to be a big change in the packet size, and am wondering if
this token ring approach would not be simpler: there are less
packets going around at the same time, order is guaranteed, so I
wouldn't be sure of this being less efficient on the network without
some real world test.

Cheers,
Sanne


 [reusing the same picture of Hybrid incremental approach]
 assuming this list of keys, owned by these nodes: k1, k2, k3, k4, k5
 - N2{k1, k2} N3{k3, k4} N4{k5}
 Now if N2 receives the list, it acquires k1, k2, in order, and sends
 the full list over to N3 with a message like

 N2{k1 *taken, k2 *taken} N3{k3, k4} N4{k5}

 and it will also wait for an async ACK. If it receives an ACK, N2's
 work is done.
 If it doesn't receive one in a reasonable time, it will assume
 This failure detection functionality is in jgroups, I wouldn't want to have 
 it duplicated here as well. But indeed N2 can be notified by jgroups about 
 this.
 N3 is
 dead: now the list of nodes changes, but the order in which these
 elements need to be acquired did not. In fact, some of these might be
 moved to N2 himself, some to N4 since the rehashing will move them to
 neighbours: N3 owned keys can be taken right away (again not violating
 the order), the rest of the list is again forwarded to the next alive
 node.
 The last node will return an ACK to the originating node to inform
 that the prepare phase was successful, and when he did it will send an
 ACK back to it's previous node to free it's resources, this one will
 propagate back the ACK, etc.
 No deadlocks, no timeouts, no undo operations. what am I missing?


 Cheers,
 Sanne

 2011/7/6 Mircea Markus mircea.mar...@jboss.com:

 On 6 Jul 2011, at 09:00, Paolo Romano wrote:

 On 7/5/11 3:47 PM, Mircea Markus wrote:
 On 5 Jul 2011, at 15:39, Manik Surtani wrote:

 Good stuff, shame about the RPC count .  ;)
 yeah. Still very valid when there are deadlocks, guess figures will tell 
 us more precisely what the gain is

 I agree with Manik's observation.

 When benchmarking it, it would be interesting to compare it also with
 the total order multicast approach that Pedro has been developing.
 +1

 When do you plan to have these locking optimizations available?
 We hope to have it in for 5.1, but it has a lower priority compared with 
 the other lock related changes.

 ___
 infinispan-dev mailing list
 infinispan-dev@lists.jboss.org
 https://lists.jboss.org/mailman/listinfo/infinispan-dev


 ___
 infinispan-dev mailing list
 

Re: [infinispan-dev] New documentation type - Glossary

2011-07-11 Thread Mircea Markus

On 8 Jul 2011, at 13:27, Manik Surtani wrote:

 Do we want to come up with a list of terms that definitely need defining, at 
 least from where we stand now?  More will be added incrementally, for sure.
 
 * Data Grid

http://www.jroller.com/cpurdy/entry/defining_a_data_grid
It doesn't sound like a definition to me, but it is the only one I could 
find... 
 * In-memory data grid
 * Write through
 * Write behind
- read ahead
- cache aside
- near cache
 * MVCC
 * Optimistic Locking
 * Pessimistic Locking
 * Deadlock
 * Livelock
 * Write skew
 * Isolation level
 * 2-phase commit
 * 1-phase commit
 * XA resource
 * JTA synchronization
 * Split brain/network partition
 
 ... what else?
 
 
 
 On 7 Jul 2011, at 17:14, Pete Muir wrote:
 
 I've added a new docs category - Glossary. This is for defining terminology 
 we use that new users (and experienced users) might have trouble with. Data 
 Grid would be a great example.
 
 For each item in the glossary, create a page underneath Glossary, and then 
 we can link it in using an include. An item in the Glossary should use an 
 h2. heading and include the heading in the text body as well as have a page 
 title - this just makes the item reusable.
 ___
 infinispan-dev mailing list
 infinispan-dev@lists.jboss.org
 https://lists.jboss.org/mailman/listinfo/infinispan-dev
 
 --
 Manik Surtani
 ma...@jboss.org
 twitter.com/maniksurtani
 
 Lead, Infinispan
 http://www.infinispan.org
 
 
 
 
 ___
 infinispan-dev mailing list
 infinispan-dev@lists.jboss.org
 https://lists.jboss.org/mailman/listinfo/infinispan-dev

___
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev

Re: [infinispan-dev] transactions :: incremental locking

2011-07-11 Thread Mircea Markus

On 11 Jul 2011, at 12:27, Sanne Grinovero wrote:

 2011/7/11 Mircea Markus mircea.mar...@jboss.com:
 
 On 6 Jul 2011, at 16:41, Sanne Grinovero wrote:
 
 Very nice design, congratulations.
 Second picture is missing a final 4 step returning the ACK to N1 right?
 yes, this happens when 1a, 1b and 1c returned to N1.
 
 Abot the third proposal I was thinking about a possible improvement,
 not sure if it works:
 consider the list defining the keys being locked, already ordered. As
 far as I understood about the ordering on the hashweel, you can now
 split this list to have the groups which need to be sent to each node:
 k1, k2, k3, k4, k5 - N2{k1, k2} N3{k3, k4} N4{k5}
 i.e. the sequence does not need to be reordered and we can make use of that.
 
 Now instead of sending the full list in multicast to each involved
 node, we send the list only to the first node, and wait for an ACK
 from the last node involved in the sequence.
 This does not reduce the number of async rpcs being executed, but
 sends less packets on the wire, and has the advantage of never needing
 to go back to recover from a failed node or recover from a potential
 deadlock, as all keys are always taken in proper order, not
 optimistically.
 Your suggestion is a variation of 2nd approach(Async incremental locking). 
 If we approach it this way, the prepare message, which is potentially big, 
 is sent in sequence from one node to the other.
 With current approach the prepare is send in parallel, and the lock request 
 sent from N2-N3-N4 has a smaller payload.
 
 Yes it's a variation, and I wouldn't have thought of it without
 reading your nice proposal. Still I'm not understanding why the
 original suggestion is doing anything in parallel, since you suggest
 the payload must be kept low, you're sending multiple different keys
 over the wire from N1, so you can't use multicast and assuming you
 have a single wire these are sent in sequence too.
ATM the prepare is sent as a multicast, each node receiving all the key-set 
touched by the transaction. 
 About total number
 of parallel communications, the original approach has the same number
 of serialized node hops for the best scenario,
I'm a bit lost on what is the original approach :-) Is the Async incremental 
locking or the hybrid one?
For the hybrid, the number of serialized node hops in the best scenario is 
1.
 but worse in case
 something needs to be rolled back, imho it gets complicated if you
 have to rollback a longer path when more nodes are involved!
yes, but the rollback should be seen as an exception in this situation, and the 
optimization is for the most common scenario which is lock acquisition 
succeeds. 
 so my intention was to a) make it simpler b) while still optimising
 for the good case, having less chattiness in the worst case.
The good case, at least compared with the hybrid approach, is less optimal: 
hybrid has serial 1 hop, this one has numNodesInvolvedInTx hops.
 
 I agree my last proposal sends keys over to nodes which are not
 interesting to them, still since it's not about the values I wouldn't
 expect this to be a big change in the packet size, and am wondering if
 this token ring approach would not be simpler: there are less
 packets going around at the same time, order is guaranteed, so I
 wouldn't be sure of this being less efficient on the network without
 some real world test.
 
 Cheers,
 Sanne
 
 
 [reusing the same picture of Hybrid incremental approach]
 assuming this list of keys, owned by these nodes: k1, k2, k3, k4, k5
 - N2{k1, k2} N3{k3, k4} N4{k5}
 Now if N2 receives the list, it acquires k1, k2, in order, and sends
 the full list over to N3 with a message like
 
 N2{k1 *taken, k2 *taken} N3{k3, k4} N4{k5}
 
 and it will also wait for an async ACK. If it receives an ACK, N2's
 work is done.
 If it doesn't receive one in a reasonable time, it will assume
 This failure detection functionality is in jgroups, I wouldn't want to have 
 it duplicated here as well. But indeed N2 can be notified by jgroups about 
 this.
 N3 is
 dead: now the list of nodes changes, but the order in which these
 elements need to be acquired did not. In fact, some of these might be
 moved to N2 himself, some to N4 since the rehashing will move them to
 neighbours: N3 owned keys can be taken right away (again not violating
 the order), the rest of the list is again forwarded to the next alive
 node.
 The last node will return an ACK to the originating node to inform
 that the prepare phase was successful, and when he did it will send an
 ACK back to it's previous node to free it's resources, this one will
 propagate back the ACK, etc.
 No deadlocks, no timeouts, no undo operations. what am I missing?
 
 
 Cheers,
 Sanne
 
 2011/7/6 Mircea Markus mircea.mar...@jboss.com:
 
 On 6 Jul 2011, at 09:00, Paolo Romano wrote:
 
 On 7/5/11 3:47 PM, Mircea Markus wrote:
 On 5 Jul 2011, at 15:39, Manik Surtani wrote:
 
 Good stuff, shame about the RPC count .  ;)
 yeah. Still very valid 

Re: [infinispan-dev] Atomic operations and transactions

2011-07-11 Thread Mircea Markus

On 11 Jul 2011, at 10:45, Manik Surtani wrote:

 
 On 4 Jul 2011, at 07:57, Galder Zamarreño wrote:
 
 I get the feeling that those atomic operations are particularly useful when 
 transactions are not used cos they allow you to reduce to cache operations 
 to one, hence avoiding the need to use a lock or synchronized block, or in 
 our case, a transaction.
 
 Precisely.  I think the atomic ops should be documented such that they are 
 not used within a transaction scope, possibly either:
 
 1) suspending any ongoing tx if used, or 
 2) throwing an illegal state exception if used within a tx scope
+1 for the 2nd approach. At least up to the moment one comes with a use case 
for using them within a tx.
___
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev


Re: [infinispan-dev] Atomic operations and transactions

2011-07-11 Thread Manik Surtani

On 11 Jul 2011, at 13:21, Mircea Markus wrote:

 
 On 11 Jul 2011, at 10:45, Manik Surtani wrote:
 
 
 On 4 Jul 2011, at 07:57, Galder Zamarreño wrote:
 
 I get the feeling that those atomic operations are particularly useful when 
 transactions are not used cos they allow you to reduce to cache operations 
 to one, hence avoiding the need to use a lock or synchronized block, or in 
 our case, a transaction.
 
 Precisely.  I think the atomic ops should be documented such that they are 
 not used within a transaction scope, possibly either:
 
 1) suspending any ongoing tx if used, or 
 2) throwing an illegal state exception if used within a tx scope
 +1 for the 2nd approach. At least up to the moment one comes with a use case 
 for using them within a tx.

Yes, it is just more explicit that way.

--
Manik Surtani
ma...@jboss.org
twitter.com/maniksurtani

Lead, Infinispan
http://www.infinispan.org




___
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev


Re: [infinispan-dev] transactions :: incremental locking

2011-07-11 Thread Sanne Grinovero
This is getting a bit confusing; sorry, I've tried to be brief but it
seems I didn't improve the message.
I'll add some more notes below as they belong to the context, but
let's talk about this on IRC.

2011/7/11 Mircea Markus mircea.mar...@jboss.com:
 On 11 Jul 2011, at 12:27, Sanne Grinovero wrote:
 2011/7/11 Mircea Markus mircea.mar...@jboss.com:
 On 6 Jul 2011, at 16:41, Sanne Grinovero wrote:

 Very nice design, congratulations.
 Second picture is missing a final 4 step returning the ACK to N1 right?
 yes, this happens when 1a, 1b and 1c returned to N1.

 Abot the third proposal I was thinking about a possible improvement,
 not sure if it works:
 consider the list defining the keys being locked, already ordered. As
 far as I understood about the ordering on the hashweel, you can now
 split this list to have the groups which need to be sent to each node:
 k1, k2, k3, k4, k5 - N2{k1, k2} N3{k3, k4} N4{k5}
 i.e. the sequence does not need to be reordered and we can make use of 
 that.

 Now instead of sending the full list in multicast to each involved
 node, we send the list only to the first node, and wait for an ACK
 from the last node involved in the sequence.
 This does not reduce the number of async rpcs being executed, but
 sends less packets on the wire, and has the advantage of never needing
 to go back to recover from a failed node or recover from a potential
 deadlock, as all keys are always taken in proper order, not
 optimistically.
 Your suggestion is a variation of 2nd approach(Async incremental 
 locking). If we approach it this way, the prepare message, which is 
 potentially big, is sent in sequence from one node to the other.
 With current approach the prepare is send in parallel, and the lock request 
 sent from N2-N3-N4 has a smaller payload.

 Yes it's a variation, and I wouldn't have thought of it without
 reading your nice proposal. Still I'm not understanding why the
 original suggestion is doing anything in parallel, since you suggest
 the payload must be kept low, you're sending multiple different keys
 over the wire from N1, so you can't use multicast and assuming you
 have a single wire these are sent in sequence too.
 ATM the prepare is sent as a multicast, each node receiving all the key-set 
 touched by the transaction.
 About total number
 of parallel communications, the original approach has the same number
 of serialized node hops for the best scenario,

 I'm a bit lost on what is the original approach :-) Is the Async 
 incremental locking or the hybrid one?
 For the hybrid, the number of serialized node hops in the best scenario 
 is 1.
I'm sorry, I'm referring to Hybrid incremental approach.
How can you have a value of 1 if the nodes have to confirm in
sequence? As the picture shows, there are messages labeled in sequence
1,2,3,4, and it should need a final ACK (5) too.
The small change I'm suggesting doesn't improve this that much, it's
only avoiding need for rolling back lock acquisitions: since you need
1-5 messages in series, I'm suggesting we take advantage of that,
admittely using a larger payload.
Thinking about it now, it's also making sure that locks are kept for a
smaller time: if all locks are acquired during the initial multicast
from message in phase 1, while with the additional proposal I've sent
some locks are acquired a bit later in terms of network messages; no
doubt the change is small, but this might make it possible from some
transactions to make progress quicker.

 but worse in case
 something needs to be rolled back, imho it gets complicated if you
 have to rollback a longer path when more nodes are involved!
 yes, but the rollback should be seen as an exception in this situation, and 
 the optimization is for the most common scenario which is lock acquisition 
 succeeds.
 so my intention was to a) make it simpler b) while still optimising
 for the good case, having less chattiness in the worst case.
 The good case, at least compared with the hybrid approach, is less optimal: 
 hybrid has serial 1 hop, this one has numNodesInvolvedInTx hops.

I'm failing to understand how you can have a single 1 hop in series,
you must ack the lock acquisition in order to achieve your goal of
deadlock prevention; there must be a misunderstanding, as I'm
experiencing the opposite interpretation; actually I think your
explanation and solution on the wiki was brilliant but we're getting
entangled on the mailing list.
Did you try expanding your schema on more nodes, what should happen if
the first node fails in a series of a dozen? It doesn't scale with the
number of involved servers, and you're keeping some locks for the time
of multiple network hops, while these locks should actually be free
and are blocking other transactions. You might well end up in a
spinning situation as the delays are in terms of multiple network
hops: you blocked another TX which needs to restart from scratch in
the locking fight, and abort yourself to retry as well: that's a lot
of wasted 

Re: [infinispan-dev] Cutting 5.0.0.Final

2011-07-11 Thread Vladimir Blagojevic
Hey Manik,

I'd say we need a release plan. What else needs to be done, last fixes 
(e.g LRU), performance tests,  documentation etc etc.

Vladimir

On 11-07-11 7:25 AM, Manik Surtani wrote:
 Guys, any thoughts on this?  Anything that's come up that needs addressing?  
 We seem to have a few minor open JIRAs, nothing significant...

 --
 Manik Surtani
 ma...@jboss.org
 twitter.com/maniksurtani

 Lead, Infinispan
 http://www.infinispan.org




 ___
 infinispan-dev mailing list
 infinispan-dev@lists.jboss.org
 https://lists.jboss.org/mailman/listinfo/infinispan-dev

___
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev


Re: [infinispan-dev] Cutting 5.0.0.Final

2011-07-11 Thread Manik Surtani
Yes, agreed, I am in the process if piecing one together.  Just wanted to make 
sure code-wise, we're ready

On 11 Jul 2011, at 14:41, Vladimir Blagojevic wrote:

 Hey Manik,
 
 I'd say we need a release plan. What else needs to be done, last fixes 
 (e.g LRU), performance tests,  documentation etc etc.
 
 Vladimir
 
 On 11-07-11 7:25 AM, Manik Surtani wrote:
 Guys, any thoughts on this?  Anything that's come up that needs addressing?  
 We seem to have a few minor open JIRAs, nothing significant...
 
 --
 Manik Surtani
 ma...@jboss.org
 twitter.com/maniksurtani
 
 Lead, Infinispan
 http://www.infinispan.org
 
 
 
 
 ___
 infinispan-dev mailing list
 infinispan-dev@lists.jboss.org
 https://lists.jboss.org/mailman/listinfo/infinispan-dev
 
 ___
 infinispan-dev mailing list
 infinispan-dev@lists.jboss.org
 https://lists.jboss.org/mailman/listinfo/infinispan-dev

--
Manik Surtani
ma...@jboss.org
twitter.com/maniksurtani

Lead, Infinispan
http://www.infinispan.org




___
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev


Re: [infinispan-dev] Faster LRU

2011-07-11 Thread Dan Berindei
On Fri, Jul 8, 2011 at 12:44 PM, Dan Berindei dan.berin...@gmail.com wrote:

 It's interesting that in the writeOnMiss test the new LRU performance
 dropped when I increased the concurrency level from 32 to 128. I think
 it might be because the eviction.thresholdExpired() check in
 BCHM.attemptEviction() is done without a lock and so it could return
 true simultaneously for multiple threads - which will all proceed to
 wait on the segment lock and attempt eviction at the same time.


I added another batch threshold check while holding the lock and the
performance anomaly disappeared.

 Another strange pattern is that neither eviction policy respects the
 capacity parameter exactly. LIRS rounds up the capacity to the next
 power of 2, and LRU/LRUOld do the same rounding and then multiply by
 0.75.


I updated BCHM to pass the initial capacity to the eviction policy,
and now all the policies keep the same number of entries.

 I'll report again once I fixed these and once I update the reporting -
 I think the total number of misses might be more relevant than the
 standard deviation of the keys at the end.


I got the tests running on the test cluster again and the new LRU is
clearly better in every respect than the old LRU. In fact the only
problem in the test results is that in the new writeOnMiss
scenario the hit ratio of the new LRU is much better than all the
other policies. There is probably a mistake somewhere in the test, if
it was a random thing I don't think it would have been so visible in a
20-minutes test:

MapStressTest configuration: capacity 50, test running time 1200 seconds
Container BCHM:LIRS Ops/s   11155.49  HitRatio  96.54  Size
 499968  stdDev  193558.30
Container BCHM:LRU  Ops/s   31292.06  HitRatio  97.84  Size
 50  stdDev  193168.07
Container BCHM:LRU_OLD  Ops/s 116.89  HitRatio  76.11  Size
 500032  stdDev  197974.87

Testing write on miss performance with capacity 50, keys 200,
concurrency level 32, threads 100
Container BCHM:LIRS Ops/s1684.01  HitRatio  63.13  Size
 499968  stdDev  338637.40
Container BCHM:LRU  Ops/s4884.57  HitRatio  84.47  Size
 50  stdDev  353336.31
Container BCHM:LRU_OLD  Ops/s  50.69  HitRatio  41.34  Size
 500032  stdDev  361239.68
MapStressTest configuration: capacity 50, test running time 1200 seconds
Testing independent read/write/remove performance with capacity 50, keys 
100, concurrency level 32, readers 90, writers 9, removers 1
Container BCHM:LIRS Ops/s   14865.45  Gets/s   13971.36  Puts/s8424.86  
Removes/s4640.39  HitRatio  83.10  Size 499972  StdDev  168386.87
Container BCHM:LRU  Ops/s   35970.38  Gets/s   33800.61  Puts/s   19273.89  
Removes/s   21815.97  HitRatio  78.64  Size 50  StdDev  192598.94
Container BCHM:LRU_OLD  Ops/s 496.10  Gets/s 467.32  Puts/s 258.91  
Removes/s 269.63  HitRatio  69.66  Size 500032  StdDev  182468.36
Container CLHM  Ops/s   56074.13  Gets/s   50656.76  Puts/s   48274.11  
Removes/s   53090.65  HitRatio  77.62  Size 50  StdDev  180395.29
Container SLHM  Ops/s   20603.49  Gets/s   18110.83  Puts/s   22696.60  
Removes/s   20080.24  HitRatio  80.12  Size 499401  StdDev  179188.12
[testng-MapStressTest] Test 
testReadWriteRemove(org.infinispan.stress.MapStressTest) succeeded.
Test suite progress: tests succeeded: 1, failed: 0, skipped: 0.
Testing independent read/write/remove performance with capacity 50, keys 
100, concurrency level 128, readers 90, writers 9, removers 1
Container BCHM:LIRS Ops/s   23567.68  Gets/s   21558.38  Puts/s   19691.52  
Removes/s3617.48  HitRatio  84.73  Size 498450  StdDev  167244.66
Container BCHM:LRU  Ops/s   38313.89  Gets/s   35514.87  Puts/s   24921.89  
Removes/s   27617.36  HitRatio  77.67  Size 500096  StdDev  193880.45
Container BCHM:LRU_OLD  Ops/s2717.51  Gets/s2600.66  Puts/s1035.20  
Removes/s1197.20  HitRatio  79.14  Size 499573  StdDev  180107.36
Container CLHM  Ops/s   77609.22  Gets/s   70243.55  Puts/s   65605.74  
Removes/s   72464.47  HitRatio  78.64  Size 499968  StdDev  182996.20
Container SLHM  Ops/s   18812.45  Gets/s   16125.15  Puts/s   25319.02  
Removes/s   14003.08  HitRatio  81.44  Size 496624  StdDev  183041.62
[testng-MapStressTest] Test 
testReadWriteRemove(org.infinispan.stress.MapStressTest) succeeded.
Test suite progress: tests succeeded: 2, failed: 0, skipped: 0.
Testing independent read/write/remove performance with capacity 50, keys 
200, concurrency level 32, readers 90, writers 9, removers 1
Container BCHM:LIRS Ops/s   10579.80  Gets/s   10346.59  Puts/s2109.26  
Removes/s2002.00  HitRatio  50.88  Size 499969  StdDev  245820.97
Container BCHM:LRU  Ops/s   35180.30  Gets/s   34372.36  Puts/s6871.62  
Removes/s   10867.85  HitRatio  50.08  

Re: [infinispan-dev] Faster LRU

2011-07-11 Thread Sanne Grinovero
Amazing, congratulations Vladimir  Dan!
___
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev


Re: [infinispan-dev] Faster LRU

2011-07-11 Thread Manik Surtani
Yeah, nice work guys.  :)

On 11 Jul 2011, at 17:10, Sanne Grinovero wrote:

 Amazing, congratulations Vladimir  Dan!
 ___
 infinispan-dev mailing list
 infinispan-dev@lists.jboss.org
 https://lists.jboss.org/mailman/listinfo/infinispan-dev

--
Manik Surtani
ma...@jboss.org
twitter.com/maniksurtani

Lead, Infinispan
http://www.infinispan.org



___
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev


Re: [infinispan-dev] New documentation type - Glossary

2011-07-11 Thread Galder Zamarreño

On Jul 8, 2011, at 2:27 PM, Manik Surtani wrote:

 Do we want to come up with a list of terms that definitely need defining, at 
 least from where we stand now?  More will be added incrementally, for sure.
 
 * Data Grid
 * In-memory data grid
 * Write through
 * Write behind
 * MVCC
 * Optimistic Locking
 * Pessimistic Locking
 * Deadlock
 * Livelock
 * Write skew
 * Isolation level
 * 2-phase commit
 * 1-phase commit
 * XA resource
 * JTA synchronization
 * Split brain/network partition
 
 ... what else?

Some that come to my mind:

* HotRod
* Read Committed
* Repeteable Read
* Externalizer

 
 
 
 On 7 Jul 2011, at 17:14, Pete Muir wrote:
 
 I've added a new docs category - Glossary. This is for defining terminology 
 we use that new users (and experienced users) might have trouble with. Data 
 Grid would be a great example.
 
 For each item in the glossary, create a page underneath Glossary, and then 
 we can link it in using an include. An item in the Glossary should use an 
 h2. heading and include the heading in the text body as well as have a page 
 title - this just makes the item reusable.
 ___
 infinispan-dev mailing list
 infinispan-dev@lists.jboss.org
 https://lists.jboss.org/mailman/listinfo/infinispan-dev
 
 --
 Manik Surtani
 ma...@jboss.org
 twitter.com/maniksurtani
 
 Lead, Infinispan
 http://www.infinispan.org
 
 
 
 
 ___
 infinispan-dev mailing list
 infinispan-dev@lists.jboss.org
 https://lists.jboss.org/mailman/listinfo/infinispan-dev

--
Galder Zamarreño
Sr. Software Engineer
Infinispan, JBoss Cache


___
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev


Re: [infinispan-dev] Faster LRU

2011-07-11 Thread Vladimir Blagojevic
On 11-07-08 5:44 AM, Dan Berindei wrote:
 I haven't run the tests with concurrency level 512, as the total
 number of threads is only 100, but I suspect the old LRU still won't
 catch up with the new LRU's performance.

 It's interesting that in the writeOnMiss test the new LRU performance
 dropped when I increased the concurrency level from 32 to 128. I think
 it might be because the eviction.thresholdExpired() check in
 BCHM.attemptEviction() is done without a lock and so it could return
 true simultaneously for multiple threads - which will all proceed to
 wait on the segment lock and attempt eviction at the same time.


I am not sure about this Dan. I looked at this code for hours! I do not 
see how two threads can call eviction#execute() concurrently.



___
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev


Re: [infinispan-dev] Faster LRU

2011-07-11 Thread Dan Berindei
On Mon, Jul 11, 2011 at 7:29 PM, Vladimir Blagojevic
vblag...@redhat.com wrote:
 On 11-07-08 5:44 AM, Dan Berindei wrote:

 I haven't run the tests with concurrency level 512, as the total
 number of threads is only 100, but I suspect the old LRU still won't
 catch up with the new LRU's performance.

 It's interesting that in the writeOnMiss test the new LRU performance
 dropped when I increased the concurrency level from 32 to 128. I think
 it might be because the eviction.thresholdExpired() check in
 BCHM.attemptEviction() is done without a lock and so it could return
 true simultaneously for multiple threads - which will all proceed to
 wait on the segment lock and attempt eviction at the same time.


 I am not sure about this Dan. I looked at this code for hours! I do not see
 how two threads can call eviction#execute() concurrently.


Sorry I wasn't very clear, two threads can enter attemptEviction
simultaneously, one will get the lock and perform eviction, the other
will also try to get the lock and when it gets it it will proceed to
call eviction#execute() again.
So eviction#execute() is not called concurrently, but it is called
twice when it should have been called only once, and I think this
dilutes the advantages of batching.
___
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev


Re: [infinispan-dev] Extend GridFS

2011-07-11 Thread Yuri de Wit
Hi Galder,

Thanks for your reply. Let me continue this discussion here first to
validate my thinking before I create any issues in JIRA (forgive me
for the lengthy follow up).

First of all, thanks for this wonderful project! I started looking
into Ehcache as the default caching implementation, but found it
lacking on some key features when using JGroups. My guess is that all
the development there is going towards the Terracotta distribution
instead of JGroups. Terracotta does seems like a wonderful product,
but I was hoping to stick to JGroups based caching impl. So I was
happy to have found Infinispan.

I need to create a distributed cache that loads data from the file
system. It's a tree of folders/files containing mostly metadata info
that changes seldom, but changes. Our mid-term goal is to move the
metadata away from the file system and into a database, but that is
not feasible now due to a tight deadline and the risks of refactoring
too much of the code base.

So I was happy to see the GridFilesystem implementation in Infinispan
and the fact that clustered caches can be lazily populated (the
metadata tree in the FS can be large and having all nodes in the
cluster preloaded with all the data would not work for us). However,
it defines it's own persistence scheme with specific file names and
serialized buckets, which would require us to have a cache-aside
strategy to read our metadata tree and populate the GridFilesystem
with it.

What I am looking for is to be able to plug into the GridFilesystem a
new FileCacheStore that can load directly from an existing directory
tree, transparently. This will basically automatically lazy load FS
content across the cluster without having to pre-populate the
GridFilesystem programatically.

At first I was hoping to extend the existing FileCacheStore to support
this (hence why I was asking for a GripInputStream.skip()/available()
implementation and make the constructors protected instead package
level access), but I later realized that what I needed was an entire
new implementation since the buckets abstraction there is not really
appropriate.

The good news is that I am close to 75% complete with the impl here.
It is working, with a few caveats, beautifully for a single node, but
I am facing some issues trying to launch a second node in the cluster
(most of it my ignorance, I am sure).

** Do you see any issues with this approach that I am not aware of?

In addition, I am having a couple of issues launching the second node
in the cluster. A couple of NPEs and an exception
java.net.NoRouteToHostException: No route to host. I will send the
details to these exceptions in a follow up email.

This is where I am stuck at the moment. In my setup I have two
configuration files:
* cache-master.xml
* cache-slave.xml
Both define data and the metadata caches required by GridFilesystem
but -master.xml configures the custom FileCacheStore I implemented and
-slave.xml uses the ClusterCacheLoader.

These are some of the items/todo's for this custom FileCacheStore impl:
** Implement chunked writes with a special chunking protocol to
trigger when the last chunk has been delivered
** custom configuration to simplify it for GridFilesystem.

regards,
-- yuri







 With the exception of supporting a safe chunk write (for now I am
sending the whole file content when writing to the cache, since
chunked write would require additional changes to GridFS such as a
protocol to let the loader know that the current chunk is the last one
and so it can finally update the underlying file as a whole, etc).
 can parsed and translate into file read on the real


Any change to implement the skip() and available() methods in
GridInputStream or make the constructors in the GridFileSystem package
public so I can easily extend them?

I am trying to plug in a custom FileMetadataCacheStore and
FileDataCacheStore implementation under the metadata/data caches used
by the GridFS so that loading from an existing FS it completely
transparent and lazy (I'll be happy to contribute if it makes sense).
The problem is that any BufferedInputStream's wrapped around the
GridInputStream call available/skip but they are not implemented in
gridfs.

Do you also see any issues with the above approach?

regards,


On Mon, Jul 11, 2011 at 12:06 PM, galderz
reply+m-9778632-9f501737dc4435143baf6908afcd349935f88...@reply.github.com
wrote:
 Hey Yuri,

 Why do you need two new file based stores? Can't you plug Infinispan with a 
 file based cache store to give you FS persistence?

 Anyway, I'd suggest you discuss it in the Infinispan dev list 
 (http://lists.jboss.org/pipermail/infinispan-dev/) and in parallel, create an 
 issue in https://issues.jboss.org/browse/ISPN

 Cheers,
 Galder

 --
 Reply to this email directly or view it on GitHub:
 http://github.com/inbox/9770320#reply

___
infinispan-dev mailing list
infinispan-dev@lists.jboss.org

Re: [infinispan-dev] Extend GridFS

2011-07-11 Thread Yuri de Wit
Following up on my setup for the GridFilesystem + custom FileCacheStore...

There is a single node (master) that is responsible to reading/writing
to the file system. There will be many more nodes (slaves) that will
fetch File content when needed. So my setup has two configuration
files:
* cache-master.xml, and
* cache-slave.xml

The cache-master.xml defines (using the std jgroups-udp.xml conf):
--
namedCache name=type-metadata
clustering mode=replication
stateRetrieval timeout=2 
fetchInMemoryState=true
alwaysProvideInMemoryState=true /
sync replTimeout=2 /
/clustering
loaders passivation=false shared=true preload=true
loader 
class=com.my.cache.loaders.FileMetadataCacheStore
fetchPersistentState=false 
purgeOnStartup=false
properties
property name=location value=/data 
/
/properties
/loader
/loaders
/namedCache

namedCache name=type-data
clustering mode=invalidation
sync replTimeout=2 /
/clustering
loaders passivation=false shared=true preload=false
loader class=com.my.cache.loaders.FileDataCacheStore
fetchPersistentState=false 
purgeOnStartup=false
properties
property name=location value=/data 
/
/properties
/loader
/loaders
/namedCache
--
And here is the cache-slave.xml (also using the std jgroups-udp.xml conf):
--
namedCache name=type-metadata
clustering mode=replication
stateRetrieval timeout=2 
fetchInMemoryState=true
alwaysProvideInMemoryState=true /
sync replTimeout=2 /
/clustering
loaders preload=true
loader 
class=org.infinispan.loaders.cluster.ClusterCacheLoader
properties
property name=remoteCallTimeout 
value=2 /
/properties
/loader
/loaders
/namedCache

namedCache name=type-data
clustering mode=invalidation
sync replTimeout=2 /
/clustering
loaders preload=false
loader 
class=org.infinispan.loaders.cluster.ClusterCacheLoader
properties
property name=remoteCallTimeout 
value=2 /
/properties
/loader
/loaders
/namedCache
--

The master starts up fine, but when starting the 1st slave I get:

java.net.NoRouteToHostException: No route to host
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:351)
at java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:213)
at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:200)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:432)
at java.net.Socket.connect(Socket.java:529)
at org.jgroups.util.Util.connect(Util.java:276)
at 
org.jgroups.protocols.pbcast.STREAMING_STATE_TRANSFER.connectToStateProvider(STREAMING_STATE_TRANSFER.java:510)
at 
org.jgroups.protocols.pbcast.STREAMING_STATE_TRANSFER.handleStateRsp(STREAMING_STATE_TRANSFER.java:462)
at 
org.jgroups.protocols.pbcast.STREAMING_STATE_TRANSFER.up(STREAMING_STATE_TRANSFER.java:223)
at org.jgroups.protocols.FRAG2.up(FRAG2.java:189)
at org.jgroups.protocols.FlowControl.up(FlowControl.java:418)
at org.jgroups.protocols.FlowControl.up(FlowControl.java:400)
at org.jgroups.protocols.pbcast.GMS.up(GMS.java:891)
at org.jgroups.protocols.pbcast.STABLE.up(STABLE.java:246)
at org.jgroups.protocols.UNICAST.handleDataReceived(UNICAST.java:613)
at org.jgroups.protocols.UNICAST.up(UNICAST.java:294)


Any ideas?

thanks,
-- yuri

On Mon, Jul 11, 2011 at 8:58 PM, Yuri de Wit yde...@gmail.com wrote:
 Hi Galder,

 Thanks for your reply. Let me continue this discussion here first to
 validate my thinking before I create any issues in