Re: [infinispan-dev] Atomic operations and transactions
On 4 Jul 2011, at 07:57, Galder Zamarreño wrote: I get the feeling that those atomic operations are particularly useful when transactions are not used cos they allow you to reduce to cache operations to one, hence avoiding the need to use a lock or synchronized block, or in our case, a transaction. Precisely. I think the atomic ops should be documented such that they are not used within a transaction scope, possibly either: 1) suspending any ongoing tx if used, or 2) throwing an illegal state exception if used within a tx scope -- Manik Surtani ma...@jboss.org twitter.com/maniksurtani Lead, Infinispan http://www.infinispan.org ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev
Re: [infinispan-dev] Atomic operations and transactions
On 5 Jul 2011, at 10:23, Galder Zamarreño wrote: I've gone through some cases and end results would not differ at first glance if the atomic ops suspend the txs. The only thing that would change would be the expectations of lock acquisition timeouts by atomic ops within txs. There is also the expectation of being able to roll back which goes away. -- Manik Surtani ma...@jboss.org twitter.com/maniksurtani Lead, Infinispan http://www.infinispan.org ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev
Re: [infinispan-dev] transactions :: incremental locking
On 6 Jul 2011, at 16:41, Sanne Grinovero wrote: Very nice design, congratulations. Second picture is missing a final 4 step returning the ACK to N1 right? yes, this happens when 1a, 1b and 1c returned to N1. Abot the third proposal I was thinking about a possible improvement, not sure if it works: consider the list defining the keys being locked, already ordered. As far as I understood about the ordering on the hashweel, you can now split this list to have the groups which need to be sent to each node: k1, k2, k3, k4, k5 - N2{k1, k2} N3{k3, k4} N4{k5} i.e. the sequence does not need to be reordered and we can make use of that. Now instead of sending the full list in multicast to each involved node, we send the list only to the first node, and wait for an ACK from the last node involved in the sequence. This does not reduce the number of async rpcs being executed, but sends less packets on the wire, and has the advantage of never needing to go back to recover from a failed node or recover from a potential deadlock, as all keys are always taken in proper order, not optimistically. Your suggestion is a variation of 2nd approach(Async incremental locking). If we approach it this way, the prepare message, which is potentially big, is sent in sequence from one node to the other. With current approach the prepare is send in parallel, and the lock request sent from N2-N3-N4 has a smaller payload. [reusing the same picture of Hybrid incremental approach] assuming this list of keys, owned by these nodes: k1, k2, k3, k4, k5 - N2{k1, k2} N3{k3, k4} N4{k5} Now if N2 receives the list, it acquires k1, k2, in order, and sends the full list over to N3 with a message like N2{k1 *taken, k2 *taken} N3{k3, k4} N4{k5} and it will also wait for an async ACK. If it receives an ACK, N2's work is done. If it doesn't receive one in a reasonable time, it will assume This failure detection functionality is in jgroups, I wouldn't want to have it duplicated here as well. But indeed N2 can be notified by jgroups about this. N3 is dead: now the list of nodes changes, but the order in which these elements need to be acquired did not. In fact, some of these might be moved to N2 himself, some to N4 since the rehashing will move them to neighbours: N3 owned keys can be taken right away (again not violating the order), the rest of the list is again forwarded to the next alive node. The last node will return an ACK to the originating node to inform that the prepare phase was successful, and when he did it will send an ACK back to it's previous node to free it's resources, this one will propagate back the ACK, etc. No deadlocks, no timeouts, no undo operations. what am I missing? Cheers, Sanne 2011/7/6 Mircea Markus mircea.mar...@jboss.com: On 6 Jul 2011, at 09:00, Paolo Romano wrote: On 7/5/11 3:47 PM, Mircea Markus wrote: On 5 Jul 2011, at 15:39, Manik Surtani wrote: Good stuff, shame about the RPC count . ;) yeah. Still very valid when there are deadlocks, guess figures will tell us more precisely what the gain is I agree with Manik's observation. When benchmarking it, it would be interesting to compare it also with the total order multicast approach that Pedro has been developing. +1 When do you plan to have these locking optimizations available? We hope to have it in for 5.1, but it has a lower priority compared with the other lock related changes. ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev
Re: [infinispan-dev] transactions :: incremental locking
On 8 Jul 2011, at 10:34, Galder Zamarreño wrote: Mircea, thx for writing this up. Some comments: It's not clear to me, at least given the examples in the wiki, how problem of Tx1 and Tx2 with a and b explained in the introduction is fixed by the solution. The problem exposed: - with some right timing, during prepare time it is possible for these two transactions to deadlock: • Tx1 lock acquired on a @ N3 • Tx2 lock acquired on b @ N4 • Tx1 cannot progress acquiring lock on b @ N4, that lock is acquired by Tx2 • Tx2 cannot acquire lock on a @ N3 as that lock is held by Tx1 • Tx1 and Tx2 are waiting for each other = deadlock The 'alleged'' solution? In the example above, considering the view = {N1, N2, N3, N4} and incremental lock acquisition: • Tx1 acquires lock on a@ N3 • Tx2 tries to acquire lock on a@ N3. It cannot acquire and waits for Tx1 to release it • Tx1 acquires locks on b@ N4, completes and releases locks • Tx2 acquires lock on a@ N3 and completes as well Is the latter block supposed to show how the example in the introduction is fixed? If it is, it's not really doing it cos it's not explaining what happens to Tx2 that is trying to acquire locks on 'b'... Tx2 acquires lock on b @ N4 after locking a@N3, I've updated the document. The diagrams and the associated explanations are confusing too. Example: N1 first locks a@N3 (1) - in the diagram, (1) represents a call from N1 to N2 b@N4 (RPC 2) - That corresponds to to 5/6 arrows? I've also updated this, thanks! On Jul 5, 2011, at 3:25 PM, Mircea Markus wrote: Hi, This is the document describing the incremental optimistic locking Dan and myself discussed last week: http://community.jboss.org/wiki/IncrementalOptimisticLocking Unless I'm missing something, this together with lock reordering[1] cover 100% of the possible deadlock situations in the optimistic locking scheme[2] - which is pretty awesome! Cheers, Mircea [1] http://community.jboss.org/wiki/LockReorderingForAvoidingDeadlocks [2] http://community.jboss.org/wiki/OptimisticLockingInInfinispan ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev -- Galder Zamarreño Sr. Software Engineer Infinispan, JBoss Cache ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev
Re: [infinispan-dev] transactions :: incremental locking
2011/7/11 Mircea Markus mircea.mar...@jboss.com: On 6 Jul 2011, at 16:41, Sanne Grinovero wrote: Very nice design, congratulations. Second picture is missing a final 4 step returning the ACK to N1 right? yes, this happens when 1a, 1b and 1c returned to N1. Abot the third proposal I was thinking about a possible improvement, not sure if it works: consider the list defining the keys being locked, already ordered. As far as I understood about the ordering on the hashweel, you can now split this list to have the groups which need to be sent to each node: k1, k2, k3, k4, k5 - N2{k1, k2} N3{k3, k4} N4{k5} i.e. the sequence does not need to be reordered and we can make use of that. Now instead of sending the full list in multicast to each involved node, we send the list only to the first node, and wait for an ACK from the last node involved in the sequence. This does not reduce the number of async rpcs being executed, but sends less packets on the wire, and has the advantage of never needing to go back to recover from a failed node or recover from a potential deadlock, as all keys are always taken in proper order, not optimistically. Your suggestion is a variation of 2nd approach(Async incremental locking). If we approach it this way, the prepare message, which is potentially big, is sent in sequence from one node to the other. With current approach the prepare is send in parallel, and the lock request sent from N2-N3-N4 has a smaller payload. Yes it's a variation, and I wouldn't have thought of it without reading your nice proposal. Still I'm not understanding why the original suggestion is doing anything in parallel, since you suggest the payload must be kept low, you're sending multiple different keys over the wire from N1, so you can't use multicast and assuming you have a single wire these are sent in sequence too. About total number of parallel communications, the original approach has the same number of serialized node hops for the best scenario, but worse in case something needs to be rolled back, imho it gets complicated if you have to rollback a longer path when more nodes are involved! so my intention was to a) make it simpler b) while still optimising for the good case, having less chattiness in the worst case. I agree my last proposal sends keys over to nodes which are not interesting to them, still since it's not about the values I wouldn't expect this to be a big change in the packet size, and am wondering if this token ring approach would not be simpler: there are less packets going around at the same time, order is guaranteed, so I wouldn't be sure of this being less efficient on the network without some real world test. Cheers, Sanne [reusing the same picture of Hybrid incremental approach] assuming this list of keys, owned by these nodes: k1, k2, k3, k4, k5 - N2{k1, k2} N3{k3, k4} N4{k5} Now if N2 receives the list, it acquires k1, k2, in order, and sends the full list over to N3 with a message like N2{k1 *taken, k2 *taken} N3{k3, k4} N4{k5} and it will also wait for an async ACK. If it receives an ACK, N2's work is done. If it doesn't receive one in a reasonable time, it will assume This failure detection functionality is in jgroups, I wouldn't want to have it duplicated here as well. But indeed N2 can be notified by jgroups about this. N3 is dead: now the list of nodes changes, but the order in which these elements need to be acquired did not. In fact, some of these might be moved to N2 himself, some to N4 since the rehashing will move them to neighbours: N3 owned keys can be taken right away (again not violating the order), the rest of the list is again forwarded to the next alive node. The last node will return an ACK to the originating node to inform that the prepare phase was successful, and when he did it will send an ACK back to it's previous node to free it's resources, this one will propagate back the ACK, etc. No deadlocks, no timeouts, no undo operations. what am I missing? Cheers, Sanne 2011/7/6 Mircea Markus mircea.mar...@jboss.com: On 6 Jul 2011, at 09:00, Paolo Romano wrote: On 7/5/11 3:47 PM, Mircea Markus wrote: On 5 Jul 2011, at 15:39, Manik Surtani wrote: Good stuff, shame about the RPC count . ;) yeah. Still very valid when there are deadlocks, guess figures will tell us more precisely what the gain is I agree with Manik's observation. When benchmarking it, it would be interesting to compare it also with the total order multicast approach that Pedro has been developing. +1 When do you plan to have these locking optimizations available? We hope to have it in for 5.1, but it has a lower priority compared with the other lock related changes. ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev ___ infinispan-dev mailing list
Re: [infinispan-dev] New documentation type - Glossary
On 8 Jul 2011, at 13:27, Manik Surtani wrote: Do we want to come up with a list of terms that definitely need defining, at least from where we stand now? More will be added incrementally, for sure. * Data Grid http://www.jroller.com/cpurdy/entry/defining_a_data_grid It doesn't sound like a definition to me, but it is the only one I could find... * In-memory data grid * Write through * Write behind - read ahead - cache aside - near cache * MVCC * Optimistic Locking * Pessimistic Locking * Deadlock * Livelock * Write skew * Isolation level * 2-phase commit * 1-phase commit * XA resource * JTA synchronization * Split brain/network partition ... what else? On 7 Jul 2011, at 17:14, Pete Muir wrote: I've added a new docs category - Glossary. This is for defining terminology we use that new users (and experienced users) might have trouble with. Data Grid would be a great example. For each item in the glossary, create a page underneath Glossary, and then we can link it in using an include. An item in the Glossary should use an h2. heading and include the heading in the text body as well as have a page title - this just makes the item reusable. ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev -- Manik Surtani ma...@jboss.org twitter.com/maniksurtani Lead, Infinispan http://www.infinispan.org ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev
Re: [infinispan-dev] transactions :: incremental locking
On 11 Jul 2011, at 12:27, Sanne Grinovero wrote: 2011/7/11 Mircea Markus mircea.mar...@jboss.com: On 6 Jul 2011, at 16:41, Sanne Grinovero wrote: Very nice design, congratulations. Second picture is missing a final 4 step returning the ACK to N1 right? yes, this happens when 1a, 1b and 1c returned to N1. Abot the third proposal I was thinking about a possible improvement, not sure if it works: consider the list defining the keys being locked, already ordered. As far as I understood about the ordering on the hashweel, you can now split this list to have the groups which need to be sent to each node: k1, k2, k3, k4, k5 - N2{k1, k2} N3{k3, k4} N4{k5} i.e. the sequence does not need to be reordered and we can make use of that. Now instead of sending the full list in multicast to each involved node, we send the list only to the first node, and wait for an ACK from the last node involved in the sequence. This does not reduce the number of async rpcs being executed, but sends less packets on the wire, and has the advantage of never needing to go back to recover from a failed node or recover from a potential deadlock, as all keys are always taken in proper order, not optimistically. Your suggestion is a variation of 2nd approach(Async incremental locking). If we approach it this way, the prepare message, which is potentially big, is sent in sequence from one node to the other. With current approach the prepare is send in parallel, and the lock request sent from N2-N3-N4 has a smaller payload. Yes it's a variation, and I wouldn't have thought of it without reading your nice proposal. Still I'm not understanding why the original suggestion is doing anything in parallel, since you suggest the payload must be kept low, you're sending multiple different keys over the wire from N1, so you can't use multicast and assuming you have a single wire these are sent in sequence too. ATM the prepare is sent as a multicast, each node receiving all the key-set touched by the transaction. About total number of parallel communications, the original approach has the same number of serialized node hops for the best scenario, I'm a bit lost on what is the original approach :-) Is the Async incremental locking or the hybrid one? For the hybrid, the number of serialized node hops in the best scenario is 1. but worse in case something needs to be rolled back, imho it gets complicated if you have to rollback a longer path when more nodes are involved! yes, but the rollback should be seen as an exception in this situation, and the optimization is for the most common scenario which is lock acquisition succeeds. so my intention was to a) make it simpler b) while still optimising for the good case, having less chattiness in the worst case. The good case, at least compared with the hybrid approach, is less optimal: hybrid has serial 1 hop, this one has numNodesInvolvedInTx hops. I agree my last proposal sends keys over to nodes which are not interesting to them, still since it's not about the values I wouldn't expect this to be a big change in the packet size, and am wondering if this token ring approach would not be simpler: there are less packets going around at the same time, order is guaranteed, so I wouldn't be sure of this being less efficient on the network without some real world test. Cheers, Sanne [reusing the same picture of Hybrid incremental approach] assuming this list of keys, owned by these nodes: k1, k2, k3, k4, k5 - N2{k1, k2} N3{k3, k4} N4{k5} Now if N2 receives the list, it acquires k1, k2, in order, and sends the full list over to N3 with a message like N2{k1 *taken, k2 *taken} N3{k3, k4} N4{k5} and it will also wait for an async ACK. If it receives an ACK, N2's work is done. If it doesn't receive one in a reasonable time, it will assume This failure detection functionality is in jgroups, I wouldn't want to have it duplicated here as well. But indeed N2 can be notified by jgroups about this. N3 is dead: now the list of nodes changes, but the order in which these elements need to be acquired did not. In fact, some of these might be moved to N2 himself, some to N4 since the rehashing will move them to neighbours: N3 owned keys can be taken right away (again not violating the order), the rest of the list is again forwarded to the next alive node. The last node will return an ACK to the originating node to inform that the prepare phase was successful, and when he did it will send an ACK back to it's previous node to free it's resources, this one will propagate back the ACK, etc. No deadlocks, no timeouts, no undo operations. what am I missing? Cheers, Sanne 2011/7/6 Mircea Markus mircea.mar...@jboss.com: On 6 Jul 2011, at 09:00, Paolo Romano wrote: On 7/5/11 3:47 PM, Mircea Markus wrote: On 5 Jul 2011, at 15:39, Manik Surtani wrote: Good stuff, shame about the RPC count . ;) yeah. Still very valid
Re: [infinispan-dev] Atomic operations and transactions
On 11 Jul 2011, at 10:45, Manik Surtani wrote: On 4 Jul 2011, at 07:57, Galder Zamarreño wrote: I get the feeling that those atomic operations are particularly useful when transactions are not used cos they allow you to reduce to cache operations to one, hence avoiding the need to use a lock or synchronized block, or in our case, a transaction. Precisely. I think the atomic ops should be documented such that they are not used within a transaction scope, possibly either: 1) suspending any ongoing tx if used, or 2) throwing an illegal state exception if used within a tx scope +1 for the 2nd approach. At least up to the moment one comes with a use case for using them within a tx. ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev
Re: [infinispan-dev] Atomic operations and transactions
On 11 Jul 2011, at 13:21, Mircea Markus wrote: On 11 Jul 2011, at 10:45, Manik Surtani wrote: On 4 Jul 2011, at 07:57, Galder Zamarreño wrote: I get the feeling that those atomic operations are particularly useful when transactions are not used cos they allow you to reduce to cache operations to one, hence avoiding the need to use a lock or synchronized block, or in our case, a transaction. Precisely. I think the atomic ops should be documented such that they are not used within a transaction scope, possibly either: 1) suspending any ongoing tx if used, or 2) throwing an illegal state exception if used within a tx scope +1 for the 2nd approach. At least up to the moment one comes with a use case for using them within a tx. Yes, it is just more explicit that way. -- Manik Surtani ma...@jboss.org twitter.com/maniksurtani Lead, Infinispan http://www.infinispan.org ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev
Re: [infinispan-dev] transactions :: incremental locking
This is getting a bit confusing; sorry, I've tried to be brief but it seems I didn't improve the message. I'll add some more notes below as they belong to the context, but let's talk about this on IRC. 2011/7/11 Mircea Markus mircea.mar...@jboss.com: On 11 Jul 2011, at 12:27, Sanne Grinovero wrote: 2011/7/11 Mircea Markus mircea.mar...@jboss.com: On 6 Jul 2011, at 16:41, Sanne Grinovero wrote: Very nice design, congratulations. Second picture is missing a final 4 step returning the ACK to N1 right? yes, this happens when 1a, 1b and 1c returned to N1. Abot the third proposal I was thinking about a possible improvement, not sure if it works: consider the list defining the keys being locked, already ordered. As far as I understood about the ordering on the hashweel, you can now split this list to have the groups which need to be sent to each node: k1, k2, k3, k4, k5 - N2{k1, k2} N3{k3, k4} N4{k5} i.e. the sequence does not need to be reordered and we can make use of that. Now instead of sending the full list in multicast to each involved node, we send the list only to the first node, and wait for an ACK from the last node involved in the sequence. This does not reduce the number of async rpcs being executed, but sends less packets on the wire, and has the advantage of never needing to go back to recover from a failed node or recover from a potential deadlock, as all keys are always taken in proper order, not optimistically. Your suggestion is a variation of 2nd approach(Async incremental locking). If we approach it this way, the prepare message, which is potentially big, is sent in sequence from one node to the other. With current approach the prepare is send in parallel, and the lock request sent from N2-N3-N4 has a smaller payload. Yes it's a variation, and I wouldn't have thought of it without reading your nice proposal. Still I'm not understanding why the original suggestion is doing anything in parallel, since you suggest the payload must be kept low, you're sending multiple different keys over the wire from N1, so you can't use multicast and assuming you have a single wire these are sent in sequence too. ATM the prepare is sent as a multicast, each node receiving all the key-set touched by the transaction. About total number of parallel communications, the original approach has the same number of serialized node hops for the best scenario, I'm a bit lost on what is the original approach :-) Is the Async incremental locking or the hybrid one? For the hybrid, the number of serialized node hops in the best scenario is 1. I'm sorry, I'm referring to Hybrid incremental approach. How can you have a value of 1 if the nodes have to confirm in sequence? As the picture shows, there are messages labeled in sequence 1,2,3,4, and it should need a final ACK (5) too. The small change I'm suggesting doesn't improve this that much, it's only avoiding need for rolling back lock acquisitions: since you need 1-5 messages in series, I'm suggesting we take advantage of that, admittely using a larger payload. Thinking about it now, it's also making sure that locks are kept for a smaller time: if all locks are acquired during the initial multicast from message in phase 1, while with the additional proposal I've sent some locks are acquired a bit later in terms of network messages; no doubt the change is small, but this might make it possible from some transactions to make progress quicker. but worse in case something needs to be rolled back, imho it gets complicated if you have to rollback a longer path when more nodes are involved! yes, but the rollback should be seen as an exception in this situation, and the optimization is for the most common scenario which is lock acquisition succeeds. so my intention was to a) make it simpler b) while still optimising for the good case, having less chattiness in the worst case. The good case, at least compared with the hybrid approach, is less optimal: hybrid has serial 1 hop, this one has numNodesInvolvedInTx hops. I'm failing to understand how you can have a single 1 hop in series, you must ack the lock acquisition in order to achieve your goal of deadlock prevention; there must be a misunderstanding, as I'm experiencing the opposite interpretation; actually I think your explanation and solution on the wiki was brilliant but we're getting entangled on the mailing list. Did you try expanding your schema on more nodes, what should happen if the first node fails in a series of a dozen? It doesn't scale with the number of involved servers, and you're keeping some locks for the time of multiple network hops, while these locks should actually be free and are blocking other transactions. You might well end up in a spinning situation as the delays are in terms of multiple network hops: you blocked another TX which needs to restart from scratch in the locking fight, and abort yourself to retry as well: that's a lot of wasted
Re: [infinispan-dev] Cutting 5.0.0.Final
Hey Manik, I'd say we need a release plan. What else needs to be done, last fixes (e.g LRU), performance tests, documentation etc etc. Vladimir On 11-07-11 7:25 AM, Manik Surtani wrote: Guys, any thoughts on this? Anything that's come up that needs addressing? We seem to have a few minor open JIRAs, nothing significant... -- Manik Surtani ma...@jboss.org twitter.com/maniksurtani Lead, Infinispan http://www.infinispan.org ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev
Re: [infinispan-dev] Cutting 5.0.0.Final
Yes, agreed, I am in the process if piecing one together. Just wanted to make sure code-wise, we're ready On 11 Jul 2011, at 14:41, Vladimir Blagojevic wrote: Hey Manik, I'd say we need a release plan. What else needs to be done, last fixes (e.g LRU), performance tests, documentation etc etc. Vladimir On 11-07-11 7:25 AM, Manik Surtani wrote: Guys, any thoughts on this? Anything that's come up that needs addressing? We seem to have a few minor open JIRAs, nothing significant... -- Manik Surtani ma...@jboss.org twitter.com/maniksurtani Lead, Infinispan http://www.infinispan.org ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev -- Manik Surtani ma...@jboss.org twitter.com/maniksurtani Lead, Infinispan http://www.infinispan.org ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev
Re: [infinispan-dev] Faster LRU
On Fri, Jul 8, 2011 at 12:44 PM, Dan Berindei dan.berin...@gmail.com wrote: It's interesting that in the writeOnMiss test the new LRU performance dropped when I increased the concurrency level from 32 to 128. I think it might be because the eviction.thresholdExpired() check in BCHM.attemptEviction() is done without a lock and so it could return true simultaneously for multiple threads - which will all proceed to wait on the segment lock and attempt eviction at the same time. I added another batch threshold check while holding the lock and the performance anomaly disappeared. Another strange pattern is that neither eviction policy respects the capacity parameter exactly. LIRS rounds up the capacity to the next power of 2, and LRU/LRUOld do the same rounding and then multiply by 0.75. I updated BCHM to pass the initial capacity to the eviction policy, and now all the policies keep the same number of entries. I'll report again once I fixed these and once I update the reporting - I think the total number of misses might be more relevant than the standard deviation of the keys at the end. I got the tests running on the test cluster again and the new LRU is clearly better in every respect than the old LRU. In fact the only problem in the test results is that in the new writeOnMiss scenario the hit ratio of the new LRU is much better than all the other policies. There is probably a mistake somewhere in the test, if it was a random thing I don't think it would have been so visible in a 20-minutes test: MapStressTest configuration: capacity 50, test running time 1200 seconds Container BCHM:LIRS Ops/s 11155.49 HitRatio 96.54 Size 499968 stdDev 193558.30 Container BCHM:LRU Ops/s 31292.06 HitRatio 97.84 Size 50 stdDev 193168.07 Container BCHM:LRU_OLD Ops/s 116.89 HitRatio 76.11 Size 500032 stdDev 197974.87 Testing write on miss performance with capacity 50, keys 200, concurrency level 32, threads 100 Container BCHM:LIRS Ops/s1684.01 HitRatio 63.13 Size 499968 stdDev 338637.40 Container BCHM:LRU Ops/s4884.57 HitRatio 84.47 Size 50 stdDev 353336.31 Container BCHM:LRU_OLD Ops/s 50.69 HitRatio 41.34 Size 500032 stdDev 361239.68 MapStressTest configuration: capacity 50, test running time 1200 seconds Testing independent read/write/remove performance with capacity 50, keys 100, concurrency level 32, readers 90, writers 9, removers 1 Container BCHM:LIRS Ops/s 14865.45 Gets/s 13971.36 Puts/s8424.86 Removes/s4640.39 HitRatio 83.10 Size 499972 StdDev 168386.87 Container BCHM:LRU Ops/s 35970.38 Gets/s 33800.61 Puts/s 19273.89 Removes/s 21815.97 HitRatio 78.64 Size 50 StdDev 192598.94 Container BCHM:LRU_OLD Ops/s 496.10 Gets/s 467.32 Puts/s 258.91 Removes/s 269.63 HitRatio 69.66 Size 500032 StdDev 182468.36 Container CLHM Ops/s 56074.13 Gets/s 50656.76 Puts/s 48274.11 Removes/s 53090.65 HitRatio 77.62 Size 50 StdDev 180395.29 Container SLHM Ops/s 20603.49 Gets/s 18110.83 Puts/s 22696.60 Removes/s 20080.24 HitRatio 80.12 Size 499401 StdDev 179188.12 [testng-MapStressTest] Test testReadWriteRemove(org.infinispan.stress.MapStressTest) succeeded. Test suite progress: tests succeeded: 1, failed: 0, skipped: 0. Testing independent read/write/remove performance with capacity 50, keys 100, concurrency level 128, readers 90, writers 9, removers 1 Container BCHM:LIRS Ops/s 23567.68 Gets/s 21558.38 Puts/s 19691.52 Removes/s3617.48 HitRatio 84.73 Size 498450 StdDev 167244.66 Container BCHM:LRU Ops/s 38313.89 Gets/s 35514.87 Puts/s 24921.89 Removes/s 27617.36 HitRatio 77.67 Size 500096 StdDev 193880.45 Container BCHM:LRU_OLD Ops/s2717.51 Gets/s2600.66 Puts/s1035.20 Removes/s1197.20 HitRatio 79.14 Size 499573 StdDev 180107.36 Container CLHM Ops/s 77609.22 Gets/s 70243.55 Puts/s 65605.74 Removes/s 72464.47 HitRatio 78.64 Size 499968 StdDev 182996.20 Container SLHM Ops/s 18812.45 Gets/s 16125.15 Puts/s 25319.02 Removes/s 14003.08 HitRatio 81.44 Size 496624 StdDev 183041.62 [testng-MapStressTest] Test testReadWriteRemove(org.infinispan.stress.MapStressTest) succeeded. Test suite progress: tests succeeded: 2, failed: 0, skipped: 0. Testing independent read/write/remove performance with capacity 50, keys 200, concurrency level 32, readers 90, writers 9, removers 1 Container BCHM:LIRS Ops/s 10579.80 Gets/s 10346.59 Puts/s2109.26 Removes/s2002.00 HitRatio 50.88 Size 499969 StdDev 245820.97 Container BCHM:LRU Ops/s 35180.30 Gets/s 34372.36 Puts/s6871.62 Removes/s 10867.85 HitRatio 50.08
Re: [infinispan-dev] Faster LRU
Amazing, congratulations Vladimir Dan! ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev
Re: [infinispan-dev] Faster LRU
Yeah, nice work guys. :) On 11 Jul 2011, at 17:10, Sanne Grinovero wrote: Amazing, congratulations Vladimir Dan! ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev -- Manik Surtani ma...@jboss.org twitter.com/maniksurtani Lead, Infinispan http://www.infinispan.org ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev
Re: [infinispan-dev] New documentation type - Glossary
On Jul 8, 2011, at 2:27 PM, Manik Surtani wrote: Do we want to come up with a list of terms that definitely need defining, at least from where we stand now? More will be added incrementally, for sure. * Data Grid * In-memory data grid * Write through * Write behind * MVCC * Optimistic Locking * Pessimistic Locking * Deadlock * Livelock * Write skew * Isolation level * 2-phase commit * 1-phase commit * XA resource * JTA synchronization * Split brain/network partition ... what else? Some that come to my mind: * HotRod * Read Committed * Repeteable Read * Externalizer On 7 Jul 2011, at 17:14, Pete Muir wrote: I've added a new docs category - Glossary. This is for defining terminology we use that new users (and experienced users) might have trouble with. Data Grid would be a great example. For each item in the glossary, create a page underneath Glossary, and then we can link it in using an include. An item in the Glossary should use an h2. heading and include the heading in the text body as well as have a page title - this just makes the item reusable. ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev -- Manik Surtani ma...@jboss.org twitter.com/maniksurtani Lead, Infinispan http://www.infinispan.org ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev -- Galder Zamarreño Sr. Software Engineer Infinispan, JBoss Cache ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev
Re: [infinispan-dev] Faster LRU
On 11-07-08 5:44 AM, Dan Berindei wrote: I haven't run the tests with concurrency level 512, as the total number of threads is only 100, but I suspect the old LRU still won't catch up with the new LRU's performance. It's interesting that in the writeOnMiss test the new LRU performance dropped when I increased the concurrency level from 32 to 128. I think it might be because the eviction.thresholdExpired() check in BCHM.attemptEviction() is done without a lock and so it could return true simultaneously for multiple threads - which will all proceed to wait on the segment lock and attempt eviction at the same time. I am not sure about this Dan. I looked at this code for hours! I do not see how two threads can call eviction#execute() concurrently. ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev
Re: [infinispan-dev] Faster LRU
On Mon, Jul 11, 2011 at 7:29 PM, Vladimir Blagojevic vblag...@redhat.com wrote: On 11-07-08 5:44 AM, Dan Berindei wrote: I haven't run the tests with concurrency level 512, as the total number of threads is only 100, but I suspect the old LRU still won't catch up with the new LRU's performance. It's interesting that in the writeOnMiss test the new LRU performance dropped when I increased the concurrency level from 32 to 128. I think it might be because the eviction.thresholdExpired() check in BCHM.attemptEviction() is done without a lock and so it could return true simultaneously for multiple threads - which will all proceed to wait on the segment lock and attempt eviction at the same time. I am not sure about this Dan. I looked at this code for hours! I do not see how two threads can call eviction#execute() concurrently. Sorry I wasn't very clear, two threads can enter attemptEviction simultaneously, one will get the lock and perform eviction, the other will also try to get the lock and when it gets it it will proceed to call eviction#execute() again. So eviction#execute() is not called concurrently, but it is called twice when it should have been called only once, and I think this dilutes the advantages of batching. ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev
Re: [infinispan-dev] Extend GridFS
Hi Galder, Thanks for your reply. Let me continue this discussion here first to validate my thinking before I create any issues in JIRA (forgive me for the lengthy follow up). First of all, thanks for this wonderful project! I started looking into Ehcache as the default caching implementation, but found it lacking on some key features when using JGroups. My guess is that all the development there is going towards the Terracotta distribution instead of JGroups. Terracotta does seems like a wonderful product, but I was hoping to stick to JGroups based caching impl. So I was happy to have found Infinispan. I need to create a distributed cache that loads data from the file system. It's a tree of folders/files containing mostly metadata info that changes seldom, but changes. Our mid-term goal is to move the metadata away from the file system and into a database, but that is not feasible now due to a tight deadline and the risks of refactoring too much of the code base. So I was happy to see the GridFilesystem implementation in Infinispan and the fact that clustered caches can be lazily populated (the metadata tree in the FS can be large and having all nodes in the cluster preloaded with all the data would not work for us). However, it defines it's own persistence scheme with specific file names and serialized buckets, which would require us to have a cache-aside strategy to read our metadata tree and populate the GridFilesystem with it. What I am looking for is to be able to plug into the GridFilesystem a new FileCacheStore that can load directly from an existing directory tree, transparently. This will basically automatically lazy load FS content across the cluster without having to pre-populate the GridFilesystem programatically. At first I was hoping to extend the existing FileCacheStore to support this (hence why I was asking for a GripInputStream.skip()/available() implementation and make the constructors protected instead package level access), but I later realized that what I needed was an entire new implementation since the buckets abstraction there is not really appropriate. The good news is that I am close to 75% complete with the impl here. It is working, with a few caveats, beautifully for a single node, but I am facing some issues trying to launch a second node in the cluster (most of it my ignorance, I am sure). ** Do you see any issues with this approach that I am not aware of? In addition, I am having a couple of issues launching the second node in the cluster. A couple of NPEs and an exception java.net.NoRouteToHostException: No route to host. I will send the details to these exceptions in a follow up email. This is where I am stuck at the moment. In my setup I have two configuration files: * cache-master.xml * cache-slave.xml Both define data and the metadata caches required by GridFilesystem but -master.xml configures the custom FileCacheStore I implemented and -slave.xml uses the ClusterCacheLoader. These are some of the items/todo's for this custom FileCacheStore impl: ** Implement chunked writes with a special chunking protocol to trigger when the last chunk has been delivered ** custom configuration to simplify it for GridFilesystem. regards, -- yuri With the exception of supporting a safe chunk write (for now I am sending the whole file content when writing to the cache, since chunked write would require additional changes to GridFS such as a protocol to let the loader know that the current chunk is the last one and so it can finally update the underlying file as a whole, etc). can parsed and translate into file read on the real Any change to implement the skip() and available() methods in GridInputStream or make the constructors in the GridFileSystem package public so I can easily extend them? I am trying to plug in a custom FileMetadataCacheStore and FileDataCacheStore implementation under the metadata/data caches used by the GridFS so that loading from an existing FS it completely transparent and lazy (I'll be happy to contribute if it makes sense). The problem is that any BufferedInputStream's wrapped around the GridInputStream call available/skip but they are not implemented in gridfs. Do you also see any issues with the above approach? regards, On Mon, Jul 11, 2011 at 12:06 PM, galderz reply+m-9778632-9f501737dc4435143baf6908afcd349935f88...@reply.github.com wrote: Hey Yuri, Why do you need two new file based stores? Can't you plug Infinispan with a file based cache store to give you FS persistence? Anyway, I'd suggest you discuss it in the Infinispan dev list (http://lists.jboss.org/pipermail/infinispan-dev/) and in parallel, create an issue in https://issues.jboss.org/browse/ISPN Cheers, Galder -- Reply to this email directly or view it on GitHub: http://github.com/inbox/9770320#reply ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org
Re: [infinispan-dev] Extend GridFS
Following up on my setup for the GridFilesystem + custom FileCacheStore... There is a single node (master) that is responsible to reading/writing to the file system. There will be many more nodes (slaves) that will fetch File content when needed. So my setup has two configuration files: * cache-master.xml, and * cache-slave.xml The cache-master.xml defines (using the std jgroups-udp.xml conf): -- namedCache name=type-metadata clustering mode=replication stateRetrieval timeout=2 fetchInMemoryState=true alwaysProvideInMemoryState=true / sync replTimeout=2 / /clustering loaders passivation=false shared=true preload=true loader class=com.my.cache.loaders.FileMetadataCacheStore fetchPersistentState=false purgeOnStartup=false properties property name=location value=/data / /properties /loader /loaders /namedCache namedCache name=type-data clustering mode=invalidation sync replTimeout=2 / /clustering loaders passivation=false shared=true preload=false loader class=com.my.cache.loaders.FileDataCacheStore fetchPersistentState=false purgeOnStartup=false properties property name=location value=/data / /properties /loader /loaders /namedCache -- And here is the cache-slave.xml (also using the std jgroups-udp.xml conf): -- namedCache name=type-metadata clustering mode=replication stateRetrieval timeout=2 fetchInMemoryState=true alwaysProvideInMemoryState=true / sync replTimeout=2 / /clustering loaders preload=true loader class=org.infinispan.loaders.cluster.ClusterCacheLoader properties property name=remoteCallTimeout value=2 / /properties /loader /loaders /namedCache namedCache name=type-data clustering mode=invalidation sync replTimeout=2 / /clustering loaders preload=false loader class=org.infinispan.loaders.cluster.ClusterCacheLoader properties property name=remoteCallTimeout value=2 / /properties /loader /loaders /namedCache -- The master starts up fine, but when starting the 1st slave I get: java.net.NoRouteToHostException: No route to host at java.net.PlainSocketImpl.socketConnect(Native Method) at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:351) at java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:213) at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:200) at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:432) at java.net.Socket.connect(Socket.java:529) at org.jgroups.util.Util.connect(Util.java:276) at org.jgroups.protocols.pbcast.STREAMING_STATE_TRANSFER.connectToStateProvider(STREAMING_STATE_TRANSFER.java:510) at org.jgroups.protocols.pbcast.STREAMING_STATE_TRANSFER.handleStateRsp(STREAMING_STATE_TRANSFER.java:462) at org.jgroups.protocols.pbcast.STREAMING_STATE_TRANSFER.up(STREAMING_STATE_TRANSFER.java:223) at org.jgroups.protocols.FRAG2.up(FRAG2.java:189) at org.jgroups.protocols.FlowControl.up(FlowControl.java:418) at org.jgroups.protocols.FlowControl.up(FlowControl.java:400) at org.jgroups.protocols.pbcast.GMS.up(GMS.java:891) at org.jgroups.protocols.pbcast.STABLE.up(STABLE.java:246) at org.jgroups.protocols.UNICAST.handleDataReceived(UNICAST.java:613) at org.jgroups.protocols.UNICAST.up(UNICAST.java:294) Any ideas? thanks, -- yuri On Mon, Jul 11, 2011 at 8:58 PM, Yuri de Wit yde...@gmail.com wrote: Hi Galder, Thanks for your reply. Let me continue this discussion here first to validate my thinking before I create any issues in