Re: [infinispan-dev] Partial state transfer
On 6/20/11 10:02 PM, Manik Surtani wrote: 5.1 is possible, 5.0 would be very tough. Fine with me then; this gives me more time for 3.0 What are the implications for our current implementation on 5.0 though? State going missing? No, Infinispan 5.0 uses JGroups 2.12.x, which still has partial state. -- Bela Ban Lead JGroups / Clustering Team JBoss ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev
Re: [infinispan-dev] Partial state transfer
On 21 Jun 2011, at 07:01, Bela Ban wrote: What are the implications for our current implementation on 5.0 though? State going missing? No, Infinispan 5.0 uses JGroups 2.12.x, which still has partial state. Yes, but I was asking about the inconsistency in the design of partial state transfer that you mentioned at the start of this thread. -- Manik Surtani ma...@jboss.org twitter.com/maniksurtani Lead, Infinispan http://www.infinispan.org ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev
Re: [infinispan-dev] Partial state transfer
5.1 is possible, 5.0 would be very tough. What are the implications for our current implementation on 5.0 though? State going missing? On 16 Jun 2011, at 17:56, Bela Ban wrote: Correct. Time frame would ideally be 5.0, but realistically it will probably be 5.1. Is that feasible from a roadmap point of view ? On 6/16/11 6:47 PM, Vladimir Blagojevic wrote: In another words the essential problem is that digest and channel state are per channel abstractions and they do not fit nicely with higher level abstractions like substates? We use partial state transfer in Infinispan and we need to address this. What is the time frame here? 5.0 final release? Vladimir On 11-06-15 11:39 AM, Bela Ban wrote: I looked into adding partial state transfer back into JGroups, but found out that partial state transfer is fundamentally flawed, something I've always suspected ! (Regular state transfer is correct, and has always been correct). - Say we have node A and B. B requests the state from A - There are partial states X and Y - Message M1 modifies X, M2 modifies Y Here's what happens: T1: A multicasts M1 T2: A delivers M1, and changes X T3: B sends a GET_STATE(Y) request to A // partial state request for state Y T4: A multicasts M2 T5: A delivers M2, changing Y T6: A receives the GET_STATE request, sends a SET_STATE response back including Y and the digest (including M1 and M2's seqnos) T7: B receives the SET_STATE response, sets its digest (includes now M1 and M2) and state Y *BUT NOT* state X ! T8: *** B receives M1, discards it because it is already in its digest *** T9: B receives M2, and also discards it At time T8, M1 (which would have changed state X) is discarded, because it is already in the digest sent with the SET_STATE response. Therefore state X is now incorrect, as M1 was never applied ! As a summary, if we get a number of updates to partial states, and don't receive all of them before requesting the partial state, the last update includes in the digest wins... I'm a real idiot, as I've written this down before, in 2006: see [1] for details. In a nutshell, [1] shows that partial state transfer doesn't work, unless virtual synchrony (FLUSH) is used. So I propose Infinispan and JBoss AS look into how they can replace their use of partial state transfer. I suggest Infinispan uses the same approach already used for state transfer with mode=distribution. Opinions ? [1] https://github.com/belaban/JGroups/blob/master/doc/design/PartialStateTransfer.txt -- Bela Ban Lead JGroups / Clustering Team JBoss ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev -- Manik Surtani ma...@jboss.org twitter.com/maniksurtani Lead, Infinispan http://www.infinispan.org ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev
Re: [infinispan-dev] Partial state transfer in Infinispan
On 17 Jun 2011, at 13:49, Mircea Markus wrote: But yes, there is no reason why we can't replace this with RPC as per Distribution, however I think we do need a streaming solution - not just for replication but distribution as well. As such I'd only want to re-implement this bit once, rather than a temp RPC based solution first. So we need a mechanism to either: Now this might sound a bit too radical but do we really need REPLICATED mode? Yes. :-) REPL is ideal for certain common usage patterns: read-heavy, small clusters (under 10 nodes), small overall data volume (fits in any single JVM's heap). This gives fast reads since reads are always local, etc. This is not fully brewed, but if e.g. we set numOwners = Integer.MAX_INTEGER the cluster is effectively in replicated mode, so can't we just drop the REPLICATION entirely? This would reduce the code size significantly... We could in theory achieve this as you suggest with numOwners = MAX_INTEGER, but internally we'd still be best off implementing this as we have right now. Saves on a lot of overhead (memory as well as processing) compared to a DIST-like setup with an unlimited number of data owners. (1) open a separate TCP socket for the sake of streaming state, or (2) reuse the sockets JGroups opens. They both have their pros and cons. (1) is more configuration, firewall setup, and a spiderweb of connections in a large grid (2) would mean multiplexing with JGroups' use of the socket. Having our own sockets might cause an administration complications. Also borrowing sockets from jgroups doesn't seems nice...I'm not a fan of either solution really: I think this should be transport's responsibility and we should enhance jgroups to offer the streaming service. Yes, it would be implemented in our Transport abstraction, for sure. So code wouldn't leak, but all the same there is no hard requirement that JGroups provides this (since it could be impl'd in JGroupsTransport). -- Manik Surtani ma...@jboss.org twitter.com/maniksurtani Lead, Infinispan http://www.infinispan.org ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev
Re: [infinispan-dev] Partial state transfer in Infinispan
On 9 Jun 2011, at 15:26, Manik Surtani wrote: We use partial state transfer not to generate partial state per cache, but the entire state per cache, but since we have 1 cache sharing a given JGroups channel, as far as JGroups in concerned this *is* partial state of a node. I.e., the state of just 1 cache on a channel, not all the caches. So we actually use cacheName as the state identifier (in JGroups' ExtendedMessageListener). But yes, there is no reason why we can't replace this with RPC as per Distribution, however I think we do need a streaming solution - not just for replication but distribution as well. As such I'd only want to re-implement this bit once, rather than a temp RPC based solution first. So we need a mechanism to either: Now this might sound a bit too radical but do we really need REPLICATED mode? This is not fully brewed, but if e.g. we set numOwners = Integer.MAX_INTEGER the cluster is effectively in replicated mode, so can't we just drop the REPLICATION entirely? This would reduce the code size significantly... (1) open a separate TCP socket for the sake of streaming state, or (2) reuse the sockets JGroups opens. They both have their pros and cons. (1) is more configuration, firewall setup, and a spiderweb of connections in a large grid (2) would mean multiplexing with JGroups' use of the socket. Having our own sockets might cause an administration complications. Also borrowing sockets from jgroups doesn't seems nice...I'm not a fan of either solution really: I think this should be transport's responsibility and we should enhance jgroups to offer the streaming service. ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev
Re: [infinispan-dev] Partial state transfer in Infinispan
On 6/17/11 2:49 PM, Mircea Markus wrote: Now this might sound a bit too radical but do we really need REPLICATED mode? This is not fully brewed, but if e.g. we set numOwners = Integer.MAX_INTEGER the cluster is effectively in replicated mode, so can't we just drop the REPLICATION entirely? This would reduce the code size significantly... This is not the same as replicated mode. With numOwners cluster size N, we send N unicasts for updates. This is inefficient, a multicast is much better here. Also, for gets, we pick a node, which might not be the local node, whereas in replicated mode, we always pick the local node. Plus, we'd have to de-activate the entire rebalancing code, as it's not needed in replicated mode. One thought though is to see if we should (internally) switch to replicated mode if numOwners cluster size. A bit dangerous though because if this condition changes (e.g. more nodes are addded), we'd have to switch back... -- Bela Ban Lead JGroups / Clustering Team JBoss ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev
Re: [infinispan-dev] Partial state transfer in Infinispan
On Wed, 2011-06-01 at 16:46 +0200, Bela Ban wrote: On 6/1/11 4:21 PM, Sanne Grinovero wrote: Hi Bela, 2011/6/1 Bela Banb...@redhat.com: We currently use JGroups' partial state transfer to transfer individual caches from one Infinispan instance to another. Since I got rid of partial state transfer in JGroups 3.0, and don't like to add it back, I'd like to know whether this is still needed. I thought that we currently require the same set of caches to be available in all Infinispan instances, and the reason (IIRC) was that distribution wouldn't work if we have caches 1 and 2 available on instances A and B, but not on C, because consistent hashing distributes the data based on views, and we didn't want to have to keep track of individual caches... Well I really don't like this limitation in Infinispan and was actually hoping that we could remove it at some point. Imagine the scenario in which you have a running cluster, and at some point the new release of your application needs an additional cache: there's no way to start a new node having this new cache. Yes, I fully agree. Another example is HTTP web sessions: currently 1 webapp == 1 cache, so we currently require the same webapps to be deployed in all JBoss instances *if* we use replication (distribution is different) ! As of AS7, web sessions for all webapps are stored in a single cache. https://issues.jboss.org/browse/JBCLUSTER-293 Also right now when an application starts, it's possible with the proper timing that it joins the cluster before having defined and started all caches (starting the cachemanager and the caches is not an atomic operation), basically failing to start because of this limitation. Yep Maybe it's still possible to build such a thing on top of non-partial state transfer? As it doesn't exist, we didn't design it. Yes. Well, Infinispan already uses its own state transfer for distribution, I wonder why this isn't the case for replication. Why are we actually using JGroups' state transfer with replication, but use our own state transfer with distribution ? I don't know, but guess it's because each node has a different set of keys so no node has the same state as another ? You could still use JGroups state transfer; getState() would list the state provider as target node. In general, partial state transfer involves transferring (1) the partial state and (2) the digest, which is a vector of highest seqnos seen for every member. When we get a partial state, we always overwrite our own digest with the one received, and update our state accordingly. However, when this is done a couple of times, for different partial states, I'm not sure that we won't receive a few messages multiple times, due to the digest overwriting... I think the cleanest solution would be for you guys to reuse the state transfer you already use for state transfer in distribution mode. ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev
Re: [infinispan-dev] Partial state transfer
Correct. Time frame would ideally be 5.0, but realistically it will probably be 5.1. Is that feasible from a roadmap point of view ? On 6/16/11 6:47 PM, Vladimir Blagojevic wrote: In another words the essential problem is that digest and channel state are per channel abstractions and they do not fit nicely with higher level abstractions like substates? We use partial state transfer in Infinispan and we need to address this. What is the time frame here? 5.0 final release? Vladimir On 11-06-15 11:39 AM, Bela Ban wrote: I looked into adding partial state transfer back into JGroups, but found out that partial state transfer is fundamentally flawed, something I've always suspected ! (Regular state transfer is correct, and has always been correct). - Say we have node A and B. B requests the state from A - There are partial states X and Y - Message M1 modifies X, M2 modifies Y Here's what happens: T1: A multicasts M1 T2: A delivers M1, and changes X T3: B sends a GET_STATE(Y) request to A // partial state request for state Y T4: A multicasts M2 T5: A delivers M2, changing Y T6: A receives the GET_STATE request, sends a SET_STATE response back including Y and the digest (including M1 and M2's seqnos) T7: B receives the SET_STATE response, sets its digest (includes now M1 and M2) and state Y *BUT NOT* state X ! T8: *** B receives M1, discards it because it is already in its digest *** T9: B receives M2, and also discards it At time T8, M1 (which would have changed state X) is discarded, because it is already in the digest sent with the SET_STATE response. Therefore state X is now incorrect, as M1 was never applied ! As a summary, if we get a number of updates to partial states, and don't receive all of them before requesting the partial state, the last update includes in the digest wins... I'm a real idiot, as I've written this down before, in 2006: see [1] for details. In a nutshell, [1] shows that partial state transfer doesn't work, unless virtual synchrony (FLUSH) is used. So I propose Infinispan and JBoss AS look into how they can replace their use of partial state transfer. I suggest Infinispan uses the same approach already used for state transfer with mode=distribution. Opinions ? [1] https://github.com/belaban/JGroups/blob/master/doc/design/PartialStateTransfer.txt -- Bela Ban Lead JGroups / Clustering Team JBoss ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev
[infinispan-dev] Partial state transfer
I looked into adding partial state transfer back into JGroups, but found out that partial state transfer is fundamentally flawed, something I've always suspected ! (Regular state transfer is correct, and has always been correct). - Say we have node A and B. B requests the state from A - There are partial states X and Y - Message M1 modifies X, M2 modifies Y Here's what happens: T1: A multicasts M1 T2: A delivers M1, and changes X T3: B sends a GET_STATE(Y) request to A// partial state request for state Y T4: A multicasts M2 T5: A delivers M2, changing Y T6: A receives the GET_STATE request, sends a SET_STATE response back including Y and the digest (including M1 and M2's seqnos) T7: B receives the SET_STATE response, sets its digest (includes now M1 and M2) and state Y *BUT NOT* state X ! T8: *** B receives M1, discards it because it is already in its digest *** T9: B receives M2, and also discards it At time T8, M1 (which would have changed state X) is discarded, because it is already in the digest sent with the SET_STATE response. Therefore state X is now incorrect, as M1 was never applied ! As a summary, if we get a number of updates to partial states, and don't receive all of them before requesting the partial state, the last update includes in the digest wins... I'm a real idiot, as I've written this down before, in 2006: see [1] for details. In a nutshell, [1] shows that partial state transfer doesn't work, unless virtual synchrony (FLUSH) is used. So I propose Infinispan and JBoss AS look into how they can replace their use of partial state transfer. I suggest Infinispan uses the same approach already used for state transfer with mode=distribution. Opinions ? [1] https://github.com/belaban/JGroups/blob/master/doc/design/PartialStateTransfer.txt -- Bela Ban Lead JGroups / Clustering Team JBoss ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev
Re: [infinispan-dev] Partial state transfer in Infinispan
We use partial state transfer not to generate partial state per cache, but the entire state per cache, but since we have 1 cache sharing a given JGroups channel, as far as JGroups in concerned this *is* partial state of a node. I.e., the state of just 1 cache on a channel, not all the caches. So we actually use cacheName as the state identifier (in JGroups' ExtendedMessageListener). But yes, there is no reason why we can't replace this with RPC as per Distribution, however I think we do need a streaming solution - not just for replication but distribution as well. As such I'd only want to re-implement this bit once, rather than a temp RPC based solution first. So we need a mechanism to either: (1) open a separate TCP socket for the sake of streaming state, or (2) reuse the sockets JGroups opens. They both have their pros and cons. (1) is more configuration, firewall setup, and a spiderweb of connections in a large grid (2) would mean multiplexing with JGroups' use of the socket. Any thoughts and suggestions? Cheers Manik On 1 Jun 2011, at 15:14, Bela Ban wrote: We currently use JGroups' partial state transfer to transfer individual caches from one Infinispan instance to another. Since I got rid of partial state transfer in JGroups 3.0, and don't like to add it back, I'd like to know whether this is still needed. I thought that we currently require the same set of caches to be available in all Infinispan instances, and the reason (IIRC) was that distribution wouldn't work if we have caches 1 and 2 available on instances A and B, but not on C, because consistent hashing distributes the data based on views, and we didn't want to have to keep track of individual caches... Why are we actually using JGroups' state transfer with replication, but use our own state transfer with distribution ? Opinions ? -- Bela Ban Lead JGroups / Clustering Team JBoss ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev -- Manik Surtani ma...@jboss.org twitter.com/maniksurtani Lead, Infinispan http://www.infinispan.org ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev
[infinispan-dev] Partial state transfer in Infinispan
We currently use JGroups' partial state transfer to transfer individual caches from one Infinispan instance to another. Since I got rid of partial state transfer in JGroups 3.0, and don't like to add it back, I'd like to know whether this is still needed. I thought that we currently require the same set of caches to be available in all Infinispan instances, and the reason (IIRC) was that distribution wouldn't work if we have caches 1 and 2 available on instances A and B, but not on C, because consistent hashing distributes the data based on views, and we didn't want to have to keep track of individual caches... Why are we actually using JGroups' state transfer with replication, but use our own state transfer with distribution ? Opinions ? -- Bela Ban Lead JGroups / Clustering Team JBoss ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev
Re: [infinispan-dev] Partial state transfer in Infinispan
Hi Bela, 2011/6/1 Bela Ban b...@redhat.com: We currently use JGroups' partial state transfer to transfer individual caches from one Infinispan instance to another. Since I got rid of partial state transfer in JGroups 3.0, and don't like to add it back, I'd like to know whether this is still needed. I thought that we currently require the same set of caches to be available in all Infinispan instances, and the reason (IIRC) was that distribution wouldn't work if we have caches 1 and 2 available on instances A and B, but not on C, because consistent hashing distributes the data based on views, and we didn't want to have to keep track of individual caches... Well I really don't like this limitation in Infinispan and was actually hoping that we could remove it at some point. Imagine the scenario in which you have a running cluster, and at some point the new release of your application needs an additional cache: there's no way to start a new node having this new cache. Also right now when an application starts, it's possible with the proper timing that it joins the cluster before having defined and started all caches (starting the cachemanager and the caches is not an atomic operation), basically failing to start because of this limitation. Maybe it's still possible to build such a thing on top of non-partial state transfer? As it doesn't exist, we didn't design it. Why are we actually using JGroups' state transfer with replication, but use our own state transfer with distribution ? I don't know, but guess it's because each node has a different set of keys so no node has the same state as another ? Cheers, Sanne Opinions ? -- Bela Ban Lead JGroups / Clustering Team JBoss ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev
Re: [infinispan-dev] Partial state transfer in Infinispan
On 6/1/11 4:21 PM, Sanne Grinovero wrote: Hi Bela, 2011/6/1 Bela Banb...@redhat.com: We currently use JGroups' partial state transfer to transfer individual caches from one Infinispan instance to another. Since I got rid of partial state transfer in JGroups 3.0, and don't like to add it back, I'd like to know whether this is still needed. I thought that we currently require the same set of caches to be available in all Infinispan instances, and the reason (IIRC) was that distribution wouldn't work if we have caches 1 and 2 available on instances A and B, but not on C, because consistent hashing distributes the data based on views, and we didn't want to have to keep track of individual caches... Well I really don't like this limitation in Infinispan and was actually hoping that we could remove it at some point. Imagine the scenario in which you have a running cluster, and at some point the new release of your application needs an additional cache: there's no way to start a new node having this new cache. Yes, I fully agree. Another example is HTTP web sessions: currently 1 webapp == 1 cache, so we currently require the same webapps to be deployed in all JBoss instances *if* we use replication (distribution is different) ! Also right now when an application starts, it's possible with the proper timing that it joins the cluster before having defined and started all caches (starting the cachemanager and the caches is not an atomic operation), basically failing to start because of this limitation. Yep Maybe it's still possible to build such a thing on top of non-partial state transfer? As it doesn't exist, we didn't design it. Yes. Well, Infinispan already uses its own state transfer for distribution, I wonder why this isn't the case for replication. Why are we actually using JGroups' state transfer with replication, but use our own state transfer with distribution ? I don't know, but guess it's because each node has a different set of keys so no node has the same state as another ? You could still use JGroups state transfer; getState() would list the state provider as target node. In general, partial state transfer involves transferring (1) the partial state and (2) the digest, which is a vector of highest seqnos seen for every member. When we get a partial state, we always overwrite our own digest with the one received, and update our state accordingly. However, when this is done a couple of times, for different partial states, I'm not sure that we won't receive a few messages multiple times, due to the digest overwriting... I think the cleanest solution would be for you guys to reuse the state transfer you already use for state transfer in distribution mode. -- Bela Ban Lead JGroups / Clustering Team JBoss ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev
Re: [infinispan-dev] Partial state transfer in Infinispan
Why are we actually using JGroups' state transfer with replication, but use our own state transfer with distribution ? I don't know, but guess it's because each node has a different set of keys so no node has the same state as another ? You could still use JGroups state transfer; getState() would list the state provider as target node. In general, partial state transfer involves transferring (1) the partial state and (2) the digest, which is a vector of highest seqnos seen for every member. When we get a partial state, we always overwrite our own digest with the one received, and update our state accordingly. However, when this is done a couple of times, for different partial states, I'm not sure that we won't receive a few messages multiple times, due to the digest overwriting... I think the cleanest solution would be for you guys to reuse the state transfer you already use for state transfer in distribution mode. +1. There's also transaction logic* related to state transfer, which would need to be maintained in two implementations - not good! * actually the transaction-failover logic needs to be revisited especially after the new rebalancing code. Dan and I will go over it next week - we working at his place and this will make things easier. ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev
Re: [infinispan-dev] Partial state transfer in Infinispan
On 6/1/11 6:05 PM, Mircea Markus wrote: Why are we actually using JGroups' state transfer with replication, but use our own state transfer with distribution ? I don't know, but guess it's because each node has a different set of keys so no node has the same state as another ? You could still use JGroups state transfer; getState() would list the state provider as target node. In general, partial state transfer involves transferring (1) the partial state and (2) the digest, which is a vector of highest seqnos seen for every member. When we get a partial state, we always overwrite our own digest with the one received, and update our state accordingly. However, when this is done a couple of times, for different partial states, I'm not sure that we won't receive a few messages multiple times, due to the digest overwriting... I think the cleanest solution would be for you guys to reuse the state transfer you already use for state transfer in distribution mode. +1. There's also transaction logic* related to state transfer, which would need to be maintained in two implementations - not good! * actually the transaction-failover logic needs to be revisited especially after the new rebalancing code. Dan and I will go over it next week - we working at his place and this will make things easier. Excellent ! Let us know what the outcome is, naturally I'm especially interested in the new rebalancing code since I wrote it ! :-) Note that I'll be on vacation until June 11th. Cheers, -- Bela Ban Lead JGroups / Clustering Team JBoss ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev