Re: [infinispan-dev] Preloading from disk versus state transfer Re: ISPN-1384 - InboundInvocationHandlerImpl should wait for cache to be started? (not just defined)
On Mon, Oct 24, 2011 at 4:42 PM, Sanne Grinovero sa...@infinispan.org wrote: On 24 October 2011 12:58, Dan Berindei dan.berin...@gmail.com wrote: Hi Galder On Mon, Oct 24, 2011 at 1:46 PM, Galder Zamarreño gal...@redhat.com wrote: On Oct 24, 2011, at 12:04 PM, Dan Berindei wrote: ISPN-1470 (https://issues.jboss.org/browse/ISPN-1470) raises an interesting question: if the preloading happens before joining, the preloading code won't know anything about the consistent hash. It will load everything from the cache store, including the keys that are owned by other nodes. It's been defined to work that way: https://docs.jboss.org/author/display/ISPN/CacheLoaders Tbh, that will only happen in shared cache stores. In non-shared ones, you'll only have data that belongs to that node. Not really... in distributed mode, every time the cache starts it will have another position on the hash wheel. That means even with a non-shared cache store, it's likely most of the stored keys will no longer be local. Actually I just noticed that you've fixed ISPN-1404, which looks like it would solves my problem when the cache is created by a HotRod server. I would like to extend it to work like this by default, e.g. by using the transport's nodeName as the seed. I think there is a check in place already so that the joiner won't push stale data from its cache store to the other nodes, but we should also discard the keys that don't map locally or we'll have stale data (since we don't have a way to check if those keys are stale and register to receive invalidations for those keys). +1, only for shared cache stores. What do you think, should I discard the non-local keys with the fix for ISPN-1470 or should I let them be and warn the user about potentially stale data? Discard only for shared cache stores. Cache configurations should be symmetrical, so if other nodes preload, they'll preload only data local to them with your change. Discarding works fine from the correctness POV, but for performance it's not that great: we may do a lot of work to preload keys and have nothing to show for it at the end. Can't you just skip loading state and be happy with the state you receive from peers? More data will be lazily loaded. Applying of course only when you're not the only/first node in the grid, in which case you have to load. Right, we could preload only on the first node. With a shared cache store this should work great, we just have to start preloading after we connect to the cluster and before we send the join request. But I have trouble visualizing how a persistent (purgeOnStartup = false) non-shared cache store should to work until we have some validation mechanism like in https://issues.jboss.org/browse/ISPN-1195. Should we even allow this kind of setup? The only alternative I see is to be able to find the boundaries of keys you own, and change the CacheLoader API to load keys by the identified range - should work with multiple boundaries too for virtualnodes, but this is something that not all CacheLoaders will be able to implement, so it should be an optional API; for now I'd stick with the first option above as I don't see how we can be more efficient in loading the state from CacheLoaders than via JGroups. Sanne ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev
Re: [infinispan-dev] Preloading from disk versus state transfer Re: ISPN-1384 - InboundInvocationHandlerImpl should wait for cache to be started? (not just defined)
Can't you just skip loading state and be happy with the state you receive from peers? More data will be lazily loaded. Applying of course only when you're not the only/first node in the grid, in which case you have to load. Right, we could preload only on the first node. With a shared cache store this should work great, we just have to start preloading after we connect to the cluster and before we send the join request. But I have trouble visualizing how a persistent (purgeOnStartup = false) non-shared cache store should to work until we have some validation mechanism like in https://issues.jboss.org/browse/ISPN-1195. Should we even allow this kind of setup? Right I don't think it makes much sense. The current node might have been down for a long time and it's dedicated cacheloader will likely contain stale values; we might update older values via versions of optimistic locking, but we won't be able to remove those which should have been removed. I don't think we should support that, at least until these problems are solved. ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev
Re: [infinispan-dev] Preloading from disk versus state transfer Re: ISPN-1384 - InboundInvocationHandlerImpl should wait for cache to be started? (not just defined)
On Oct 24, 2011, at 12:58 PM, Dan Berindei wrote: Hi Galder On Mon, Oct 24, 2011 at 1:46 PM, Galder Zamarreño gal...@redhat.com wrote: On Oct 24, 2011, at 12:04 PM, Dan Berindei wrote: ISPN-1470 (https://issues.jboss.org/browse/ISPN-1470) raises an interesting question: if the preloading happens before joining, the preloading code won't know anything about the consistent hash. It will load everything from the cache store, including the keys that are owned by other nodes. It's been defined to work that way: https://docs.jboss.org/author/display/ISPN/CacheLoaders Tbh, that will only happen in shared cache stores. In non-shared ones, you'll only have data that belongs to that node. Not really... in distributed mode, every time the cache starts it will have another position on the hash wheel. That means even with a non-shared cache store, it's likely most of the stored keys will no longer be local. Actually I just noticed that you've fixed ISPN-1404, which looks like it would solves my problem when the cache is created by a HotRod server. I would like to extend it to work like this by default, e.g. by using the transport's nodeName as the seed. I think there is a check in place already so that the joiner won't push stale data from its cache store to the other nodes, but we should also discard the keys that don't map locally or we'll have stale data (since we don't have a way to check if those keys are stale and register to receive invalidations for those keys). +1, only for shared cache stores. What do you think, should I discard the non-local keys with the fix for ISPN-1470 or should I let them be and warn the user about potentially stale data? Discard only for shared cache stores. Cache configurations should be symmetrical, so if other nodes preload, they'll preload only data local to them with your change. Discarding works fine from the correctness POV, but for performance it's not that great: we may do a lot of work to preload keys and have nothing to show for it at the end. I agree, I thought of that when replying to this. It'd be great if you could only bring that data that will belong to you, but for that we'd need to store the hash of the key as well. Enabling the fixed hash seed by default should make the performance issue go away. I think it would also require virtual nodes enabled by default and a way to ensure that the nodeNames are unique across the cluster. Cheers Dan Cheers Dan On Mon, Oct 3, 2011 at 3:09 AM, Manik Surtani ma...@jboss.org wrote: On 28 Sep 2011, at 10:56, Dan Berindei wrote: I'm not sure if the comment is valid though, since the old StateTransferManager had priority 55 and it also cleared the data container before applying the state from the coordinator. I'm not sure how preloading and state transfer are supposed to interact, maybe Manik can help clear this up? Hmm - this is interesting. I think preloading should happen first, since the cache store may contain old data. -- Manik Surtani ma...@jboss.org twitter.com/maniksurtani Lead, Infinispan http://www.infinispan.org -- Galder Zamarreño Sr. Software Engineer Infinispan, JBoss Cache -- Galder Zamarreño Sr. Software Engineer Infinispan, JBoss Cache ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev
Re: [infinispan-dev] Preloading from disk versus state transfer Re: ISPN-1384 - InboundInvocationHandlerImpl should wait for cache to be started? (not just defined)
On Oct 24, 2011, at 2:42 PM, Sanne Grinovero wrote: On 24 October 2011 12:58, Dan Berindei dan.berin...@gmail.com wrote: Hi Galder On Mon, Oct 24, 2011 at 1:46 PM, Galder Zamarreño gal...@redhat.com wrote: On Oct 24, 2011, at 12:04 PM, Dan Berindei wrote: ISPN-1470 (https://issues.jboss.org/browse/ISPN-1470) raises an interesting question: if the preloading happens before joining, the preloading code won't know anything about the consistent hash. It will load everything from the cache store, including the keys that are owned by other nodes. It's been defined to work that way: https://docs.jboss.org/author/display/ISPN/CacheLoaders Tbh, that will only happen in shared cache stores. In non-shared ones, you'll only have data that belongs to that node. Not really... in distributed mode, every time the cache starts it will have another position on the hash wheel. That means even with a non-shared cache store, it's likely most of the stored keys will no longer be local. Actually I just noticed that you've fixed ISPN-1404, which looks like it would solves my problem when the cache is created by a HotRod server. I would like to extend it to work like this by default, e.g. by using the transport's nodeName as the seed. I think there is a check in place already so that the joiner won't push stale data from its cache store to the other nodes, but we should also discard the keys that don't map locally or we'll have stale data (since we don't have a way to check if those keys are stale and register to receive invalidations for those keys). +1, only for shared cache stores. What do you think, should I discard the non-local keys with the fix for ISPN-1470 or should I let them be and warn the user about potentially stale data? Discard only for shared cache stores. Cache configurations should be symmetrical, so if other nodes preload, they'll preload only data local to them with your change. Discarding works fine from the correctness POV, but for performance it's not that great: we may do a lot of work to preload keys and have nothing to show for it at the end. Can't you just skip loading state and be happy with the state you receive from peers? More data will be lazily loaded. Applying of course only when you're not the only/first node in the grid, in which case you have to load. The only alternative I see is to be able to find the boundaries of keys you own, and change the CacheLoader API to load keys by the identified range - should work with multiple boundaries too for virtualnodes, but this is something that not all CacheLoaders will be able to implement, so it should be an optional API; for now I'd stick with the first option above as I don't see how we can be more efficient in loading the state from CacheLoaders than via JGroups. Before when state transfer meant that state came from a single node, that node could be overloaded and so cache loader access might have been more efficient, particularly if it's a non-shared one that's available in your machine. The benefit of loading state from cache loader is that the rest of nodes don't have to stop what they're doing, which with loading it from other nodes, in the current design they have to. Sanne -- Galder Zamarreño Sr. Software Engineer Infinispan, JBoss Cache ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev
Re: [infinispan-dev] Preloading from disk versus state transfer Re: ISPN-1384 - InboundInvocationHandlerImpl should wait for cache to be started? (not just defined)
ISPN-1470 (https://issues.jboss.org/browse/ISPN-1470) raises an interesting question: if the preloading happens before joining, the preloading code won't know anything about the consistent hash. It will load everything from the cache store, including the keys that are owned by other nodes. I think there is a check in place already so that the joiner won't push stale data from its cache store to the other nodes, but we should also discard the keys that don't map locally or we'll have stale data (since we don't have a way to check if those keys are stale and register to receive invalidations for those keys). What do you think, should I discard the non-local keys with the fix for ISPN-1470 or should I let them be and warn the user about potentially stale data? Cheers Dan On Mon, Oct 3, 2011 at 3:09 AM, Manik Surtani ma...@jboss.org wrote: On 28 Sep 2011, at 10:56, Dan Berindei wrote: I'm not sure if the comment is valid though, since the old StateTransferManager had priority 55 and it also cleared the data container before applying the state from the coordinator. I'm not sure how preloading and state transfer are supposed to interact, maybe Manik can help clear this up? Hmm - this is interesting. I think preloading should happen first, since the cache store may contain old data. -- Manik Surtani ma...@jboss.org twitter.com/maniksurtani Lead, Infinispan http://www.infinispan.org ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev
Re: [infinispan-dev] Preloading from disk versus state transfer Re: ISPN-1384 - InboundInvocationHandlerImpl should wait for cache to be started? (not just defined)
On 24 October 2011 12:58, Dan Berindei dan.berin...@gmail.com wrote: Hi Galder On Mon, Oct 24, 2011 at 1:46 PM, Galder Zamarreño gal...@redhat.com wrote: On Oct 24, 2011, at 12:04 PM, Dan Berindei wrote: ISPN-1470 (https://issues.jboss.org/browse/ISPN-1470) raises an interesting question: if the preloading happens before joining, the preloading code won't know anything about the consistent hash. It will load everything from the cache store, including the keys that are owned by other nodes. It's been defined to work that way: https://docs.jboss.org/author/display/ISPN/CacheLoaders Tbh, that will only happen in shared cache stores. In non-shared ones, you'll only have data that belongs to that node. Not really... in distributed mode, every time the cache starts it will have another position on the hash wheel. That means even with a non-shared cache store, it's likely most of the stored keys will no longer be local. Actually I just noticed that you've fixed ISPN-1404, which looks like it would solves my problem when the cache is created by a HotRod server. I would like to extend it to work like this by default, e.g. by using the transport's nodeName as the seed. I think there is a check in place already so that the joiner won't push stale data from its cache store to the other nodes, but we should also discard the keys that don't map locally or we'll have stale data (since we don't have a way to check if those keys are stale and register to receive invalidations for those keys). +1, only for shared cache stores. What do you think, should I discard the non-local keys with the fix for ISPN-1470 or should I let them be and warn the user about potentially stale data? Discard only for shared cache stores. Cache configurations should be symmetrical, so if other nodes preload, they'll preload only data local to them with your change. Discarding works fine from the correctness POV, but for performance it's not that great: we may do a lot of work to preload keys and have nothing to show for it at the end. Can't you just skip loading state and be happy with the state you receive from peers? More data will be lazily loaded. Applying of course only when you're not the only/first node in the grid, in which case you have to load. The only alternative I see is to be able to find the boundaries of keys you own, and change the CacheLoader API to load keys by the identified range - should work with multiple boundaries too for virtualnodes, but this is something that not all CacheLoaders will be able to implement, so it should be an optional API; for now I'd stick with the first option above as I don't see how we can be more efficient in loading the state from CacheLoaders than via JGroups. Sanne ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev
Re: [infinispan-dev] Preloading from disk versus state transfer Re: ISPN-1384 - InboundInvocationHandlerImpl should wait for cache to be started? (not just defined)
On 3 Oct 2011, at 01:09, Manik Surtani wrote: On 28 Sep 2011, at 10:56, Dan Berindei wrote: I'm not sure if the comment is valid though, since the old StateTransferManager had priority 55 and it also cleared the data container before applying the state from the coordinator. I'm not sure how preloading and state transfer are supposed to interact, maybe Manik can help clear this up? Hmm - this is interesting. I think preloading should happen first, since the cache store may contain old data. I can't find Dan's original email - was it set to the entire list? I don't get the entire context, but I don't think preloading *first* would resolve the consistency problem in the case of deletions: what if you preload something that was in between deleted from memory? ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev