Re: [infinispan-dev] TimeService (ISPN-3069): CacheLoader API break
On Mon, May 13, 2013 at 8:44 PM, Sanne Grinovero sa...@infinispan.orgwrote: On 13 May 2013 18:32, Manik Surtani msurt...@redhat.com wrote: On 13 May 2013, at 16:25, Mircea Markus mmar...@redhat.com wrote: On 13 May 2013, at 15:05, Manik Surtani wrote: 100% agree, most users will have to interact with AdvancedCache at some point - if only because of lock() and withFlags(). I've seen quite a bit of end-user code that doesn't touch AdvancedCache. I'm on Dan's side here, I think it's pretty popular through the users and should be considered as public API. A note on the same lines, we also recommend all our users to use Flag.IGNORE_RETURN_VALUE, which again goes trough AdvancedCache. So you're saying getTimeService() should be in EmbeddedCacheManager? That's Dan's argument... I really don't think this should be accessible by end-user applications. +1 to keep it hidden, but SPI kind of API wouldn't be too bad. If we want to keep it hidden, then I think it would be best to leave the getTimeService() method only in ComponentRegistry/GlobalComponentRegistry and remove it from the AdvancedCache interface. We might want to remove it from the configuration, too. More importantly, I'd design it in such a way that different Caches could be using a different one. Doesn't have to be supported in the current code implementation, I just mean API-wise this should not be on a global component but on a Cache-specific one. Any particular usage in mind for having a different time service in each cache? We definitely need a global component, because JGroupsTransport uses it, so we'd have to support both. Cheers Dan ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev
Re: [infinispan-dev] commit not failing but transaction reported as in-doubt
Mircea, I think I'm missing something here. Why would the originator send a TxCompletionNotificationCommand at all if the commit command was asynchronous? I don't think recovery should require the originator to send a TxCompletionNotificationCommand. Our commit commands can't fail anyway, so the only way for a transaction to become in-doubt would be if the cache crashed before sending the command. (Or maybe if another resource's commit phase failed.) Cheers Dan On Wed, May 15, 2013 at 5:06 PM, Mircea Markus mmar...@redhat.com wrote: Thanks again for nice explanation Jonathan! @Pedro - seems like you're doing the right thing by encouraging people to be properly paranoid :-) Otherwise we'd leak tx logs (in infinispan parlance the PrepareCommands in the recovery cache) which would not be nice. On 15 May 2013, at 13:32, Jonathan Halliday jonathan.halli...@redhat.com wrote: No, it's out of scope for the TM, at least as far as the JTA/XA specs are concerned. The TM would not retain any txlog information to allow it to perform useful recovery anyhow. Usually you just log it in the hope a human notices and sorts out the mess. Of course properly paranoid humans don't use async commit in the first place. There has been various talk around making JTA TM.commit() support an async callback, such that the business logic thread can continue as soon as the prepare phase is successful, whilst still receiving a callback handler invocation if the commit phase subsequently fails. Extending that to the XA protocol would be nice, but won't happen as there is no upward (RM-TM) communication in XA - it's all driven top down. So as you point out, adding the failed tx to the in-doubt list is the only way of signalling a problem. That's bad, since you'd also need a positive 'it worked' callback in the proto to allow GC of the txlog, otherwise you have to throw away the log eagerly and can't then do anything useful with the subsequent error callback anyhow. Associated with that discussion is the expectation around the semantics of afterCompletion, which may mean 'after successful prepare' or 'after successful commit' in such case, the latter effectively removing the need for a new JTA callback api in the first place. If you don't need a callback at all, then there is already an async commit option in the TM config, it's just non-standard and marginally dangerous. It simply logs commit phase failures and hopes a human notices. Jonathan. On 05/15/2013 01:13 PM, Mircea Markus wrote: Hi Jonathan, In the scope of ISPN-3063 [1] we came to a problem we need some advice on :-) Would a transaction manager expect/handle this situation: for a transaction the commit is successful but at a further point the same transaction would be reported as in-doubt to the recovery process. In our case this can happen when we send the commit async and this might only fail after the commit is acknowledged to the TM. [1] https://issues.jboss.org/browse/ISPN-3063 Cheers, -- Registered in England and Wales under Company Registration No. 03798903 Directors: Michael Cunningham (USA), Mark Hegarty (Ireland), Matt Parson (USA), Charlie Peters (USA) ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev Cheers, -- Mircea Markus Infinispan lead (www.infinispan.org) ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev
[infinispan-dev] AtomicHashMap concurrent modifications in pessimistic mode
Hi guys I'm working on an intermittent failure in NodeMoveAPIPessimisticTest and I think I've come across what I think is underspecified behaviour in AtomicHashMap. Say we have two transactions, tx1 and tx2, and they both work with the same atomic map in a pessimistic cache: 1. tx1: am1 = AtomicMapLookup.get(cache, key) 2. tx2: am2 = AtomicMapLookup.get(cache, key) 3. tx1: am1.put(subkey1, value1) // locks the map 4. tx2: am2.get(subkey1) // returns null 5. tx1: commit // the map is now {subkey1=value1} 6. tx2: am2.put(subkey2, value2) // locks the map 7. tx2: commit // the map is now {subkey2=value2} It's not clear to me from the AtomicMap/AtomicHashMap javadoc if this is ok or if it's a bug... Note that today the map is overwritten by tx2 even without step 4 (tx2: am2.get(subkey1)). I'm pretty sure that's a bug and I fixed it locally by using the FORCE_WRITE_LOCK in AtomicHashMapProxy.getDeltaMapForWrite. However, when the Tree API moves a node it first checks for the existence of the destination node, which means NodeMoveAPIPessimisticTest is still failing. I'm not sure if I should fix that by forcing a write lock for all AtomicHashMap reads, for all TreeCache reads, or only in TreeCache.move(). Cheers Dan ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev
Re: [infinispan-dev] AtomicHashMap concurrent modifications in pessimistic mode
On Fri, May 17, 2013 at 1:59 PM, Mircea Markus mmar...@redhat.com wrote: On 17 May 2013, at 07:35, Dan Berindei dan.berin...@gmail.com wrote: On Thu, May 16, 2013 at 8:27 PM, Mircea Markus mmar...@redhat.com wrote: On 16 May 2013, at 15:04, Dan Berindei dan.berin...@gmail.com wrote: Hi guys I'm working on an intermittent failure in NodeMoveAPIPessimisticTest and I think I've come across what I think is underspecified behaviour in AtomicHashMap. Say we have two transactions, tx1 and tx2, and they both work with the same atomic map in a pessimistic cache: 1. tx1: am1 = AtomicMapLookup.get(cache, key) 2. tx2: am2 = AtomicMapLookup.get(cache, key) 3. tx1: am1.put(subkey1, value1) // locks the map 4. tx2: am2.get(subkey1) // returns null 5. tx1: commit // the map is now {subkey1=value1} 6. tx2: am2.put(subkey2, value2) // locks the map 7. tx2: commit // the map is now {subkey2=value2} It's not clear to me from the AtomicMap/AtomicHashMap javadoc if this is ok or if it's a bug... as a user I find that a bit confusing so I think tx2 should merge stuff in the AtomiMap. Id be curious to hear Manik(author) and Sanne's (user) opinion on this. Merging should work with pessimistic locking, but I don't think we could do it with optimistic locking and write skew check enabled: we only do the write skew check for the whole map. if the WSC is enabled, then the 2nd transaction should fail: tx2 reads the version at 2. and at 7. The WSC should forbid it to commit, so I we shouldn't have this problem at all. Right, the 2nd transaction must fail with WSC enabled, so we can't implement merging. Would it be worth making this change if it meant making the behaviour of AtomicHashMap more complex? how more complex? If it's not a quick fix (2h) I'd say no as this is more of a nice to have/no user requires this functionality ATM. The behaviour of AtomicMap will be more complex because we're adding a bit of functionality that only works with pessimistic locking. Or maybe with optimistic locking as well, only not when write skew check is enabled. This is definitely not a 2h fix. As you can see, it's taking more than 2h just to figure out what needs to change :) What other options do we have? Leave it as it is and document the limitation? On the other hand, I believe FineGrainedAtomicHashMap doesn't do separate write skew checks for each key in the map either, so users probably have to deal with this difference between pessimistic and optimistic locking already. For FGAM I think the WSC should be performed on a per FGAM's key basis, and not for the whole map. I agree, but I think implementing fine-grained WSC will be tricky. I'll create a feature request in JIRA. Note that today the map is overwritten by tx2 even without step 4 (tx2: am2.get(subkey1)). I'm pretty sure that's a bug and I fixed it locally by using the FORCE_WRITE_LOCK in AtomicHashMapProxy.getDeltaMapForWrite. However, when the Tree API moves a node it first checks for the existence of the destination node, which means NodeMoveAPIPessimisticTest is still failing. I'm not sure if I should fix that by forcing a write lock for all AtomicHashMap reads, for all TreeCache reads, or only in TreeCache.move(). I tried using the FORCE_WRITE_LOCKS flag for all TreeCache reads. This seems to work fine, and move() doesn't throw any exceptions in pessimistic mode any more. In optimistic mode, it doesn't change anything, and concurrent moves still fail with WriteSkewException. The only downside is the performance, having extra locks will certainly slow things down. ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev Cheers, -- Mircea Markus Infinispan lead (www.infinispan.org) ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev
Re: [infinispan-dev] AtomicHashMap concurrent modifications in pessimistic mode
On Mon, May 20, 2013 at 1:57 PM, Manik Surtani msurt...@redhat.com wrote: On 16 May 2013, at 15:04, Dan Berindei dan.berin...@gmail.com wrote: Hi guys I'm working on an intermittent failure in NodeMoveAPIPessimisticTest and I think I've come across what I think is underspecified behaviour in AtomicHashMap. Say we have two transactions, tx1 and tx2, and they both work with the same atomic map in a pessimistic cache: 1. tx1: am1 = AtomicMapLookup.get(cache, key) 2. tx2: am2 = AtomicMapLookup.get(cache, key) 3. tx1: am1.put(subkey1, value1) // locks the map 4. tx2: am2.get(subkey1) // returns null 5. tx1: commit // the map is now {subkey1=value1} 6. tx2: am2.put(subkey2, value2) // locks the map 7. tx2: commit // the map is now {subkey2=value2} It's not clear to me from the AtomicMap/AtomicHashMap javadoc if this is ok or if it's a bug... If optimistic, step 7 should fail with a write skew check. If pessimistic, step 2 would *usually* block assuming that another thread is updating the map, but since neither tx1 or tx2 has started updating the map yet, neither has a write lock on the map. So that succeeds. I'm not sure if this is any different from not using an atomic map: 1. tx1: cache.get(k, v); // reads into tx context 2. tx2: cache.get(k, v); 3. tx1: cache.put(k, v + 1 ); 4. tx1: commit 5. tx2: cache.put(k, v + 1 ); 6. tx2: commit here as well, if using optimistic, step 6 will fail with a WSC but if pessimistic this will work (since tx2 only requested a write lock after tx1 committed/released its write lock). The difference is that in your scenario, you see in the code that tx2 writes to key k, so it's not surprising to find that tx2 overwrote the value written by tx1. But it would be surprising if tx2 also overwrote an unrelated key k2. With an atomic map, you only see in the code map.put(subkey2, value). Tx2 doesn't touch subkey1, so it's not that obvious that it should remove it. It is clear to me why it behaves the way it does now, after reading the implementation, but I don't think it's what most users would expect. (The proof, I guess, is in the current implementation of TreeCache.move()). With a FineGrainedAtomicMap in optimistic mode, it's not obvious why tx1 writing to subkey1 should cause tx2's write to subkey2 fail, either (see https://issues.jboss.org/browse/ISPN-3123). Note that today the map is overwritten by tx2 even without step 4 (tx2: am2.get(subkey1)). I'm pretty sure that's a bug and I fixed it locally by using the FORCE_WRITE_LOCK in AtomicHashMapProxy.getDeltaMapForWrite. However, when the Tree API moves a node it first checks for the existence of the destination node, which means NodeMoveAPIPessimisticTest is still failing. I'm not sure if I should fix that by forcing a write lock for all AtomicHashMap reads, for all TreeCache reads, or only in TreeCache.move(). I think only in TreeCache.move() I tend to disagree. I think it's way too easy to introduce a read of a node's structure in a transaction and start losing data without knowing it. ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev
Re: [infinispan-dev] configuring fetchInMemoryState for topology caches
I wouldn't want to deprecate CCL, I think it definitely has a purpose - at least in invalidation mode. Even in replication mode, having a lazy alternative to state transfer may be useful. Maybe not for the topology cache, but it might make sense for large caches. On Tue, May 21, 2013 at 4:36 PM, Mircea Markus mmar...@redhat.com wrote: On 21 May 2013, at 08:30, Tristan Tarrant ttarr...@redhat.com wrote: On 05/21/2013 08:58 AM, Galder Zamarreño wrote: Shouldn't it be enabled by default/enforced? ^ Either that, or the cluster cache loader are used, both of which serve the same purpouse. I think what Mircea is getting at, is that there is an intention to deprecate / remove the CCL. I think that we can do that in 6.0 (with the CacheStore redesign) and remove all potential users of CCL (including the lazy topology transfer). Mind reader :-) Cheers, -- Mircea Markus Infinispan lead (www.infinispan.org) ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev
Re: [infinispan-dev] How to get GrouperT#computeGroup(key) return value to map to physical Node?
I guess the grouper could use a KeyAffinityService (or something similar) to generate a key local to each node and return that instead of 0 or 1. However, you won't have any guarantee that the keys will stay on the same node if the cache topology changes (e.g. another node joins). It used to be that the actual address of the node would (almost) always be mapped to that node, but that was just an implementation detail, and it never worked 100%. Ben, do you think being able to pin a key permanently to a node would be useful? Cheers Dan On Tue, May 21, 2013 at 6:31 PM, Galder Zamarreño gal...@redhat.com wrote: Hmmm, one hacky way might be to hold on to the grouper instance passed in via configuration, and once the cache manager has been started, set it in the grouper and use to query either the address, or the physical address (via EmbeddedCacheManager.getTransport…)? On May 14, 2013, at 6:34 PM, cotton-ben ben.cot...@alumni.rutgers.edu wrote: I am playing with the Infinispan 5.3 quick-start package to exercise my usage of the Grouping API. As we know the quick start package is made up of AbstractNode.java, Node0.java, Node1.java and Node2.java (plus a util/listener). My ambition is to demonstrate 1. that any CacheK,V.put(DIMENSION.xxx,v) will flow through my Grouper and pin that key in the Cache at @Node=0. 2. that any CacheK,V.put(POSITION.xxx,v) will flow through my Grouper and pin that key in the Cache at either @Node=1 or @Node=2 . Here is my AbstractNode#createCacheManagerProgramatically() config: private static EmbeddedCacheManager createCacheManagerProgramatically() { return new DefaultCacheManager( GlobalConfigurationBuilder.defaultClusteredBuilder() .transport().addProperty(configurationFile, jgroups.xml) .build(), new org.infinispan.configuration.cache.ConfigurationBuilder() .clustering() .cacheMode(CacheMode.DIST_SYNC) .hash().numOwners(1).groups().enabled(Boolean.TRUE) .addGrouper(new com.jpmorgan.ct.lri.cs.ae.test.DimensionGrouperString()) .build() ); } And here is my GrouperT implementation public class DimensionGrouperT implements GrouperString { public String computeGroup(String key, String group) { if (key.indexOf(DIMENSION.)==0) { String groupPinned = 0; System.out.println(Pinning Key=[+key+] @Node=[+groupPinned+]); //node= exactly 0 return groupPinned; } else if (key.indexOf(POSITION.)==0) { String groupPinned = +(1+ (int)(Math.random()*2)); System.out.println(Pinning Key=[+key+] @Node=[+groupPinned+]); //node= {1,2} return groupPinned; } else { return null; } } public ClassString getKeyType() { return String.class; } } The logic is working correctly ... i.e. when from Node2.java I call for (int i = 0; i 10; i++) { cacheDP.put(DIMENSION. + i, DimensionValue. + i); cacheDP.put(POSITION. + i, PositionValue. + i); } My DimensionGrouper is returning 0 from computeGroup(). My question is how in Infinispan can I map the computeGroup() return value to a physical Node? I.e. How can I make it so that when computeGroup() returns 0, I will *only* add that K,V entry to the Cache @Node 0? -- View this message in context: http://infinispan-developer-list.980875.n3.nabble.com/Re-How-to-get-Grouper-T-computeGroup-key-return-value-to-map-to-physical-Node-tp4027134.html Sent from the Infinispan Developer List mailing list archive at Nabble.com. ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev -- Galder Zamarreño gal...@redhat.com twitter.com/galderz Project Lead, Escalante http://escalante.io Engineer, Infinispan http://infinispan.org ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev
Re: [infinispan-dev] Baselining on Java 7
Oracle will support Java 7 at least until March 2015 [1]. And it's very likely that some people will keep using 7 after that, just like they are using 6 now. So unless we want to support Java 6 until 2016, I don't think we can skip 7. [1] http://www.oracle.com/technetwork/java/eol-135779.html On Thu, May 23, 2013 at 4:51 PM, Bela Ban b...@redhat.com wrote: If the majority of users go directly to 8 from 6, then yes why not. But if that's not the case, then we can't do it, either... On 5/23/13 3:07 PM, Galder Zamarreño wrote: What about skipping 7 and waiting for 8? On May 23, 2013, at 12:38 PM, Manik Surtani msurt...@redhat.com wrote: Yup, it needs to happen sometime. But the question is, when? Is there anyone out there who desperately needs Infinispan to work on Java 6? Cheers Manik -- Bela Ban, JGroups lead (http://www.jgroups.org) ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev
Re: [infinispan-dev] Baselining on Java 7
* ForkJoinPool (assuming we can get AS to inject one) * AutoCloseable * Throwable.addSuppressed() On Fri, May 24, 2013 at 2:01 AM, Mircea Markus mmar...@redhat.com wrote: Sent from my iPhone On 23 May 2013, at 12:06, Bela Ban b...@redhat.com wrote: On 5/23/13 12:38 PM, Manik Surtani wrote: Yup, it needs to happen sometime. But the question is, when? Is there anyone out there who desperately needs Infinispan to work on Java 6? I got a lot of pushback when I moved the required JDK in JGroups to 1.5 (people were still using 1.4), and the same for moving from 1.5 to 1.6, so I anticipate this won't be different for moving to 1.7 as baseline. I personally would not mind waiting for JGroups 5 to move to JDK 7, as this gives users a bit more time and I won't require JDK 7 before JGroups 5 anyway, as the NIO2 stuff won't be in JGroups 4, only in 5. +1 to stick to 6 at leat till jgroups 5. Besides NIO2 stuff, is there anything from 7 we want badly? But if you guys need to move to JDK 7 in Infinispan 6 already, then that's fine with me, too. I guess the AS has some requirements, too, so if they decide to move to JDK 7, then we'll have to make that move too -- Bela Ban, JGroups lead (http://www.jgroups.org) ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev
Re: [infinispan-dev] CommandAwareRpcDispatcher desired logic
On Wed, May 29, 2013 at 4:50 PM, Pedro Ruivo pe...@infinispan.org wrote: Hi all, //related to https://issues.jboss.org/browse/ISPN-3100 In here [1] I've added three simple tests to the CARD logic with the ResponseFilter. #1 testResponses() = tests if all the responses are returned. This test is failing because the FutureCollactor only returns the last response. *question* is this the desired behaviour or is it a bug (and this test is incorrect)? I think it was intentional, because it only really had to work with ClusteredGetResponseValidityFilter at the time. But I'm all for changing it to return all the (validated) responses. #2 testTimeout() = tests the timeout when a ResponseFilter is used. It sets the timeout to 1000 milliseconds and sends a command that sleeps for 5000 millisecond before reply. This test is failing because no TimeoutException was thrown (as expected). That's why I've created ISPN-3100. *question* is the desired behaviour to wait infinitely? If yes, then ISPN-3100 is invalid. I'm pretty sure that's not intentional. FutureCollator.futureDone() should be called when a UnicastRequest finishes with a TimeoutException and it should decrement expectedResponses. When all the UnicastRequests are done, it should notify() the caller thread, which should throw a TimeoutException. #3 testTimeoutWithoutFilter() = tests the timeout without a filter. It's working correctly! Thanks! Cheers, Pedro [1] https://github.com/pruivo/infinispan/commit/1fc26d740b773bf3864298615f61ca57b964d9cf#diff-0 ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev
Re: [infinispan-dev] Suppressing state transfer via JMX
If we only want to deal with full cluster shutdown, then I think stopping all application requests, calling Cache.clear() on one node, and then shutting down all the nodes should be simpler. On start, assuming no cache store, the caches will start empty, so starting all the nodes at once and only allowing application requests when they've all joined should also work without extra work. If we only want to stop a part of the cluster, suppressing rebalancing would be better, because we wouldn't lose all the data. But we'd still lose the keys whose owners are all among the nodes we want to stop. I've discussed this with Adrian, and we think if we want to stop a part of the cluster without losing data we need a JMX operation on the coordinator that will atomically remove a set of nodes from the CH. After the operation completes, the user will know it's safe to stop those nodes without losing data. When it comes to starting a part of the cluster, a pause rebalancing option would probably be better - but again, on the coordinator, not on each joining node. And clearly, if more than numOwner nodes leave while rebalancing is suspended, data will be lost. Cheers Dan On Fri, May 31, 2013 at 12:17 PM, Manik Surtani msurt...@redhat.com wrote: Guys We've discussed ISPN-3140 elsewhere before, I'm brining it to this forum now. https://issues.jboss.org/browse/ISPN-3140 Any thoughts/concerns? Particularly looking to hear from Dan or Adrian about viability, complexity, ease of implementation. Thanks Manik -- Manik Surtani ma...@jboss.org twitter.com/maniksurtani Platform Architect, JBoss Data Grid http://red.ht/data-grid ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev
[infinispan-dev] Retrieval operations with the IGNORE_RETURN_VALUES flag
Hi guys CacheLoaderInterceptor and DistributionInterceptor both honour the IGNORE_RETURN_VALUES flag for get commands, but I think it would be more useful if they ignored it - just like they ignore it for conditional commands. That would make it possible for users to only keep a reference to a cache.getAdvancedCache().withFlags(IGNORE_RETURN_VALUES) and use it for both read and write operations. What do you think? Cheers Dan ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev
Re: [infinispan-dev] Retrieval operations with the IGNORE_RETURN_VALUES flag
Sanne, I'm only talking about get operations. I was thinking that if you call cache.get(key), you want the value of that key, regardless of where it is stored... Obviously, write operations would still behave as they do now. On Mon, Jun 3, 2013 at 12:59 PM, Sanne Grinovero sa...@infinispan.orgwrote: Hi Dan, I'm not sure I understood this. How can I prevent it to return values if you have the flag ignored? Note that in some cases it makes a huge performance difference. Sanne On 3 June 2013 10:52, Dan Berindei dan.berin...@gmail.com wrote: Hi guys CacheLoaderInterceptor and DistributionInterceptor both honour the IGNORE_RETURN_VALUES flag for get commands, but I think it would be more useful if they ignored it - just like they ignore it for conditional commands. That would make it possible for users to only keep a reference to a cache.getAdvancedCache().withFlags(IGNORE_RETURN_VALUES) and use it for both read and write operations. What do you think? Cheers Dan ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev
Re: [infinispan-dev] Retrieval operations with the IGNORE_RETURN_VALUES flag
I agree, eventually we should have separate getAndXxx methods like JSR-107. The Jokre is cool, but running with an instrumentation agent is never going to be ok for everyone. I was thinking about this in the context of AtomicHashMap. It uses withFlags(SKIP_REMOTE_LOOKUP, DELTA_WRITE, FORCE_WRITE_LOCK) for write operations, but it still has to keep a reference to the original cache for read operations. Of course, after more digging, I don't think the SKIP_REMOTE_LOOKUP flag is that useful for writes either (since we already have a copy of the map in the invocation context at that point). But it seemed like an interesting idea in the general case. On Mon, Jun 3, 2013 at 1:15 PM, Sanne Grinovero sa...@infinispan.orgwrote: ha got it, good point. but I'm not persuaded: doesn't it get even more confusing for users? Imho it would be more helpful to throw an exception. these flags are confusing and ideally we should evolve the API, as the JSR did, or push on The Jokre. On 3 Jun 2013 11:08, Dan Berindei dan.berin...@gmail.com wrote: Sanne, I'm only talking about get operations. I was thinking that if you call cache.get(key), you want the value of that key, regardless of where it is stored... Obviously, write operations would still behave as they do now. On Mon, Jun 3, 2013 at 12:59 PM, Sanne Grinovero sa...@infinispan.orgwrote: Hi Dan, I'm not sure I understood this. How can I prevent it to return values if you have the flag ignored? Note that in some cases it makes a huge performance difference. Sanne On 3 June 2013 10:52, Dan Berindei dan.berin...@gmail.com wrote: Hi guys CacheLoaderInterceptor and DistributionInterceptor both honour the IGNORE_RETURN_VALUES flag for get commands, but I think it would be more useful if they ignored it - just like they ignore it for conditional commands. That would make it possible for users to only keep a reference to a cache.getAdvancedCache().withFlags(IGNORE_RETURN_VALUES) and use it for both read and write operations. What do you think? Cheers Dan ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev
Re: [infinispan-dev] Retrieval operations with the IGNORE_RETURN_VALUES flag
Fair point... ok, let's leave it as it is now. On Mon, Jun 3, 2013 at 5:23 PM, Galder Zamarreño gal...@redhat.com wrote: On Jun 3, 2013, at 11:52 AM, Dan Berindei dan.berin...@gmail.com wrote: Hi guys CacheLoaderInterceptor and DistributionInterceptor both honour the IGNORE_RETURN_VALUES flag for get commands, but I think it would be more useful if they ignored it - just like they ignore it for conditional commands. That would make it possible for users to only keep a reference to a cache.getAdvancedCache().withFlags(IGNORE_RETURN_VALUES) and use it for both read and write operations. What do you think? If I was to take the role of a colleague of the person who's written the Infinispan code, it'd be very confused to see a cache reference created with IGNORE_RETURN_VALUES being used for a get() operation… I can see myself thinking: Why on earth do you call get with IGNORE_RETURN_VALUES? Cheers Dan ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev -- Galder Zamarreño gal...@redhat.com twitter.com/galderz Project Lead, Escalante http://escalante.io Engineer, Infinispan http://infinispan.org ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev
Re: [infinispan-dev] mongodb cache store added in Infinispan 5.3 - curtesy of Guillaume Scheibel
There's a link 'Login as guest' at the bottom of the login form at ci.infinispan.org, you can use that and you'll see all the builds. You can also create a user for yourself. On Mon, Jun 3, 2013 at 6:01 PM, Guillaume SCHEIBEL guillaume.schei...@gmail.com wrote: I don't think I have access to the CI platform. If I do, with kind of credentials do I have to use ? Guillaume 2013/6/3 Galder Zamarreño gal...@redhat.com Btw, seems like since the introduction of this, 21 failures have gone in coming from the MongoDBCacheStoreTest [1]. Guillaume, can you please look into this? Cheers, [1] http://ci.infinispan.org/viewLog.html?buildId=1650tab=buildResultsDivbuildTypeId=bt2 On Jun 3, 2013, at 4:02 PM, Guillaume SCHEIBEL guillaume.schei...@gmail.com wrote: Thanks guys, I'm glad you like it :) BTW, if you would like to have other implementation, ... (I like doing this translation work) Guillaume 2013/6/3 Galder Zamarreño gal...@redhat.com Indeed great stuff Guillaume!! :) On May 31, 2013, at 2:18 PM, Sanne Grinovero sa...@infinispan.org wrote: Looks great, thanks Guillaume! On 31 May 2013 10:27, Guillaume SCHEIBEL guillaume.schei...@gmail.com wrote: I have updated the main page but I haven't found how to delete the child page. The blog post has been updated you can publish whenever you want to. Guillaume 2013/5/30 Mircea Markus mmar...@redhat.com Also better place the content in the parent document directly instead of making it a separate child document (see the rest of cache stores): https://docs.jboss.org/author/display/ISPN/Cache+Loaders+and+Stores Thanks! On 30 May 2013, at 21:49, Mircea Markus mmar...@redhat.com wrote: Thanks, looks good! I think it would be nice to add a code snippet with configuring the store using the fluent API as well. And release the blog :-) On 30 May 2013, at 17:05, Guillaume SCHEIBEL guillaume.schei...@gmail.com wrote: Hello, I have wrote a small documentation page here: https://docs.jboss.org/author/display/ISPN/MongoDB+CacheStore WDYT ? about the blog post a gist (script tag) is already there but it's not visible in the editor view. Guillaume 2013/5/27 Mircea Markus mmar...@redhat.com On 22 May 2013, at 11:01, Guillaume SCHEIBEL guillaume.schei...@gmail.com wrote: Mircea, I have just created a quick blog post titled Using MongoDB as a cache store. May you have a look at it ? If it suits you, the documentation will follow soon. Thank you Guillaume. Looks good to me once the link to documentation and the sample configuration snippet is in. Cheers Guillaume 2013/5/22 Randall Hauch rha...@redhat.com There is a way to download (via Maven) and run MongoDB locally from within Java, via Flapdoodle's Embedded MongoDB: https://github.com/flapdoodle-oss/embedmongo.flapdoodle.de ModeShape uses this in our builds in support of our storage of binary values inside MongoDB. The relevant Maven POM parts and JUnit test case are: https://github.com/ModeShape/modeshape/blob/master/modeshape-jcr/pom.xml#L147 https://github.com/ModeShape/modeshape/blob/master/modeshape-jcr/src/test/java/org/modeshape/jcr/value/binary/MongodbBinaryStoreTest.java On May 21, 2013, at 1:04 PM, Mircea Markus mmar...@redhat.com wrote: Thanks to Guillaume Scheibel, Infinispan now has an mongodb cache store that will be shipped as part of 5.3.0.CR1. The test for the mongodb cache store are not run by default. In order to be able to run them you need to: - install mongodb locally - run mongodb profile The cache store was add in the CI build on all 5.3 configs (together with a running instance of mongodb). Guillaume, would you mind adding a blog entry describing this new functionality? (I've invited you to be a member of the infinispan.blogpsot.com team.) Also can you please update the user doc: https://docs.jboss.org/author/display/ISPN/Cache+Loaders+and+Stores Cheers, -- Mircea Markus Infinispan lead (www.infinispan.org) ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev Cheers, -- Mircea Markus Infinispan lead (www.infinispan.org) ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org
Re: [infinispan-dev] Retrieval operations with the IGNORE_RETURN_VALUES flag
On Tue, Jun 4, 2013 at 12:27 PM, Mircea Markus mmar...@redhat.com wrote: On 3 Jun 2013, at 19:01, Dan Berindei dan.berin...@gmail.com wrote: Fair point... ok, let's leave it as it is now. On Mon, Jun 3, 2013 at 5:23 PM, Galder Zamarreño gal...@redhat.com wrote: On Jun 3, 2013, at 11:52 AM, Dan Berindei dan.berin...@gmail.com wrote: Hi guys CacheLoaderInterceptor and DistributionInterceptor both honour the IGNORE_RETURN_VALUES flag for get commands, but I think it would be more useful if they ignored it - just like they ignore it for conditional commands. That would make it possible for users to only keep a reference to a cache.getAdvancedCache().withFlags(IGNORE_RETURN_VALUES) and use it for both read and write operations. What do you think? If I was to take the role of a colleague of the person who's written the Infinispan code, it'd be very confused to see a cache reference created with IGNORE_RETURN_VALUES being used for a get() operation… I can see myself thinking: Why on earth do you call get with IGNORE_RETURN_VALUES? Isn't Galder's point not to allow invoking get with IGNORE_RETURN_VALUES? As both of you pointed out, Get + IGNORE_RETURN_VALUES doesn't make any sense :-) You'd think conditional operations with IGNORE_RETURN_VALUES don't make sense either, yet we have a special case to handle those as if the flag wasn't present :) ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev
Re: [infinispan-dev] Using infinispan as quorum-based nosql
On Mon, Jun 3, 2013 at 4:23 PM, vitalii.tymchys...@ubs.com wrote: Hello. Thanks for your information. I will subscribe and vote for the issues noted. In the meantime I've implemented hacky JgroupsTransport that downgrades all (but CacheViewControlCommand and StateTransferControlCommand) SYNCHRONOUS invokeRemotely calls to SYNCHRONOUS_IGNORE_LEAVERS and checks if required number of answers was received with a filter (I've tried to use original invokeRemotely return value but it often returns some strange value, like empty map). It seems to do the trick for me. But I am still not sure if this has any side effects. Indeed, I started working on a solution, but I over-engineered it and then I got side-tracked with other stuff. Sorry about that. The problem with using SYNCHRONOUS_IGNORE_LEAVERS everywhere, as I found out, is that you don't want to ignore the primary owner of a key leaving during a prepare/lock command (or the coordinator, in REPL mode prior to 5.3.0.CR1/ISPN-2772). If that happens, you have to retry on the new primary owner, otherwise you can't know if the prepare command has locked the key or not. A similar problem appears in non-transactional caches with supportsConcurrentUpdates=true: there the primary owner can ignore any of the backup owners leaving, but the originator can't ignore the primary owner leaving. For now I can see merge problem in my test: different values are picked during merge. I am going to dig a little deeper and follow up. But it's already a little strange for me, since the test algorithm is: 1)Assign old value to full cluster (it's REPL_SYNC mode) 2)Block coordinator 3)Writer new value to one of two remaining nodes. It's syncrhonized to second remaining node 4)Unblock coordinator 5)Wait (I could not find a good way to wait for state transfer but wait in this case). 6)Check the value on coordinator And in my test I am randomly getting old or new in assert. I am now going to check why. May be I will need to reinitialize smaller cluster part to ensure data is taken from the quorum part of the cluster. We don't handle merges properly. See https://issues.jboss.org/browse/ISPN-263 and the discussion at http://markmail.org/message/meyczotzobuva7js What happens right now is that after a merge, all the caches are assumed to have up-to-date data, so there is no state transfer. We had several ideas floating around on how we could force the smaller partition to receive data from the quorum partition, but I think with the public API your best option is to stop all the caches in the smaller partition after the split and start them back up after the merge. Cheers Dan Best regards, Vitalii Tymchyshyn -Original Message- From: infinispan-dev-boun...@lists.jboss.org [mailto: infinispan-dev-boun...@lists.jboss.org] On Behalf Of Galder Zamarreno Sent: Monday, June 03, 2013 9:04 AM To: infinispan -Dev List Subject: Re: [infinispan-dev] Using infinispan as quorum-based nosql On May 30, 2013, at 5:10 PM, vitalii.tymchys...@ubs.com wrote: Hello. We are going to use Infinispan in our project as NoSQL solution. It performs quite well for us, but currently we've faced next problem. Note: We are using Infinispan 5.1.6 in SYNC_REPL mode in small cluster. The problem is that when any node fails, any running transactions wait for Jgroups to decide if it've really failed or not and rollback because of SuspectException after that. While we can live with a delay, we'd really like to skip rolling back. As for me, I actually don't see a reason for rollback because transactions started after leave will succeed. So, as for me, previously running transactions could do the same. We're aware of the problem (https://issues.jboss.org/browse/ISPN-2402). @Dan, has there been any updates on this? The question for is if node that left will synchronize it's state after merge (even if merge was done without infinispan restart). As for me, it should or it won't work correctly at all. This is not in yet: https://issues.jboss.org/browse/ISPN-263 So, I've found RpcManager's ResponseMode.SYNCHRONOUS_IGNORE_LEAVERS and think on switching to it for RpcManager calls that don't specify ResponseMode explicitly. As for me, it should do the trick. Also, I am going to enforce Quorum number of reponses, but that's another story. So, how do you think, would it work? ^ Not sure if that'll work. @Dan? P.S. Another Q for me, how does it work now, when SuspectException is thrown from CommitCommand broadcasting. Af far as I can see, commit is still done on some remote nodes (that are still in the cluster), but rolled back on local node because of this exception. Am I correct? ^ How Infinispan reacts in these situations depends a lot on the type of communications (synchronous or asynchronous) and the transaction configuration. Mircea can provide more details on this. Cheers, This can cause inconsistencies, but
Re: [infinispan-dev] Using infinispan as quorum-based nosql
Say you have two transactions, tx1 and tx2. They both send a LockControlCommand(k1) to the primary owner of k1 (let's call it B). If the lock commands use SYNCHRONOUS_IGNORE_LEAVERS and B dies while processing the commands, both tx1 and tx2 will think they have succeeded in locking k1. So you're right, everything should be locked before prepare in pessimistic mode, but LockControlCommands are also susceptible to SuspectExceptions. On the other hand, you can use SYNCHRONOUS mode for LockControlCommands and you can just retry the transaction in case of a SuspectException. Unfortunately, you can't retry the transaction if the PrepareCommand fails (in pessimistic mode; or the CommitCommand in optimistic mode), because it is executed in the commit phase. The transaction manager swallows all the exceptions in the commit phase, making it impossible to see if it failed because of a node leaving. I guess this means I should increase the priority of https://issues.jboss.org/browse/ISPN-2402 ... On Thu, Jun 6, 2013 at 11:49 AM, vitalii.tymchys...@ubs.com wrote: ** Hello. We are using pessimistic transaction mode. In this case everything's already locked by the time of prepare, is not it? As of merge, for quorum mode it's simple - take data from quorum. I think I will try to simply suppress sending data from non-quorum members on merge. Because currently everyone sends it's data and it creates complete mess with unsynchronized data after merge (depending on the timing). Best regards, Vitalii Tymchyshyn -- *From:* infinispan-dev-boun...@lists.jboss.org [mailto: infinispan-dev-boun...@lists.jboss.org] *On Behalf Of *Dan Berindei *Sent:* Wednesday, June 05, 2013 12:04 PM *To:* infinispan -Dev List *Subject:* Re: [infinispan-dev] Using infinispan as quorum-based nosql On Mon, Jun 3, 2013 at 4:23 PM, vitalii.tymchys...@ubs.com wrote: Hello. Thanks for your information. I will subscribe and vote for the issues noted. In the meantime I've implemented hacky JgroupsTransport that downgrades all (but CacheViewControlCommand and StateTransferControlCommand) SYNCHRONOUS invokeRemotely calls to SYNCHRONOUS_IGNORE_LEAVERS and checks if required number of answers was received with a filter (I've tried to use original invokeRemotely return value but it often returns some strange value, like empty map). It seems to do the trick for me. But I am still not sure if this has any side effects. Indeed, I started working on a solution, but I over-engineered it and then I got side-tracked with other stuff. Sorry about that. The problem with using SYNCHRONOUS_IGNORE_LEAVERS everywhere, as I found out, is that you don't want to ignore the primary owner of a key leaving during a prepare/lock command (or the coordinator, in REPL mode prior to 5.3.0.CR1/ISPN-2772). If that happens, you have to retry on the new primary owner, otherwise you can't know if the prepare command has locked the key or not. A similar problem appears in non-transactional caches with supportsConcurrentUpdates=true: there the primary owner can ignore any of the backup owners leaving, but the originator can't ignore the primary owner leaving. For now I can see merge problem in my test: different values are picked during merge. I am going to dig a little deeper and follow up. But it's already a little strange for me, since the test algorithm is: 1)Assign old value to full cluster (it's REPL_SYNC mode) 2)Block coordinator 3)Writer new value to one of two remaining nodes. It's syncrhonized to second remaining node 4)Unblock coordinator 5)Wait (I could not find a good way to wait for state transfer but wait in this case). 6)Check the value on coordinator And in my test I am randomly getting old or new in assert. I am now going to check why. May be I will need to reinitialize smaller cluster part to ensure data is taken from the quorum part of the cluster. We don't handle merges properly. See https://issues.jboss.org/browse/ISPN-263 and the discussion at http://markmail.org/message/meyczotzobuva7js What happens right now is that after a merge, all the caches are assumed to have up-to-date data, so there is no state transfer. We had several ideas floating around on how we could force the smaller partition to receive data from the quorum partition, but I think with the public API your best option is to stop all the caches in the smaller partition after the split and start them back up after the merge. Cheers Dan Best regards, Vitalii Tymchyshyn -Original Message- From: infinispan-dev-boun...@lists.jboss.org [mailto: infinispan-dev-boun...@lists.jboss.org] On Behalf Of Galder Zamarreno Sent: Monday, June 03, 2013 9:04 AM To: infinispan -Dev List Subject: Re: [infinispan-dev] Using infinispan as quorum-based nosql On May 30, 2013, at 5:10 PM, vitalii.tymchys...@ubs.com wrote: Hello. We are going to use Infinispan in our project as NoSQL
Re: [infinispan-dev] Retrieval operations with the IGNORE_RETURN_VALUES flag
On Thu, Jun 6, 2013 at 9:08 PM, Ray Tsang saturn...@gmail.com wrote: On Jun 6, 2013, at 13:26, Mircea Markus mmar...@redhat.com wrote: On 4 Jun 2013, at 13:55, Dan Berindei dan.berin...@gmail.com wrote: CacheLoaderInterceptor and DistributionInterceptor both honour the IGNORE_RETURN_VALUES flag for get commands, but I think it would be more useful if they ignored it - just like they ignore it for conditional commands. That would make it possible for users to only keep a reference to a cache.getAdvancedCache().withFlags(IGNORE_RETURN_VALUES) and use it for both read and write operations. What do you think? If I was to take the role of a colleague of the person who's written the Infinispan code, it'd be very confused to see a cache reference created with IGNORE_RETURN_VALUES being used for a get() operation… I can see myself thinking: Why on earth do you call get with IGNORE_RETURN_VALUES? Isn't Galder's point not to allow invoking get with IGNORE_RETURN_VALUES? As both of you pointed out, Get + IGNORE_RETURN_VALUES doesn't make any sense :-) You'd think conditional operations with IGNORE_RETURN_VALUES don't make sense either, yet we have a special case to handle those as if the flag wasn't present :) I guess you're referring to ISPN-3141? Exactly. Does it make sense to call cache.withFlags(IGNORE_RETURN_VALUES).putIfAbsent(k, v)? What should it return? Still I think Get + IGNORE_RETURN_VALUES doesn't make any sense :-) +1. It definitely threw me off... Ok, maybe IGNORE_RETURN_VALUES wouldn't be the best flag name for what I had in mind... I was thinking of a scenario where the application needs to do both reads and writes, but for writes it never needs to know the previous value. In that scenario it would make sense to call something like cache = cacheManager.getCache().getAdvancedCache().withFlags(IGNORE_RETURN_VALUES_ON_WRITES) at the beginning and only ever use that reference in the application. I agree that using the existing IGNORE_RETURN_VALUES flag for that would be a bit misleading, though. Should we change anything about the IGNORE_RETURN_VALUES, then? I guess it would be relatively simple to make it so that get() operations with the flag throw an exception and (optionally) put() operations always return null. Should I create an issue in JIRA for that? Cheers Dan ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev
Re: [infinispan-dev] [infinispan-internal] PutMapCommand is ineffective
Yes, putAll is really heavy in non-tx (concurrent) mode, because the same PutMapCommand is forwarded from each primary owner to all the backup owners of the keys it primary-owns. However, I don't think However, in non-tx mode locks are owned by threads. A separate lock command would acquire a lock and associate it with its execution thread, making it impossible for a following write command to use the same lock. Changing putAll to implement Radim's proposal would indeed make it very similar to a transactional putAll: you'd need a pseudo-transaction object to associate the locks with, and a reaper to clean up the pseudo-transaction objects when the originator leaves the cluster. On Mon, Jun 10, 2013 at 1:33 PM, Manik Surtani msurt...@redhat.com wrote: Agreed. It does sound pretty heavy. We should investigate a better implementation - the two approaches you suggest both sound good, could you create a JIRA for this? Adding infinispan-dev, that's the correct place to discuss this. Cheers Manik On 7 Jun 2013, at 13:39, Radim Vansa rva...@redhat.com wrote: Hi, recently I was looking into the performance of PutMapCommand and what's in fact going on under the hood. From what I've seen (not from the code but from message flow analysis), in non-transactional synchronous mode this happens: A wants to execute PutMapCommand with many keys - let's assume that in fact the keys span all nodes in the cluster. 1. A locks all local keys and sends via unicast a message to each primary owner of some of the keys in the map 2. A sends unicast message to each node, requesting the operation 3. Each node locks its keys and sends multicast message to ALL other nodes in the cluster I don't think that's right... Each primary owner only sends this message to all the backup owners of the keys for which that node is the primary owner. So it will only send the message to all the other nodes (optimized as a multicast) if every other node is a backup owner for one of its primary keys. This happens N - 1 times: 4. Each node receives the multicast message, (updates the non-primary segments) and sends reply back to the sender of mcast message. 5. The primary owners send confirmation back to A. Let's compute how many messages are here received - it's N - 1 // A's request (N - 1) * (N - 1) // multicast message received (N - 1) * (N - 1) // reply to the multicast message received N - 1 // response to A That's 2*N^2 - 2*N messages. In case nobody needs flow control replenishments, nothing is lost etc. I don't like that ^2 exponent - does not look like the cluster is really scaling. It could be fun to see execute it on 64-node cluster, spawning thousands of messages just for one putAll (with, say 100 key-value pairs - I don't want to compute the exact probability on how many nodes would such set of keys have primary segments). Could the requestor orchestrate the whole operation? The idea is that all messages are sent only between requestor and other nodes, never between the other nodes. The requestor would lock the primary keys by one set of messages (waiting for reply), updating the non-primaries by another set of messages and then again unlocking all primaries by last message. The set of messages could be either unicast with selected keys only for the recipient, or multicast with whole map - rationalization which one is actually better is subject to performance test. This results in 6*N - 6 messages (or 5*N - 5 if the last message wouldn't require the reply). You can easily see when 5*(N - 1) is better than 2*N*(N - 1). Or is this too similar to transactions with multiple keys? I think that with current implementation, the putAll operation should be discouraged as it does not provide better performance than multiple put (and in terms of atomicity it's probably not much better either). WDYT? Radim --- Radim Vansa Quality Assurance Engineer JBoss Datagrid tel. +420532294559 ext. 62559 Red Hat Czech, s.r.o. Brno, Purkyňova 99/71, PSČ 612 45 Czech Republic -- Manik Surtani ma...@jboss.org twitter.com/maniksurtani Platform Architect, JBoss Data Grid http://red.ht/data-grid ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev
Re: [infinispan-dev] Suppressing state transfer via JMX
On Tue, Jun 11, 2013 at 2:01 PM, Manik Surtani msurt...@redhat.com wrote: On 10 Jun 2013, at 15:12, Dan Berindei dan.berin...@gmail.com wrote: Erik, I think in your case you'd be better served by a ConsistentHashFactory that always assigns at most one owner from each machine for each segment. I guess the fix for ISPN-3140 should work as well, but it wouldn't be very straightforward: you'd have to keep the rebalancingEnabled attribute set to false by default, and you'd have to enable it temporarily every time you have a topology change that you do want to process. Why? Does the workflow detailed in ISPN-3140 not work? ISPN-3140 is geared toward planned shutdowns, my understanding was that Erik's scenario involves an unexpected failure. Say we have a cluster with 4 nodes spread on 2 machines: A(m1), B(m1), C(m2), D(m2). If m2 fails, rebalancing will start automatically and m1 will have 2 copies of each entry (one on A and one on B). Trying to suspend rebalancing after m2 has already failed won't have any effect - if state transfer is already in progress it won't be cancelled. In order to avoid the unnecessary transfers, rebalancing would have to be suspended before the failure - i.e. rebalancing should be suspended by default. It's certainly possible to do this automatically from your app or from a monitoring daemon, but I'm pretty sure an enhanced topology-aware CHF would be a better fit. Do explain. A custom ConsistentHashFactory could distribute segments so that a machine never has more than 1 copy of each segment. If m2 failed, there would be just one machine in the cluster, and just one copy of each segment. The factory would not change the consistent hash, and there wouldn't be any state transfer. It could be even simpler - the existing TopologyAwareConsistentHashFactory/TopologyAwareSyncConsistentHashFactory implementations already ensure just one copy per machine if the number of machines is = numOwners. So a custom ConsistentHashFactory could just extend one of these and skip calling super.rebalance() when the number of machines in the cluster is numOwners. On Fri, Jun 7, 2013 at 1:45 PM, Erik Salter an1...@hotmail.com wrote: I'd like something similar. If I have equal keys on two machines (given an orthogonal setup and a TACH), I'd like to suppress state transfer and run with only one copy until I can recover my machines. The business case is that in a degraded scenario, additional replicas aren't going to buy me anything, as a failure will most likely be at the machine level and will cause me to lose data. Once I've recovered the other machine, I can turn back on state transfer to get my data redundancy. Erik -Original Message- From: infinispan-dev-boun...@lists.jboss.org [mailto:infinispan-dev-boun...@lists.jboss.org] On Behalf Of Mircea Markus Sent: Tuesday, June 04, 2013 5:44 AM To: infinispan -Dev List Subject: Re: [infinispan-dev] Suppressing state transfer via JMX Manik, what's wrong with Dan's suggestion with clearing the cache before shutdown? On 31 May 2013, at 14:20, Manik Surtani msurt...@redhat.com wrote: If we only want to deal with full cluster shutdown, then I think stopping all application requests, calling Cache.clear() on one node, and then shutting down all the nodes should be simpler. On start, assuming no cache store, the caches will start empty, so starting all the nodes at once and only allowing application requests when they've all joined should also work without extra work. If we only want to stop a part of the cluster, suppressing rebalancing would be better, because we wouldn't lose all the data. But we'd still lose the keys whose owners are all among the nodes we want to stop. I've discussed this with Adrian, and we think if we want to stop a part of the cluster without losing data we need a JMX operation on the coordinator that will atomically remove a set of nodes from the CH. After the operation completes, the user will know it's safe to stop those nodes without losing data. I think the no-data-loss option is bigger scope, perhaps part of ISPN-1394. And that's not what I am asking about. When it comes to starting a part of the cluster, a pause rebalancing option would probably be better - but again, on the coordinator, not on each joining node. And clearly, if more than numOwner nodes leave while rebalancing is suspended, data will be lost. Yup. This sort of option would only be used where data loss isn't an issue (such as a distributed cache). Where data loss is an issue, we'd need more control - ISPN-1394. Cheers, -- Mircea Markus Infinispan lead (www.infinispan.org) ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev ___ infinispan-dev mailing list infinispan-dev
Re: [infinispan-dev] Suppressing state transfer via JMX
On Wed, Jun 12, 2013 at 2:57 PM, Manik Surtani msurt...@redhat.com wrote: On 11 Jun 2013, at 16:27, Dan Berindei dan.berin...@gmail.com wrote: On Tue, Jun 11, 2013 at 2:01 PM, Manik Surtani msurt...@redhat.comwrote: On 10 Jun 2013, at 15:12, Dan Berindei dan.berin...@gmail.com wrote: Erik, I think in your case you'd be better served by a ConsistentHashFactory that always assigns at most one owner from each machine for each segment. I guess the fix for ISPN-3140 should work as well, but it wouldn't be very straightforward: you'd have to keep the rebalancingEnabled attribute set to false by default, and you'd have to enable it temporarily every time you have a topology change that you do want to process. Why? Does the workflow detailed in ISPN-3140 not work? ISPN-3140 is geared toward planned shutdowns, my understanding was that Erik's scenario involves an unexpected failure. Say we have a cluster with 4 nodes spread on 2 machines: A(m1), B(m1), C(m2), D(m2). If m2 fails, rebalancing will start automatically and m1 will have 2 copies of each entry (one on A and one on B). Trying to suspend rebalancing after m2 has already failed won't have any effect - if state transfer is already in progress it won't be cancelled. In order to avoid the unnecessary transfers, rebalancing would have to be suspended before the failure - i.e. rebalancing should be suspended by default. It's certainly possible to do this automatically from your app or from a monitoring daemon, but I'm pretty sure an enhanced topology-aware CHF would be a better fit. Do explain. A custom ConsistentHashFactory could distribute segments so that a machine never has more than 1 copy of each segment. If m2 failed, there would be just one machine in the cluster, and just one copy of each segment. The factory would not change the consistent hash, and there wouldn't be any state transfer. But that's bad for unplanned failures, as you lose data in that case. Erik's assumption was that machine failures were much more likely than individual node failures, and having a second copy on m1 wouldn't help if m1 failed as well. It could be even simpler - the existing TopologyAwareConsistentHashFactory/TopologyAwareSyncConsistentHashFactory implementations already ensure just one copy per machine if the number of machines is = numOwners. So a custom ConsistentHashFactory could just extend one of these and skip calling super.rebalance() when the number of machines in the cluster is numOwners. On Fri, Jun 7, 2013 at 1:45 PM, Erik Salter an1...@hotmail.com wrote: I'd like something similar. If I have equal keys on two machines (given an orthogonal setup and a TACH), I'd like to suppress state transfer and run with only one copy until I can recover my machines. The business case is that in a degraded scenario, additional replicas aren't going to buy me anything, as a failure will most likely be at the machine level and will cause me to lose data. Once I've recovered the other machine, I can turn back on state transfer to get my data redundancy. Erik -Original Message- From: infinispan-dev-boun...@lists.jboss.org [mailto:infinispan-dev-boun...@lists.jboss.org] On Behalf Of Mircea Markus Sent: Tuesday, June 04, 2013 5:44 AM To: infinispan -Dev List Subject: Re: [infinispan-dev] Suppressing state transfer via JMX Manik, what's wrong with Dan's suggestion with clearing the cache before shutdown? On 31 May 2013, at 14:20, Manik Surtani msurt...@redhat.com wrote: If we only want to deal with full cluster shutdown, then I think stopping all application requests, calling Cache.clear() on one node, and then shutting down all the nodes should be simpler. On start, assuming no cache store, the caches will start empty, so starting all the nodes at once and only allowing application requests when they've all joined should also work without extra work. If we only want to stop a part of the cluster, suppressing rebalancing would be better, because we wouldn't lose all the data. But we'd still lose the keys whose owners are all among the nodes we want to stop. I've discussed this with Adrian, and we think if we want to stop a part of the cluster without losing data we need a JMX operation on the coordinator that will atomically remove a set of nodes from the CH. After the operation completes, the user will know it's safe to stop those nodes without losing data. I think the no-data-loss option is bigger scope, perhaps part of ISPN-1394. And that's not what I am asking about. When it comes to starting a part of the cluster, a pause rebalancing option would probably be better - but again, on the coordinator, not on each joining node. And clearly, if more than numOwner nodes leave while rebalancing is suspended, data will be lost. Yup. This sort of option would only be used where data loss isn't an issue (such as a distributed cache
Re: [infinispan-dev] New bundler performance
On Wed, Jun 12, 2013 at 5:39 PM, Radim Vansa rva...@redhat.com wrote: - Original Message - | From: Bela Ban b...@redhat.com | To: infinispan-dev@lists.jboss.org | Sent: Wednesday, June 12, 2013 3:16:58 PM | Subject: Re: [infinispan-dev] New bundler performance | | | | On 06/12/2013 04:54 AM, Radim Vansa wrote: | Hi, | | I was going through the commits (running tests on each of them) to | seek the performance regression we've recently discovered and it | seems that our test (replicated udp non-transactional stress test on | 4 nodes) experiences a serious regression on the commit | | ISPN-2848 Use the new bundling mechanism from JGroups 3.3.0 | (73da108cdcf9db4f3edbcd6dbda6938d6e45d148) | | The performance drops from about 7800 writes/s to 4800 writes/s, and | from 1.5M reads/s to 1.2M reads/s (having slower reads in replicated | mode is really odd). | | | Is this using sync or async replication ? Sync | | You could set UDP.bundler_type=old to see if the old bundler makes a | difference. Yeah, that does not work at all, performance drops very low as everything is bundled | | Yes, reads should have *nothing* to do with the bundler as there should | not be any communication on reads ! | | Maybe you should instrument the reads with byteman and see if they cause | any communication. Maybe that's really just about overall system behaviour. If we waited for some communication, there wouldn't be million reads per second :) Higher CPU usage for the write commands will slow down everything else on the same machine. So as long as the test does reads and writes in parallel, there will be some interference. I suggest running a read-only test and a write-only test to get a more reliable picture of the performance for each of them. | | | It seems that the bundler is not really as good as we hoped for - it | may be the bottleneck. I have tried to create another bundler which | shares the queue between 4 instances of TransportQueueBundler (so, 4 | threads are actually sending the messages which go into one queue) | and the performance mildly improved - to 5200 writes/s, but that's | not enough. | | I'm currently at the summit, but do you have a test that I could run | when I'm back to reproduce the performance issue ? Would be RadarGun benchmark enough to you? See attachments. You have compile Infinispan from snapshot (113842c8cf91cbb5a1bbd26e05fab7024fdec081 is the last OK, 73da108cdcf9db4f3edbcd6dbda6938d6e45d148 is slower) and compile RadarGun from snapshot with mvn install -Pinfinispan53 (this creates also the plugin using infinispan-core-5.3.0-SNAPSHOT) Radim | | | | Radim | | Note: you may have seen my conversation with Pedro Ruivo on IRC about | the bundler several days ago, in that time our configuration had old | bundler. This was fixed, but as I have not built Infinispan properly | (something got cached), I have not noticed the regression between | these builds. | | -- | Bela Ban | Lead JGroups / Clustering Team | JBoss | ___ | infinispan-dev mailing list | infinispan-dev@lists.jboss.org | https://lists.jboss.org/mailman/listinfo/infinispan-dev | ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev
Re: [infinispan-dev] Retrieval operations with the IGNORE_RETURN_VALUES flag
On Wed, Jun 12, 2013 at 3:39 PM, Mircea Markus mmar...@redhat.com wrote: On 12 Jun 2013, at 13:14, Galder Zamarreño gal...@redhat.com wrote: On Jun 10, 2013, at 12:01 PM, Adrian Nistor anis...@redhat.com wrote: Maybe we could just clarify the javadoc of IGNORE_RETURN_VALUES and say that it only applies to write operations and is ignored for everything else? Why punish the user with an exception when doing a 'get'? We already document there's a (very common-sense) exception for conditional writes were the flag is ignored (ISPN-3141). I wonder if anyone noticed my reply earlier... The flag business does need a big re-think. Not only to separate internal from external flags (we have a jira for that [1]), but also to have a way to define which flags can be passed to a particular operation, in a way that's type-safe, and without resulting in a runtime error of the likes of X flag cannot be used with Y operation. IOW, any error on which flag can be used with what operation should ideally be caught at compilation time. I don't have specific ideas on this right now, but I think it'd be good to achieve this. IOW, I suggest we leave it as it is. We need to re-think it anyway. So let's tackle it in 6.0 so that a get operation can never be passed IGNORE_RETURN_VALUES flag, and this being something that's caught at **compilation time**. this would be the elegant way of doing it. An even more elegant solution would be to make put return void by default, and force the user to state explicitly that he wants a return value - a la JSR-107. I know that would break backwards compatibility, so it may not be an option even in 6.0, but still I think we should focus on that. Maybe it's worth making other flags type-safe, but I don't think it's worth doing it for IGNORE_RETURN_VALUES. I'm just about to add another internal flag to Flag as a result of the JCache 0.7 upgrade…, so need to tackle ISPN-2201 to avoid causing more confusion, and alongside avoid the issues that have been highlighted WRT which operations are allowed which flags. I'm happy to do this for 6.0. [1] https://issues.jboss.org/browse/ISPN-2201 I've update the JIRA to track the fact that IGNORE_RETURN_VALUES + get should not be possible. Cheers, -- Mircea Markus Infinispan lead (www.infinispan.org) ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev
Re: [infinispan-dev] partial shutdown problem when shutting down the coordinator
Done: https://github.com/infinispan/infinispan/pull/1900 On Thu, Jun 13, 2013 at 4:15 PM, Mircea Markus mmar...@redhat.com wrote: Hi Dan, snip But unfortunately the partial shutdown case does NOT work, if the coordinator node is a member of partial shutdown nodes. When the coordinator is down while suspending rebalance, one of the rest servers is becoming a new coordinator with rebalanceEnabled=true, and after that, the rebalance process is starting. Once this happened, it will cause the unexpected cache data accumulation on the nodes that are awaiting shutdown request. /snip Seems like this scenario is not covered. I guess we should broadcast (to all the members) the clean shutdown request so that the new coordinator would know pick it up and don't start the rebalance? Cheers, -- Mircea Markus Infinispan lead (www.infinispan.org) ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev
Re: [infinispan-dev] KeyAffinityService
Ben, I'm pretty sure it's not possible to define getKeyForAddress as returning the same type as its input key. Let's say f1(address)(x) = getKeyForAddress(address, x) f1(address) has to be injective for each address. So if the size of the key domain is size(K) and the number of addresses is N, the number of unique output values of getKeyForAddress is N * size(K), which is clearly not possible. It's probably possible to define something like this instead: PinnedKeyK getKeyForAddress(address, key) Even so, keeping the result key pinned to the same address after a rebalance would be tricky - probably requiring some changes to the ConsistentHash implementation. Of course, the node to which the key is pinned may also die, and I'm not sure where the keys should be located then. For the existing KeyAffinityService methods it's simple: they give you a key that is currently mapped to a specific node, but they don't make any guarantees about where that key will be located in the future. Speaking of which, how do you know which keys should map to which node in your application? If the rules are simple enough, you may be able to create multiple caches, each of them with a custom ConsistentHashFactory that locates *all* the keys to the same primary node (with random backup owners). Cheers Dan On Fri, Jun 14, 2013 at 3:43 PM, cotton-ben ben.cot...@alumni.rutgers.eduwrote: Thanks for this reply, Mircea. Very interesting approach. By dispatching a distributed executor back to the node (node 0) that produced the pinned key affinity for the object's natural key, we could indeed do an iterative search (from node 2) to find the original 'myObject' in the pinned cache (on node 0). As you point out, however, this approach is prohibitively costly -- we are effectively doing an iterative search across nodes. Again, we see the best approach as being to use the enhancement to the Key Affinity Service API suggested in https://issues.jboss.org/browse/ISPN-3112. Once *K getKeyForAddress(Address pinnedNodeAddress, K objectNaturalKey);* is provided, we will be able to proceed as simply as follows: //given that all nodes know that all birthday Entry(s) are pinned to Node 0 @t=0, from node=0, pin Entry to cache at Node 0 Object pinnedKey = getKeyForAddress(getAddress(Node0), 1962-08-10); cache.put(pinnedKey,Ben D. Cotton III birthday); Later @t=1, from node =2, we can use the improved API (suggested in the JIRA) from a 'foreign' node without any need to do a distributed, iterative search, i.e. Object pinnedKey = getKeyForAddress(getAddress(Node0), 1962-08-10); Object birthdayCelebrant = cache.get(pinnedKey); //return Ben D. Cotton III birthday Of course, for us to be able to do this, the ISPN team would have to deliver to us the API recommended in the JIRA https://issues.jboss.org/browse/ISPN-3112. Don't you agree such an API enhancement would be an awesome capability improvement for ISPN users wishing to operate in a distributed fashion on pinned Caches? :-) As always, thanks to you and this forum for its fine support, Ben -- View this message in context: http://infinispan-developer-list.980875.n3.nabble.com/KeyAffinityService-nice-2-recommendations-tp4027152p4027401.html Sent from the Infinispan Developer List mailing list archive at Nabble.com. ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev
Re: [infinispan-dev] KeyAffinityService
On Fri, Jun 14, 2013 at 5:46 PM, cotton-ben ben.cot...@alumni.rutgers.eduwrote: /It's probably possible to define something like this instead: PinnedKeyK getKeyForAddress(address, key) / This is fine. Do it this way. /Even so, keeping the result key pinned to the same address after a rebalance would be tricky - probably requiring some changes to the ConsistentHash implementation. Of course, the node to which the key is pinned may also die, and I'm not sure where the keys should be located then. / But, shouldn't some combination of my usage of the ISPN GrouperT and the ISPN KeyAffintyService capabilities cover me? In my understanding, the Grouper capability can allow me to group my keys to a specific node (however node identity is anonymous). Grouper capability's ambition is to transparently survive topology changes of nodes participating in my grid. In my understanding, the KAS capability can allow me to pin my keys to a specific node (by node identity). A potetnailly great tutorial document for ISPN users might be How to harmoniously use the ISPN Grouper and ISP KeyAffinityService Capabilities. How to use these 2 meritorious capabilies together (and in harmony) is still unclear to me. Just to be clear, KAS doesn't really allow you to pin a key to a certain node. It only gives you a key that maps to the given node the moment you called KAS.getKeyForAddress(Address) - by the time you call cache.put() with the pinned key, it may have already moved to a different node. Would that be enough for you? If yes, adding the new method should be relatively easy to accomplish with what we have now (although the fact that a grouping key can only be a String does impose some limitations). But it has been my impression that you want something that would keep the key pinned to the same node as long as that node was alive. If that's the case, then implementing it would require significant additional work. /Speaking of which, how do you know which keys should map to which node in your application?/ We just know. :-) Seriously, we do have a mechanism for resolving pinnedKey identity to Node identity custody. I'm just wondering, wouldn't it be simpler if you created a custom ConsistentHashFactory to map keys to nodes using the same mechanism you already have? Not as easy as the API you proposed, but it wouldn't require a separate cache either... /If the rules are simple enough, you may be able to create multiple caches, each of them with a custom ConsistentHashFactory that locates *all* the keys to the same primary node (with random backup owners)./ Very true. But we don't think that is as nice as extending the API in the way recommend. To be blunt, we could just do something as simple as this: @t=0, from node=0, pin Entry to cache at Node 0 Object pinnedKey = getKeyForAddress(getAddress(Node0),1962-08-10); I suppose you meant pinnedKey = getKeyForAddress(getAddress( Node0)) here. With this solution, pinned keys could also move from the initial target node to another node as the cache topology changes (and not all keys will move to the same node). globalReplicatedPinnedKeyCache.put(1962-08-10, pinnedKey); //cache the pinnedKeys globally pinnedCache.put(pinnedKey,Ben D. Cotton III birthday); Later @t=1, from node =2 Object pinnedKey = globalReplicatedPinnedKeyCache.get(1962-08-10); Object birthdayCelebrant = pinnedCache.get(pinnedKey); //return Ben D. Cotton III birthday But that is effectively forcing us to do 2 explicit CacheK,V#get() operations in application code. First to get the pinnedKey (given the naturalKey). Second to get the pinnedValue (given the pinnedKey). Not preferred -- but DOABLE, FOR SURE. What do you think? Just for the niceness of it, do you think we might be able to have the API suggested at https://issues.jboss.org/browse/ISPN-3112 ? My main concern is that you actually want something more than the KeyAffinityService is able to give you ATM - even if we did add the niceness in the API, the keys still won't be pinned to a certain node for as long as that node is alive. ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev
Re: [infinispan-dev] Doubts about TxDistributionInterceptor and possible break in transaction isolation
On Mon, Jun 17, 2013 at 3:58 PM, Pedro Ruivo pe...@infinispan.org wrote: On 06/17/2013 12:56 PM, Mircea Markus wrote: On 17 Jun 2013, at 11:52, Pedro Ruivo pe...@infinispan.org wrote: I've been looking at TxDistributionInterceptor and I have a couple of questions (assuming REPEATABLE_READ isolation level): #1. why are we doing a remote get each time we write on a key? (huge perform impact if the key was previously read) indeed this is suboptimal for transactions that write the same key repeatedly and repeatable read. Can you please create a JIRA for this? created: https://issues.jboss.org/browse/ISPN-3235 Oops... when I fixed https://issues.jboss.org/browse/ISPN-3124 I removed the SKIP_REMOTE_LOOKUP, thinking that the map is already in the invocation context so there shouldn't be any perf penalty. I can't put the SKIP_REMOTE_LOOKUP flag back, otherwise delta writes won't have the previous value during state transfer, so +1 to fixing ISPN-3235. #2. why are we doing a dataContainer.get() if the remote get returns a null value? Shouldn't the interactions with data container be performed only in the (Versioned)EntryWrappingInterceptor? This was added in the scope of ISPN-2688 and covers the scenario in which a state transfer is in progress, the remote get returns null as the remote value was dropped (no longer owner) and this node has become the owner in between. ok :) Yeah, this should be correct as long as we check if we already have the key in the invocation context before doing the remote + local get. #3. (I didn't verify this) why are we acquire the lock is the remote get is performed for a write? This looks correct for pessimistic locking but not for optimistic... I think that, given that the local node is not owner, the lock acquisition is redundant even for pessimistic caches. Mind creating a test to check if dropping that lock acquisition doesn't break things? I created a JIRA with low priority since it does not affect the transaction outcome/isolation and I believe the performance impact should be lower (you can increase the priority if you want). https://issues.jboss.org/browse/ISPN-3237 If we don't lock the L1 entry, I think something like this could happen: tx1@A: remote get(k1) from B - stores k1=v1 in invocation context tx2@A: write(k1, v2) tx2@A: commit - writes k1=v2 in L1 tx1@A: commit - overwrites k1=v1 in L1 After this analysis, it is possible to break the isolation between transaction if I do a get on the key that does not exist: tm.begin() cache.get(k) //returns null //in the meanwhile a transaction writes on k and commits cache.get(k) //return the new value. IMO, this is not valid for REPEATABLE_READ isolation level! Indeed sounds like a bug, well spotted. Can you please add a UT to confirm it and raise a JIRA? created: https://issues.jboss.org/browse/ISPN-3236 IMO, this should be the correct behaviour (I'm going to add the test cases later): tm.begin() cache.get(k) //returns null (op#1) //in the meanwhile a transaction writes on k and commits write operation performed: * put: must return the same value as op#1 * conditional put //if op#1 returns null the operation should be always successful (i.e. the key is updated, return true). Otherwise, the key remains unchanged (return false) * replace: must return the same value as op#1 * conditional replace: replace should be successful if checked with the op#1 return value (return true). Otherwise, the key must remain unchanged (return false). * remote: must return the same value as op#1 * conditional remove: the key should be removed if checked with the op#1 return value (return true). Otherwise, the key must remain unchanged (return false) Also, the description above should be valid after a removal of a key. Cheers, ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev
Re: [infinispan-dev] Doubts about TxDistributionInterceptor and possible break in transaction isolation
On Mon, Jun 17, 2013 at 7:00 PM, Mircea Markus mmar...@redhat.com wrote: On 17 Jun 2013, at 16:11, Dan Berindei dan.berin...@gmail.com wrote: I think that, given that the local node is not owner, the lock acquisition is redundant even for pessimistic caches. Mind creating a test to check if dropping that lock acquisition doesn't break things? I created a JIRA with low priority since it does not affect the transaction outcome/isolation and I believe the performance impact should be lower (you can increase the priority if you want). https://issues.jboss.org/browse/ISPN-3237 If we don't lock the L1 entry, I think something like this could happen: There is a lock happening *without* L1 enabled. Nope, tx1 doesn't lock k1 on B because it doesn't do a put(k1, v3) - it only reads the value from B. So even if tx2 does lock k1 on B, it doesn't add any synchronization between tx1 and tx2. But tx1 does write the entry to L1 on A, so it should acquire an L1 lock on A - and tx2 should also acquire the same lock. tx1@A: remote get(k1) from B - stores k1=v1 in invocation context tx2@A: write(k1, v2) tx2@A: commit - writes k1=v2 in L1 tx1@A: commit - overwrites k1=v1 in L1 Cheers, -- Mircea Markus Infinispan lead (www.infinispan.org) ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev
Re: [infinispan-dev] Doubts about TxDistributionInterceptor and possible break in transaction isolation
On Mon, Jun 17, 2013 at 6:35 PM, William Burns mudokon...@gmail.com wrote: On Mon, Jun 17, 2013 at 11:11 AM, Dan Berindei dan.berin...@gmail.comwrote: On Mon, Jun 17, 2013 at 3:58 PM, Pedro Ruivo pe...@infinispan.orgwrote: On 06/17/2013 12:56 PM, Mircea Markus wrote: On 17 Jun 2013, at 11:52, Pedro Ruivo pe...@infinispan.org wrote: I've been looking at TxDistributionInterceptor and I have a couple of questions (assuming REPEATABLE_READ isolation level): #1. why are we doing a remote get each time we write on a key? (huge perform impact if the key was previously read) indeed this is suboptimal for transactions that write the same key repeatedly and repeatable read. Can you please create a JIRA for this? created: https://issues.jboss.org/browse/ISPN-3235 Oops... when I fixed https://issues.jboss.org/browse/ISPN-3124 I removed the SKIP_REMOTE_LOOKUP, thinking that the map is already in the invocation context so there shouldn't be any perf penalty. I can't put the SKIP_REMOTE_LOOKUP flag back, otherwise delta writes won't have the previous value during state transfer, so +1 to fixing ISPN-3235. #2. why are we doing a dataContainer.get() if the remote get returns a null value? Shouldn't the interactions with data container be performed only in the (Versioned)EntryWrappingInterceptor? This was added in the scope of ISPN-2688 and covers the scenario in which a state transfer is in progress, the remote get returns null as the remote value was dropped (no longer owner) and this node has become the owner in between. ok :) Yeah, this should be correct as long as we check if we already have the key in the invocation context before doing the remote + local get. #3. (I didn't verify this) why are we acquire the lock is the remote get is performed for a write? This looks correct for pessimistic locking but not for optimistic... I think that, given that the local node is not owner, the lock acquisition is redundant even for pessimistic caches. Mind creating a test to check if dropping that lock acquisition doesn't break things? I created a JIRA with low priority since it does not affect the transaction outcome/isolation and I believe the performance impact should be lower (you can increase the priority if you want). https://issues.jboss.org/browse/ISPN-3237 If we don't lock the L1 entry, I think something like this could happen: tx1@A: remote get(k1) from B - stores k1=v1 in invocation context tx2@A: write(k1, v2) tx2@A: commit - writes k1=v2 in L1 tx1@A: commit - overwrites k1=v1 in L1 This one is just like here: referenced in https://issues.jboss.org/browse/ISPN-2965?focusedCommentId=12779780page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-12779780 Yep, it's the same thing. And even locking doesn't help in this case since it doesn't lock the key for a remote get only a remote get in the context of a write - which means the L1 could be updated concurrently in either order - causing possibly an inconsistency. This will be solved when I port the same fix I have for https://issues.jboss.org/browse/ISPN-3197 for tx caches. I thought the locking happened for all remote gets, and that's how I think it should work. We don't have to keep the lock for the entire duration of the transaction, though. If we write the L1 entry to the data container during the remote get, like you suggested in your comment, then we could release the L1 lock immediately and remote invalidation commands would be free to remove the entry. After this analysis, it is possible to break the isolation between transaction if I do a get on the key that does not exist: tm.begin() cache.get(k) //returns null //in the meanwhile a transaction writes on k and commits cache.get(k) //return the new value. IMO, this is not valid for REPEATABLE_READ isolation level! Indeed sounds like a bug, well spotted. Can you please add a UT to confirm it and raise a JIRA? created: https://issues.jboss.org/browse/ISPN-3236 IMO, this should be the correct behaviour (I'm going to add the test cases later): tm.begin() cache.get(k) //returns null (op#1) //in the meanwhile a transaction writes on k and commits write operation performed: * put: must return the same value as op#1 * conditional put //if op#1 returns null the operation should be always successful (i.e. the key is updated, return true). Otherwise, the key remains unchanged (return false) * replace: must return the same value as op#1 * conditional replace: replace should be successful if checked with the op#1 return value (return true). Otherwise, the key must remain unchanged (return false). * remote: must return the same value as op#1 * conditional remove: the key should be removed if checked with the op#1 return value (return true). Otherwise, the key must remain unchanged (return false) Also, the description above should
Re: [infinispan-dev] FAO Infinispan Developers: Must set JAVA_HOME_7 environment variable
If I run the whole test suite with OpenJDK 1.7, do I still need to set the JAVA_HOME_7 variable? On Wed, Jun 19, 2013 at 11:47 AM, Galder Zamarreño gal...@redhat.comwrote: Hi all, Starting with Infinispan 5.3.0.CR2, anyone who builds Infinispan requires an environment variable called JAVA_HOME_7 to be set and point to a JDK7 installation. This is required in order to run the JCache TCK, which requires Java 7, and runs automatically as part of the Infinsipan build (when testing...). @Mircea, the TCK is not running on CI cos the environment variable is not set, see [1] log snippet taken from [2]. Can you sort out this in the CI machine? The TCK run build failure was ignored cos test failures are ignored, and this property is only used when running the TCK per se. Cheers, [1] https://gist.github.com/galderz/39851953ca4d6b4af899 [2] http://ci.infinispan.org/viewLog.html?buildId=2072buildTypeId=bt2tab=buildLog#_ -- Galder Zamarreño gal...@redhat.com twitter.com/galderz Project Lead, Escalante http://escalante.io Engineer, Infinispan http://infinispan.org ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev
Re: [infinispan-dev] L1 Data Container
This would only affect the Cache internals (use two DataContainers instead of one). Users still won't have direct access to the L1 cache. On Wed, Jun 19, 2013 at 4:24 PM, cotton-ben ben.cot...@alumni.rutgers.eduwrote: Instinctively, this is a very attractive idea. Would the L1 data container be a separate, but equal container? i.e. would it be a separate instance from the DIST_SYNC cache but the exact same data structure? or Would the L1 data container be separate, and different container? i.e. would it be a completely different data structure (and API) from the DIST_SYNC cache? -- View this message in context: http://infinispan-developer-list.980875.n3.nabble.com/infinispan-dev-L1-Data-Container-tp4027447p4027449.html Sent from the Infinispan Developer List mailing list archive at Nabble.com. ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev
Re: [infinispan-dev] L1 Data Container
On Wed, Jun 19, 2013 at 5:19 PM, Sanne Grinovero sa...@infinispan.orgwrote: On 19 June 2013 13:44, William Burns mudokon...@gmail.com wrote: All the L1 data for a DIST cache is stored in the same data container as the actual distributed data itself. I wanted to propose breaking this out so there is a separate data container for the L1 cache as compared to the distributed data. I thought of a few quick benefits/drawbacks: Benefits: 1. L1 cache can be separately tuned - L1 maxEntries for example -1! I don't think thats a benefit actually, from the point of view of a user: as a user I only know I have a certain amount of memory available on each node, and the application is going to use certain data way more often than others. The eviction strategy should be put in condition to be able to make an optimal choice about which entries - among all - are better kept in memory vs. passivated. I don't see a specific reason to favour keeping in memory owned entries over an L1 entry: the L1 entry might be very hot, and the owned entry might be almost-never read. Considering that even serving a Get operation to another node (as owners of the entry) makes the entry less likely to be passivated (it counts as a hit), the current design naturally provides an optimal balance for memory usage. At the opposite site, I don't see how - as a user - I could optimally tune a separate container. 2. L1 values will not cause eviction of real data -1 That's not a benefit, as I explained above. Real Data is not less important, especially if it's never used. Granted I'm making some assumptions about the application having some hot-data and some less hot data, and not being able to take advantage of node pinning or affinity strategies.. but that is another way to say that I'm assuming the user needs L1: if it was possible to apply these more advanced strategies I'd disable L1 altogether. You're also assuming that data can always be recreated from another source, but I don't think that's always the case. If Infinispan is the canonical data store and there is no cache store, you can't enable eviction for the real data and with a single container you can't enable eviction for the L1 entries either. 3. Would make https://issues.jboss.org/browse/ISPN-3229 an easy fix 4. Could add a new DataContainer implementation specific to L1 with additional optimizations You have some example of what you have in mind? Considering you would need to consider the optimal balance usage of the available heap space, I suspect that would be quite hard. 5. Help with some concurrency issues with L1 without requiring wider locking (such as locking a key for an entire ClusteredGet rpc call) - https://issues.jboss.org/browse/ISPN-3197. I don't understand this. L1 entries require the same level of consistency than any other entry so I suspect you would need the same locking patterns replicated. The downside would be that you get duplication of the same logic. Remember also that L1 is having some similarities with entries still hanging around when they where previously stored in this node after a state transfer.. today these are considered L1-active entries, if you change the storage you would need to design a migration of state from one container to the other; migration of state might not be too hard, doing it while guaranteeing consistent locking is going to be I guess as hard as considering the L1 problem today. Drawbacks: 1. Would require, depending on configuration, an additional thread for eviction 2. Users upgrading could have double memory used up due to 2 data containers This drawback specifically is to be considered very seriously. I don't think people would be happy to buy and maintain a twice as large datacenter than what they actually need. Sanne Both?: 1. Additional configuration available a. Add maxEntires just like the normal data container (use data container size if not configured?) b. Eviction wakeup timer? We could just reuse the task cleanup frequency? c. Eviction strategy? I would think the default data container's would be sufficient. I was wondering what you guys thought. Thanks, - Will ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev
[infinispan-dev] Infinispan 5.3.0.Final is out!
Dear Infinispan community, We're proud to announce the final release of Infinispan 5.3.0 Tactical Nuclear Penguin. Besides increased stability (130+ bug fixes) this release also brings some highly demanded features: - Total Order transaction protocolhttp://infinispan.blogspot.co.uk/2013/04/faster-transaction-protocols-in.htmldeveloped within the scope of the CloudTM project yielding significant performance improvements under concurrent access - Support for JSR-107http://infinispan.blogspot.co.uk/2013/04/give-java-caching-standard-api-go-using.html(Java Caching API) implementation - A new implementation of the Lucene Directory for Infinispan based on Lucene 4 - A new packaginghttp://infinispan.blogspot.co.uk/2013/04/infinispan-server-530alpha1.htmlfor the Infinispan server modules, based on the JBoss AS - SSL access to Hot Rodhttp://infinispan.blogspot.co.uk/2013/05/infinispan-server-remote-protocols.html - Interoperabilityhttp://infinispan.blogspot.co.uk/2013/05/interoperability-between-embedded-and.htmlbetween Hot Rod, Memcached, REST and embedded mode - Storing arrayshttps://docs.jboss.org/author/display/ISPN/Storing+objects+%28e.g.+arrays%29+with+custom+Equivalence+functionsin Infinispan as keys and values - Several new cache stores: a mongoDB cache storehttp://infinispan.blogspot.co.uk/2013/06/using-mongodb-as-cache-store.html(courtesy of Guillaume Scheibel), a JPA based cache storehttp://infinispan.blogspot.co.uk/2013/05/introducing-jpa-cache-store.htmland a LevelDB cache store https://issues.jboss.org/browse/ISPN-2657 (courtesy of Ray Tsang) For a complete list of features included in this release refer to the release noteshttps://issues.jboss.org/secure/ReleaseNote.jspa?projectId=12310799version=12320550 . Visit our downloads http://www.jboss.org/infinispan/downloads section to find the latest release and if you have any questions please check our forums http://www.jboss.org/infinispan/forums, our mailing listshttps://lists.jboss.org/mailman/listinfo/infinispan-devor ping us directly on IRC. Time to move forward now: Infinispan 6.0 will bring some significant improvements https://community.jboss.org/en/infinispan?view=documents to the Inifinispan ecosystem and also a shift in licence to Apache Software Licencehttp://infinispan.blogspot.co.uk/2013/05/infinispan-to-adopt-apache-software.html Stay tuned! Cheers, Dan Berindei ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev
Re: [infinispan-dev] Introducing builder subpackages
-1 as well, as a user you only use ConfigurationBuilder and GlobalConfigurationBuilder directly, and moving them to a subpackage would make them less accessible. If anything, I'd move these two classes to the parent package (org.infinispan.configuration), but like Sanne said there's no point in breaking the configuration again. On Wed, Jun 26, 2013 at 10:02 AM, Mircea Markus mmar...@redhat.com wrote: On 26 Jun 2013, at 09:54, Sanne Grinovero sa...@infinispan.org wrote: Please don't break the configuration API again :-( On 26 June 2013 06:29, Navin Surtani nsurt...@redhat.com wrote: While working through ISPN-2463, and the sub-tasks I was wondering about the organisation of the ConfigurationBuilder classes. Currently, they are located in org.infinispan.configuration.cache.etc or org.infinispan.configuration.global.etc. The actual Configuration classes are within the same package already as well. To me, this sounds a little bit cluttered and perhaps not very intuitive and I was wondering if it might be a better idea to have something like: org.infinispan.configuration.cache.builders.ConfigurationBuilder (and others) org.infinispan.configuration.global.builders.GlobalConfigurationBuilder (etc etc) Another suggestion could be: org.infinispan.configuration.builders.cache.etc org.infinispan.configuration.builders.glocal.etc The only problem with that would be breaking backward compatibility, but from ISPN 6.x onwards I think that there are a fair few classes and packages being moved around anyway. It's just an idea that might make the API seem a little bit cleaner as to where it's located. Thoughts? Indeed sounds is a good suggestion but as Sanne mentioned this would break the backward compatibility which is something we want to avoid. Cheers, -- Mircea Markus Infinispan lead (www.infinispan.org) ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev
Re: [infinispan-dev] TeamCity posting a message in pull req if any test failed?
On Thu, Jun 27, 2013 at 2:52 PM, Galder Zamarreño gal...@redhat.com wrote: On Jun 27, 2013, at 12:09 PM, Manik Surtani msurt...@redhat.com wrote: On 27 Jun 2013, at 09:57, Galder Zamarreño gal...@redhat.com wrote: On Jun 26, 2013, at 12:28 PM, Dan Berindei dan.berin...@gmail.com wrote: TeamCity also updates the status of the PR (right below the description). ^ Sure, it updates the status, but we get no notification. IOW, it's up to you to go and check the status which is not ideal. I'd like to see some kind of mail notification, or comment when the test finishes. The AS guys did this in their extension... Unfortunately the status disappears the moment the PR is updated OR the merge branch is updated (because the status is set on the commit that is the merge result). I also receive emails from TeamCity and notifications in IntelliJ for all my PRs with failing tests - if you don't, make sure you have set up the VCS user name in your TeamCity settings. ^ Emails from TeamCity? Notifications in IntelliJ? I wasn't aware of either… Any docu on this? http://www.jetbrains.com/idea/features/teamcity_integration.html Oh, there's a special plugin for this… I was looking for it in the preferences :) Thanks! You can also configure your email notifications here: http://ci.infinispan.org/profile.html?tab=userGeneralSettings ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev
Re: [infinispan-dev] L1 Consistency with Sync Caches
On Thu, Jun 27, 2013 at 4:18 PM, William Burns mudokon...@gmail.com wrote: First off I apologize for the length. There have been a few Jiras recently that have identified L1 consistency issues with both TX and non TX sync caches. Async caches with L1 have their own issues as well, but I only wanted to talk about sync caches. https://issues.jboss.org/browse/ISPN-3197 https://issues.jboss.org/browse/ISPN-2965 https://issues.jboss.org/browse/ISPN-2990 I have proposed a solution in https://github.com/infinispan/infinispan/pull/1922 which should start L1 consistency down the right track. There are quite a few comments on it if you want to look into it more, but because of that I am moving this to the dev mailing list. The key changes in the PR are the following (non-tx): 1. Concurrent reads for a key that can retrieve a remote value are corralled into a single thread of execution for that given key. This would reduce network traffic with concurrent gets for the same key. Note the corralling only happens on a per key basis. Get commands on owners should not be serialized. Get commands on non-owners should not be serialized either, if the key already exists in L1. So I'd say L1ReadSynchronizer should be L1WriteSynchronizer instead :) 2. The single thread that is doing the remote get would update the L1 if able (without locking) and make available the value to all the requests waiting on the get. Well, L1ReadSynchronizer does prevent other threads from modifying the same key, so we are locking the key - just not using LockManager. It would also require StateTransferLock.acquireSharedTopologyLock() to make sure it doesn't write an L1 entry after the node became a proper owner. 3. Invalidations that are received would first check to see if there is a current remote get occurring for it's keys. If there is it will attempt to cancel the L1 write(s) before it occurs. If it cannot cancel the L1 write, then it must also wait on the current remote get completion and subsequently run the invalidation. Note the cancellation would fail when the remote get was done and it is in the middle of updating the L1, so this would be very small window. I think it would be clearer to describe this as the L1 invalidation cancelling the remote get, not the L1 update, because the actual L1 update can't be cancelled. We also have to remove the logic in AbstractLockingInterceptor that skips L1 invalidation for a key if it can't acquire a lock with a 0 timeout. 4. Local writes will also do the same thing as the invalidation with cancelling or waiting. Note that non tx local writes only do L1 invalidations and don't write the value to the data container. Reasons why I found at https://issues.jboss.org/browse/ISPN-3214 I didn't know about ISPN-3214 or that non-tx writes don't write to L1, but it sounds fair. 5. Writes that require the previous value and don't have it in the L1 would also do it's get operations using the same corralling method. The remoteGetBeforeWrites are a bit different - they don't happen on non-owners, they only happen on writeCH-owners that didn't receive that entry via state transfer yet. They put the value in the InvocationContext, but they don't write it to the data container - nor do they invalidate the L1 entry, if it exists. 4/5 are not currently implemented in PR. This approach would use no locking for non tx caches for all L1 operations. The synchronization point would be done through the corralling method and invalidations/writes communicating to it. Transactional caches would do almost the same thing as non-tx. Note these changes are not done in any way yet. 1. Gets would now update the L1 immediately after retrieving the value without locking, but still using the corralling technique that non-tx does. Previously the L1 update from a get was transactional. This actually would remedy issue [1] 2. Writes currently acquire the remote lock when committing, which is why tx caches are able to update the L1 with the value. Writes would do the same cancellation/wait method as non-tx. 3. Writes that require the previous value and don't have it in the L1 would also do it's get operations using the same method. Just like for non-tx caches, I don't think these remote gets have to be stored in L1. 4. For tx cache [2] would also have to be done. [1] - https://issues.jboss.org/browse/ISPN-2965?focusedCommentId=12779780page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-12779780 [2] - https://issues.jboss.org/browse/ISPN-1540 Also rehashing is another issue, but we should be able to acquire the state transfer lock before updating the L1 on a get, just like when an entry is committed to the data container. The same for L1 invalidations - we don't want to remove real entries from the data container after the local node became an owner. Any comments/concerns would be appreciated. Thanks, -
Re: [infinispan-dev] L1 Consistency with Sync Caches
On Thu, Jun 27, 2013 at 4:40 PM, William Burns mudokon...@gmail.com wrote: Comments that were outstanding on PR: @danberindei: +1 to move the discussion to the mailing list, could you summarize your changes (preferably for both non-tx and tx cases) and send an email to the list? And now to add some more to this already unwieldy discussion :) 1. Many interceptors check ownership, I don't think that would be a problem. Besides, I think creating a new L1ReadSynchronizer for every read is just as bad for performance as locking the key on every read, so you'd need that check either way. We can't use a try lock approach for L1 invalidation when gets are locked without possibly dropping a L1 update We can't use a timed lock approach for L1 invalidation when writes lock the key as we could get into a timed deadlock situation when another node concurrently writes to a value/stripe I still don't see how we can get away with locking on a get. What are you proposing? I'm proposing something like this: 1. For get commands: acquire local lock + remote get + store in L1 + release local lock 2. For invalidate commands: acquire local lock + remove entry from L1 + release local lock 3. For write commands: invoke on primary owner, and the primary owner sends invalidation back to originator; alternatively, skip the invalidation command to the originator and do invoke on primary owner + acquire local lock + remove entry from L1 + release local lock All the lock acquisitions would use a regular timeout, not a try lock approach. 2. By default L1 invalidations are sent as multicasts, so I'm not sure ISPN-3273 really matters here. BTW, I wonder if we have a check to only send L1 invalidations from one node if the threshold is 0... I agree that is the default, but we should support the operation, although it doesn't matter for this discussion. Also I am curious as to why multicast for L1 isn't set to say 2 by default? It seems wasteful to send a multicast to all members that they process when only 1 would do anything about it. Do you know why this is like that? I suppose it's because we don't have good perf numbers for different L1 invalidation threshold numbers... The problem is, we don't have a way to count all the requestors of a key in the cluster, so it's reasonably likely that with a threshold of 2 you'd get 1 unicast invalidation from one owner + 1 multicast invalidation from the other owner, making it less efficient than a single multicast invalidation. 3a. Right, for put commands we can't hold the local lock while executing the remote put, or we'll have a deadlock. But I think a shorter lock, held only after the remote put completed (or after the lock on the primary owner was acquired, with txs) should work. Same point under 1 I don't see how we could get a deadlock if we don't hold the local lock during the remote write invocation. 3b. We'd also have an ownership check before, so we'd only serialize the get commands that need to go remotely for the same key. I think it would be almost the same as your solution (although it does have one ? disadvantage - if the key doesn't exist in the cache, all the get commands will go remotely). The number of L1 writes should be very small compared to the number of L1 reads anyway, otherwise it would be more efficient to get the key from the owner every time. You are saying an optimization for owner nodes so they don't do the corralling for keys they own? I like that. Also I don't think it has the disadvantage, it only does remotes it if isn't an owner. I meant your corralling strategy means if you have 2 concurrent get commands and one of them retrieves a null from the entry owners, the other command will return null directly. With regular locking, the other command wouldn't find anything in L1 and it would do another remote get. I don't think there's any disadvantage in skipping the corralling for key owners, in fact I think we need to skip it if the key already exists in L1, too. It would be nice to agree on what guarantees we want to provide for L1 invalidation in non-tx caches, I'm not sure if we can do anything to prevent this scenario: Actually this scenario doesn't occur with non-tx since writes don't update the L1 with their value, they just invalidate. Tx caches are fine with this because they acquire the primary owner lock for the duration of the write including the L1 update so you can't have this ordering. Sounds good. A initiates a put(k, v1) to the primary owner B B performs the put(k, v1), invalidates every non-owner and returns B performs another put(k, v2), invalidating every non-owner A receives the result from B and puts k=v1 in its L1 @pruivo: The invalidation does not need to wait for the remote get. When you receive an invalidation, you can mark the current remote get invalid. The invalidation command can return immediately and the remote get
Re: [infinispan-dev] L1 Consistency with Sync Caches
On Fri, Jun 28, 2013 at 12:17 AM, William Burns mudokon...@gmail.comwrote: Trying to leave my points that would most likely have responses to second email so we can try to get back to a single thread :) No such luck :) Sorry for sending 2 replies in the first place, but it seemed more natural - I meant to comment on your proposal in one email and to describe my alternative proposal in the second email. On Thu, Jun 27, 2013 at 4:12 PM, Dan Berindei dan.berin...@gmail.com wrote: On Thu, Jun 27, 2013 at 4:18 PM, William Burns mudokon...@gmail.com wrote: First off I apologize for the length. There have been a few Jiras recently that have identified L1 consistency issues with both TX and non TX sync caches. Async caches with L1 have their own issues as well, but I only wanted to talk about sync caches. https://issues.jboss.org/browse/ISPN-3197 https://issues.jboss.org/browse/ISPN-2965 https://issues.jboss.org/browse/ISPN-2990 I have proposed a solution in https://github.com/infinispan/infinispan/pull/1922 which should start L1 consistency down the right track. There are quite a few comments on it if you want to look into it more, but because of that I am moving this to the dev mailing list. The key changes in the PR are the following (non-tx): 1. Concurrent reads for a key that can retrieve a remote value are corralled into a single thread of execution for that given key. This would reduce network traffic with concurrent gets for the same key. Note the corralling only happens on a per key basis. Get commands on owners should not be serialized. Get commands on non-owners should not be serialized either, if the key already exists in L1. So I'd say L1ReadSynchronizer should be L1WriteSynchronizer instead :) You are suggesting to check the context to see if the key is present before attempting the synchronizer right? Reading your second email that seems the case :) Nope, I meant we should check the data container (aka the L1 cache). But obviously we have to check the invocation context first in a tx cache, if the tx read the key before it should see the same value. 2. The single thread that is doing the remote get would update the L1 if able (without locking) and make available the value to all the requests waiting on the get. Well, L1ReadSynchronizer does prevent other threads from modifying the same key, so we are locking the key - just not using LockManager. It would also require StateTransferLock.acquireSharedTopologyLock() to make sure it doesn't write an L1 entry after the node became a proper owner. Agree, when I was saying locking I was meaning through the use of the lock manager. 3. Invalidations that are received would first check to see if there is a current remote get occurring for it's keys. If there is it will attempt to cancel the L1 write(s) before it occurs. If it cannot cancel the L1 write, then it must also wait on the current remote get completion and subsequently run the invalidation. Note the cancellation would fail when the remote get was done and it is in the middle of updating the L1, so this would be very small window. I think it would be clearer to describe this as the L1 invalidation cancelling the remote get, not the L1 update, because the actual L1 update can't be cancelled. When I say L1 update I meant the write to the data container after the remote get. The invalidation can't stop the remote get, all it does is tell the caller that Hey don't write the remote value you retrieved into the L1. Oh right, the get command will still use the value it got from the remote node, it just won't write it. That makes me wonder, though, if something like this can happen: 1. A invokes get(k), starts a L1ReadSynchronizer and a remote get to B 2. B invokes put(k, v) and sends an invalidation command to A 3. The invalidation command cancels the L1 put on A 4. A invokes get(k) again, finds the L1ReadSynchronizer from step 1) and queues on it 5. Both get(k) commands return the same value, even though the value has changed on the owner(s). We also have to remove the logic in AbstractLockingInterceptor that skips L1 invalidation for a key if it can't acquire a lock with a 0 timeout. 4. Local writes will also do the same thing as the invalidation with cancelling or waiting. Note that non tx local writes only do L1 invalidations and don't write the value to the data container. Reasons why I found at https://issues.jboss.org/browse/ISPN-3214 I didn't know about ISPN-3214 or that non-tx writes don't write to L1, but it sounds fair. Yeah I really wanted that to work, but without some additional checks such as versioned data, I don't see a way to do this without locking at the primary node like tx caches. In theory, the primary owner could send a synchronous RPC back to the originator while it is holding
Re: [infinispan-dev] L1 Consistency with Sync Caches
On Fri, Jun 28, 2013 at 12:51 AM, William Burns mudokon...@gmail.comwrote: On Thu, Jun 27, 2013 at 4:40 PM, Dan Berindei dan.berin...@gmail.com wrote: On Thu, Jun 27, 2013 at 4:40 PM, William Burns mudokon...@gmail.com wrote: Comments that were outstanding on PR: @danberindei: +1 to move the discussion to the mailing list, could you summarize your changes (preferably for both non-tx and tx cases) and send an email to the list? And now to add some more to this already unwieldy discussion :) 1. Many interceptors check ownership, I don't think that would be a problem. Besides, I think creating a new L1ReadSynchronizer for every read is just as bad for performance as locking the key on every read, so you'd need that check either way. We can't use a try lock approach for L1 invalidation when gets are locked without possibly dropping a L1 update We can't use a timed lock approach for L1 invalidation when writes lock the key as we could get into a timed deadlock situation when another node concurrently writes to a value/stripe I still don't see how we can get away with locking on a get. What are you proposing? I'm proposing something like this: 1. For get commands: acquire local lock + remote get + store in L1 + release local lock 2. For invalidate commands: acquire local lock + remove entry from L1 + release local lock 3. For write commands: invoke on primary owner, and the primary owner sends invalidation back to originator; alternatively, skip the invalidation command to the originator and do invoke on primary owner + acquire local lock + remove entry from L1 + release local lock I do like this and think it would be simpler for non-tx. However this will still not be as performant I wouldn't think (just a gut feeling). It would indeed be less performant in some cases, e.g. if we have two get commands for a remote key at the same time and the key doesn't exist, my proposal would request the key twice. The writes to L1 *should* be rare enough that the extra lock contention there would be minimal. There is a chance that an invalidation will be stuck waiting for a remote get, so it will again be slower than your approach. However I am more than certain it will be slower when collisions occur, which is more likely with lock striping. I'm not that worried about lock striping - the user can always increase the number of stripes to get less contention. And I hope they know better than to use lock striping and transactions with multiple keys, or they'll get deadlocks with or without L1 enabled. I don't think this however is feasible for tx caches. Optimistic - we don't want to have gets being blocked when the tx is being committed - that could be quite a bit of a performance hit especially if it has to do a 2 phase commit. Updating the L1 on a remoteGetBeforeWrites could help remedy this (since get wouldn't need to lock), however then you would be almost implementing what I have and still have locking overhead. Maybe this would occur so infrequently it wouldn't matter though? Yes, I would propose removing the L1 commit in EntryWrappingInterceptor as well, and moving it to the L1 interceptor - immediately after the value was retrieved from the remote owner. The local lock would also be released immediately. The part I wanted to avoid from your proposal was L1ReadSync/L1ReadSynchronizer, because I found it kind of hard to understand. I guess I'd find it more approachable all the synchronization (the map of L1ReadSynchronizers and the interaction with it) would be in a separate class, with its own tests, and the L1 interceptor would only know about the high-level stuff. Like the LockManager also hides a map of locks; not like the LockManager unit tests, because they're kind of lacking, but LockManager has the advantage that almost all the tests touch it indirectly. Pessimistic - I haven't looked into pessimistic closely yet, but in that case I don't think it ever acquires the local locks so it would have to acquire a remote lock on a get, which would be pretty disastrous (gets would indirectly have a weak FORCE_WRITE_LOCK enabled). Making this to acquire local locks might work as well, but then would still have the same issue as optimistic when committing - I really am not as familiar with Pessimistic to say for sure. Nope, acquiring a local lock should be enough. We don't want to prevent other txs from modifying the key at all, we just want to delay the invalidation commands triggered by those modifications on the local node. And the lock would only be held for the duration of the remote get + L1 update, not all the way to the commit. All the lock acquisitions would use a regular timeout, not a try lock approach. In this case they could, yes. 2. By default L1 invalidations are sent as multicasts, so I'm not sure ISPN-3273 really matters here. BTW, I wonder if we have
Re: [infinispan-dev] L1 Consistency with Sync Caches
On Fri, Jun 28, 2013 at 3:41 PM, William Burns mudokon...@gmail.com wrote: On Fri, Jun 28, 2013 at 5:53 AM, Dan Berindei dan.berin...@gmail.com wrote: On Fri, Jun 28, 2013 at 12:51 AM, William Burns mudokon...@gmail.com wrote: On Thu, Jun 27, 2013 at 4:40 PM, Dan Berindei dan.berin...@gmail.com wrote: On Thu, Jun 27, 2013 at 4:40 PM, William Burns mudokon...@gmail.com wrote: Comments that were outstanding on PR: @danberindei: +1 to move the discussion to the mailing list, could you summarize your changes (preferably for both non-tx and tx cases) and send an email to the list? And now to add some more to this already unwieldy discussion :) 1. Many interceptors check ownership, I don't think that would be a problem. Besides, I think creating a new L1ReadSynchronizer for every read is just as bad for performance as locking the key on every read, so you'd need that check either way. We can't use a try lock approach for L1 invalidation when gets are locked without possibly dropping a L1 update We can't use a timed lock approach for L1 invalidation when writes lock the key as we could get into a timed deadlock situation when another node concurrently writes to a value/stripe I still don't see how we can get away with locking on a get. What are you proposing? I'm proposing something like this: 1. For get commands: acquire local lock + remote get + store in L1 + release local lock 2. For invalidate commands: acquire local lock + remove entry from L1 + release local lock 3. For write commands: invoke on primary owner, and the primary owner sends invalidation back to originator; alternatively, skip the invalidation command to the originator and do invoke on primary owner + acquire local lock + remove entry from L1 + release local lock I do like this and think it would be simpler for non-tx. However this will still not be as performant I wouldn't think (just a gut feeling). It would indeed be less performant in some cases, e.g. if we have two get commands for a remote key at the same time and the key doesn't exist, my proposal would request the key twice. The writes to L1 *should* be rare enough that the extra lock contention there would be minimal. There is a chance that an invalidation will be stuck waiting for a remote get, so it will again be slower than your approach. However I am more than certain it will be slower when collisions occur, which is more likely with lock striping. I'm not that worried about lock striping - the user can always increase the number of stripes to get less contention. And I hope they know better than to use lock striping and transactions with multiple keys, or they'll get deadlocks with or without L1 enabled. I didn't think of the deadlock problem with tx and lock striping. I am wondering should we log a warning message so that user's get an explicit warning? Yeah, I think a warning would be nice - especially in the documentation, it looks way too innocuous. I don't think this however is feasible for tx caches. Optimistic - we don't want to have gets being blocked when the tx is being committed - that could be quite a bit of a performance hit especially if it has to do a 2 phase commit. Updating the L1 on a remoteGetBeforeWrites could help remedy this (since get wouldn't need to lock), however then you would be almost implementing what I have and still have locking overhead. Maybe this would occur so infrequently it wouldn't matter though? Yes, I would propose removing the L1 commit in EntryWrappingInterceptor as well, and moving it to the L1 interceptor - immediately after the value was retrieved from the remote owner. The local lock would also be released immediately. The part I wanted to avoid from your proposal was L1ReadSync/L1ReadSynchronizer, because I found it kind of hard to understand. I guess I'd find it more approachable all the synchronization (the map of L1ReadSynchronizers and the interaction with it) would be in a separate class, with its own tests, and the L1 interceptor would only know about the high-level stuff. Like the LockManager also hides a map of locks; not like the LockManager unit tests, because they're kind of lacking, but LockManager has the advantage that almost all the tests touch it indirectly. True, I agree I like it as a separate class better - much better separation. Maybe I can move that over and add tests and see if that makes it more clear? If not we can still always use the locking approach. +1 Pessimistic - I haven't looked into pessimistic closely yet, but in that case I don't think it ever acquires the local locks so it would have to acquire a remote lock on a get, which would be pretty disastrous (gets would indirectly have a weak FORCE_WRITE_LOCK enabled
Re: [infinispan-dev] L1 Consistency with Sync Caches
On Fri, Jun 28, 2013 at 4:39 PM, William Burns mudokon...@gmail.com wrote: On Fri, Jun 28, 2013 at 5:14 AM, Dan Berindei dan.berin...@gmail.com wrote: On Fri, Jun 28, 2013 at 12:17 AM, William Burns mudokon...@gmail.com wrote: Trying to leave my points that would most likely have responses to second email so we can try to get back to a single thread :) No such luck :) Sorry for sending 2 replies in the first place, but it seemed more natural - I meant to comment on your proposal in one email and to describe my alternative proposal in the second email. On Thu, Jun 27, 2013 at 4:12 PM, Dan Berindei dan.berin...@gmail.com wrote: On Thu, Jun 27, 2013 at 4:18 PM, William Burns mudokon...@gmail.com wrote: First off I apologize for the length. There have been a few Jiras recently that have identified L1 consistency issues with both TX and non TX sync caches. Async caches with L1 have their own issues as well, but I only wanted to talk about sync caches. https://issues.jboss.org/browse/ISPN-3197 https://issues.jboss.org/browse/ISPN-2965 https://issues.jboss.org/browse/ISPN-2990 I have proposed a solution in https://github.com/infinispan/infinispan/pull/1922 which should start L1 consistency down the right track. There are quite a few comments on it if you want to look into it more, but because of that I am moving this to the dev mailing list. The key changes in the PR are the following (non-tx): 1. Concurrent reads for a key that can retrieve a remote value are corralled into a single thread of execution for that given key. This would reduce network traffic with concurrent gets for the same key. Note the corralling only happens on a per key basis. Get commands on owners should not be serialized. Get commands on non-owners should not be serialized either, if the key already exists in L1. So I'd say L1ReadSynchronizer should be L1WriteSynchronizer instead :) You are suggesting to check the context to see if the key is present before attempting the synchronizer right? Reading your second email that seems the case :) Nope, I meant we should check the data container (aka the L1 cache). But obviously we have to check the invocation context first in a tx cache, if the tx read the key before it should see the same value. I was thinking because the non-tx wraps the value if it is in the data container before hand. But actually for non-tx it seems I should check the data container only and tx I should only check the ctx (to guarantee read consistency) For tx you probably have to check the context first, and the data container only if you don't find the entry in the context (non-existent entries that were previously read in the tx should have a NullMarkerEntry in the context). 2. The single thread that is doing the remote get would update the L1 if able (without locking) and make available the value to all the requests waiting on the get. Well, L1ReadSynchronizer does prevent other threads from modifying the same key, so we are locking the key - just not using LockManager. It would also require StateTransferLock.acquireSharedTopologyLock() to make sure it doesn't write an L1 entry after the node became a proper owner. Agree, when I was saying locking I was meaning through the use of the lock manager. 3. Invalidations that are received would first check to see if there is a current remote get occurring for it's keys. If there is it will attempt to cancel the L1 write(s) before it occurs. If it cannot cancel the L1 write, then it must also wait on the current remote get completion and subsequently run the invalidation. Note the cancellation would fail when the remote get was done and it is in the middle of updating the L1, so this would be very small window. I think it would be clearer to describe this as the L1 invalidation cancelling the remote get, not the L1 update, because the actual L1 update can't be cancelled. When I say L1 update I meant the write to the data container after the remote get. The invalidation can't stop the remote get, all it does is tell the caller that Hey don't write the remote value you retrieved into the L1. Oh right, the get command will still use the value it got from the remote node, it just won't write it. That makes me wonder, though, if something like this can happen: 1. A invokes get(k), starts a L1ReadSynchronizer and a remote get to B 2. B invokes put(k, v) and sends an invalidation command to A 3. The invalidation command cancels the L1 put on A 4. A invokes get(k) again, finds the L1ReadSynchronizer from step 1) and queues on it 5. Both get(k) commands return the same value, even though the value has changed on the owner(s). Yeah
Re: [infinispan-dev] L1 consistency for transactional caches.
On Tue, Jul 2, 2013 at 5:59 PM, Pedro Ruivo pe...@infinispan.org wrote: Hi all, simple question: What are the consistency guaranties that is supposed to be ensured? I have the following scenario (happened in a test case): NonOwner: remote get key BackupOwner: receives the remote get and replies (with the correct value) BackupOwner: put in L1 the value I assume you meant NonOwner here? PrimaryOwner: [at the same time] is committing a transaction that will update the key. PrimaryOwer: receives the remote get after sending the commit. The invalidation for L1 is not sent to NonOwner. At some point, BackupOwner has to execute the commit as well, and it should send an InvalidateL1Command(key) to NonOwner. However, one of the bugs that Will is working on could prevent that invalidation from working (maybe https://issues.jboss.org/browse/ISPN-2965). The test finishes and I perform a check for the key value in all the caches. The NonOwner returns the L1 cached value (==test fail). IMO, this is bug (or not) depending what guaranties we provide. wdyt? It's a bug! IMO, at least in DIST_SYNC mode with sync commit, we should guarantee that stale entries are removed from non-owners before the TM.commit() call returns on the originator. With other configurations we probably can't guarantee that, instead we should guarantee that stale entries are removed from non-owners soon after the TM.commit() call returns on the originator. I don't think it's ok to say that a stale entry can stay in L1 indefinitely in any configuration - otherwise why have L1 invalidation at all? ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev
Re: [infinispan-dev] L1 consistency for transactional caches.
On Tue, Jul 2, 2013 at 6:36 PM, Pedro Ruivo pe...@infinispan.org wrote: On 07/02/2013 04:21 PM, Sanne Grinovero wrote: +1 for considering it a BUG Didn't we decide a year ago that GET operations should be sent to a single node only (the primary) ? +1 :) Manik had a patch for staggering remote GET calls, but it was slowing down reads by 25%: http://markmail.org/message/vsx46qbfzzxkkl4w On 2 July 2013 15:59, Pedro Ruivo pe...@infinispan.org wrote: Hi all, simple question: What are the consistency guaranties that is supposed to be ensured? I have the following scenario (happened in a test case): NonOwner: remote get key BackupOwner: receives the remote get and replies (with the correct value) BackupOwner: put in L1 the value PrimaryOwner: [at the same time] is committing a transaction that will update the key. PrimaryOwer: receives the remote get after sending the commit. The invalidation for L1 is not sent to NonOwner. The test finishes and I perform a check for the key value in all the caches. The NonOwner returns the L1 cached value (==test fail). IMO, this is bug (or not) depending what guaranties we provide. wdyt? Pedro ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev
Re: [infinispan-dev] L1 consistency for transactional caches.
On Tue, Jul 2, 2013 at 6:35 PM, Pedro Ruivo pe...@infinispan.org wrote: On 07/02/2013 04:29 PM, Dan Berindei wrote: On Tue, Jul 2, 2013 at 5:59 PM, Pedro Ruivo pe...@infinispan.org mailto:pe...@infinispan.org wrote: Hi all, simple question: What are the consistency guaranties that is supposed to be ensured? I have the following scenario (happened in a test case): NonOwner: remote get key BackupOwner: receives the remote get and replies (with the correct value) BackupOwner: put in L1 the value I assume you meant NonOwner here? yes PrimaryOwner: [at the same time] is committing a transaction that will update the key. PrimaryOwer: receives the remote get after sending the commit. The invalidation for L1 is not sent to NonOwner. At some point, BackupOwner has to execute the commit as well, and it should send an InvalidateL1Command(key) to NonOwner. However, one of the bugs that Will is working on could prevent that invalidation from working (maybe https://issues.jboss.org/browse/ISPN-2965). only the primary owner is sending the invalidation command. Oops, you're right! And I'm pretty sure I made the same assumptions in my replies to Will's L1 thread... I guess we could make it work either sending the invalidations from all the owners (slowing down writes, because most of the time we'd send the same commands twice), or by sending remote get commands ONLY to the primary owner (which will slow down remote reads). Staggering remote GET commands won't work, because with staggering you still have the possibility of the first request reaching the primary owner only after the second request returned and the entry was written to L1. Sending the invalidations from the tx originator after the commit might work as well, but only if L1.invalidationTreshold == 0 and all the L1 invalidations are sent as multicasts. The test finishes and I perform a check for the key value in all the caches. The NonOwner returns the L1 cached value (==test fail). IMO, this is bug (or not) depending what guaranties we provide. wdyt? It's a bug! IMO, at least in DIST_SYNC mode with sync commit, we should guarantee that stale entries are removed from non-owners before the TM.commit() call returns on the originator. With other configurations we probably can't guarantee that, instead we should guarantee that stale entries are removed from non-owners soon after the TM.commit() call returns on the originator. I don't think it's ok to say that a stale entry can stay in L1 indefinitely in any configuration - otherwise why have L1 invalidation at all? ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev
Re: [infinispan-dev] L1 consistency for transactional caches.
It's not wrong, sending the invalidation only from the primary owner is wrong :) On Tue, Jul 2, 2013 at 7:14 PM, Sanne Grinovero sa...@infinispan.orgwrote: I see, so we keep the wrong implementation because it's faster? :D On 2 July 2013 16:38, Dan Berindei dan.berin...@gmail.com wrote: On Tue, Jul 2, 2013 at 6:36 PM, Pedro Ruivo pe...@infinispan.org wrote: On 07/02/2013 04:21 PM, Sanne Grinovero wrote: +1 for considering it a BUG Didn't we decide a year ago that GET operations should be sent to a single node only (the primary) ? +1 :) Manik had a patch for staggering remote GET calls, but it was slowing down reads by 25%: http://markmail.org/message/vsx46qbfzzxkkl4w On 2 July 2013 15:59, Pedro Ruivo pe...@infinispan.org wrote: Hi all, simple question: What are the consistency guaranties that is supposed to be ensured? I have the following scenario (happened in a test case): NonOwner: remote get key BackupOwner: receives the remote get and replies (with the correct value) BackupOwner: put in L1 the value PrimaryOwner: [at the same time] is committing a transaction that will update the key. PrimaryOwer: receives the remote get after sending the commit. The invalidation for L1 is not sent to NonOwner. The test finishes and I perform a check for the key value in all the caches. The NonOwner returns the L1 cached value (==test fail). IMO, this is bug (or not) depending what guaranties we provide. wdyt? Pedro ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev
Re: [infinispan-dev] L1 consistency for transactional caches.
On Tue, Jul 2, 2013 at 8:41 PM, Sanne Grinovero sa...@infinispan.orgwrote: On 2 July 2013 17:24, Dan Berindei dan.berin...@gmail.com wrote: It's not wrong, sending the invalidation only from the primary owner is wrong :) Agreed, sending a GET operation to multiple nodes might not be wrong per-se but is the root cause of such race conditions, and other subtle complexities we might not even be aware of yet. I don't know why it was slower, but since the result doesn't make sense we should look at it a second time rather than throwing the code away. It does make sense: statistically, the backup owner will sometimes reply faster than the primary owner. http://markmail.org/message/qmpn7yueym4tbnve http://www.bailis.org/blog/doing-redundant-work-to-speed-up-distributed-queries/ Sending invalidations from a non-primary owner is an interesting approach, but then we're having each owner to maintain an independent list of nodes who have read the value. For each write, the primary node would send an invalidation to each registered node, plus the copy to the secondary nodes, which in turn sends more L1 invalidation nodes to each of their registered nodes.. what's the likelihood of duplication of invalidation messages here? Sounds like a big network traffic amplifier, lots of network traffic triggered for each write. The likelihood of duplication is very near to 100%, indeed, and in non-tx caches it would add another RPC to the critical path. As always, it's a compromise: if we do something to speed up writes, it will slow down reads. Perhaps we could send the request to the primary owners only when L1 is enabled, as the number of remote gets should be smaller, and send the request to all the owners when L1 is disabled, and the number of remote gets is higher. Pedro's suggestion to send the request to all the owners, but only write the value to L1 if the first reply was from the primary owner, sounds like it should work just as well. It would make L1 slightly less efficient, but it wouldn't have latency spikes caused by a delay on the primary owner. It also implies that we don't have reliability on the list of registered nodes, as each owner will be maintaining a different set. In this case we should also have each node invalidate its L1 stored entries when the node from which they got these entries has left the cluster. Right now we invalidate from L1 all the keys for which the list of owners changed, whether they're still alive or not, because we don't keep track of the node we got each entry from. If we only sent remote get commands to the primary owner, we'd have to invalidate from L1 all the keys for which the primary owner changed. One thing that we don't do at the moment, but we should do whether we send the invalidations from the primary owner or from all the owners, is to clean up the requestor lists for the keys that a node no longer owns. Having it all dealt by the primary owner makes for a much simpler design and also makes it more likely that a single L1 invalidate message is sent via multicast, or at least with less duplication. The simplest design would be to never keep track of requestors and always send a multicast from the originator. In fact, the default configuration is to always send multicasts (but we still keep track of requestors and we send the invalidation from the primary owner). Intuitively, unicasts would be preferable for keys that have a low read:write ratio, as in a write-intensive scenario, but I wonder if disabling L1 wouldn't be even better for that scenario. Cheers Dan Cheers, Sanne On Tue, Jul 2, 2013 at 7:14 PM, Sanne Grinovero sa...@infinispan.org wrote: I see, so we keep the wrong implementation because it's faster? :D On 2 July 2013 16:38, Dan Berindei dan.berin...@gmail.com wrote: On Tue, Jul 2, 2013 at 6:36 PM, Pedro Ruivo pe...@infinispan.org wrote: On 07/02/2013 04:21 PM, Sanne Grinovero wrote: +1 for considering it a BUG Didn't we decide a year ago that GET operations should be sent to a single node only (the primary) ? +1 :) Manik had a patch for staggering remote GET calls, but it was slowing down reads by 25%: http://markmail.org/message/vsx46qbfzzxkkl4w On 2 July 2013 15:59, Pedro Ruivo pe...@infinispan.org wrote: Hi all, simple question: What are the consistency guaranties that is supposed to be ensured? I have the following scenario (happened in a test case): NonOwner: remote get key BackupOwner: receives the remote get and replies (with the correct value) BackupOwner: put in L1 the value PrimaryOwner: [at the same time] is committing a transaction that will update the key. PrimaryOwer: receives the remote get after sending the commit. The invalidation for L1 is not sent to NonOwner. The test finishes and I perform a check
Re: [infinispan-dev] L1 consistency for transactional caches.
On Tue, Jul 2, 2013 at 9:51 PM, Pedro Ruivo pe...@infinispan.org wrote: On 07/02/2013 04:55 PM, Dan Berindei wrote: On Tue, Jul 2, 2013 at 6:35 PM, Pedro Ruivo pe...@infinispan.org mailto:pe...@infinispan.org wrote: On 07/02/2013 04:29 PM, Dan Berindei wrote: On Tue, Jul 2, 2013 at 5:59 PM, Pedro Ruivo pe...@infinispan.org mailto:pe...@infinispan.org mailto:pe...@infinispan.org mailto:pe...@infinispan.org wrote: Hi all, simple question: What are the consistency guaranties that is supposed to be ensured? I have the following scenario (happened in a test case): NonOwner: remote get key BackupOwner: receives the remote get and replies (with the correct value) BackupOwner: put in L1 the value I assume you meant NonOwner here? yes PrimaryOwner: [at the same time] is committing a transaction that will update the key. PrimaryOwer: receives the remote get after sending the commit. The invalidation for L1 is not sent to NonOwner. At some point, BackupOwner has to execute the commit as well, and it should send an InvalidateL1Command(key) to NonOwner. However, one of the bugs that Will is working on could prevent that invalidation from working (maybe https://issues.jboss.org/browse/ISPN-2965). only the primary owner is sending the invalidation command. Oops, you're right! And I'm pretty sure I made the same assumptions in my replies to Will's L1 thread... I guess we could make it work either sending the invalidations from all the owners (slowing down writes, because most of the time we'd send the same commands twice), or by sending remote get commands ONLY to the primary owner (which will slow down remote reads). Staggering remote GET commands won't work, because with staggering you still have the possibility of the first request reaching the primary owner only after the second request returned and the entry was written to L1. This may be a stupid idea, but we only store the entry in L1 if it is a reply from the primary owner? Of course this will reduce the L1 hit ratio... :( I like that... writing the entry to L1 only if the primary owner replied first would also allow for staggering get requests. Sending the invalidations from the tx originator after the commit might work as well, but only if L1.invalidationTreshold == 0 and all the L1 invalidations are sent as multicasts. The test finishes and I perform a check for the key value in all the caches. The NonOwner returns the L1 cached value (==test fail). IMO, this is bug (or not) depending what guaranties we provide. wdyt? It's a bug! IMO, at least in DIST_SYNC mode with sync commit, we should guarantee that stale entries are removed from non-owners before the TM.commit() call returns on the originator. With other configurations we probably can't guarantee that, instead we should guarantee that stale entries are removed from non-owners soon after the TM.commit() call returns on the originator. I don't think it's ok to say that a stale entry can stay in L1 indefinitely in any configuration - otherwise why have L1 invalidation at all? ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org mailto:infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org mailto: infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev
Re: [infinispan-dev] L1 consistency for transactional caches.
On Mon, Jul 8, 2013 at 12:19 AM, Sanne Grinovero sa...@infinispan.orgwrote: On 3 July 2013 10:26, Dan Berindei dan.berin...@gmail.com wrote: On Tue, Jul 2, 2013 at 8:41 PM, Sanne Grinovero sa...@infinispan.org wrote: On 2 July 2013 17:24, Dan Berindei dan.berin...@gmail.com wrote: It's not wrong, sending the invalidation only from the primary owner is wrong :) Agreed, sending a GET operation to multiple nodes might not be wrong per-se but is the root cause of such race conditions, and other subtle complexities we might not even be aware of yet. I don't know why it was slower, but since the result doesn't make sense we should look at it a second time rather than throwing the code away. It does make sense: statistically, the backup owner will sometimes reply faster than the primary owner. http://markmail.org/message/qmpn7yueym4tbnve http://www.bailis.org/blog/doing-redundant-work-to-speed-up-distributed-queries/ Of course, I remember the discussion but you can put may questions marks on this decision. First off, this is doubling the load on network, which is supposedly our most precious resource, so I highly question how we're measuring the benefit of the second request. If you read the articles you linked to you'll see google applies such a strategy to improve the tail latency, but only sends the second request when the first one is not getting a fast answer, and in their tests this seems to pay off to a mere 5% of increased network usage.. I would say that's a significantly different level of trade off. Also, opposed to Google's BigTable and Apache Cassandra which use these techniques, Infinispan is not supporting an eventually consistent model which makes it far more dangerous to read the slightly out of date value from the non-owner... sure we can resolve those things, but it gets hairy. In this case specifically, reading from the primary owner only seems a much cleaner solution, as IMHO it's a move towards simplification rather than adding yet another special case in the codebase. It's cleaner, but it slows down reads, as we've seen in our own tests. Also, the consistency problems with staggered remote gets are the same as the consistency problems with simultaneous remote gets. So it would be much harder to go back and experiment with staggered gets once we simplify the code on the assumption that we should only ever read from the primary owner. Sending invalidations from a non-primary owner is an interesting approach, but then we're having each owner to maintain an independent list of nodes who have read the value. For each write, the primary node would send an invalidation to each registered node, plus the copy to the secondary nodes, which in turn sends more L1 invalidation nodes to each of their registered nodes.. what's the likelihood of duplication of invalidation messages here? Sounds like a big network traffic amplifier, lots of network traffic triggered for each write. The likelihood of duplication is very near to 100%, indeed, and in non-tx caches it would add another RPC to the critical path. As always, it's a compromise: if we do something to speed up writes, it will slow down reads. Perhaps we could send the request to the primary owners only when L1 is enabled, as the number of remote gets should be smaller, and send the request to all the owners when L1 is disabled, and the number of remote gets is higher. Pedro's suggestion to send the request to all the owners, but only write the value to L1 if the first reply was from the primary owner, sounds like it should work just as well. It would make L1 slightly less efficient, but it wouldn't have latency spikes caused by a delay on the primary owner. That's far from an ideal solution; we don't have a clue on how to measure what slightly less efficient means: that might reveal to be unbearably worse for some usage pattern. True, in a worst case scenario, the primary owner could be replying consistently slower than the others, and the entry may never be stored in L1. And it wouldn't make sense to try this with numOwners = 10, the chances of the primary owner replying first would be slim. I think my slightly less efficient description would be more accurate if we had staggered gets, if we ever get that patch in... While we have no clue how worse it can be, it will definitely always provide a worse cache hit/miss ratio, so it's easily proven that it's going to be sub optimal in all cases. If numOwners = 1 there is only one way to read the value, so it's clearly optimal in one case :) If you really go for something like that, at least take the value from the primary owners when it arrives (second, third, .. whatever but at some point you might get it) and then store it in L1: will cost you a second unmarshalling operation but that's far better than causing (several?) cache misses
Re: [infinispan-dev] Isolation level in Repeatable Read + remote get
Pedro, I'll integrate the PR as it is, and then you can experiment with my proposal in another branch/PR. On Tue, Jul 9, 2013 at 1:48 PM, Pedro Ruivo pe...@infinispan.org wrote: On 07/09/2013 11:46 AM, Galder Zamarreño wrote: On Jul 8, 2013, at 11:02 AM, Pedro Ruivo pe...@infinispan.org wrote: Hi guys, re: ISPN-2840, ISPN-3235, ISPN-3236 short: transaction isolation in repeatable read Dan came up with an idea (a good idea IMO) to change a little the logic how entries are put in the context for transactional caches. One of the Repeatable Read properties is after a key is accessed, the transaction should never see other concurrent transaction values, even if they commit first. In result, we can optimize the distributed mode by only do a remote get on the first time a transaction access a key. My idea (and the one implemented in the Pull Request) ^ Which pull req, link? You mean [1]? yes [1] https://github.com/infinispan/infinispan/pull/1937 was adding a flag to mark the entry as accessed. All future access to that key will not try to fetch the data from container neither from a remote location (we have a small but with last one). The Dan's idea is more simple but require some change in the EntryFactory logic. ^ Which is Dan's idea exactly? is the description below: At this stage, we only put an entry in the transaction context if the following conditions are respected: * the entry exists in data container (i.e. the node is owner or it is in L1) * we put a RepeatableReadEntry with null value in the transaction context if ** the entry does not exist in the container but the node is owner ** the entry does not exist in the data container but the command has flags to skip the remote fetch (like IGNORE_RETURN_VALUE or SKIP_REMOTE_LOOKUP). Of course the conditional commands needs special attention here. Note: as usual, if the entry exists in the context, then nothing is done. At the TxDistributionInterceptor level, the check to see if the remote get should be done is simple as check if lookupEntries(k)==null. Dan, if I said something wrong let me know. If I was not clear at some point let me know too. Anyone see any issue with this approach? Any other suggestions? Cheers, Pedro Ruivo ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev -- Galder Zamarreño gal...@redhat.com twitter.com/galderz Project Lead, Escalante http://escalante.io Engineer, Infinispan http://infinispan.org ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev
Re: [infinispan-dev] MongoDB cachestore 5.3.0.Final now in jboss maven repo
Do we need to change something in pom.xml as well so that it's included automatically in the next release? On Mon, Jul 8, 2013 at 10:30 PM, Mircea Markus mmar...@redhat.com wrote: Thanks for updating the release doc :-) On 5 Jul 2013, at 10:02, Tristan Tarrant ttarr...@redhat.com wrote: Dearl all, I have released to maven the missing MongoDB cachestore for 5.3.0.Final. I have not re-released the zip distributions to avoid confusion, but let's make sure it is included in future releases. Tristan ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev Cheers, -- Mircea Markus Infinispan lead (www.infinispan.org) ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev
Re: [infinispan-dev] configuring fetchInMemoryState for topology caches
On Tue, Jul 9, 2013 at 12:48 PM, Mircea Markus mmar...@redhat.com wrote: On 21 May 2013, at 17:09, Dan Berindei dan.berin...@gmail.com wrote: I wouldn't want to deprecate CCL, I think it definitely has a purpose - at least in invalidation mode. The only use case I'm aware of for invalidation is 2nd level cache and I don't think that needs a remote cache, Galder wdyt? I don't know if 2nd level cache actually uses ClusterCacheLoader, but it could definitely use it without exposing the cache through HotRod. Even in replication mode, having a lazy alternative to state transfer may be useful. Maybe not for the topology cache, but it might make sense for large caches. this makes me wonder: what would be the case in which one doesn't want to use state transfer for a distributed/replicated cache. In replication mode, if the cache has a huge (non-shared) cache store, some users may prefer to disable fetchPersistentState and rely on ClusterCacheLoader instead for less frequently accessed keys. I don't know if it ever makes sense to use ClusterCacheLoader in distribution mode, since it always sends the ClusteredGetCommand to the entire cluster. Cheers Dan ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev
[infinispan-dev] Infinispan 6.0.0.Alpha2 is out!
Dear Infinispan community, We're proud to announce the second Alpha release of Infinispan 6.0.0, and also the second release using the Apache Software Licence. New features in this release: - A new query DSL https://issues.jboss.org/browse/ISPN-3169 that is usable both in embedded/library mode and over HotRod. - Support for JCache 0.8 https://issues.jboss.org/browse/ISPN-3234, the Public Review Draft version of the JSR-107 specification. - A slimmer distribution, thanks to moving non-core cache store implementations to separate repositorieshttps://issues.jboss.org/browse/ISPN-3377 . For a complete list of features and fixes included in this release please refer to the release noteshttps://issues.jboss.org/secure/ReleaseNote.jspa?projectId=12310799version=12321854 . Visit our downloads http://www.jboss.org/infinispan/downloads section to find the latest release and if you have any questions please check our forums http://www.jboss.org/infinispan/forums, our mailing listshttps://lists.jboss.org/mailman/listinfo/infinispan-dev or ping us directly on IRC http://www.blogger.com/null. Thanks to everyone for their involvement and contribution! Cheers Dan ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev
Re: [infinispan-dev] loaders/shared
+1 to make it per loader On Thu, Aug 8, 2013 at 6:30 PM, Galder Zamarreño gal...@redhat.com wrote: On Aug 8, 2013, at 11:43 AM, Mircea Markus mmar...@redhat.com wrote: Currently the shared attribute is configured at *loaders* level, so all the defined cache loaders inherit this attribute. It should be configured on a per loader basis. Does anyone se any problem with that? No. I think you're right in that it should be per loader. You could have a shared JDBC store and a non-shared local FS. Cheers, Cheers, -- Mircea Markus Infinispan lead (www.infinispan.org) ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev -- Galder Zamarreño gal...@redhat.com twitter.com/galderz Project Lead, Escalante http://escalante.io Engineer, Infinispan http://infinispan.org ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev
Re: [infinispan-dev] Eventual Consistency in Infinispan
Are you using explicit transactions or only implicit transactions? As of 5.2, non-transactional caches use a scheme where the modification is sent to the primary owner, which then sends the modification to the backup owner(s). The primary owner waits for responses from all the backup owners, and the originator waits for a response from the primary owner. In optimistic caches, one-phase transactions are much simpler: the originator just sends the modification to all the owners, and they all apply it. Execution order is preserved for transactions originating from the same node, but transactions originating from different nodes can be applied by the owners in any order. So implicit transactions with use1PcForAutoCommitTransactions=true can certainly be faster than non-transactional operations, but I wouldn't recommend using them because they aren't atomic. (To be fair, there are also some problems with atomicity of non-tx operations during state transfer that we're just fixing now - e.g. https://issues.jboss.org/browse/ISPN-3357) Cheers Dan On Mon, Aug 19, 2013 at 4:47 PM, Faseela K faseel...@ericsson.com wrote: Hi Tristan, I was just comparing the performance of synchronous transactional and synchronous non-transactional writes in replication mode. And I see, transactional writes were taking lesser time, than non-transactional. Is it expected behaviour? My transactional config has the following properties : syncRollbackPhase=true syncCommitPhase=true cacheStopTimeout=3 use1PcForAutoCommitTransactions=true autoCommit=true lockingMode=OPTIMISTIC useSynchronization=true transactionMode=TRANSACTIONAL Or Am I using some wrong configuration? Thanks, Faseela -Original Message- From: Tristan Tarrant [mailto:ttarr...@redhat.com] Sent: Monday, August 19, 2013 6:28 PM To: infinispan -Dev List Cc: Faseela K Subject: Re: [infinispan-dev] Eventual Consistency in Infinispan Infinispan does not implement eventual consistency yet. Tristan On 08/19/2013 02:39 PM, Faseela K wrote: Hi, I am using Infinispan - 5.3.0 in clustering mode. I have a transactional configuration, in replication-synchronous mode. I want to know, whether eventual consistency is supported for synchronous replication in 5.3. Could someone please brief how eventual consistency works in infinispan context? Thanks, Faseela ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev
[infinispan-dev] ISPN-3051 configuration
Hi guys As you know, I'm working on ISPN-3051, allowing each node to take a higher or lower proportion of the entries in the cache. I've implemented this by adding a float loadFactor setting in each node's configuration, with 1 being the default and any positive value being accepted (including 0). There are two questions I wanted to ask you about the configuration: 1. What do you think about the loadFactor name? I started having doubts about it, since it has a very different meaning in HashMap. I have come up with a couple alternatives, but I don't love any of them: proportionalLoad and proportionalCapacity. 2. Where should we put this setting? I have added it as CacheConfiguration.clustering().hash().loadFactor(), but I can't think of a reason for having different values for each cache, so we might as well put it in the global configuration. Cheers Dan ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev
Re: [infinispan-dev] ISPN-3051 configuration
On Mon, Sep 9, 2013 at 12:34 PM, Tristan Tarrant ttarr...@redhat.comwrote: On 09/09/2013 11:18 AM, Dan Berindei wrote: Hi guys As you know, I'm working on ISPN-3051, allowing each node to take a higher or lower proportion of the entries in the cache. I've implemented this by adding a float loadFactor setting in each node's configuration, with 1 being the default and any positive value being accepted (including 0). There are two questions I wanted to ask you about the configuration: 1. What do you think about the loadFactor name? I started having doubts about it, since it has a very different meaning in HashMap. I have come up with a couple alternatives, but I don't love any of them: proportionalLoad and proportionalCapacity. Since this is per-node, you want to use the node word in there, so nodeCapacity would be good. Can this value change at runtime ? nodeCapacity by itself would sound like we're limiting the actual number of keys held on this node, so I feel it would be misleading. You do have a point about having node in the name though... how about nodeScale? The value can't change at runtime. TBH, I haven't even considered it. 2. Where should we put this setting? I have added it as CacheConfiguration.clustering().hash().loadFactor(), but I can't think of a reason for having different values for each cache, so we might as well put it in the global configuration. Yes, global sounds good. And don't forget server :) I guess the others disagree, so I'll keep it at the cache level. I'll try to remember about server ;) Cheers Dan ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev
Re: [infinispan-dev] ISPN-3557: interactions between a clear() operation and a Transaction
On Wed, Oct 2, 2013 at 12:48 PM, Pedro Ruivo pe...@infinispan.org wrote: On 10/02/2013 12:03 AM, Sanne Grinovero wrote: I'd love to brainstorm about the clear() operation and what it means on Infinispan. I'm not sure to what extent, but it seems that clear() is designed to work in a TX, or even create an implicit transaction if needed, but I'm not understanding how that can work. I think it's pretty straight-forward to define the semantics of operations after clear(): they should assume that every key in the cache was removed. True, like Pedro mentioned, the current implementation allows subsequent operations in the same tx to see keys inserted by other txs. But I don't think fixing that would be harder than ensuring that if a tx inserts key k owned by nodes A and B, clear() from outside the tx either deletes k on A and B or doesn't delete k on either node. Obviously a clear() operation isn't listing all keys explicitly. (offtopic) like entrySet(), keySet(), values() and iterator() when executed in a transactional context. however, I think that this operations are more difficult to handle, because they have to return consistent data... I agree, defining how entrySet() and the others should work in a tx seems harder than defining clear(). Will, I'm guessing your fix for ISPN-3560 still allows entrySet() after clear() to see entries inserted by another tx in the meantime? Which implies that it's undefined on which keys it's going to operate when it's fired.. that seems like terribly wrong in a distributed key/value store as we can't possibly freeze the global state and somehow define a set of keys which are going to be affected, while an explicit enumeration is needed to acquire the needed locks. you may not need to explicit acquire the lock for the affected key. you can use a global lock that is share acquired for write transactions without clear operation and exclusively acquire for transactions with clear operation (exactly what I use for Total Order). also, the clear transactions (i.e. the transactions with clear operation) can also only acquired the global lock. +1 for the global lock. It may slow things down a bit in the general case, because every tx will have to acquire it in shared mode, but it will guarantee consistency for clear(). It still won't help entrySet() and its brethren... It might give a nice safe feeling that, when invoking a clear() operation in a transaction, I can still abort the transaction to make it cancel the operation; that's the only good part I can think of: we can cancel it. yep Wouldn't it be harder to integrate it with Total Order if it didn't have a tx? But most of all, I think we have to keep it because it's a part of the Map interface and a part of JCache's Cache interface. I don't think it has anything to do with consistency though? To make sure you're effectively involving all replicas of all entries in a consistent way, a lock would need to be acquired on each affected key, which again implies a need to enumerate all keys, including the unknown keys which might be hiding in a CacheStore: it's not enough to broadcast the clear() operation to all nodes and have them simply wipe their local state as that's never going to deal correctly (consistently) with in-flight transactions working on different nodes at different times (I guess enabling Total Order could help but you'd need to make it mandatory). see the comment above about the locks... may not be the most efficient, but handles the problem. Note that we need to iterate the keys anyway, in order to fire the CacheEntryRemoved listeners. Even if we optimized it skip the iteration when there are no registered listeners, it would still have to handle the general case. So let's step back a second and consider what is the use case for clear() ? I suspect it's primarily a method needed during testing, or maybe cleanup before a backup is restored (operations), maybe a manually activated JMX operation to clear the cache in exceptional cases. I'm not aware of the uses cases of a clear() operation, but use it as JMX operation can be a problem in transactional caches (depending the semantic you have in mind). If it is going to clear all the caches (i.e. from all replicas/nodes), then we have to find a way to set an order with the current transactions or we will have something some nodes delivering a transaction (say tx1) before the clear() and other nodes delivering tx1 after the clear(), leading to an inconsistent state. (offtopic) I'm wondering if we have that problem in the current implementation if tx1 is only insert new keys... in that case, we have no lock to serialize an order between the clear() and tx1... I'm pretty sure we do have this problem. We definitely need to add some tests for clear concurrent with other write operations. In my view, as long as we're going to expose a clear()
Re: [infinispan-dev] Warnings vs. Fail Fast
+1 to apply it, but maybe we should use the occasion to move the error message to the Log interface as well. On Fri, Oct 4, 2013 at 2:43 AM, Sanne Grinovero sa...@infinispan.orgwrote: Currently if a cache is configured with indexing enabled, but the Query module isn't on classpath, you get a simple warning. I think this should fail with a configuration validation error; it's not just safer but also consistent with many other validations. I've created ISPN-3583 and patch is ready.. any good reason to not apply it? Sanne ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev
Re: [infinispan-dev] ISPN-3557: interactions between a clear() operation and a Transaction
On Thu, Oct 3, 2013 at 1:57 PM, Sanne Grinovero sa...@infinispan.orgwrote: On 3 October 2013 10:29, Dan Berindei dan.berin...@gmail.com wrote: On Wed, Oct 2, 2013 at 12:48 PM, Pedro Ruivo pe...@infinispan.org wrote: On 10/02/2013 12:03 AM, Sanne Grinovero wrote: I'd love to brainstorm about the clear() operation and what it means on Infinispan. I'm not sure to what extent, but it seems that clear() is designed to work in a TX, or even create an implicit transaction if needed, but I'm not understanding how that can work. I think it's pretty straight-forward to define the semantics of operations after clear(): they should assume that every key in the cache was removed. True, like Pedro mentioned, the current implementation allows subsequent operations in the same tx to see keys inserted by other txs. But I don't think fixing that would be harder than ensuring that if a tx inserts key k owned by nodes A and B, clear() from outside the tx either deletes k on A and B or doesn't delete k on either node. Obviously a clear() operation isn't listing all keys explicitly. (offtopic) like entrySet(), keySet(), values() and iterator() when executed in a transactional context. however, I think that this operations are more difficult to handle, because they have to return consistent data... I agree, defining how entrySet() and the others should work in a tx seems harder than defining clear(). I agree, those other nasty beasts are next on my list: see below.. But we're not arguing about which method deserves the gold medal for complex design ;-) Will, I'm guessing your fix for ISPN-3560 still allows entrySet() after clear() to see entries inserted by another tx in the meantime? Which implies that it's undefined on which keys it's going to operate when it's fired.. that seems like terribly wrong in a distributed key/value store as we can't possibly freeze the global state and somehow define a set of keys which are going to be affected, while an explicit enumeration is needed to acquire the needed locks. you may not need to explicit acquire the lock for the affected key. you can use a global lock that is share acquired for write transactions without clear operation and exclusively acquire for transactions with clear operation (exactly what I use for Total Order). also, the clear transactions (i.e. the transactions with clear operation) can also only acquired the global lock. +1 for the global lock. It may slow things down a bit in the general case, because every tx will have to acquire it in shared mode, but it will guarantee consistency for clear(). I had thought of that too, but it will slow down all operations, at which benefit? To have a nice transactional clear() operation which guarantees consistency? For me, the most important reason is to support the JCache API. The goal of this thread is to challenge if that makes sense for real use cases, it seems in fact we would highly prefer a non transactional version, and considering that we can't run a single Cache in both modes anymore that's a problem. Side note on the implementation complexity: even the AsyncCacheStore decorator code is made extremely more tricky just to handle clear() appropriately, to keep it in the correct order of execution compared to the other operations, while still allowing coaleshing of other operations, which essentially implies we allow reordering of operations as long as they affect independent keys.. where clear() is of course a jolly which requires it's own execution path, its own global lock and generally makes it far more complex. Off-topic: I'm not very familiar with the async cache store code, but I really don't think coalescing writes is good for consistency. In case of a crash, you'd have lots of incomplete transactions on disk. It still won't help entrySet() and its brethren... It might give a nice safe feeling that, when invoking a clear() operation in a transaction, I can still abort the transaction to make it cancel the operation; that's the only good part I can think of: we can cancel it. yep Wouldn't it be harder to integrate it with Total Order if it didn't have a tx? But most of all, I think we have to keep it because it's a part of the Map interface and a part of JCache's Cache interface. I don't think it has anything to do with consistency though? To make sure you're effectively involving all replicas of all entries in a consistent way, a lock would need to be acquired on each affected key, which again implies a need to enumerate all keys, including the unknown keys which might be hiding in a CacheStore: it's not enough to broadcast the clear() operation to all nodes and have them simply wipe their local state as that's never going to deal correctly (consistently) with in-flight
Re: [infinispan-dev] Running stress tests on CI ?
We don't have a configured maximum duration for the stress tests. The stress build take about 13 minutes now, but enabling all the stress tests will probably increase the duration. I didn't want to delay regular builds, I've scheduled it to run every night at 03:00 GMT instead of running after every commit. On Mon, Oct 7, 2013 at 4:09 PM, Sanne Grinovero sa...@infinispan.orgwrote: Nice! Is that going to work out with our hardware? We'll need to be careful now with the configured duration of each such test. On 7 October 2013 13:48, Dan Berindei dan.berin...@gmail.com wrote: I've created a build in CI for the stress tests: http://ci.infinispan.org/viewType.html?buildTypeId=Infinispan_StressHotspotJdk6 Cheers Dan On Sun, Oct 6, 2013 at 12:07 AM, Sanne Grinovero sa...@infinispan.org wrote: Hi all, the following change introduced a critical issue in the Lucene Directory: final SetString filesList = fileOps.getFileList(); - String[] array = filesList.toArray(new String[0]); - return array; + return filesList.toArray(new String[filesList.size()]); I'll leave it as a puzzler to figure why the change is able to cause trouble ;-) This generates a NPE in just a single second of running one of the stress tests or performance tests, but I'm guilty of not being able to make a normal unit test for this case. That module contains such limited code, that in the very rare occasions in which I apply some changes I re-run the included benchmarks; I realize I can't expect that from all of you, so.. Should we enable some stress tests on CI? As a side warning consequence of this, the Lucene Directory in release 6.0.0.CR1 is very unreliable [ISPN-3592]. --Sanne ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev
Re: [infinispan-dev] Is anyone else experiencing JGRP-1675
I've seen StateTransferLargeObjectTest hang on my machine with all OOB threads waiting in FlowControl$Credit.decrementIfEnoughCredits, but I thought that was because the JGroups internal thread pool wasn't enabled in the test configuration. Now that I've seen it's enabled by default, I think it could be a JGroups problem after all. I'll enable TRACE logging for org.jgroups and hopefully I'll reproduce it again. For the record, the test suite is running with TCP, and the internal thread pool has queueing enabled by default, so I don't think any REPLENISH message could be lost. Cheers Dan On Fri, Oct 11, 2013 at 4:12 PM, Radim Vansa rva...@redhat.com wrote: I remember seeing that mostly on UDP, but I've had an issue with this over TCP as well on the cross-site link with multiple site masters (synchronous 1pc xs replication). Radim On 10/11/2013 02:51 PM, Ray Tsang wrote: Udp or tcp? On Oct 11, 2013, at 8:41, Radim Vansa rva...@redhat.com wrote: Hi, since Infinispan moved to JGroups 3.4, we're experiencing occassional deadlocks in some tests - most of threads that send anything over JGroups are waiting in JGroups' FlowControl.decrementCredits. The problem sometimes goes away after several seconds, but it produces some ugly spikes in our througput/response time charts. Originally this affected just some RadarGun tests but this is appearing in some client-server tests as well (we've recently investigated an issue where this appeared in a regular soak test). I was looking into that [1] for some time but haven't really figured out the cause. The workaround is to set up MFC and UFC credits high enough (I use 10M) and stuff works then. I was trying to reproduce that on pure JGroups, but unsuccessfully. I am not asking anyone to dig into that, but I wanted to know whether QA is alone experiencing that or if there are more of us. Radim [1] https://issues.jboss.org/browse/JGRP-1675 -- Radim Vansa rva...@redhat.com JBoss DataGrid QA ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev -- Radim Vansa rva...@redhat.com JBoss DataGrid QA ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev
Re: [infinispan-dev] Is anyone else experiencing JGRP-1675
I changed the default config some time ago, but I forgot to close ISPN-3334. On Fri, Oct 11, 2013 at 4:35 PM, Ray Tsang saturn...@gmail.com wrote: also, don't forget this one: https://issues.jboss.org/browse/ISPN-3334 On Fri, Oct 11, 2013 at 9:22 AM, Erik Salter an1...@hotmail.com wrote: Hi Radim, If you're using VMs, this may be of use. I actually experienced a JGroups lockup in production with FC in decrementCredits. What happened in my case is that virtualization tools had actually screwed with the system clock -- it was reported that JBoss took ~1 year to start. http://www.vmware.com/pdf/vmware_timekeeping.pdf The upshot is that I had to take FC out of my stack going forward. Erik -Original Message- From: infinispan-dev-boun...@lists.jboss.org [mailto:infinispan-dev-boun...@lists.jboss.org] On Behalf Of Radim Vansa Sent: Friday, October 11, 2013 8:41 AM To: infinispan-dev@lists.jboss.org Subject: [infinispan-dev] Is anyone else experiencing JGRP-1675 Hi, since Infinispan moved to JGroups 3.4, we're experiencing occassional deadlocks in some tests - most of threads that send anything over JGroups are waiting in JGroups' FlowControl.decrementCredits. The problem sometimes goes away after several seconds, but it produces some ugly spikes in our througput/response time charts. Originally this affected just some RadarGun tests but this is appearing in some client-server tests as well (we've recently investigated an issue where this appeared in a regular soak test). I was looking into that [1] for some time but haven't really figured out the cause. The workaround is to set up MFC and UFC credits high enough (I use 10M) and stuff works then. I was trying to reproduce that on pure JGroups, but unsuccessfully. I am not asking anyone to dig into that, but I wanted to know whether QA is alone experiencing that or if there are more of us. Radim [1] https://issues.jboss.org/browse/JGRP-1675 -- Radim Vansa rva...@redhat.com JBoss DataGrid QA ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev
Re: [infinispan-dev] The windup of 6.0.0
Bela, do you really need to rename the thread? You already pass a name argument to the Thread constructor, why not create the thread with the correct name directly? On Thu, Oct 17, 2013 at 5:48 PM, Dennis Reed der...@redhat.com wrote: On 10/17/2013 05:26 AM, Bela Ban wrote: The other thing to look at is the apparent cost of Thread.setName(): if we cannot avoid thread many creations, we could experiment with *not* naming threads, although this is IMO not a good idea. I agree that not naming the threads is a bad idea. The thread names are vital for debugging -- both in log messages and in thread dumps. Not naming the threads would lose a whole lot of information. -Dennis ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev
Re: [infinispan-dev] Improve WriteCommand processing code and [possibly] performance
On Tue, Oct 29, 2013 at 1:33 PM, Pedro Ruivo pe...@infinispan.org wrote: On 10/29/2013 07:58 AM, Radim Vansa wrote: On 10/25/2013 08:17 PM, Pedro Ruivo wrote: Hi guys. I've open a JIRA to tack this: https://issues.jboss.org/browse/ISPN-3664 Suggestions/feedback is appreciated. This is probably be integrated in the next major (but no promises). I was not cleared just ping me. Have a nice weekend folks :) Regards, Pedro Description is the following: Major refactorization of the write command with the following goals: * Base WriteCommand: all the write command has the same workflow through the interceptor chain * Create a concrete WriteCommand for each operation (put, remove, replace, replaceIfEquals, removeIfEquals, putIfAbsent) * Extend the interceptor chain to process each one of the command and add a new visitWriteCommand, that is invoked by the default visitX methods. * (minor) change the GetKeyValueCommand to ReadCommand to make name compatible with WriteCommand. Note that most of the interceptor only needs to implement the visitWriteCommand because all the write command has the same execution flow. The basic flow of the write commands are: (non-tx) lock, fetch value (cachestore/remote), check condition and apply change. for tx mode, lock (if pessimistic), fetch value (cache loader, remote, etc), apply change and add it to the transaction (if successful) I've been wondering for a while why the fetch part is a separate message in the write flow. Is the return value of much use when it does not return really the replaced value but only some of the previous values? And this way you double* the latency. I think that returning the value from primary owner as the response for write would make more sense. Naturally, for optimistic txs you can only do the get, and for pessimistic ones you'd have to return the value together with the locking, but still, the operations would make more sense. Actually, I think you are right. It makes no sense to fecth the value before the write operation for non-tx caches. I also like your idea of fetching the value when the key is locked :) Do you mind to create a new thread for it (I'm not sure if everybody reads this thread)? I don't think we actually fetch the value before the write operation for non-tx caches, at most we do an extra lookup in the local data container (in NonTxDistributionInterceptor.remoteGetBeforeWrite). We do fetch the value explicitly in transactional caches, but we don't invoke the write command remotely in that case. There is an opportunity for optimization in pessimistic tx caches, because we invoke both a remote get and a remote lock command for the same write command. Perhaps we can make LockControlCommand return the previous value (only if needed, of course). *: OK, it's not doubling as with the write the primary owner replicates the write to backups, but it's doubling the amount of communication originating from the requestor node. Also, another advantage is the simplification of the EntryFactory because if we think a little about it, independent of the write command we need to wrap the entry anyway. Suggested implementation class abstract WriteCommand, Object key, Object newValue boolen match(Object currentValue) //true by default boolean needsRemoteGetBeforeWrite() //true by default object perform() //common implementation like: if (match(entry.getValue()) then entry.setValue(newValue); entry.setChanged(true); entry.setRemoved(newValue == null)} * Concrete implementations * {PutCommand|RemoveCommand} extends WriteCommand ovewrite needsRemoteGetBeforeWrite() {return !flags.contains(IGNORE_RETURN_VALUE)} ReplaceIfPresentCommand extends WriteCommand ovewrite match(Object currentValue) {return currentValue != null} PutIfAbsentCommand extends WriteCommand ovewrite match(Object currentValue) {return currentValue == null} * Special base class for operation with expected value to compare * class abstract AdvancedWriteCommand extends WriteCommand Object expectedValue match(Object currentValue) {return currentValue.equals(expectedValue)} I'd call that rather ConditionalWriteCommand - AdvancedWC sounds too general to me. +1 for ConditionalWriteCommand My 2c. Radim ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev
Re: [infinispan-dev] Retrieving value before write
On Tue, Oct 29, 2013 at 3:56 PM, Mircea Markus mmar...@redhat.com wrote: On Oct 29, 2013, at 3:08 PM, William Burns mudokon...@gmail.com wrote: I agree, and I could have sworn there was a JIRA that detailed this, but I can't seem to locate it now. I was hoping to get to it next release. +1. This also is a very big improvement for non tx caches as well as the updating cache only has to send a single message, possibly reducing the overall amount of required JGroups messages by a third. We're already doing this for non-tx caches, at least since the lock delegation was implemented. Although the amount of bytes reduced is only a fixed amount less. Also it would make the consistency more correct since right now there is technically a window where you could have 2 concurrent updates but only 1 sees the correct returned value. +1 I'm not sure about that... in non-tx caches we already return the previous value from the write command, and in pessimistic caches we acquire the remote lock *before* we retrieve the value. On Tue, Oct 29, 2013 at 9:42 AM, Radim Vansa rva...@redhat.com wrote: Hi, Pedro suggested moving the following idea to separate thread (from the 'Improve WriteCommand processing code...'). Re: I've been wondering for a while why the fetch part is a separate message in the write flow. Is the return value of much use when it does not return really the replaced value but only some of the previous values? And this way you double* the latency. I think that returning the value from primary owner as the response for write would make more sense. Naturally, for optimistic txs you can only do the get, and for pessimistic ones you'd have to return the value together with the locking, but still, the operations would make more sense. *: OK, it's not doubling as with the write the primary owner replicates the write to backups, but it's doubling the amount of communication originating from the requestor node. WDYT? +1 Care to create a JIRA for this please? Radim ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev Cheers, -- Mircea Markus Infinispan lead (www.infinispan.org) ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev
Re: [infinispan-dev] PutForExternalRead consistency
On Mon, Nov 18, 2013 at 9:43 AM, Galder Zamarreño gal...@redhat.com wrote: On Nov 14, 2013, at 1:20 PM, Pedro Ruivo pe...@infinispan.org wrote: Hi, Simple question: shouldn't PFER ensure some consistency? I know that PFER is asynchronous but (IMO) it can create inconsistencies in the data. the primary owner replicates the PFER follow by a PUT (PFER is sent async log the lock is released immediately) for the same key, we have no way to be sure if the PFER is delivered after or before in all the backup owners. comments? Assuming that PFER and PUT happen in the same thread, we're normally relying on the JGroups sequence of events to send the first, wait no response, and then send the second put. That should guarantee order in which puts are received in the other nodes, but after that yeah, there's a risk that it could happen. PFER and PUT for a given key normally happen in the same thread in cache heavy use cases such as Hibernate 2LC, but there's no guarantee. I don't think that's correct. If the cache is synchronous, the PUT will be sent as an OOB message, and as such it can be delivered on the target before the previous PFER command. That's regardless of whether the PFER command was sent as a regular or as an OOB message. Cheers, Pedro ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev -- Galder Zamarreño gal...@redhat.com twitter.com/galderz Project Lead, Escalante http://escalante.io Engineer, Infinispan http://infinispan.org ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev
Re: [infinispan-dev] Running master: some failing test
I have already issued a PR for NonTxStateTransferOverwritingValue2Test: https://github.com/infinispan/infinispan/pull/2228 On Mon, Nov 18, 2013 at 10:13 PM, Mircea Markus mmar...@redhat.com wrote: Your machines are notorious the high number of intermittent failures, so this doesn't look that bad then :-) I've disabled AsyncReplExtendedStatisticTest [1] as it is not that critical. Dan, can you please look at NonTxStateTransferOverwritingValue2Test? We seem to be quite close to a stable test suite and would not be ideal to loose the momentum. [1] https://issues.jboss.org/browse/ISPN-3727 On Nov 18, 2013, at 4:10 PM, Sanne Grinovero sa...@infinispan.org wrote: Attempt 1# Failed tests: AsyncReplExtendedStatisticTestBaseClusteredExtendedStatisticTest.testReplaceWithOldVal:190-BaseClusteredExtendedStatisticTest.assertCacheValue:253-MultipleCacheManagersTest.assertEventuallyEquals:554-AbstractInfinispanTest.eventually:173-AbstractInfinispanTest.eventually:45-AbstractInfinispanTest.eventually:62 expected [true] but found [false] Attempt 2# Failed tests: NonTxStateTransferOverwritingValue2Test.testBackupOwnerJoiningDuringReplace:78-doTest:170 » Timeout Attempt 3# Failed tests: AsyncReplExtendedStatisticTestBaseClusteredExtendedStatisticTest.testReplaceWithOldVal:190-BaseClusteredExtendedStatisticTest.assertCacheValue:253-MultipleCacheManagersTest.assertEventuallyEquals:554-AbstractInfinispanTest.eventually:173-AbstractInfinispanTest.eventually:45-AbstractInfinispanTest.eventually:62 expected [true] but found [false] Attempt 4# Failed tests: AsyncReplExtendedStatisticTestBaseClusteredExtendedStatisticTest.testReplaceWithOldVal:190-BaseClusteredExtendedStatisticTest.assertCacheValue:253-MultipleCacheManagersTest.assertEventuallyEquals:554-AbstractInfinispanTest.eventually:173-AbstractInfinispanTest.eventually:45-AbstractInfinispanTest.eventually:62 expected [true] but found [false] ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev Cheers, -- Mircea Markus Infinispan lead (www.infinispan.org) ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev
Re: [infinispan-dev] merging all the github projects back?
Mircea, what exactly are the problems that we want to achieve by moving everything back in a single repository? To me, the biggest problem right now is that builds take way too long, and if I integrate something in master it will take almost an entire day for TeamCity to update the status on all the open pull requests. I don't think moving everything to a single repository will fix that, TBH. BTW, Maven rebuilds every module in the reactor every time you run a build. On my machine, it takes 6 minutes to build everything in the Infinispan reactor. Would you really want to wait 6 minutes every time you want to test something in the server? Cheers Dan On Mon, Nov 18, 2013 at 10:26 PM, Mircea Markus mmar...@redhat.com wrote: I've created ISPN-3728 to track this. On Nov 18, 2013, at 2:16 PM, Tristan Tarrant ttarr...@redhat.com wrote: +1, also only the gui-demo should be in the main repo. All other examples/demos should be moved to quickstarts or their own repos. Tristan On 11/18/2013 09:15 AM, Galder Zamarreño wrote: On Nov 15, 2013, at 2:02 PM, Sanne Grinovero sa...@infinispan.org wrote: +1 (as I've always been, damn my pessimistic opinions) However there is an alternative. If I recall correctly, one of your motivations for the split was to not keep maintaining all the non-essential components. I really like that idea, I just think it was implemented the wrong way. If you identify which components are non-essential (and you don't need them on the release day), you should keep them out but unlock them to a different version numbering, and especially never ever depend on snapshot versions. Let's try define the requirements: - any project should compile successfully out of the box - last minute surprises before a release needs to be done is not manageable You could get this easily if you allow the projects which are in a different repository to lag behind: you can't require to release those other projects each time you release an Infinispan core version. Which inherently implies that if you have something which is essential for a core release, it needs to be moved back to catch failures early on and keep them in perfect sync. Other module maintainers would then catch up on compatibility breaking changes of Infinispan core as (and when) they can. We can of course have a goal of keeping aligned quickly, but it should be optional (i.e. non-blocking for innovation and progress on the core module). + 1 to all said above and Mircea's suggestion. The thing is that Infinispan Server, and a subset of the cache stores really need to be released at the same time as Infinispan core. All those modules would benefit from living in same repo, making the compilation, CI and release much easier. Some examples: Level DB store needs to be brought back, whereas Cassandra cache store can live on its own. It's perfectly doable, all our other projects depending on Infinispan have lived well so far, with the occasional pain to fix some backwards compatibility issue but that's part of the dance. For example Search depends on Hibernate ORM, JGroups and Infinispan.. we try to catch backwards compatibility with ad-hoc CI jobs but it's not going to catch all problems, nor it's our intention to block other projects from innovating: it's rather for us to be aware of possible problems asap.. but none of these problems are allowed to prevent us to release early and often. [BTW the reason the Search case isn't perfect is because we target specific platforms like AS/Wildfly, not necessarily the latest Infinispan] Cheers, Sanne On 15 November 2013 12:43, Mircea Markus mmar...@redhat.com wrote: Hi guys, Given all the compiling problems we had since we've split in multiple github repos (server, stores and embedded) makes me think that the split wasn't such a great idea after all( and that I was hmm, wrong). Shall we move everything back into a single repo? We can still keep different CI runs for cache stores, server etc, but at least all this builds will compile everything. wdyt? Cheers, -- Mircea Markus Infinispan lead (www.infinispan.org) ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev -- Galder Zamarreño gal...@redhat.com twitter.com/galderz Project Lead, Escalante http://escalante.io Engineer, Infinispan http://infinispan.org ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev ___ infinispan-dev
Re: [infinispan-dev] merging all the github projects back?
On Tue, Nov 19, 2013 at 11:21 AM, Martin Gencur mgen...@redhat.com wrote: On 19.11.2013 09:59, Dan Berindei wrote: BTW, Maven rebuilds every module in the reactor every time you run a build. On my machine, it takes 6 minutes to build everything in the Infinispan reactor. Would you really want to wait 6 minutes every time you want to test something in the server? Actually, we often end up doing that cos we first need to build ISPN core and then the server which is dependent on ISPN core. It happens quite often that ISPN server works only with latest version of ISPN core. So if I understand it correctly, the ISPN server would be integrated in the main ISPN repository. I guess it would be a specific module which you could build on its own (and also run its tests). This would cover the case when you don't need to build ISPN core in order to run the server. Or did I miss something? I'm pretty sure it wouldn't be a single module... ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev
Re: [infinispan-dev] rethinking ISPN transactions
Sorry, I was confusing things - the timeout I was thinking of happened in the transaction manager itself, and it didn't matter whether we were registering a synchronization or a full XA resource. Replication timeout sounds like something that could be easily removed for commit command RPCs, but there may be plenty of other things that could fail without the caller knowing. So the question remains, should give up on synchronizations (except maybe with asynchronous commit/1PC, if we decide to keep that)? On Tue, Nov 19, 2013 at 6:01 PM, Dan Berindei dan.berin...@gmail.comwrote: Maybe synchronization support is another thing that we should scrap in 7.0, then? BTW, I've also seen the transaction timeout that Radim mentioned, but only recently. I wonder if we could do anything to increase it for the stress tests. On Tue, Nov 19, 2013 at 4:40 PM, Erik Salter an1...@hotmail.com wrote: - this is quite a subtle aspect of JTA transactions (is Synchronisation.afterCompletion failures are not propagated to the user) - you have a very good point here, thanks! This right here has burned me very badly. I think this really needs to be called out, especially if we're using this mode rather than full-blown XA. Erik ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev
Re: [infinispan-dev] merging all the github projects back?
When it comes to IDEs, I'd rather remove some more modules from the main project in order to make it more responsive and easier to use. Launching a test from IntelliJ takes minutes to build (or parse, or analyze, or whatever) everything, when it works. Extra dependencies on old versions in the compatibility modules broke the Analyze stacktrace feature, which used to be very nice for analyzing logs. I'd argue that having the cache store builds break in CI actually serves as a warning that we are breaking the API/SPI for our users. I don't think making API/SPI breakage easier should be the goal of our setup. On Wed, Nov 20, 2013 at 1:43 AM, Sanne Grinovero sa...@infinispan.orgwrote: Hi Manik, the problem the team has been facing past week is that the other modules are depending on a -SNAPSHOT version, and after having done some final touches to the modules in the core repo, we then find out that the other modules don't build anymore. This happened several times, and while it's normal for dependant projects to occasionally break, it's not so nice that the current structure doesn't allow for it to be spotted, or managed time-wise: it is important to be able to release at any point in time, but having potential surprises in different modules makes it hard to predict how long it will take to get to a stable tag. My suggestion is not to necessarily return all external modules in the core repository, but rather to make sure the modules which are separately also have the benefit of a different release cycle, so that you can always release any module at any time. This could mean that they have different version numbers but not necessarily: we could make it a convention to release compatible optional modules using the same version. For example - the optional CacheStore implementation for Infinispan 6.0.0.Final are released also as 6.0.0.Final but not _necessarily_ on the same day of the core module. Probably works best to have a same major,minor number, and be flexible with micro releases. You'd never want to depend on a snapshot version unless it's a module in your same repo. As mentioned, if a new Infinispan improvement breaks the Hibernate Search build, I have the option to decide to not upgrade; you don't have this flexibility if you're depending on snapshots. Cheers, Sanne On 19 November 2013 23:17, Manik Surtani ma...@infinispan.org wrote: As in, for easy refactoring? True that helps. But it doesn't help you with modular releases. On 19 November 2013 15:00, Mircea Markus mmar...@redhat.com wrote: That only half-solves the problem: having everything in the same IDE at the same time is more preentive. On 19 Nov 2013, at 22:44, Manik Surtani ma...@infinispan.org wrote: Can't this be achieved by checking out and building all relevant repos? This could be scripted. On 15 November 2013 04:43, Mircea Markus mmar...@redhat.com wrote: Hi guys, Given all the compiling problems we had since we've split in multiple github repos (server, stores and embedded) makes me think that the split wasn't such a great idea after all( and that I was hmm, wrong). Shall we move everything back into a single repo? We can still keep different CI runs for cache stores, server etc, but at least all this builds will compile everything. wdyt? Cheers, -- Mircea Markus Infinispan lead (www.infinispan.org) ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev
Re: [infinispan-dev] merging all the github projects back?
Sorry Sanne for ignoring your message, I meant to send the reply to Manik's :) I agree that depending on SNAPSHOTS from another repository is not a good idea. After all, we couldn't even make the dependencies work in CI for the jobs we have now - the REST cache store's build still fails because it can't find the infinispan-persistence-parent-6.0.1-SNAPSHOT POM in the repository. In this case we did have an option: we already have a compatibility layer that allows cache stores to use the old, stable SPI, so the cache stores could have depended on the latest Alpha/Beta/CR version, and (at least in theory) the server could have kept on using the 5.2.6.Final version of the cache stores until the final version of the core was released. The problem, I guess, is that we still want to release Infinispan Server on the same day as Infinispan Embedded, and we also want it to use to use the latest and greatest cache stores. If we don't change that policy, we can't use your suggestion. Cheers Dan On Wed, Nov 20, 2013 at 1:43 AM, Sanne Grinovero sa...@infinispan.orgwrote: Hi Manik, the problem the team has been facing past week is that the other modules are depending on a -SNAPSHOT version, and after having done some final touches to the modules in the core repo, we then find out that the other modules don't build anymore. This happened several times, and while it's normal for dependant projects to occasionally break, it's not so nice that the current structure doesn't allow for it to be spotted, or managed time-wise: it is important to be able to release at any point in time, but having potential surprises in different modules makes it hard to predict how long it will take to get to a stable tag. My suggestion is not to necessarily return all external modules in the core repository, but rather to make sure the modules which are separately also have the benefit of a different release cycle, so that you can always release any module at any time. This could mean that they have different version numbers but not necessarily: we could make it a convention to release compatible optional modules using the same version. For example - the optional CacheStore implementation for Infinispan 6.0.0.Final are released also as 6.0.0.Final but not _necessarily_ on the same day of the core module. Probably works best to have a same major,minor number, and be flexible with micro releases. You'd never want to depend on a snapshot version unless it's a module in your same repo. As mentioned, if a new Infinispan improvement breaks the Hibernate Search build, I have the option to decide to not upgrade; you don't have this flexibility if you're depending on snapshots. Cheers, Sanne On 19 November 2013 23:17, Manik Surtani ma...@infinispan.org wrote: As in, for easy refactoring? True that helps. But it doesn't help you with modular releases. On 19 November 2013 15:00, Mircea Markus mmar...@redhat.com wrote: That only half-solves the problem: having everything in the same IDE at the same time is more preentive. On 19 Nov 2013, at 22:44, Manik Surtani ma...@infinispan.org wrote: Can't this be achieved by checking out and building all relevant repos? This could be scripted. On 15 November 2013 04:43, Mircea Markus mmar...@redhat.com wrote: Hi guys, Given all the compiling problems we had since we've split in multiple github repos (server, stores and embedded) makes me think that the split wasn't such a great idea after all( and that I was hmm, wrong). Shall we move everything back into a single repo? We can still keep different CI runs for cache stores, server etc, but at least all this builds will compile everything. wdyt? Cheers, -- Mircea Markus Infinispan lead (www.infinispan.org) ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev
Re: [infinispan-dev] Infinispan 6.0.0.Final is out!
Yes, it was implemented in Beta1: http://blog.infinispan.org/2013/09/heterogenous-clusters-with-infinispan.html Cheers Dan On Wed, Nov 20, 2013 at 11:37 AM, Radim Vansa rva...@redhat.com wrote: Hi, the heterogenous clusters link does not work. I also miss any related JIRA in release notes - is it really implemented? Radim On 11/19/2013 09:08 PM, Adrian Nistor wrote: Dear Infinispan community, We're pleased to announce the final release of Infinispan 6.0 Infinium. As announcedhttp://infinispan.blogspot.co.uk/2013/05/infinispan-to-adopt-apache-software.html, this is the first Infinispan stable version to be released under the terms of Apache License v2.0 http://www.apache.org/licenses/LICENSE-2.0.html. This release brings some highly demanded features besides many stability enhancements and bug fixes: - Support for remote queryhttp://blog.infinispan.org/2013/09/embedded-and-remote-query-in-infinispan.html. It is now possible for the HotRod clients to query an Infinispan grid using a new expressive query DSL. This querying functionality is built on top of Apache Lucene and Google Protobuf and lays the foundation for storing information and querying an Infinispan server in a language neutral manner. The Java HotRod client has already been enhanced to support this, the soon-to-be announced C++ HotRod client will also contain this functionality (initially for write/read, then full blown querying). - C++ HotRod client. Allows C++ applications to read and write information from an Infinispan server. This is a fully fledged HotRod client that is topology (level 2) and consistent hash aware (level 3) and will be released in the following days. Some features (such as Remote Query and SSL support) will be developed during the next iteration so that it maintains feature parity with its Java counterpart. - Better persistence integration. We’ve revisited the entire cache loader API and we’re quite pleased with the result: the new Persistence APIhttp://blog.infinispan.org/2013/09/new-persistence-api-in-infinispan.htmlbrought by Infinispan 6.0 supports parallel iteration of the stored entries, reduces the overall serialization overhead and also is aligned with the JSR-107 http://jcp.org/en/jsr/detail?id=107 specification, which makes implementations more portable. - A more efficient FileCacheStore implementationhttp://blog.infinispan.org/2013/07/faster-file-cache-store-no-extra.html. This file store is built with efficiency in mind: it outperforms the existing file store with up to 2 levels of magnitude. This comes at a cost though, as keys need to be kept in memory. Thanks to Karsten Bleeshttps://github.com/kbleesfor contributing this! - Support for heterogeneous clustershttp://blog.infinispan.org/2013/09/heterogenous-clusters-with-infinispan.htm. Up to this release, every member of the cluster owns an equal share of the cluster’s data. This doesn’t work well if one machine is more powerful than the other cluster participants. This functionality allows specifying the amount of data, compared with the average, held by a particular machine. - A new set of usage and performance statisticshttps://issues.jboss.org/browse/ISPN-2861developed within the scope of the CloudTM projecthttps://issues.jboss.org/browse/ISPN-3234 . - JCache https://issues.jboss.org/browse/ISPN-3234 (JSR-107) implementation upgrade. First released in Infinispan 5.3.0, the standard caching support is now upgraded to version 1.0.0-PFD. For a complete list of features included in this release please refer to the release noteshttps://issues.jboss.org/secure/ReleaseNote.jspa?projectId=12310799version=12322480 . The user documentation for this release has been revamped and migrated to the new website http://infinispan.org/documentation/ - we think it looks much better and hope you’ll like it too! This release has spread over a period of 5 months: a sustained effort from the core development team, QE team and our growing community - a BIG thanks to everybody involved! Please visit our downloadshttp://infinispan.org/download/section to find the latest release. Also if you have any questions please check our forums http://infinispan.org/community/, our mailing listshttps://lists.jboss.org/mailman/listinfo/infinispan-devor ping us directly on IRC. Cheers, Adrian ___ infinispan-dev mailing listinfinispan-dev@lists.jboss.orghttps://lists.jboss.org/mailman/listinfo/infinispan-dev -- Radim Vansa rva...@redhat.com rva...@redhat.com JBoss DataGrid QA ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev ___ infinispan-dev mailing list
Re: [infinispan-dev] rethinking ISPN transactions
On Thu, Nov 21, 2013 at 5:39 PM, Pedro Ruivo pe...@infinispan.org wrote: On 11/21/2013 03:18 PM, Dan Berindei wrote: Hmm, couldn't you just disable recovery in the TM to get the same performance with a XA resource as with a synchronization? If you are suggesting to mess around with the TM, I don't think that is a good idea. First, the TM is external to Infinispan and second you can have others XaResources associated to the TM. I'm not suggesting that Infinispan should change the TM settings, I'm just wondering if there's a difference between using synchronization in Infinispan and disabling recovery completely (or using an in-memory transaction log) from the user's point of view. Actually, that might be irrelevant if using synchronization can lead to stale locks in Infinispan. If the commit command fails and doesn't release the locks, how will the user be able to find out that there are stale locks and release them? On Thu, Nov 21, 2013 at 1:57 PM, Pedro Ruivo pe...@infinispan.org mailto:pe...@infinispan.org wrote: On 11/21/2013 11:34 AM, Galder Zamarreño wrote: It's way faster actually. The speed difference from all the extra work required by Transaction Manager to deal with multiple XA resources, make transactions recoverable..etc. We've done tests in the past (i.e. Hibernate 2LC) comparing both and the difference was quite big. you are right. I forgot the recovery mechanism :) ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org mailto: infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev
Re: [infinispan-dev] PutForExternalRead consistency
On Thu, Nov 21, 2013 at 12:35 PM, Galder Zamarreño gal...@redhat.comwrote: On Nov 18, 2013, at 12:42 PM, Dan Berindei dan.berin...@gmail.com wrote: On Mon, Nov 18, 2013 at 9:43 AM, Galder Zamarreño gal...@redhat.com wrote: On Nov 14, 2013, at 1:20 PM, Pedro Ruivo pe...@infinispan.org wrote: Hi, Simple question: shouldn't PFER ensure some consistency? I know that PFER is asynchronous but (IMO) it can create inconsistencies in the data. the primary owner replicates the PFER follow by a PUT (PFER is sent async log the lock is released immediately) for the same key, we have no way to be sure if the PFER is delivered after or before in all the backup owners. comments? Assuming that PFER and PUT happen in the same thread, we're normally relying on the JGroups sequence of events to send the first, wait no response, and then send the second put. That should guarantee order in which puts are received in the other nodes, but after that yeah, there's a risk that it could happen. PFER and PUT for a given key normally happen in the same thread in cache heavy use cases such as Hibernate 2LC, but there's no guarantee. I don't think that's correct. If the cache is synchronous, the PUT will be sent as an OOB message, and as such it can be delivered on the target before the previous PFER command. That's regardless of whether the PFER command was sent as a regular or as an OOB message. ^ H, that's definitely risky. I think we should make PFER local only. The fact that PFER is asynchronous is nice to have. IOW, if you read a value from a database and you want to store it in the cache for later read, the fact that it's replicated asynchronously is just so that other nodes can take advantage of the value being in the cache. Since it's asynchronous some nodes could fail to apply, but that's fine since you can go to the database and re-retrieve it from there. So, making PFER local only would be the degenerate case, where all nodes fail to apply except the local node, which is fine. This is better than having the reordering above. In a chat I had with Dan, he pointed out that having PFER local only would be problematic for DIST mode w/ L1 enabled, since the local write would not invalidate other nodes, but this is fine because PFER only really makes sense for situations where the Infinispan is used as a cache. So, if the data is in the DB, you might as well go there (1 network trip), as opposed to askign the other nodes for data and the database in the worst case (2 network trips). PFER is really designed for replication or invalidation use cases, which are precisely the ones configured for Hibernate 2LC. Thoughts? +1 to make PFER local-only in replicated caches, but I now think we should go all the way and disallow PFER completely in dist caches. I still think having L1 enabled would be a problem, because a regular put() won't invalidate the entry on all the nodes that did a PFER for that key (there are no requestors, and even if we assume that we do a remote get before the PFER we'd still have race conditions). With L1 disabled, we have the problem that you mentioned: we're trying to read the value from the proper owners, but we never write it to the proper owners, so the hit ratio will be pretty bad. Using the SKIP_REMOTE_LOOKUP flag on reads, we'll avoid the extra RPC in Infinispan, but that will make the hit ratio even worse. E.g. in a 4-nodes cluster with numOwners=2, the hit ratio will never go above 50%. I don't think anyone would use a cache knowing that its hit ratio can never get above 50%, so we should just save ourselves some effort and stop supporting PFER in DIST mode. Cheers Dan ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev
Re: [infinispan-dev] rethinking ISPN transactions
Hmm, couldn't you just disable recovery in the TM to get the same performance with a XA resource as with a synchronization? On Thu, Nov 21, 2013 at 1:57 PM, Pedro Ruivo pe...@infinispan.org wrote: On 11/21/2013 11:34 AM, Galder Zamarreño wrote: It's way faster actually. The speed difference from all the extra work required by Transaction Manager to deal with multiple XA resources, make transactions recoverable..etc. We've done tests in the past (i.e. Hibernate 2LC) comparing both and the difference was quite big. you are right. I forgot the recovery mechanism :) ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev
Re: [infinispan-dev] merging all the github projects back?
On Thu, Nov 21, 2013 at 1:55 PM, Galder Zamarreño gal...@redhat.com wrote: On Nov 20, 2013, at 11:03 AM, Dan Berindei dan.berin...@gmail.com wrote: When it comes to IDEs, I'd rather remove some more modules from the main project in order to make it more responsive and easier to use. Launching a test from IntelliJ takes minutes to build (or parse, or analyze, or whatever) everything, when it works. Extra dependencies on old versions in the compatibility modules broke the Analyze stacktrace feature, which used to be very nice for analyzing logs. ^ Isnt't that a problem with the IDE which is messing up the code that's really in use? I guess the IDE could be smarter here, but I think it's pretty niche, so it's not going to be high on their priority list. ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev
Re: [infinispan-dev] PutForExternalRead consistency
That doesn't sound right, we don't keep any lock for the duration of the replication. In non-tx mode, we have to do a RPC to the primary owner before we acquire any key. So there's nothing stopping the PFER from writing its value after a regular (sync) put when the put was initiated after the PFER. On Fri, Nov 22, 2013 at 2:49 PM, William Burns mudokon...@gmail.com wrote: I wonder if we are over analyzing this. It seems the main issue is that the replication is done asynchronously. Infinispan has many ways to be make something asynchronous. In my opinion we just chose the wrong way. Wouldn't it be easier to just change the PFER to instead of passing along the FORCE_ASYNCHRONOUS flag we instead just have the operation performed asynchronous using putIfAbsentAsync ? This way the lock is held during the duration of the replication and should be consistent with other operations. Also the user can regain control back faster as it doesn't even have to process the local interceptor chain. We could also change the putForExternalRead method declaration to also return a NotifiableFutureVoid or something so they know when the operation is completed (if they want). - Will On Thu, Nov 21, 2013 at 9:54 AM, Dan Berindei dan.berin...@gmail.com wrote: On Thu, Nov 21, 2013 at 12:35 PM, Galder Zamarreño gal...@redhat.com wrote: On Nov 18, 2013, at 12:42 PM, Dan Berindei dan.berin...@gmail.com wrote: On Mon, Nov 18, 2013 at 9:43 AM, Galder Zamarreño gal...@redhat.com wrote: On Nov 14, 2013, at 1:20 PM, Pedro Ruivo pe...@infinispan.org wrote: Hi, Simple question: shouldn't PFER ensure some consistency? I know that PFER is asynchronous but (IMO) it can create inconsistencies in the data. the primary owner replicates the PFER follow by a PUT (PFER is sent async log the lock is released immediately) for the same key, we have no way to be sure if the PFER is delivered after or before in all the backup owners. comments? Assuming that PFER and PUT happen in the same thread, we're normally relying on the JGroups sequence of events to send the first, wait no response, and then send the second put. That should guarantee order in which puts are received in the other nodes, but after that yeah, there's a risk that it could happen. PFER and PUT for a given key normally happen in the same thread in cache heavy use cases such as Hibernate 2LC, but there's no guarantee. I don't think that's correct. If the cache is synchronous, the PUT will be sent as an OOB message, and as such it can be delivered on the target before the previous PFER command. That's regardless of whether the PFER command was sent as a regular or as an OOB message. ^ H, that's definitely risky. I think we should make PFER local only. The fact that PFER is asynchronous is nice to have. IOW, if you read a value from a database and you want to store it in the cache for later read, the fact that it's replicated asynchronously is just so that other nodes can take advantage of the value being in the cache. Since it's asynchronous some nodes could fail to apply, but that's fine since you can go to the database and re-retrieve it from there. So, making PFER local only would be the degenerate case, where all nodes fail to apply except the local node, which is fine. This is better than having the reordering above. In a chat I had with Dan, he pointed out that having PFER local only would be problematic for DIST mode w/ L1 enabled, since the local write would not invalidate other nodes, but this is fine because PFER only really makes sense for situations where the Infinispan is used as a cache. So, if the data is in the DB, you might as well go there (1 network trip), as opposed to askign the other nodes for data and the database in the worst case (2 network trips). PFER is really designed for replication or invalidation use cases, which are precisely the ones configured for Hibernate 2LC. Thoughts? +1 to make PFER local-only in replicated caches, but I now think we should go all the way and disallow PFER completely in dist caches. I still think having L1 enabled would be a problem, because a regular put() won't invalidate the entry on all the nodes that did a PFER for that key (there are no requestors, and even if we assume that we do a remote get before the PFER we'd still have race conditions). With L1 disabled, we have the problem that you mentioned: we're trying to read the value from the proper owners, but we never write it to the proper owners, so the hit ratio will be pretty bad. Using the SKIP_REMOTE_LOOKUP flag on reads, we'll avoid the extra RPC in Infinispan, but that will make the hit ratio even worse. E.g. in a 4-nodes cluster with numOwners=2, the hit ratio will never go above 50%. I don't think
Re: [infinispan-dev] rethinking ISPN transactions
On Fri, Nov 22, 2013 at 10:26 AM, Radim Vansa rva...@redhat.com wrote: On 11/21/2013 05:09 PM, Dan Berindei wrote: On Thu, Nov 21, 2013 at 5:39 PM, Pedro Ruivo pe...@infinispan.org wrote: On 11/21/2013 03:18 PM, Dan Berindei wrote: Hmm, couldn't you just disable recovery in the TM to get the same performance with a XA resource as with a synchronization? If you are suggesting to mess around with the TM, I don't think that is a good idea. First, the TM is external to Infinispan and second you can have others XaResources associated to the TM. I'm not suggesting that Infinispan should change the TM settings, I'm just wondering if there's a difference between using synchronization in Infinispan and disabling recovery completely (or using an in-memory transaction log) from the user's point of view. Actually, that might be irrelevant if using synchronization can lead to stale locks in Infinispan. If the commit command fails and doesn't release the locks, how will the user be able to find out that there are stale locks and release them? Infinispan detects the failed commit and sends a rollback command. Only if this fails as well, the locks stay unreleased. The problem is just that application thinks it has succeeded while the transaction may have been rolled back (in Infinispan way, the JTA transaction was already committed). Ok, maybe I was overreacting a bit :) But like you said, the rollback command can fail as well. True, the rollback doesn't have to write anything to the cache stores, so it's less likely to fail, but we should still consider the possibility. Radim On Thu, Nov 21, 2013 at 1:57 PM, Pedro Ruivo pe...@infinispan.org mailto:pe...@infinispan.org wrote: On 11/21/2013 11:34 AM, Galder Zamarreño wrote: It's way faster actually. The speed difference from all the extra work required by Transaction Manager to deal with multiple XA resources, make transactions recoverable..etc. We've done tests in the past (i.e. Hibernate 2LC) comparing both and the difference was quite big. you are right. I forgot the recovery mechanism :) ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org mailto: infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev ___ infinispan-dev mailing listinfinispan-dev@lists.jboss.orghttps://lists.jboss.org/mailman/listinfo/infinispan-dev -- Radim Vansa rva...@redhat.com rva...@redhat.com JBoss DataGrid QA ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev
Re: [infinispan-dev] PutForExternalRead consistency
I think I need to clarify my earlier email a bit: the problem I'm worried about is that we could have a thread execute a putForExternalLeave(k, v1), then a put(k, v2), then a remove(k), and in the end leaving k = v1 in the cache. (Without the remove(k) we'd be ok, because PFER uses putIfAbsent() under the hood.) This is quite different from the problem that Pedro raised, that of different owners ending up with different values for the same key. Will's suggestion to implement PFER as a regular put from a background thread does fix that problem. Writing the value only locally like Galder suggested would also address my concern, but at the expense of extra misses from the cache - especially in DIST mode. Hence my proposal to not support PFER in DIST mode at all. On Fri, Nov 22, 2013 at 3:45 PM, Dan Berindei dan.berin...@gmail.comwrote: That doesn't sound right, we don't keep any lock for the duration of the replication. In non-tx mode, we have to do a RPC to the primary owner before we acquire any key. So there's nothing stopping the PFER from writing its value after a regular (sync) put when the put was initiated after the PFER. On Fri, Nov 22, 2013 at 2:49 PM, William Burns mudokon...@gmail.comwrote: I wonder if we are over analyzing this. It seems the main issue is that the replication is done asynchronously. Infinispan has many ways to be make something asynchronous. In my opinion we just chose the wrong way. Wouldn't it be easier to just change the PFER to instead of passing along the FORCE_ASYNCHRONOUS flag we instead just have the operation performed asynchronous using putIfAbsentAsync ? This way the lock is held during the duration of the replication and should be consistent with other operations. Also the user can regain control back faster as it doesn't even have to process the local interceptor chain. We could also change the putForExternalRead method declaration to also return a NotifiableFutureVoid or something so they know when the operation is completed (if they want). - Will On Thu, Nov 21, 2013 at 9:54 AM, Dan Berindei dan.berin...@gmail.com wrote: On Thu, Nov 21, 2013 at 12:35 PM, Galder Zamarreño gal...@redhat.com wrote: On Nov 18, 2013, at 12:42 PM, Dan Berindei dan.berin...@gmail.com wrote: On Mon, Nov 18, 2013 at 9:43 AM, Galder Zamarreño gal...@redhat.com wrote: On Nov 14, 2013, at 1:20 PM, Pedro Ruivo pe...@infinispan.org wrote: Hi, Simple question: shouldn't PFER ensure some consistency? I know that PFER is asynchronous but (IMO) it can create inconsistencies in the data. the primary owner replicates the PFER follow by a PUT (PFER is sent async log the lock is released immediately) for the same key, we have no way to be sure if the PFER is delivered after or before in all the backup owners. comments? Assuming that PFER and PUT happen in the same thread, we're normally relying on the JGroups sequence of events to send the first, wait no response, and then send the second put. That should guarantee order in which puts are received in the other nodes, but after that yeah, there's a risk that it could happen. PFER and PUT for a given key normally happen in the same thread in cache heavy use cases such as Hibernate 2LC, but there's no guarantee. I don't think that's correct. If the cache is synchronous, the PUT will be sent as an OOB message, and as such it can be delivered on the target before the previous PFER command. That's regardless of whether the PFER command was sent as a regular or as an OOB message. ^ H, that's definitely risky. I think we should make PFER local only. The fact that PFER is asynchronous is nice to have. IOW, if you read a value from a database and you want to store it in the cache for later read, the fact that it's replicated asynchronously is just so that other nodes can take advantage of the value being in the cache. Since it's asynchronous some nodes could fail to apply, but that's fine since you can go to the database and re-retrieve it from there. So, making PFER local only would be the degenerate case, where all nodes fail to apply except the local node, which is fine. This is better than having the reordering above. In a chat I had with Dan, he pointed out that having PFER local only would be problematic for DIST mode w/ L1 enabled, since the local write would not invalidate other nodes, but this is fine because PFER only really makes sense for situations where the Infinispan is used as a cache. So, if the data is in the DB, you might as well go there (1 network trip), as opposed to askign the other nodes for data and the database in the worst case (2 network trips). PFER is really designed for replication or invalidation use cases, which are precisely the ones configured for Hibernate 2LC. Thoughts
Re: [infinispan-dev] Today's topic: eviction
On Fri, Nov 22, 2013 at 6:15 PM, Pedro Ruivo pe...@infinispan.org wrote: On 11/22/2013 11:40 AM, Sanne Grinovero wrote: Do we still need to support the non-V8 implementations? Clearly there are cases in which we need it to be available: org.infinispan.util.concurrent.locks.containers.AbstractPerEntryLockContainerL refers explicitly to EquivalentConcurrentHashMapV8 I don't think we need to support the non-v8 implementation but I'm not aware of any BoundedConcurrentHashMapV8. So, if I have to choose, for me it is simpler to change the current BounderConcurrentHashMap than implement all the eviction policies on top of ECHM-V8. To put it another way, we do need to support the non-v8 implementation because that's the only implementation of size-based eviction we've got :) AFAIK there are some lock-free implementations out there for LRU, and there may be some for LIRS as well. But our current implementations require a lock on the entire segment, so there is no simple way to make them work with CHMV8. ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev
Re: [infinispan-dev] Today's topic: eviction
On Wed, Nov 27, 2013 at 10:15 AM, Galder Zamarreño gal...@redhat.comwrote: On Nov 22, 2013, at 12:31 PM, Pedro Ruivo pe...@infinispan.org wrote: On 11/21/2013 10:08 PM, Sanne Grinovero wrote: Hi Pedro, great analysis. Conceptually I agree, and since I'm not too familiar with the core code, I actually expected this to be the case as we discussed the approach when we first discussed the single-lock-owner pattern. More specifically on the DataContainer: I don't think you should lock the segment, you need to lock the key entry only as the CacheStore could be quite slow so that is a long lock at least relatively to normal usage. Agree with you. I based my solution on the current BoundConcurrentHashMap [1] implementation that uses locks per segment. Also, the eviction is kicked inside this locks so we already have a long lock to write the entry in the cache writer. [1] Segment.put(): https://github.com/infinispan/infinispan/blob/master/core/src/main/java/org/infinispan/util/concurrent/BoundedConcurrentHashMap.java#L1554 This should NOT be the normal key lock however, as you might remember I'm since long time advocating that there needs to be a split in logical lock as the user sees it (and as used by transactions) vs the locks we use internally to keep our very own data structure consistency. It's trivial to apply if you use EquivalentConcurrentHashMapV8.computeIfAbsent(K, Fun? super K, ? extends V) This is a great idea :) +1 The EquivalentConcurrentHashMapV8 will solve the problem for unbounded data container but we still have to do more work with the bounded data container I'm afraid… We need a BoundedEquivalentConcurrentHashMapV8. The reason we haven't built one yet is because ConcurrentHashMapV8 has been in flux over past few months, so keeping it up to date required some effort. As it gets more stable, we can start looking into a BoundedEquivalentConcurrentHashMapV8. This might be a good time to do it. +1 to write a BoundedEquivalentConcurrentHashMapV8, but changing the LRU and LIRS implementations is going to be tricky business, so I'd rather we do it separately from the DataContainer API change. Doesn't sound like a patch which requires a major release? Sanne On 21 November 2013 18:48, Pedro Ruivo pe...@infinispan.org wrote: (re: https://issues.jboss.org/browse/ISPN-3048) (ps. sorry for the long mail. I hope that is clear) * Automatic eviction: I've been writing some scenario involving eviction and concurrent operations. This test shows a very reliable eviction but we have a problem when passivation is enabled. The passivation phase is ok but when we need to activate a key, we have a huge window in which the read/write operation does not find the key neither in-memory data container neither in the cache loader. This happens because the check in-memory data container is not synchronized with the check in cache loader. Passivation phase works fine, as I said, because it happens inside the data container, i.e., the bounded concurrent hash map (BCHM) evict the key under the segment lock. * Manual eviction I haven't finish my test suite but so far it does not work as expected. If you are a backup owner, you are unable to evict any keys (because the EvictCommand is handled as a RemoveCommand and the key is not wrapped in EntryWrappingInterceptor). In addition, I believe that it has the same problems with activation as above. Also, I believe that passivation phase does not work well because it first tries to passivate than it removes from DataContainer. Nothing is preventing to passivate an old value, a put occurs and modifies the data in DataContainer and finally the EvictCommand removes the new value (that is lost). * New implementation Since we are stating a new major release, I would like to propose the following improvements. One aspect of the implementation is the dropping of the *Cache[Writer or Loader]Interceptor and the EvictCommand. Also, it would add a new method in DataContainer (void evict(K key)) and possible change the interface in PersistenceManager. My idea is based on the current implementation of the BCHM. The BCHM implementation is aware of the persistence and it performs internally the passivate() and activate() operations. I would like to extend this and make it full aware of the PersistenceManager. Also, it would be required to have ConcurrentHashMap (CHM) with the same features Enter in more detail, the (B)CHM.get() will be responsible to load the key from persistence (and activate it) if it is not found in-memory. Also, it stores the entry in memory. In other hand, the put() stores the entry in-memory and in the persistence (if passivation==false) and we add an evict(k) to the CHM interface to remove from memory and passivate it to persistence a single key. *
Re: [infinispan-dev] Design of Remote Hot Rod events
On Mon, Dec 2, 2013 at 11:44 AM, Galder Zamarreño gal...@redhat.com wrote: On Nov 27, 2013, at 3:06 PM, Mircea Markus mmar...@redhat.com wrote: - how does the server know that a request originated from a certain client in order not to send it to that client again? There's no clientId in the request… Well spotted :). There are two ways to solve this. First one is by adding the source id to each cache operation sent from the client. This would require a change in the way the header is parsed for all operations. This is the simplest solution, with a little addition to the header. The second option is a bit more complicated but avoids the need to send the source id per request. At least in the Java client, each connection opened sends a ping at the start. You could add source id to the ping command, and then the server could track all incoming connections that send a particular id. There could be multiple in the case of clients pooling connections. The server can track disconnections and keep this collection up to date, but it'd be quite a bit of work on top of the rest of stuff. I'd prefer the first option. +1, for simplicity. We also don't enforce the client connections to start with a ping either. ^ Indeed we don't. It's something the java client does by default but it's not mandatory at all. How would you generate the client id? ip+port perhaps? or something the server would issue (shared server counter) when a client asks for it? The client or source id should ideally be composed of two parts: 1. Something the Hot Rod client provides via configuration. 2. Something that's dynamically generated whenever the RemoteCacheManager is started. The former could be anything from a simple application id, to an application id alonside client host and port. This is the static part of the source or client id. The one that's always the same for a RemoteCacheManager unless the configuration changes. The second part, which is dynamic, should be created by the Hot Rod client implementation in order to avoid client resurrection issues (a similar method to what JGroups does). Regardless, the source or client id will be a variable length byte array. I think this is easier than relying in some kind of server side state, and having to synch that. You could have many clients connecting, so having to produce something different for each, cluster wide, could be challenging. Thoughts? Why not a UUID? Cheers, On Nov 12, 2013, at 3:17 PM, Galder Zamarreño gal...@redhat.com wrote: Hi all, Re: https://github.com/infinispan/infinispan/wiki/Remote-Hot-Rod-Events I've just finished writing up the Hot Rod remote events design document. Amongst many other use cases, this will enable near caching use cases with the help of Hot Rod client callbacks. Cheers, -- Galder Zamarreño gal...@redhat.com twitter.com/galderz Project Lead, Escalante http://escalante.io Engineer, Infinispan http://infinispan.org ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev Cheers, -- Mircea Markus Infinispan lead (www.infinispan.org) ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev -- Galder Zamarreño gal...@redhat.com twitter.com/galderz Project Lead, Escalante http://escalante.io Engineer, Infinispan http://infinispan.org ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev Cheers, -- Mircea Markus Infinispan lead (www.infinispan.org) ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev -- Galder Zamarreño gal...@redhat.com twitter.com/galderz Project Lead, Escalante http://escalante.io Engineer, Infinispan http://infinispan.org ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev
Re: [infinispan-dev] Denormalizing hashes
Hi Radim Actually, it's me that wrote the denormalization code :) It was meant as a stop-gap measure before we upgraded the HotRod protocol to support the segment-based consistent hash, but the denormalization worked well enough (or so we thought) that we didn't get to changing the protocol yet. That's not a big change in itself, but we also wanted to make the consistent hash per-cache on the client (it's now per-cache manager), and that's a bit more complicated to do. And it's not like it would have been a good idea to change this before starting the C++ client, the client would still have to support the current style of consistent hash. On Tue, Dec 10, 2013 at 8:17 PM, Radim Vansa rva...@redhat.com wrote: Hi Galder, as I am trying to debug some problem in C++ client, I was looking into the server code. And I am not sure whether I understand the code correctly, but it seems to me that the server denormalizes the consistent hash for each client anew (after each topology change or client joining). Is this true? Looking into trace logs, I can see stuff like 18:15:17,339 TRACE [org.infinispan.server.hotrod.Encoders$Encoder12$] (HotRodServerWorker-12) Writing hash id 639767 for 192.168.11.101:11222 From denormalizeSegmentHashIds() method I see that this means that we have executed the hash function 639768 times just to notify one client. Is my understanding correct? Yes, this happens every time a client joins and/or every time the cache topology changes. We could easily cache the result of denormalizeSegmentHashIds, as it only depends on the number of segments. It's just that I wasn't expecting it to take so many iterations. Also, there is nothing like the concept of primary owner, is this right? The client CH doesn't have a concept of backup owners. But for each (hash id, server) pair that gets sent to the client, it means all the hash codes between the previous hash id and this hash id have this server as the primary owner. The server in the next (hash id, server) pair is the first backup, and so on. For each segment, the server generates numOwners (hash id, server) pairs. That means, for most of the hash codes in the segment, the list of owners on the client will be the same as the list of owners on the server. But for 0.0002 (leewayFraction) of the hash codes, the client primary owner will be indeed one of the server backup owners. I thought that every first request in HotRod will go to primary owner, so that the PUT does not have to do the first hop and is executed directly on the primary. But it seems to me that it goes to any of the owners (practically random one, as you are only looking for the numOwner ids in leeway = on the beginning of the range - then, 99.98% or more requests should go to the server with last position in the leeway). This looks pretty suboptimal for writes, isn't it? I'm not sure what you mean here, but I'm pretty sure the request goes to the correct server because we have a test for it: ConsistentHashV1IntegrationTest Cheers Dan Cheers Radim PS: for every line of code you write in Scala, God kills a kitten -- Radim Vansa rva...@redhat.com JBoss DataGrid QA ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev
Re: [infinispan-dev] Denormalizing hashes
On Wed, Dec 11, 2013 at 10:38 AM, Radim Vansa rva...@redhat.com wrote: Hi Dan I am not speaking about changing something for the C++ client, I understand that the client code cannot be changed in order to keep the backward compatibility. Sure, I was just trying to give some background information on what we discussed and why we still have the wheel-based CH in the client. The current hash-wheel approach is working well, but there are few flaws that could be fixed keeping the client code untouched. Please, correct me if I am wrong. 1) The denormalization is executed for every client for every topology change/client join. I don't have any numbers, but calling the hashing algorithm million times per every such occasion sounds as wasting computing power. - cache the denormalized stuff on server +1, like I said it would be easy to do but it never came up as a problem before. 2) The server is sending numOwners hashIds per segment, one for each owner. What's the reason for that? I think that only primary owners should be inserted there. This would: The main reason is to support clients from Infinispan 5.1, which pick a random owner instead of always choosing the primary ( https://issues.jboss.org/browse/ISPN-2655). a) target all PUT requests to primary owner, reducing PUT latency and lowering the general load in cluster Nope, it wouldn't. The same fraction of requests would go to the primary owner as before, because we won't find the exact denormalized hash id that maps to the segment border when normalized. b) reduce the routing information For 7.0, I guess we could say that 5.1 clients are no longer supported and we could switch to sending only the primary owners to the clients. But I'm not sure whether the loss of backwards compatibility is worth a couple hundred bytes sent once for every client. And yes, ISPN-3530 and ISPN-3701 are pretty serious, but IMO rather orthogonal to the segment vs. hash wheel approach and its details. Agree. Could you create issues in JIRA for both your proposals? Radim On 12/11/2013 09:18 AM, Dan Berindei wrote: Hi Radim Actually, it's me that wrote the denormalization code :) It was meant as a stop-gap measure before we upgraded the HotRod protocol to support the segment-based consistent hash, but the denormalization worked well enough (or so we thought) that we didn't get to changing the protocol yet. That's not a big change in itself, but we also wanted to make the consistent hash per-cache on the client (it's now per-cache manager), and that's a bit more complicated to do. And it's not like it would have been a good idea to change this before starting the C++ client, the client would still have to support the current style of consistent hash. On Tue, Dec 10, 2013 at 8:17 PM, Radim Vansa rva...@redhat.com wrote: Hi Galder, as I am trying to debug some problem in C++ client, I was looking into the server code. And I am not sure whether I understand the code correctly, but it seems to me that the server denormalizes the consistent hash for each client anew (after each topology change or client joining). Is this true? Looking into trace logs, I can see stuff like 18:15:17,339 TRACE [org.infinispan.server.hotrod.Encoders$Encoder12$] (HotRodServerWorker-12) Writing hash id 639767 for 192.168.11.101:11222 From denormalizeSegmentHashIds() method I see that this means that we have executed the hash function 639768 times just to notify one client. Is my understanding correct? Yes, this happens every time a client joins and/or every time the cache topology changes. We could easily cache the result of denormalizeSegmentHashIds, as it only depends on the number of segments. It's just that I wasn't expecting it to take so many iterations. Also, there is nothing like the concept of primary owner, is this right? The client CH doesn't have a concept of backup owners. But for each (hash id, server) pair that gets sent to the client, it means all the hash codes between the previous hash id and this hash id have this server as the primary owner. The server in the next (hash id, server) pair is the first backup, and so on. For each segment, the server generates numOwners (hash id, server) pairs. That means, for most of the hash codes in the segment, the list of owners on the client will be the same as the list of owners on the server. But for 0.0002 (leewayFraction) of the hash codes, the client primary owner will be indeed one of the server backup owners. I thought that every first request in HotRod will go to primary owner, so that the PUT does not have to do the first hop and is executed directly on the primary. But it seems to me that it goes to any of the owners (practically random one, as you are only looking for the numOwner ids in leeway = on the beginning of the range - then, 99.98% or more requests should go to the server with last position in the leeway
Re: [infinispan-dev] Denormalizing hashes
On Wed, Dec 11, 2013 at 11:37 AM, Radim Vansa rva...@redhat.com wrote: On Wed, Dec 11, 2013 at 10:38 AM, Radim Vansa rva...@redhat.com wrote: Hi Dan I am not speaking about changing something for the C++ client, I understand that the client code cannot be changed in order to keep the backward compatibility. Sure, I was just trying to give some background information on what we discussed and why we still have the wheel-based CH in the client. The current hash-wheel approach is working well, but there are few flaws that could be fixed keeping the client code untouched. Please, correct me if I am wrong. 1) The denormalization is executed for every client for every topology change/client join. I don't have any numbers, but calling the hashing algorithm million times per every such occasion sounds as wasting computing power. - cache the denormalized stuff on server +1, like I said it would be easy to do but it never came up as a problem before. Fair enough. 2) The server is sending numOwners hashIds per segment, one for each owner. What's the reason for that? I think that only primary owners should be inserted there. This would: The main reason is to support clients from Infinispan 5.1, which pick a random owner instead of always choosing the primary ( https://issues.jboss.org/browse/ISPN-2655). You can always report numKeyOwners=1 and the old code should handle that. Yeah, it looks like it would work. I was thinking that when retrying a failed operation, the pre-5.2 client would still try one of the key owners, but I see now that it always chose a random server when retrying. Also, the client doesn't expose numKeyOwners to the user, like I had assumed. a) target all PUT requests to primary owner, reducing PUT latency and lowering the general load in cluster Nope, it wouldn't. The same fraction of requests would go to the primary owner as before, because we won't find the exact denormalized hash id that maps to the segment border when normalized. Oh, only now have I noticed that the hashIds are sorted by the normalized ID, therefore, the primary owner always picks the first position (and most requests will hit the first). Mea culpa. Still, it makes no sense to include the backup owners into the routing table, as the probability that a read will hit them is negligable. b) reduce the routing information For 7.0, I guess we could say that 5.1 clients are no longer supported and we could switch to sending only the primary owners to the clients. But I'm not sure whether the loss of backwards compatibility is worth a couple hundred bytes sent once for every client. And yes, ISPN-3530 and ISPN-3701 are pretty serious, but IMO rather orthogonal to the segment vs. hash wheel approach and its details. Agree. Could you create issues in JIRA for both your proposals? OK. By the way, shouldn't we tag the features that should be included for hotrod protocol v1.4 with some tag, such as hotrod14? But as I said, if we fake the numOwners to be always 1, it should be backwards compatible even now. Nevertheless, as you in fact do route most requests to the primary owner, it won't provide much performance gain and it's not as hot as I first thought. Radim ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev
Re: [infinispan-dev] Clean ThreadLocals
On Thu, Dec 12, 2013 at 12:52 AM, David M. Lloyd david.ll...@redhat.comwrote: On 12/11/2013 04:47 PM, Pedro Ruivo wrote: Hi, I've created a method to clean a specific ThreadLocal variable from all live threads [1]. My goal is to clean the ThreadLocal variables after a cache stops. It's kind expensive method (it uses reflection) but I think it is fine. Currently, I'm using it to clear the Marshaller ThreadLocal in here [2]. Does anyone see any problem with this approach? Maybe it can be used in other possible leaking ThreadLocals? It's an interesting idea (I've done something similar in the past to simply drop all thread locals). I would recommend that you check that all the JDKs you want to support use the same technique for thread locals though. Because these fields are not part of the JDK, they may not always exist in all environments. Yeah, the implementation of ThreadLocal has changed in the past and is likely to change again in the future. If that happens and we have to support different JDKs with different methods for clearing ThreadLocals, it won't be pretty. Also, it is important to only ever clean the thread locals of your current thread, or you're inviting all kinds of bizarre concurrency problems (maybe even locking threads into infinite loops), especially if the thread is running and actively using thread locals at the time. Indeed, ThreadLocalMap doesn't look to be thread-safe. I'm not sure if a remote() from another thread can cause an infinite loop like in HashMap because it uses open addressing instead of chaining, but it looks like it may cause a get() for a different ThreadLocal to return the wrong instance. (I would be ok with it if it would only cause breakage for threads using the cache that's being stopped, but it looks like it can touch more than one entry.) I think we shouldn't concern ourselves with actually removing the ThreadLocal from the map, we just have to keep track of the ThreadLocal instances and clean any references to other objects they might have (a la ExternalizerTableProxy). Or maybe give up on ThreadLocals and use a ConcurrentWeakKeyHashMapThread, PerThreadInstanceHolder instead. Dan -- - DML ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev
Re: [infinispan-dev] Design of Remote Hot Rod events - round 2
On Thu, Dec 19, 2013 at 2:15 PM, Emmanuel Bernard emman...@hibernate.orgwrote: On Thu 2013-12-19 9:46, Galder Zamarreño wrote: == Example of continuous query atop remote listeners Thinking about how to implement continuous query atop this infrastructure I am missing a few things. The primary problem is that I don't want to enlist a filter id per continuous query I want to run. Not only that but I'd love to be able to add a continuous query on the fly and disable it on the fly as well per client. For that filters and converters are not flexible enough. What is missing is the ability to pass parameters from the client to the remote filter and remote converter. Parameters should be provided *per client*. Say Client 1 register the continuous query listener with where age 19 and client 2 registers the CQ listener with where name = emmanuel. The filter knowing for which client it filters, it will be able to only return the keys that match the query. This all sounds a bit like remote code exectution to me? You're asking for the client to pass some kind of executable thing that acts as a filter. That's a separate feature IMO, which I believe @Tristan is looking into. Once that's in place, I'm happy to enhance stuff in the remote event side to support it. I don't think you are correct. This is not remote execution in the sense of arbitrary code driven by the client. Remote execution will likely be triggered, render a result and stop. It will not send matching events in a continuous fashion. Plus remote execution will likely involve dynamic languages and I'm not sure we want to go that route for things like continuous query. To be clear, this is exactly the same as the filter parameters that Radim was asking for, right? From Infinispan's point of view, the filter just takes a String parameter, and the fact that that string can be parsed by the filter in a particular language is irrelevant. ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev
Re: [infinispan-dev] Design of Remote Hot Rod events - round 2
On Thu, Dec 19, 2013 at 8:43 PM, Emmanuel Bernard emman...@hibernate.orgwrote: On Thu 2013-12-19 19:57, Dan Berindei wrote: On Thu, Dec 19, 2013 at 2:15 PM, Emmanuel Bernard emman...@hibernate.orgwrote: On Thu 2013-12-19 9:46, Galder Zamarreño wrote: == Example of continuous query atop remote listeners Thinking about how to implement continuous query atop this infrastructure I am missing a few things. The primary problem is that I don't want to enlist a filter id per continuous query I want to run. Not only that but I'd love to be able to add a continuous query on the fly and disable it on the fly as well per client. For that filters and converters are not flexible enough. What is missing is the ability to pass parameters from the client to the remote filter and remote converter. Parameters should be provided *per client*. Say Client 1 register the continuous query listener with where age 19 and client 2 registers the CQ listener with where name = emmanuel. The filter knowing for which client it filters, it will be able to only return the keys that match the query. This all sounds a bit like remote code exectution to me? You're asking for the client to pass some kind of executable thing that acts as a filter. That's a separate feature IMO, which I believe @Tristan is looking into. Once that's in place, I'm happy to enhance stuff in the remote event side to support it. I don't think you are correct. This is not remote execution in the sense of arbitrary code driven by the client. Remote execution will likely be triggered, render a result and stop. It will not send matching events in a continuous fashion. Plus remote execution will likely involve dynamic languages and I'm not sure we want to go that route for things like continuous query. To be clear, this is exactly the same as the filter parameters that Radim was asking for, right? From Infinispan's point of view, the filter just takes a String parameter, and the fact that that string can be parsed by the filter in a particular language is irrelevant. Not sure what string you are talking about. The filterid? I was referring to the condition string: where age 19, or where name = emmanuel. ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev
Re: [infinispan-dev] Design of Remote Hot Rod events - round 2
On Fri, Dec 13, 2013 at 4:11 PM, Radim Vansa rva...@redhat.com wrote: On 12/13/2013 02:44 PM, Galder Zamarreño wrote: On Dec 6, 2013, at 10:45 AM, Radim Vansa rva...@redhat.com wrote: Hi, 1) IMO, filtering for specific key is a very important use case. Registering a filterId is a very powerful feature, but as long as you don't provide runtime parameter for this filter, you cannot implement one-key filtering. What do you mean by runtime parameter exactly? Can you give a concrete example of what you want to achieve that is not possible with what I've written up? As I stressed, if the client wants to listen for events on key_123456, then you can deploy a filter matching key_{number} (and additional constraints) but the 123456 is not known at deployment time. 2) setting ack/no ack in listener, and then configuring server-wise whether you should ack each / only last event sounds weird. I'd replace the boolean with enum { NO_ACK, ACK_EACH, ACK_LAST }. Makes a lot of sense, +1. 3) should the client provide source id when registering listener or when starting RemoteCacheManager? No API for that. Every operation will require a source ID from now on, so clients must provide it from first operation sent to the server. From a Java client perspective, you'd have this from the start via the configuration. 4) clustered events design does not specify any means to replicating the clustered event listener - all it does is that you register the listener on one node and the other nodes then route events to this node, until the node dies/deregisters the listener. No replication. Please specify, how should it piggyback on clustered events, and how should the listener list be replicated. In clustered listeners, the other nodes you talk about are gonna need to know about the clustered listeners so that they route events. Some kind of information about these clustered listeners will need to be sent around the cluster. The exact details are probably implementation details but we have a clustered registry already in place for this kind of things. In any case, it'd make a lot of sense that both use cases reuse as much as logic in this area. OK, this is probably the desired behaviour, it just is not covered by the Clustered Events design draft. Probably something to add - I'll ping Mircea about that. And you're right that it would make a lot of sense to have shared structure for the listeners, and two implementations of the delivery boy (one to the node where a clustered event has been registered and second to local component handling HotRod clients). 5) non-acked events: how exactly do you expect the ack data to be replicated, and updated? I see three options: A) Let non-acked list be a part of the listener record in replicated cache, and the primary owner which executes the event should update these via delta messages. I guess for proper reliability it should add operation record synchronously before confirming the operation to the originator, and then it might asynchronously remove it after the ack from client. When a node becomes primary owner, it should send events to client for all non-acked events. B) Having the non-acked list attached directly to cache entry (updating it together with regular backup), and then asynchronously updating the non-ack list after ack comes C) Separate cache for acks by entry keys, similar to B, consistent hash synced with the main entry cache Definitely not B. I don't wanna tie the internal cache entry to the ACKs. The two should be independent. Either C or A. For C, you'd wished to have a single cache for all listeners+caches, but you'd have to think about the keys and to have the same consistent hash, you'd have to have same keys. A might be better, but you certainly don't want this ACK info in a replicated structure. You'd want ACKs in a distributed cache preferably, and clustered listener info in the clustered replicated registry. There already is some CH implementation which aims at sharing the same distribution for all caches, SyncConsistentHash. Is there some problem with C and forcing this for the caches? Dan? I'm not sure what the exact requirements would be here. SyncConsistentHashFactory does ensure that the same key is mapped to the same owners in all the caches using it, most of the time. However, it requires both caches to have the same members, and since topologies aren't applied at exactly the same time there will be periods when the owners in the two caches won't match. ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev
Re: [infinispan-dev] Design of Remote Hot Rod events - round 2
On Fri, Dec 20, 2013 at 12:20 PM, Emmanuel Bernard emman...@hibernate.orgwrote: On Fri 2013-12-20 12:09, Dan Berindei wrote: On Thu, Dec 19, 2013 at 8:43 PM, Emmanuel Bernard emman...@hibernate.orgwrote: On Thu 2013-12-19 19:57, Dan Berindei wrote: On Thu, Dec 19, 2013 at 2:15 PM, Emmanuel Bernard emman...@hibernate.orgwrote: On Thu 2013-12-19 9:46, Galder Zamarreño wrote: == Example of continuous query atop remote listeners Thinking about how to implement continuous query atop this infrastructure I am missing a few things. The primary problem is that I don't want to enlist a filter id per continuous query I want to run. Not only that but I'd love to be able to add a continuous query on the fly and disable it on the fly as well per client. For that filters and converters are not flexible enough. What is missing is the ability to pass parameters from the client to the remote filter and remote converter. Parameters should be provided *per client*. Say Client 1 register the continuous query listener with where age 19 and client 2 registers the CQ listener with where name = emmanuel. The filter knowing for which client it filters, it will be able to only return the keys that match the query. This all sounds a bit like remote code exectution to me? You're asking for the client to pass some kind of executable thing that acts as a filter. That's a separate feature IMO, which I believe @Tristan is looking into. Once that's in place, I'm happy to enhance stuff in the remote event side to support it. I don't think you are correct. This is not remote execution in the sense of arbitrary code driven by the client. Remote execution will likely be triggered, render a result and stop. It will not send matching events in a continuous fashion. Plus remote execution will likely involve dynamic languages and I'm not sure we want to go that route for things like continuous query. To be clear, this is exactly the same as the filter parameters that Radim was asking for, right? From Infinispan's point of view, the filter just takes a String parameter, and the fact that that string can be parsed by the filter in a particular language is irrelevant. Not sure what string you are talking about. The filterid? I was referring to the condition string: where age 19, or where name = emmanuel. OK. I am not sure that the fact that the parameter is a string in both cases and that there is only one parameter and not multiple is particularly relevant to the general problem at hand. I don't think so either, I was just asking if you think there is any difference between what you're asking for and what Radim was asking for... ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev
Re: [infinispan-dev] help with Infinispan OSGi
On Tue, Jan 7, 2014 at 9:14 PM, Brett Meyer brme...@redhat.com wrote: Apologies for the delay -- things have been nuts. Here's the route I've taken so far. I created OsgiClassLoader that searches available Bundles for classes and resources. A new (non-static) AggregateClassLoader replaces FileLookup, CL-related methods in Util, and parts of ReflectionUtil. AggregateClassLoader extends/overrides ClassLoader and searches over a prioritized list of user/app CL, OsgiCL, System CL, TCCL, etc. This is easily wired into GlobalConfiguration concepts. However, I'm not exactly sure how this will play into Configuration and Cache. Configuration#classLoader is deprecated. Is that in favor of CacheImpl#getClassLoader CacheImpl#with(ClassLoader)? Actually, I think the idea was to only set the classloader in the GlobalConfiguration and use a separate CacheManager for each deployment, removing the need for AdvancedCache.with(ClassLoader) as well. But I'm not sure if we'll ever get to that... AdvancedCache.with(ClassLoader) is also very limited in scope. The classloader can't be sent remotely so AFAICT it's only really useful for unmarshalling return values for get operations with storeAsBinary enabled. Can someone describe the scoping of CL concepts between GlobalConfiguration and the Configuration/Cache? Should both have their own instance of AggregateClassLoader? Cache's instance would put the user/app provided CL on the top of the queue? Sounds about right. For marshalling, ATM we use an EmbeddedContextClassResolver, which uses the InvocationContextContainer to obtain the classloader set by the user with AdvancedCache.with(ClassLoader) (unlike JBoss Marshalling's ContextClassResolver, which uses the TCCL). You will probably want to implement your own ClassResolver based on AggregateClassLoader, but using InvocationContextContainer as well. I'm not sure about other places where we need to load classes. While working on https://issues.jboss.org/browse/ISPN-3836 I did find some instances of ServiceLoader.load(Class), which use the TCCL implicitly, but I'm trying to change that to use the global/cache configuration's classloader. Hopefully that all makes sense... Brett Meyer Red Hat, Hibernate ORM - Original Message - From: Galder Zamarreño gal...@redhat.com To: Eric Wittmann eric.wittm...@redhat.com Cc: infinispan -Dev List infinispan-dev@lists.jboss.org, Brett Meyer brme...@redhat.com, Sanne Grinovero sa...@hibernate.org, Pete Muir pm...@redhat.com, Randall Hauch rha...@redhat.com, Steve Jacobs sjac...@redhat.com Sent: Wednesday, December 18, 2013 8:22:13 AM Subject: Re: [infinispan-dev] help with Infinispan OSGi On Dec 16, 2013, at 3:00 PM, Eric Wittmann eric.wittm...@redhat.com wrote: I wanted to add that in the Overlord group we're also looking into using ISPN in OSGi. Our directive is to get our projects running in Fuse 6.1. To that end I've been working on getting Overlord:S-RAMP up and running, which requires both ModeShape and ISPN. Additionally, Gary Brown uses ISPN in Overlord:RTGov and so will need to get it working directly (no ModeShape) in Fuse 6.1. I've made some progress on my end but have run into some of the same issues as Brett. An additional issue I hit was the use of Java's ServiceLoader for org.infinispan.configuration.parsing.ConfigurationParser. None of the parsers get loaded because ServiceLoader doesn't work particularly well in OSGi. We had this same issue in S-RAMP (we use ServiceLoader in a few places). I solved it by using the OSGi Service Registry when running in an OSGi container, but continuing to use ServiceLoader otherwise. ^ Can you add a JIRA for this so that we can abstract this away? I'm not sure how exactly we'd decide on the impl to use. By default it'd be SL impl. When used on OSGI though, an alternative service loading impl would need to be configured specifically by the user? Or would Infinispan itself detect that it's in OSGi and hence used the corresponding impl? I've no idea about OSGI. In any case - I was wondering if anyone thought it might be a good idea to create a git repo where we can create some test OSGi applications that use ISPN and can be deployed (e.g. to Fuse). This would be for testing purposes only - to shake out problems. Might be useful for collaboration? A quickstart on [1] would be the perfect place for something like that, i.e. fuse + infinispan or something like that. [1] http://www.jboss.org/jdf/quickstarts/get-started/ -Eric On 12/12/2013 12:55 PM, Brett Meyer wrote: I finally had a chance to start working with this, a bit, today. Here's what I've found so far. In general, I'm seeing 2 types of CL issues come up when testing w/ hibernate-infinispan: 1.) Reliance on the client bundle's CL. Take the following stack as an example: https://gist.github.com/brmeyer/c8aaa1157a4a951a462c. Hibernate's
Re: [infinispan-dev] Integration between HotRod and OGM
On Tue, Jan 21, 2014 at 4:07 PM, Mircea Markus mmar...@redhat.com wrote: Hi Emmanuel, Just had a good chat with Davide on this and one solution to overcome the shortcoming you mentioned in the above email would be to enhance the hotrod client to support grouping: RemoteClient.put(G g, K k, V v); //first param is the group RemoteClinet.getGroup(G g) : MapK,V; I think you'd also need RemoteClient.get(G g, K k), as in embedded mode the group is included in the key. It requires an enhancement on our local grouping API: EmbeddedCache.getGroup(G). This is something useful for us in a broader context, as it is the step needed to be able to deprecated AtomicMaps and get suggest them being replaced with Grouping. It would also require us to keep a SetK for each group, with the keys associated with that group. As such, I'm not sure it would be a lot easier to implement (correctly) than FineGrainedAtomicMap. This approach still has some limitations compared to the current embedded integration: - performance caused by the lack of transactions: this means increased TCP chattiness between the Hot Rod client and the server. - you'd have to handle atomicity, potentially by retrying an operation What do you think? On Dec 3, 2013, at 3:10 AM, Mircea Markus mmar...@redhat.com wrote: On Nov 19, 2013, at 10:22 AM, Emmanuel Bernard emman...@hibernate.org wrote: It's an interesting approach that would work fine-ish for entities assuming the Hot Rod client is multi threaded and assuming the client uses Future to parallelize the calls. The Java Hotrod client is both multithreaded and exposes an Async API. But it won't work for associations as we have them designed today. Each association - or more precisely the query results to go from an entity A1 to the list of entities B associated to it - is represented by an AtomicMap. Each entry in this map does correspond to an entry in the association. While we can guess the column names and build from the metadata the list of composed keys for entities, we cannot do the same for associations as the key is literally the (composite) id of the association and we cannot guess that most of the time (we can in very pathological cases). We could imagine that we list the association row keys in a special entry to work around that but this approach is just as problematic and is conceptually the same. The only solution would be to lock the whole association for each operation and I guess impose some versioning / optimistic lock. That is not a pattern that scales sufficiently from my experience. I think so too :-) That's the problem with interconnected data :) Emmanuel On Mon 2013-11-18 23:05, Mircea Markus wrote: Neither the grouping API nor the AtomicMap work over hotrod. Between the grouping API and AtomicMap, I think the one that would make more sense migrating is the grouping API. One way or the other, I think the hotrod protocol would require an enhancement - mind raising a JIRA for that? For now I guess you can sacrifice performance and always sending the entire object across on every update instead of only the deltas? On Nov 18, 2013, at 9:56 AM, Emmanuel Bernard emman...@hibernate.org wrote: Someone mentioned the grouping API as some sort of alternative to AtomicMap. Maybe we should use that? Note that if we don't have a fine-grained approach we will need to make sure we *copy* the complex data structure upon reads to mimic proper transaction isolation. On Tue 2013-11-12 15:14, Sanne Grinovero wrote: On 12 November 2013 14:54, Emmanuel Bernard emman...@hibernate.org wrote: On the transaction side, we can start without them. +1 on omitting transactions for now. And on the missing AtomicMaps, I hope the Infinispan will want to implement it? Would be good to eventually converge on similar featuresets on remote vs embedded APIs. I know the embedded version relies on batching/transactions, but I guess we could obtain a similar effect with some ad-hoc commands in Hot Rod? Sanne On Tue 2013-11-12 14:34, Davide D'Alto wrote: Hi, I'm working on the integration between HotRod and OGM. We already have a dialect for Inifinispan and I'm trying to follow the same logic. At the moment I'm having two problems: 1) In the Infinispan dialect we are using the AtomicMap and the AtomicMapLookup but this classes don't work with the RemoteCache. Is there an equivalent for HotRod? 2) As far as I know HotRod does not support transactions. I've found a link to a branch on Mircea repository: https://github.com/mmarkus/ops_over_hotrod/wiki/Usage-guide Is this something I could/should use? Any help is appreciated. Thanks, Davide ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev
Re: [infinispan-dev] Integration between HotRod and OGM
On 22 Jan 2014 16:10, Pedro Ruivo pe...@infinispan.org wrote: On 01/22/2014 01:58 PM, Dan Berindei wrote: It would also require us to keep a SetK for each group, with the keys associated with that group. As such, I'm not sure it would be a lot easier to implement (correctly) than FineGrainedAtomicMap. Dan, I didn't understand why do we need to keep a SetK. Can you elaborate? We'd need some way to keep track of the keys that are part of the group, iterating over the entire cache for every getGroup() call would be way too slow. ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev
Re: [infinispan-dev] Dropping AtomicMap/FineGrainedAtomicMap
I think it's way too early to discuss removing FineGrainedAtomicMap and AtomicMap, as long as we don't have a concrete alternative with similar properties. Cache.getGroup(groupName) is just a method name at this point, we don't have any idea how it will compare to AtomicMap/FineGrainedAtomicMap from a transaction isolation or performance perspective. BTW, do we really need the group name to be a String? A good way to prove that the grouping API is a proper replacement for the atomic maps would be to replace the usage of atomic maps in the Tree module with the grouping API. Unless we plan to drop the Tree module completely... Cheers Dan On Wed, Jan 22, 2014 at 2:45 PM, Mircea Markus mmar...@redhat.com wrote: On Jan 21, 2014, at 8:42 PM, Vladimir Blagojevic vblag...@redhat.com wrote: I agree with Erik here. Deltas are used in M/R and I've never detected any problems so far. On 1/21/2014, 1:39 PM, Erik Salter wrote: Please don't remove the Delta stuff. That's quite useful, especially for large collections. +1 to keep DeltaAware. Thanks for the feedbak Erik ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev Cheers, -- Mircea Markus Infinispan lead (www.infinispan.org) ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev
Re: [infinispan-dev] Integration between HotRod and OGM
On Mon, Jan 27, 2014 at 2:43 PM, Pedro Ruivo pe...@infinispan.org wrote: On 01/27/2014 12:26 PM, Emmanuel Bernard wrote: I'd be curious to see performance tests on Pedro's approach (ie walk through the entire data key set to find the matching elements of a given group). That might be fast enough but that looks quite scary compared to a single lookup. I would prefer to have a performance hit than a map of sets (group name = set of keys). I also think that keep this map synchronized with the keys in data container will not be easy... Sure, I would prefer the simpler implementation as well. But if changing an application to use groups instead of atomic maps will change the processing time of a request from 1ms to 1s, I'm pretty sure users will prefer to keep use the atomic maps :) Any doc explaining how FGAM is broken in transactions for curiosity. well, the map is not isolated, so you can see the updates from other transactions immediately (https://issues.jboss.org/browse/ISPN-3932) Do you know if AtomicMap is affected, too? It also does not work when you enable write skew check with optimistic transactions (we have a JIRA somewhere) I assume you mean https://issues.jboss.org/browse/ISPN-3939 ? This looks like it also affects AtomicMap, so the only workaround is to use pessimistic locking. On Mon 2014-01-27 11:54, Pedro Ruivo wrote: On 01/27/2014 09:52 AM, Sanne Grinovero wrote: On 27 January 2014 09:38, Pedro Ruivo pe...@infinispan.org wrote: On 01/27/2014 09:20 AM, Sanne Grinovero wrote: On 23 January 2014 18:03, Dan Berindei dan.berin...@gmail.com wrote: On 22 Jan 2014 16:10, Pedro Ruivo pe...@infinispan.org wrote: On 01/22/2014 01:58 PM, Dan Berindei wrote: It would also require us to keep a SetK for each group, with the keys associated with that group. As such, I'm not sure it would be a lot easier to implement (correctly) than FineGrainedAtomicMap. Dan, I didn't understand why do we need to keep a SetK. Can you elaborate? We'd need some way to keep track of the keys that are part of the group, iterating over the entire cache for every getGroup() call would be way too slow. Right, and load all entries from any CacheStore too :-/ IMO, I prefer to iterate over the data container and cache loader when it is needed than keep the SetK for each group. I think the memory will thank you Of course. I'm just highlighting how importand Dan's comment is, because we obviously don' t want to load everything from CacheStore. So, tracking which entries are part of the group is essential: failing this, I'm still skeptical about why the Grouping API is a better replacement than FGAM. I have one reason: FGAM does not work inside transactions... Sanne ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev
Re: [infinispan-dev] L1OnRehash Discussion
On Tue, Feb 4, 2014 at 10:07 AM, Galder Zamarreño gal...@redhat.com wrote: On 28 Jan 2014, at 15:29, William Burns mudokon...@gmail.com wrote: Hello everyone, I wanted to discuss what I would say as dubious benefit of L1OnRehash especially compared to the benefits it provide. L1OnRehash is used to retain a value by moving a previously owned value into the L1 when a rehash occurs and this node no longer owns that value Also any current L1 values are removed when a rehash occurs. Therefore it can only save a single remote get for only a few keys when a rehash occurs. This by itself is fine however L1OnRehash has many edge cases to guarantee consistency as can be seen from https://issues.jboss.org/browse/ISPN-3838. This can get quite complicated for a feature that gives marginal performance increases (especially given that this value may never have been read recently - at least normal L1 usage guarantees this). My first suggestion is instead to deprecate the L1OnRehash configuration option and to remove this logic. +1 +1 from me as well My second suggestion is a new implementation of L1OnRehash that is always enabled when L1 threshold is configured to 0. For those not familiar L1 threshold controls whether invalidations are broadcasted instead of individual messages. A value of 0 means to always broadcast. This would allow for some benefits that we can't currently do: 1. L1 values would never have to be invalidated on a rehash event (guarantee locality reads under rehash) 2. L1 requestors would not have to be tracked any longer However every write would be required to send an invalidation which could slow write performance in additional cases (since we currently only send invalidations when requestors are found). The difference would be lessened with udp, which is the transport I would assume someone would use when configuring L1 threshold to 0. Sounds good to me, but I think you could go even beyond this and maybe get rid of threshold configuration option too? If the transport is UDP and multicast is configured, invalidations are broadcasted (and apply the two benefits you mention). If UDP w/ unicast or TCP used, track invalidations and send them as unicasts. Do we really need to expose these configuration options to the user? I think the idea was that even with UDP, sending 2 unicasts and waiting for only 2 responses may be faster than sending a multicast and waiting for 10 responses. However, I'm not sure that's the case if we send 1 unicast invalidation from each owner instead of a single multicast invalidation from the primary owner/originator [1]. Maybe if each owner would return a list of requestors and the originator would do the invalidation at the end... One tangible benefit of having the setting is that we can run the test suite with TCP only, and still cover every path in L1Manager. If removed it completely, it would still be possible to change the toggle in L1ManagerImpl via reflection, but it would be a little hacky. What do you guys think? I am thinking that no one minds the removal of L1OnRehash that we have currently (if so let me know). I am quite curious what others think about the changes for L1 threshold value of 0, maybe this configuration value is never used? Since we don't give any guidance as to what a good threshold value would be, I doubt many people use it. My alternative proposal would be to replace the invalidationThreshold=-1|0|0 setting with a traceRequestors=true|false setting. 1. If traceRequestors == false, don't keep track of requestors, only send the invalidation from the originator, and enable l1OnRehash. This means we can keep the entries that are in L1 after a rehash as well. 2. If traceRequestors == true, track requestors, send unicast/multicast invalidations depending on the transport, and disable l1OnRehash. [1] https://issues.jboss.org/browse/ISPN-186 Cheers Dan ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev
Re: [infinispan-dev] [jboss-as7-dev] Module jars dissapearing leaving empty classes/ folders and errors
On Tue, Feb 4, 2014 at 2:36 PM, Galder Zamarreño gal...@redhat.com wrote: Narrowing down the list now, since this is a problem of how our CI is doing builds. These logs are retrieved from [1]. Dunno how our CI is configured but this is odd. Seems like the build is halt due to test failures, but it continues somehow? I mean, the jars are not being produced properly, but the build is not halting. We run the build with -fn (fail-never), so the build should never be halted because of a test failure. The configuration is here: http://ci.infinispan.org/admin/editRunType.html?id=buildType:bt8runnerId=RUNNER_1 It's about time we did the following: 1) Any test failures should halt the build there and then. IOW, do not continue the build at all. Will having 100 tests in one run and 2000 tests in another really help? 2) Any tests that fail randomly should be disabled. Let's go ahead and disable all the server tests then? ;) Cheers, [1] https://dl.dropboxusercontent.com/u/6148072/does-not-work.log On 04 Feb 2014, at 13:30, Galder Zamarreño gal...@redhat.com wrote: On 04 Feb 2014, at 10:38, Stuart Douglas stuart.w.doug...@gmail.com wrote: It is almost certainly something to do with this: module-def name=org.infinispan.server.rest src=${infinispan.server.modules.dir} maven-resource-with-classifier group=org.infinispan artifact=infinispan-server-rest classifier=classes / /module-def I guess sometimes the classes artefact is being attached as a reference to the classes directory, rather than a reference to a jar, which causes the issue. Here's a gist with a subset of the build log [1]. When it works fine, it's copying a jar, when it's not, it's copying an empty folder. However, this is not only happening for the org.infinispan.server.rest module, others show the same issue [2]. What seems to be a pattern is that it only happens with modules that are built by us, it's not happening for modules coming with the base AS/WF instance. I've traced back and this might be due to build failures that are not producing the right jars [3]. @Stuart, this is really our problem. Sorry for the inconvenience! [1] https://gist.github.com/galderz/b9286f385aad1316df51 [2] https://gist.github.com/galderz/9e6a9bd9b18b805db323 [3] https://gist.github.com/galderz/6ab662a1027cd96cbd8c Stuart On Tue, Feb 4, 2014 at 11:14 AM, Galder Zamarreño gal...@redhat.com wrote: On 04 Feb 2014, at 10:01, Stuart Douglas stuart.w.doug...@gmail.com wrote: On Tue, Feb 4, 2014 at 11:00 AM, Stuart Douglas stuart.w.doug...@gmail.com wrote: Yes, there is nothing in the server code that modified the modules directory. Well, except for the new patching stuff, but that is not really relevant here. The testsuite AS/WF builds are built out of the distribution build, which shows the same problem. The distribution we build uses the scripts we got from AS [1]. Do you see anything in there that could be causing this? We are using maven-antrun-plugin version 1.3, and take into account the lib.xml in [2]. Finally, do you have any suggestions on changes we could make to these files to further debug the issue? Thanks a lot for your help! [1] https://github.com/infinispan/infinispan/blob/master/server/integration/build/build.xml [2] https://github.com/infinispan/infinispan/blob/master/server/integration/build/lib.xml Stuart Stuart On Tue, Feb 4, 2014 at 10:56 AM, Galder Zamarreño gal...@redhat.com wrote: On 04 Feb 2014, at 09:37, Stuart Douglas stuart.w.doug...@gmail.com wrote: This looks like an issue with your environment. The modules directory is static. Wildfly does not contain any code that messes with it. I would say the culprit is probably something in either your build process or your test suite. Correction, this is happening with AS 7.2.0.Final (Wildfly 8 used somewhere else). I guess your answer still applies? Cheers, Stuart On Tue, Feb 4, 2014 at 10:21 AM, Galder Zamarreño gal...@redhat.com wrote: Hi all, We're having issues with our Infinispan Server integration tests, which run within Wildfly 8.0.0.Beta1 (as I'm typing I'm wondering if we should just upgrade it to see if this goes away...?). Quite often some of the runs fail with error message [1]. Having looked at the build environment when a run fails, you see this: -- $ ls modules/system/layers/base/org/infinispan/server/rest/main drwxrwxr-x 2 g staff68B Feb 3 18:41 classes (-- a directory??) -rw-r--r-- 1 g staff 1B Feb 3 18:41 classes.index -rw-r--r-- 1 g staff 2.1K Feb 3 18:41 module.xml $ ls modules/system/layers/base/org/infinispan/server/rest/main/classes drwxrwxr-x 2 g staff68B Feb 3 18:41 . drwxrwxr-x 5 g staff 170B Feb 3 18:41 .. $ more modules/system/layers/base/org/infinispan/server/rest/main/module.xml module