Re: [infinispan-dev] TimeService (ISPN-3069): CacheLoader API break

2013-05-14 Thread Dan Berindei
On Mon, May 13, 2013 at 8:44 PM, Sanne Grinovero sa...@infinispan.orgwrote:

 On 13 May 2013 18:32, Manik Surtani msurt...@redhat.com wrote:
 
  On 13 May 2013, at 16:25, Mircea Markus mmar...@redhat.com wrote:
 
 
  On 13 May 2013, at 15:05, Manik Surtani wrote:
 
  100% agree, most users will have to interact with AdvancedCache at
 some point - if only because of lock() and withFlags().
 
  I've seen quite a bit of end-user code that doesn't touch
 AdvancedCache.
  I'm on Dan's side here, I think it's pretty popular through the users
 and should be considered as public API. A note on the same lines, we also
 recommend all our users to use Flag.IGNORE_RETURN_VALUE, which again goes
 trough AdvancedCache.
 
  So you're saying getTimeService() should be in EmbeddedCacheManager?
  That's Dan's argument... I really don't think this should be accessible by
 end-user applications.

 +1 to keep it hidden, but SPI kind of API wouldn't be too bad.


If we want to keep it hidden, then I think it would be best to leave the
getTimeService() method only in ComponentRegistry/GlobalComponentRegistry
and remove it from the AdvancedCache interface.

We might want to remove it from the configuration, too.



 More importantly, I'd design it in such a way that different Caches
 could be using a different one. Doesn't have to be supported in the
 current code implementation, I just mean API-wise this should not be
 on a global component but on a Cache-specific one.


Any particular usage in mind for having a different time service in each
cache?

We definitely need a global component, because JGroupsTransport uses it, so
we'd have to support both.

Cheers
Dan
___
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev

Re: [infinispan-dev] commit not failing but transaction reported as in-doubt

2013-05-16 Thread Dan Berindei
Mircea, I think I'm missing something here. Why would the originator send a
TxCompletionNotificationCommand at all if the commit command was
asynchronous?

I don't think recovery should require the originator to send a
TxCompletionNotificationCommand. Our commit commands can't fail anyway, so
the only way for a transaction to become in-doubt would be if the cache
crashed before sending the command. (Or maybe if another resource's commit
phase failed.)

Cheers
Dan


On Wed, May 15, 2013 at 5:06 PM, Mircea Markus mmar...@redhat.com wrote:

 Thanks again for nice explanation Jonathan!
 @Pedro - seems like you're doing the right thing by encouraging people to
 be properly paranoid :-)
 Otherwise we'd leak tx logs (in infinispan parlance the PrepareCommands in
 the recovery cache) which would not be nice.

 On 15 May 2013, at 13:32, Jonathan Halliday jonathan.halli...@redhat.com
 wrote:

 
  No, it's out of scope for the TM, at least as far as the JTA/XA specs
  are concerned. The TM would not retain any txlog information to allow it
  to perform useful recovery anyhow.   Usually you just log it in the hope
  a human notices and sorts out the mess.  Of course properly paranoid
  humans don't use async commit in the first place.
 
  There has been various talk around making JTA TM.commit() support an
  async callback, such that the business logic thread can continue as soon
  as the prepare phase is successful, whilst still receiving a callback
  handler invocation if the commit phase subsequently fails. Extending
  that to the XA protocol would be nice, but won't happen as there is no
  upward (RM-TM) communication in XA - it's all driven top down.  So as
  you point out, adding the failed tx to the in-doubt list is the only way
  of signalling a problem. That's bad, since you'd also need a positive
  'it worked' callback in the proto to allow GC of the txlog, otherwise
  you have to throw away the log eagerly and can't then do anything useful
  with the subsequent error callback anyhow.
 
  Associated with that discussion is the expectation around the semantics
  of afterCompletion, which may mean 'after successful prepare' or 'after
  successful commit' in such case, the latter effectively removing the
  need for a new JTA callback api in the first place.
 
  If you don't need a callback at all, then there is already an async
  commit option in the TM config, it's just non-standard and marginally
  dangerous. It simply logs commit phase failures and hopes a human
 notices.
 
  Jonathan.
 
  On 05/15/2013 01:13 PM, Mircea Markus wrote:
  Hi Jonathan,
 
  In the scope of ISPN-3063 [1] we came to a problem we need some advice
 on :-)
 
  Would a transaction manager expect/handle this situation: for a
 transaction the commit is successful but at a further point the same
 transaction would be reported as in-doubt to the recovery process. In our
 case this can happen when we send the commit async and this might only fail
 after the commit is acknowledged to the TM.
 
  [1] https://issues.jboss.org/browse/ISPN-3063
 
  Cheers,
 
 
  --
  Registered in England and Wales under Company Registration No. 03798903
  Directors: Michael Cunningham (USA), Mark Hegarty (Ireland), Matt Parson
  (USA), Charlie Peters (USA)
  ___
  infinispan-dev mailing list
  infinispan-dev@lists.jboss.org
  https://lists.jboss.org/mailman/listinfo/infinispan-dev

 Cheers,
 --
 Mircea Markus
 Infinispan lead (www.infinispan.org)





 ___
 infinispan-dev mailing list
 infinispan-dev@lists.jboss.org
 https://lists.jboss.org/mailman/listinfo/infinispan-dev

___
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev

[infinispan-dev] AtomicHashMap concurrent modifications in pessimistic mode

2013-05-16 Thread Dan Berindei
Hi guys

I'm working on an intermittent failure in NodeMoveAPIPessimisticTest and I
think I've come across what I think is underspecified behaviour in
AtomicHashMap.

Say we have two transactions, tx1 and tx2, and they both work with the same
atomic map in a pessimistic cache:

1. tx1: am1 = AtomicMapLookup.get(cache, key)
2. tx2: am2 = AtomicMapLookup.get(cache, key)
3. tx1: am1.put(subkey1, value1) // locks the map
4. tx2: am2.get(subkey1) // returns null
5. tx1: commit // the map is now {subkey1=value1}
6. tx2: am2.put(subkey2, value2) // locks the map
7. tx2: commit // the map is now {subkey2=value2}

It's not clear to me from the AtomicMap/AtomicHashMap javadoc if this is ok
or if it's a bug...

Note that today the map is overwritten by tx2 even without step 4 (tx2:
am2.get(subkey1)). I'm pretty sure that's a bug and I fixed it locally by
using the FORCE_WRITE_LOCK in AtomicHashMapProxy.getDeltaMapForWrite.

However, when the Tree API moves a node it first checks for the existence
of the destination node, which means NodeMoveAPIPessimisticTest is still
failing. I'm not sure if I should fix that by forcing a write lock for all
AtomicHashMap reads, for all TreeCache reads, or only in TreeCache.move().

Cheers
Dan
___
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev

Re: [infinispan-dev] AtomicHashMap concurrent modifications in pessimistic mode

2013-05-17 Thread Dan Berindei
On Fri, May 17, 2013 at 1:59 PM, Mircea Markus mmar...@redhat.com wrote:


 On 17 May 2013, at 07:35, Dan Berindei dan.berin...@gmail.com wrote:

 
 
 
  On Thu, May 16, 2013 at 8:27 PM, Mircea Markus mmar...@redhat.com
 wrote:
 
  On 16 May 2013, at 15:04, Dan Berindei dan.berin...@gmail.com wrote:
 
   Hi guys
  
   I'm working on an intermittent failure in NodeMoveAPIPessimisticTest
 and I think I've come across what I think is underspecified behaviour in
 AtomicHashMap.
  
   Say we have two transactions, tx1 and tx2, and they both work with the
 same atomic map in a pessimistic cache:
  
   1. tx1: am1 = AtomicMapLookup.get(cache, key)
   2. tx2: am2 = AtomicMapLookup.get(cache, key)
   3. tx1: am1.put(subkey1, value1) // locks the map
   4. tx2: am2.get(subkey1) // returns null
   5. tx1: commit // the map is now {subkey1=value1}
   6. tx2: am2.put(subkey2, value2) // locks the map
   7. tx2: commit // the map is now {subkey2=value2}
  
   It's not clear to me from the AtomicMap/AtomicHashMap javadoc if this
 is ok or if it's a bug...
  as a user I find that a bit confusing so I think tx2 should merge stuff
 in the AtomiMap.
  Id be curious to hear Manik(author) and Sanne's (user) opinion on this.
 
 
  Merging should work with pessimistic locking, but I don't think we could
 do it with optimistic locking and write skew check enabled: we only do the
 write skew check for the whole map.
 if the WSC is enabled, then the 2nd transaction should fail: tx2 reads the
 version at 2. and at 7. The WSC should forbid it to commit, so I we
 shouldn't have this problem at all.


Right, the 2nd transaction must fail with WSC enabled, so we can't
implement merging.



  Would it be worth making this change if it meant making the behaviour of
 AtomicHashMap more complex?
 how more complex? If it's not a quick fix (2h) I'd say no as this is more
 of a nice to have/no user requires this functionality ATM.


The behaviour of AtomicMap will be more complex because we're adding a bit
of functionality that only works with pessimistic locking. Or maybe with
optimistic locking as well, only not when write skew check is enabled.

This is definitely not a 2h fix. As you can see, it's taking more than 2h
just to figure out what needs to change :)
What other options do we have? Leave it as it is and document the
limitation?



 
  On the other hand, I believe FineGrainedAtomicHashMap doesn't do
 separate write skew checks for each key in the map either, so users
 probably have to deal with this difference between pessimistic and
 optimistic locking already.
 For FGAM I think the WSC should be performed on a per FGAM's key basis,
 and not for the whole map.


I agree, but I think implementing fine-grained WSC will be tricky. I'll
create a feature request in JIRA.



 
  
   Note that today the map is overwritten by tx2 even without step 4
 (tx2: am2.get(subkey1)). I'm pretty sure that's a bug and I fixed it
 locally by using the FORCE_WRITE_LOCK in
 AtomicHashMapProxy.getDeltaMapForWrite.
  
   However, when the Tree API moves a node it first checks for the
 existence of the destination node, which means NodeMoveAPIPessimisticTest
 is still failing. I'm not sure if I should fix that by forcing a write lock
 for all AtomicHashMap reads, for all TreeCache reads, or only in
 TreeCache.move().
  
 
  I tried using the FORCE_WRITE_LOCKS flag for all TreeCache reads. This
 seems to work fine, and move() doesn't throw any exceptions in pessimistic
 mode any more. In optimistic mode, it doesn't change anything, and
 concurrent moves still fail with WriteSkewException. The only downside is
 the performance, having extra locks will certainly slow things down.
 
  ___
  infinispan-dev mailing list
  infinispan-dev@lists.jboss.org
  https://lists.jboss.org/mailman/listinfo/infinispan-dev

 Cheers,
 --
 Mircea Markus
 Infinispan lead (www.infinispan.org)





 ___
 infinispan-dev mailing list
 infinispan-dev@lists.jboss.org
 https://lists.jboss.org/mailman/listinfo/infinispan-dev

___
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev

Re: [infinispan-dev] AtomicHashMap concurrent modifications in pessimistic mode

2013-05-20 Thread Dan Berindei
On Mon, May 20, 2013 at 1:57 PM, Manik Surtani msurt...@redhat.com wrote:


 On 16 May 2013, at 15:04, Dan Berindei dan.berin...@gmail.com wrote:

 Hi guys

 I'm working on an intermittent failure in NodeMoveAPIPessimisticTest and I
 think I've come across what I think is underspecified behaviour in
 AtomicHashMap.

 Say we have two transactions, tx1 and tx2, and they both work with the
 same atomic map in a pessimistic cache:

 1. tx1: am1 = AtomicMapLookup.get(cache, key)
 2. tx2: am2 = AtomicMapLookup.get(cache, key)
 3. tx1: am1.put(subkey1, value1) // locks the map
 4. tx2: am2.get(subkey1) // returns null
 5. tx1: commit // the map is now {subkey1=value1}
 6. tx2: am2.put(subkey2, value2) // locks the map
 7. tx2: commit // the map is now {subkey2=value2}

 It's not clear to me from the AtomicMap/AtomicHashMap javadoc if this is
 ok or if it's a bug...


 If optimistic, step 7 should fail with a write skew check.  If
 pessimistic, step 2 would *usually* block assuming that another thread is
 updating the map, but since neither tx1 or tx2 has started updating the map
 yet, neither has a write lock on the map.  So that succeeds.  I'm not sure
 if this is any different from not using an atomic map:

 1.  tx1: cache.get(k, v); // reads into tx context
 2.  tx2: cache.get(k, v);
 3.  tx1: cache.put(k, v + 1 );
 4.  tx1: commit
 5.  tx2: cache.put(k, v + 1 );
 6.  tx2: commit

 here as well, if using optimistic, step 6 will fail with a WSC but if
 pessimistic this will work (since tx2 only requested a write lock after tx1
 committed/released its write lock).


The difference is that in your scenario, you see in the code that tx2
writes to key k, so it's not surprising to find that tx2 overwrote the
value written by tx1. But it would be surprising if tx2 also overwrote an
unrelated key k2.

With an atomic map, you only see in the code map.put(subkey2, value). Tx2
doesn't touch subkey1, so it's not that obvious that it should remove it.
It is clear to me why it behaves the way it does now, after reading the
implementation, but I don't think it's what most users would expect. (The
proof, I guess, is in the current implementation of TreeCache.move()).

With a FineGrainedAtomicMap in optimistic mode, it's not obvious why tx1
writing to subkey1 should cause tx2's write to subkey2 fail, either (see
https://issues.jboss.org/browse/ISPN-3123).


 Note that today the map is overwritten by tx2 even without step 4 (tx2:
 am2.get(subkey1)). I'm pretty sure that's a bug and I fixed it locally by
 using the FORCE_WRITE_LOCK in AtomicHashMapProxy.getDeltaMapForWrite.

 However, when the Tree API moves a node it first checks for the existence
 of the destination node, which means NodeMoveAPIPessimisticTest is still
 failing. I'm not sure if I should fix that by forcing a write lock for all
 AtomicHashMap reads, for all TreeCache reads, or only in TreeCache.move().


 I think only in TreeCache.move()


I tend to disagree. I think it's way too easy to introduce a read of a
node's structure in a transaction and start losing data without knowing it.
___
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev

Re: [infinispan-dev] configuring fetchInMemoryState for topology caches

2013-05-21 Thread Dan Berindei
I wouldn't want to deprecate CCL, I think it definitely has a purpose - at
least in invalidation mode.

Even in replication mode, having a lazy alternative to state transfer may
be useful. Maybe not for the topology cache, but it might make sense for
large caches.


On Tue, May 21, 2013 at 4:36 PM, Mircea Markus mmar...@redhat.com wrote:


 On 21 May 2013, at 08:30, Tristan Tarrant ttarr...@redhat.com wrote:

  On 05/21/2013 08:58 AM, Galder Zamarreño wrote:
  Shouldn't it be enabled by default/enforced?
  ^ Either that, or the cluster cache loader are used, both of which
 serve the same purpouse.
 
  I think what Mircea is getting at, is that there is an intention to
  deprecate / remove the CCL. I think that we can do that in 6.0 (with the
  CacheStore redesign) and remove all potential users of CCL (including
  the lazy topology transfer).
 Mind reader :-)

 Cheers,
 --
 Mircea Markus
 Infinispan lead (www.infinispan.org)





 ___
 infinispan-dev mailing list
 infinispan-dev@lists.jboss.org
 https://lists.jboss.org/mailman/listinfo/infinispan-dev

___
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev

Re: [infinispan-dev] How to get GrouperT#computeGroup(key) return value to map to physical Node?

2013-05-21 Thread Dan Berindei
I guess the grouper could use a KeyAffinityService (or something similar)
to generate a key local to each node and return that instead of 0 or 1.
However, you won't have any guarantee that the keys will stay on the same
node if the cache topology changes (e.g. another node joins).

It used to be that the actual address of the node would (almost) always be
mapped to that node, but that was just an implementation detail, and it
never worked 100%.

Ben, do you think being able to pin a key permanently to a node would be
useful?

Cheers
Dan




On Tue, May 21, 2013 at 6:31 PM, Galder Zamarreño gal...@redhat.com wrote:

 Hmmm, one hacky way might be to hold on to the grouper instance passed in
 via configuration, and once the cache manager has been started, set it in
 the grouper and use to query either the address, or the physical address
 (via EmbeddedCacheManager.getTransport…)?

 On May 14, 2013, at 6:34 PM, cotton-ben ben.cot...@alumni.rutgers.edu
 wrote:

  I am playing with the Infinispan 5.3 quick-start package to exercise my
 usage
  of the Grouping API.  As we know the quick start package is made up of
  AbstractNode.java, Node0.java, Node1.java and Node2.java (plus a
  util/listener).
 
  My ambition is to demonstrate
 
  1.  that any CacheK,V.put(DIMENSION.xxx,v) will flow through my
 Grouper
  and pin that key in the Cache at @Node=0.
  2.  that any CacheK,V.put(POSITION.xxx,v) will flow through my
 Grouper
  and pin that key in the Cache at either @Node=1 or @Node=2 .
 
  Here is my AbstractNode#createCacheManagerProgramatically() config:
 
  private static EmbeddedCacheManager createCacheManagerProgramatically() {
return new DefaultCacheManager(
 
 GlobalConfigurationBuilder.defaultClusteredBuilder()
 
 .transport().addProperty(configurationFile, jgroups.xml)
.build(),
new
 org.infinispan.configuration.cache.ConfigurationBuilder()
.clustering()
.cacheMode(CacheMode.DIST_SYNC)
 
 .hash().numOwners(1).groups().enabled(Boolean.TRUE)
.addGrouper(new
  com.jpmorgan.ct.lri.cs.ae.test.DimensionGrouperString())
.build()
);
}
 
  And here is my GrouperT implementation
 
  public class DimensionGrouperT implements GrouperString {
public String computeGroup(String key, String group) {
if (key.indexOf(DIMENSION.)==0) {
String groupPinned = 0;
System.out.println(Pinning Key=[+key+]
 @Node=[+groupPinned+]);
  //node= exactly 0
return groupPinned;
} else if (key.indexOf(POSITION.)==0) {
String groupPinned = +(1+
 (int)(Math.random()*2));
System.out.println(Pinning Key=[+key+]
 @Node=[+groupPinned+]);
  //node= {1,2}
return groupPinned;
} else {
return null;
}
}
 
public ClassString getKeyType() {
return String.class;
}
 
  }
 
  The logic is working correctly ... i.e. when from Node2.java I call
 
  for (int i = 0; i  10; i++) {
cacheDP.put(DIMENSION. + i,
 DimensionValue. + i);
cacheDP.put(POSITION. + i,
 PositionValue. + i);
 
}
 
  My DimensionGrouper is returning 0 from computeGroup().  My question is
  how in Infinispan can I map the computeGroup() return value to a physical
  Node?  I.e.  How can I make it so that when computeGroup() returns 0, I
  will *only* add that K,V entry to the Cache @Node 0?
 
 
 
 
 
  --
  View this message in context:
 http://infinispan-developer-list.980875.n3.nabble.com/Re-How-to-get-Grouper-T-computeGroup-key-return-value-to-map-to-physical-Node-tp4027134.html
  Sent from the Infinispan Developer List mailing list archive at
 Nabble.com.
  ___
  infinispan-dev mailing list
  infinispan-dev@lists.jboss.org
  https://lists.jboss.org/mailman/listinfo/infinispan-dev


 --
 Galder Zamarreño
 gal...@redhat.com
 twitter.com/galderz

 Project Lead, Escalante
 http://escalante.io

 Engineer, Infinispan
 http://infinispan.org


 ___
 infinispan-dev mailing list
 infinispan-dev@lists.jboss.org
 https://lists.jboss.org/mailman/listinfo/infinispan-dev

___
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev

Re: [infinispan-dev] Baselining on Java 7

2013-05-23 Thread Dan Berindei
Oracle will support Java 7 at least until March 2015 [1]. And it's very
likely that some people will keep using 7 after that, just like they are
using 6 now. So unless we want to support Java 6 until 2016, I don't think
we can skip 7.

[1] http://www.oracle.com/technetwork/java/eol-135779.html


On Thu, May 23, 2013 at 4:51 PM, Bela Ban b...@redhat.com wrote:

 If the majority of users go directly to 8 from 6, then yes why not. But
 if that's not the case, then we can't do it, either...

 On 5/23/13 3:07 PM, Galder Zamarreño wrote:
  What about skipping 7 and waiting for 8?
 
  On May 23, 2013, at 12:38 PM, Manik Surtani msurt...@redhat.com wrote:
 
  Yup, it needs to happen sometime.  But the question is, when?
 
  Is there anyone out there who desperately needs Infinispan to work on
 Java 6?
 
  Cheers
  Manik

 --
 Bela Ban, JGroups lead (http://www.jgroups.org)
 ___
 infinispan-dev mailing list
 infinispan-dev@lists.jboss.org
 https://lists.jboss.org/mailman/listinfo/infinispan-dev

___
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev

Re: [infinispan-dev] Baselining on Java 7

2013-05-24 Thread Dan Berindei
* ForkJoinPool (assuming we can get AS to inject one)
* AutoCloseable
* Throwable.addSuppressed()



On Fri, May 24, 2013 at 2:01 AM, Mircea Markus mmar...@redhat.com wrote:



 Sent from my iPhone

 On 23 May 2013, at 12:06, Bela Ban b...@redhat.com wrote:

 
 
  On 5/23/13 12:38 PM, Manik Surtani wrote:
  Yup, it needs to happen sometime.  But the question is, when?
 
  Is there anyone out there who desperately needs Infinispan to work on
 Java 6?
 
  I got a lot of pushback when I moved the required JDK in JGroups to 1.5
  (people were still using 1.4), and the same for moving from 1.5 to 1.6,
  so I anticipate this won't be different for moving to 1.7 as baseline.
 
  I personally would not mind waiting for JGroups 5 to move to JDK 7, as
  this gives users a bit more time and I won't require JDK 7 before
  JGroups 5 anyway, as the NIO2 stuff won't be in JGroups 4, only in 5.
 +1 to stick to 6 at leat till jgroups 5.
 Besides NIO2 stuff, is there anything from 7 we want badly?
 
  But if you guys need to move to JDK 7 in Infinispan 6 already, then
  that's fine with me, too.
 
  I guess the AS has some requirements, too, so if they decide to move to
  JDK 7, then we'll have to make that move too
 
  --
  Bela Ban, JGroups lead (http://www.jgroups.org)
  ___
  infinispan-dev mailing list
  infinispan-dev@lists.jboss.org
  https://lists.jboss.org/mailman/listinfo/infinispan-dev

 ___
 infinispan-dev mailing list
 infinispan-dev@lists.jboss.org
 https://lists.jboss.org/mailman/listinfo/infinispan-dev

___
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev

Re: [infinispan-dev] CommandAwareRpcDispatcher desired logic

2013-05-29 Thread Dan Berindei
On Wed, May 29, 2013 at 4:50 PM, Pedro Ruivo pe...@infinispan.org wrote:

 Hi all,

 //related to https://issues.jboss.org/browse/ISPN-3100

 In here [1] I've added three simple tests to the CARD logic with the
 ResponseFilter.

 #1 testResponses() = tests if all the responses are returned. This test
 is failing because the FutureCollactor only returns the last response.

 *question* is this the desired behaviour or is it a bug (and this test
 is incorrect)?


I think it was intentional, because it only really had to work with
ClusteredGetResponseValidityFilter at the time. But I'm all for changing it
to return all the (validated) responses.


#2 testTimeout() = tests the timeout when a ResponseFilter is used. It
 sets the timeout to 1000 milliseconds and sends a command that sleeps
 for 5000 millisecond before reply. This test is failing because no
 TimeoutException was thrown (as expected). That's why I've created
 ISPN-3100.

 *question* is the desired behaviour to wait infinitely? If yes, then
 ISPN-3100 is invalid.


I'm pretty sure that's not intentional. FutureCollator.futureDone() should
be called when a UnicastRequest finishes with a TimeoutException and it
should decrement expectedResponses. When all the UnicastRequests are done,
it should notify() the caller thread, which should throw a TimeoutException.



 #3 testTimeoutWithoutFilter() = tests the timeout without a filter.
 It's working correctly!

 Thanks!

 Cheers,
 Pedro

 [1]

 https://github.com/pruivo/infinispan/commit/1fc26d740b773bf3864298615f61ca57b964d9cf#diff-0
 ___
 infinispan-dev mailing list
 infinispan-dev@lists.jboss.org
 https://lists.jboss.org/mailman/listinfo/infinispan-dev

___
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev

Re: [infinispan-dev] Suppressing state transfer via JMX

2013-05-31 Thread Dan Berindei
If we only want to deal with full cluster shutdown, then I think stopping
all application requests, calling Cache.clear() on one node, and then
shutting down all the nodes should be simpler. On start, assuming no cache
store, the caches will start empty, so starting all the nodes at once and
only allowing application requests when they've all joined should also work
without extra work.

If we only want to stop a part of the cluster, suppressing rebalancing
would be better, because we wouldn't lose all the data. But we'd still lose
the keys whose owners are all among the nodes we want to stop. I've
discussed this with Adrian, and we think if we want to stop a part of the
cluster without losing data we need a JMX operation on the coordinator that
will atomically remove a set of nodes from the CH. After the operation
completes, the user will know it's safe to stop those nodes without losing
data.

When it comes to starting a part of the cluster, a pause rebalancing
option would probably be better - but again, on the coordinator, not on
each joining node. And clearly, if more than numOwner nodes leave while
rebalancing is suspended, data will be lost.

Cheers
Dan



On Fri, May 31, 2013 at 12:17 PM, Manik Surtani msurt...@redhat.com wrote:

 Guys

 We've discussed ISPN-3140 elsewhere before, I'm brining it to this forum
 now.

 https://issues.jboss.org/browse/ISPN-3140

 Any thoughts/concerns?  Particularly looking to hear from Dan or Adrian
 about viability, complexity, ease of implementation.

 Thanks
 Manik
 --
 Manik Surtani
 ma...@jboss.org
 twitter.com/maniksurtani

 Platform Architect, JBoss Data Grid
 http://red.ht/data-grid


 ___
 infinispan-dev mailing list
 infinispan-dev@lists.jboss.org
 https://lists.jboss.org/mailman/listinfo/infinispan-dev

___
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev

[infinispan-dev] Retrieval operations with the IGNORE_RETURN_VALUES flag

2013-06-03 Thread Dan Berindei
Hi guys

CacheLoaderInterceptor and DistributionInterceptor both honour the
IGNORE_RETURN_VALUES flag for get commands, but I think it would be more
useful if they ignored it - just like they ignore it for conditional
commands.

That would make it possible for users to only keep a reference to a
cache.getAdvancedCache().withFlags(IGNORE_RETURN_VALUES) and use it for
both read and write operations.

What do you think?

Cheers
Dan
___
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev

Re: [infinispan-dev] Retrieval operations with the IGNORE_RETURN_VALUES flag

2013-06-03 Thread Dan Berindei
Sanne, I'm only talking about get operations. I was thinking that if you
call cache.get(key), you want the value of that key, regardless of where it
is stored...

Obviously, write operations would still behave as they do now.


On Mon, Jun 3, 2013 at 12:59 PM, Sanne Grinovero sa...@infinispan.orgwrote:

 Hi Dan,
 I'm not sure I understood this. How can I prevent it to return values
 if you have the flag ignored? Note that in some cases it makes a huge
 performance difference.

 Sanne

 On 3 June 2013 10:52, Dan Berindei dan.berin...@gmail.com wrote:
  Hi guys
 
  CacheLoaderInterceptor and DistributionInterceptor both honour the
  IGNORE_RETURN_VALUES flag for get commands, but I think it would be more
  useful if they ignored it - just like they ignore it for conditional
  commands.
 
  That would make it possible for users to only keep a reference to a
  cache.getAdvancedCache().withFlags(IGNORE_RETURN_VALUES) and use it for
 both
  read and write operations.
 
  What do you think?
 
  Cheers
  Dan
 
 
  ___
  infinispan-dev mailing list
  infinispan-dev@lists.jboss.org
  https://lists.jboss.org/mailman/listinfo/infinispan-dev
 ___
 infinispan-dev mailing list
 infinispan-dev@lists.jboss.org
 https://lists.jboss.org/mailman/listinfo/infinispan-dev

___
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev

Re: [infinispan-dev] Retrieval operations with the IGNORE_RETURN_VALUES flag

2013-06-03 Thread Dan Berindei
I agree, eventually we should have separate getAndXxx methods like JSR-107.
The Jokre is cool, but running with an instrumentation agent is never going
to be ok for everyone.

I was thinking about this in the context of AtomicHashMap. It uses
withFlags(SKIP_REMOTE_LOOKUP, DELTA_WRITE, FORCE_WRITE_LOCK) for write
operations, but it still has to keep a reference to the original cache for
read operations. Of course, after more digging, I don't think the
SKIP_REMOTE_LOOKUP flag is that useful for writes either (since we already
have a copy of the map in the invocation context at that point). But it
seemed like an interesting idea in the general case.



On Mon, Jun 3, 2013 at 1:15 PM, Sanne Grinovero sa...@infinispan.orgwrote:

 ha got it, good point.
 but I'm not persuaded: doesn't it get even more confusing for users? Imho
 it would be more helpful to throw an exception.

 these flags are confusing and ideally we should evolve the API, as the JSR
 did, or push on The Jokre.
 On 3 Jun 2013 11:08, Dan Berindei dan.berin...@gmail.com wrote:

 Sanne, I'm only talking about get operations. I was thinking that if you
 call cache.get(key), you want the value of that key, regardless of where it
 is stored...

 Obviously, write operations would still behave as they do now.


 On Mon, Jun 3, 2013 at 12:59 PM, Sanne Grinovero sa...@infinispan.orgwrote:

 Hi Dan,
 I'm not sure I understood this. How can I prevent it to return values
 if you have the flag ignored? Note that in some cases it makes a huge
 performance difference.

 Sanne

 On 3 June 2013 10:52, Dan Berindei dan.berin...@gmail.com wrote:
  Hi guys
 
  CacheLoaderInterceptor and DistributionInterceptor both honour the
  IGNORE_RETURN_VALUES flag for get commands, but I think it would be
 more
  useful if they ignored it - just like they ignore it for conditional
  commands.
 
  That would make it possible for users to only keep a reference to a
  cache.getAdvancedCache().withFlags(IGNORE_RETURN_VALUES) and use it
 for both
  read and write operations.
 
  What do you think?
 
  Cheers
  Dan
 
 
  ___
  infinispan-dev mailing list
  infinispan-dev@lists.jboss.org
  https://lists.jboss.org/mailman/listinfo/infinispan-dev
 ___
 infinispan-dev mailing list
 infinispan-dev@lists.jboss.org
 https://lists.jboss.org/mailman/listinfo/infinispan-dev



 ___
 infinispan-dev mailing list
 infinispan-dev@lists.jboss.org
 https://lists.jboss.org/mailman/listinfo/infinispan-dev


 ___
 infinispan-dev mailing list
 infinispan-dev@lists.jboss.org
 https://lists.jboss.org/mailman/listinfo/infinispan-dev

___
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev

Re: [infinispan-dev] Retrieval operations with the IGNORE_RETURN_VALUES flag

2013-06-03 Thread Dan Berindei
Fair point... ok, let's leave it as it is now.


On Mon, Jun 3, 2013 at 5:23 PM, Galder Zamarreño gal...@redhat.com wrote:



 On Jun 3, 2013, at 11:52 AM, Dan Berindei dan.berin...@gmail.com wrote:

  Hi guys
 
  CacheLoaderInterceptor and DistributionInterceptor both honour the
 IGNORE_RETURN_VALUES flag for get commands, but I think it would be more
 useful if they ignored it - just like they ignore it for conditional
 commands.
 
  That would make it possible for users to only keep a reference to a
 cache.getAdvancedCache().withFlags(IGNORE_RETURN_VALUES) and use it for
 both read and write operations.
 
  What do you think?

 If I was to take the role of a colleague of the person who's written the
 Infinispan code, it'd be very confused to see a cache reference created
 with IGNORE_RETURN_VALUES being used for a get() operation… I can see
 myself thinking: Why on earth do you call get with IGNORE_RETURN_VALUES?

 
  Cheers
  Dan
 
  ___
  infinispan-dev mailing list
  infinispan-dev@lists.jboss.org
  https://lists.jboss.org/mailman/listinfo/infinispan-dev


 --
 Galder Zamarreño
 gal...@redhat.com
 twitter.com/galderz

 Project Lead, Escalante
 http://escalante.io

 Engineer, Infinispan
 http://infinispan.org


 ___
 infinispan-dev mailing list
 infinispan-dev@lists.jboss.org
 https://lists.jboss.org/mailman/listinfo/infinispan-dev

___
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev

Re: [infinispan-dev] mongodb cache store added in Infinispan 5.3 - curtesy of Guillaume Scheibel

2013-06-03 Thread Dan Berindei
There's a link 'Login as guest' at the bottom of the login form at
ci.infinispan.org, you can use that and you'll see all the builds. You can
also create a user for yourself.


On Mon, Jun 3, 2013 at 6:01 PM, Guillaume SCHEIBEL 
guillaume.schei...@gmail.com wrote:

 I don't think I have access to the CI platform. If I do, with kind of
 credentials do I have to use ?

 Guillaume


 2013/6/3 Galder Zamarreño gal...@redhat.com

 Btw, seems like since the introduction of this, 21 failures have gone in
 coming from the MongoDBCacheStoreTest [1].

 Guillaume, can you please look into this?

 Cheers,

 [1]
 http://ci.infinispan.org/viewLog.html?buildId=1650tab=buildResultsDivbuildTypeId=bt2

 On Jun 3, 2013, at 4:02 PM, Guillaume SCHEIBEL 
 guillaume.schei...@gmail.com wrote:

  Thanks guys, I'm glad you like it :)
  BTW, if you would like to have other implementation, ... (I like doing
 this translation work)
 
  Guillaume
 
 
  2013/6/3 Galder Zamarreño gal...@redhat.com
  Indeed great stuff Guillaume!! :)
 
  On May 31, 2013, at 2:18 PM, Sanne Grinovero sa...@infinispan.org
 wrote:
 
   Looks great, thanks Guillaume!
  
   On 31 May 2013 10:27, Guillaume SCHEIBEL 
 guillaume.schei...@gmail.com wrote:
   I have updated the main page but I haven't found how to delete the
 child
   page.
   The blog post has been updated you can publish whenever you want to.
  
   Guillaume
  
  
   2013/5/30 Mircea Markus mmar...@redhat.com
  
   Also better place the content in the parent document directly
 instead of
   making it a separate child document (see the rest of cache stores):
   https://docs.jboss.org/author/display/ISPN/Cache+Loaders+and+Stores
  
   Thanks!
  
   On 30 May 2013, at 21:49, Mircea Markus mmar...@redhat.com wrote:
  
   Thanks, looks good!
   I think it would be nice to add a code snippet with configuring the
   store using the fluent API as well.
   And release the blog :-)
  
   On 30 May 2013, at 17:05, Guillaume SCHEIBEL
   guillaume.schei...@gmail.com wrote:
  
   Hello,
  
   I have wrote a small documentation page here:
   https://docs.jboss.org/author/display/ISPN/MongoDB+CacheStore
  
   WDYT ?
  
   about the blog post a gist (script tag) is already there but it's
 not
   visible in the editor view.
  
  
   Guillaume
  
  
   2013/5/27 Mircea Markus mmar...@redhat.com
  
   On 22 May 2013, at 11:01, Guillaume SCHEIBEL
   guillaume.schei...@gmail.com wrote:
  
   Mircea,
  
   I have just created a quick blog post titled Using MongoDB as a
 cache
   store. May you have a look at it ? If it suits you, the
 documentation will
   follow soon.
  
   Thank you Guillaume. Looks good to me once the link to
 documentation
   and the sample configuration snippet is in.
  
  
   Cheers
   Guillaume
  
  
   2013/5/22 Randall Hauch rha...@redhat.com
   There is a way to download (via Maven) and run MongoDB locally
 from
   within Java, via Flapdoodle's Embedded MongoDB:
  
   https://github.com/flapdoodle-oss/embedmongo.flapdoodle.de
  
   ModeShape uses this in our builds in support of our storage of
 binary
   values inside MongoDB. The relevant Maven POM parts and JUnit
 test case are:
  
  
  
 https://github.com/ModeShape/modeshape/blob/master/modeshape-jcr/pom.xml#L147
  
  
 https://github.com/ModeShape/modeshape/blob/master/modeshape-jcr/src/test/java/org/modeshape/jcr/value/binary/MongodbBinaryStoreTest.java
  
  
  
   On May 21, 2013, at 1:04 PM, Mircea Markus mmar...@redhat.com
 wrote:
  
   Thanks to Guillaume Scheibel, Infinispan now has an mongodb
 cache
   store that will be shipped as part of 5.3.0.CR1.
  
   The test for the mongodb cache store are not run by default. In
 order
   to be able to run them you need to:
   - install mongodb locally
   - run mongodb profile
   The cache store was add in the CI build on all 5.3 configs
 (together
   with a running instance of mongodb).
  
   Guillaume, would you mind adding a blog entry describing this
 new
   functionality? (I've invited you to be a member of the
   infinispan.blogpsot.com team.)
   Also can you please update the user doc:
  
 https://docs.jboss.org/author/display/ISPN/Cache+Loaders+and+Stores
  
  
   Cheers,
   --
   Mircea Markus
   Infinispan lead (www.infinispan.org)
  
  
  
  
  
   ___
   infinispan-dev mailing list
   infinispan-dev@lists.jboss.org
   https://lists.jboss.org/mailman/listinfo/infinispan-dev
  
  
   ___
   infinispan-dev mailing list
   infinispan-dev@lists.jboss.org
   https://lists.jboss.org/mailman/listinfo/infinispan-dev
  
   ___
   infinispan-dev mailing list
   infinispan-dev@lists.jboss.org
   https://lists.jboss.org/mailman/listinfo/infinispan-dev
  
   Cheers,
   --
   Mircea Markus
   Infinispan lead (www.infinispan.org)
  
  
  
  
  
   ___
   infinispan-dev mailing list
   infinispan-dev@lists.jboss.org
   

Re: [infinispan-dev] Retrieval operations with the IGNORE_RETURN_VALUES flag

2013-06-04 Thread Dan Berindei
On Tue, Jun 4, 2013 at 12:27 PM, Mircea Markus mmar...@redhat.com wrote:


 On 3 Jun 2013, at 19:01, Dan Berindei dan.berin...@gmail.com wrote:

  Fair point... ok, let's leave it as it is now.
 
 
  On Mon, Jun 3, 2013 at 5:23 PM, Galder Zamarreño gal...@redhat.com
 wrote:
 
 
  On Jun 3, 2013, at 11:52 AM, Dan Berindei dan.berin...@gmail.com
 wrote:
 
  Hi guys
 
  CacheLoaderInterceptor and DistributionInterceptor both honour the
 IGNORE_RETURN_VALUES flag for get commands, but I think it would be more
 useful if they ignored it - just like they ignore it for conditional
 commands.
 
  That would make it possible for users to only keep a reference to a
 cache.getAdvancedCache().withFlags(IGNORE_RETURN_VALUES) and use it for
 both read and write operations.
 
  What do you think?
 
  If I was to take the role of a colleague of the person who's written the
 Infinispan code, it'd be very confused to see a cache reference created
 with IGNORE_RETURN_VALUES being used for a get() operation… I can see
 myself thinking: Why on earth do you call get with IGNORE_RETURN_VALUES?

 Isn't Galder's point not to allow invoking get with IGNORE_RETURN_VALUES?
 As both of you pointed out, Get + IGNORE_RETURN_VALUES doesn't make any
 sense :-)


You'd think conditional operations with IGNORE_RETURN_VALUES don't make
sense either, yet we have a special case to handle those as if the flag
wasn't present :)
___
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev

Re: [infinispan-dev] Using infinispan as quorum-based nosql

2013-06-05 Thread Dan Berindei
On Mon, Jun 3, 2013 at 4:23 PM, vitalii.tymchys...@ubs.com wrote:

 Hello.

 Thanks for your information. I will subscribe and vote for the issues
 noted.
 In the meantime I've implemented hacky JgroupsTransport that downgrades
 all (but CacheViewControlCommand and StateTransferControlCommand)
 SYNCHRONOUS invokeRemotely calls to SYNCHRONOUS_IGNORE_LEAVERS and checks
 if required number of answers was received with a filter (I've tried to use
 original invokeRemotely return value but it often returns some strange
 value, like empty map). It seems to do the trick for me. But I am still not
 sure if this has any side effects.


Indeed, I started working on a solution, but I over-engineered it and then
I got side-tracked with other stuff. Sorry about that.

The problem with using SYNCHRONOUS_IGNORE_LEAVERS everywhere, as I found
out, is that you don't want to ignore the primary owner of a key leaving
during a prepare/lock command (or the coordinator, in REPL mode prior to
5.3.0.CR1/ISPN-2772). If that happens, you have to retry on the new primary
owner, otherwise you can't know if the prepare command has locked the key
or not.

A similar problem appears in non-transactional caches with
supportsConcurrentUpdates=true: there the primary owner can ignore any of
the backup owners leaving, but the originator can't ignore the primary
owner leaving.



 For now I can see merge problem in my test: different values are picked
 during merge. I am going to dig a little deeper and follow up. But it's
 already a little strange for me, since the test algorithm is:
 1)Assign old value to full cluster (it's REPL_SYNC mode)
 2)Block coordinator
 3)Writer new value to one of two remaining nodes. It's syncrhonized to
 second remaining node
 4)Unblock coordinator
 5)Wait (I could not find a good way to wait for state transfer but wait in
 this case).
 6)Check the value on coordinator

 And in my test I am randomly getting old or new in assert. I am now
 going to check why. May be I will need to reinitialize smaller cluster
 part to ensure data is taken from the quorum part of the cluster.


We don't handle merges properly. See
https://issues.jboss.org/browse/ISPN-263 and the discussion at
http://markmail.org/message/meyczotzobuva7js

What happens right now is that after a merge, all the caches are assumed to
have up-to-date data, so there is no state transfer. We had several ideas
floating around on how we could force the smaller partition to receive data
from the quorum partition, but I think with the public API your best option
is to stop all the caches in the smaller partition after the split and
start them back up after the merge.

Cheers
Dan



 Best regards, Vitalii Tymchyshyn

 -Original Message-
 From: infinispan-dev-boun...@lists.jboss.org [mailto:
 infinispan-dev-boun...@lists.jboss.org] On Behalf Of Galder Zamarreno
 Sent: Monday, June 03, 2013 9:04 AM
 To: infinispan -Dev List
 Subject: Re: [infinispan-dev] Using infinispan as quorum-based nosql


 On May 30, 2013, at 5:10 PM, vitalii.tymchys...@ubs.com wrote:

  Hello.
 
  We are going to use Infinispan in our project as NoSQL solution. It
  performs quite well for us, but currently we've faced next problem.
  Note: We are using Infinispan 5.1.6 in SYNC_REPL mode in small cluster.
  The problem is that when any node fails, any running transactions wait
  for Jgroups to decide if it've really failed or not and rollback
  because of SuspectException after that. While we can live with a
  delay, we'd really like to skip rolling back. As for me, I actually
  don't see a reason for rollback because transactions started after
  leave will succeed. So, as for me, previously running transactions
  could do the same.

 We're aware of the problem (https://issues.jboss.org/browse/ISPN-2402).

 @Dan, has there been any updates on this?

  The question for is if node that left will synchronize it's state
  after merge (even if merge was done without infinispan restart). As
  for me, it should or it won't work correctly at all.

 This is not in yet: https://issues.jboss.org/browse/ISPN-263

  So, I've found RpcManager's ResponseMode.SYNCHRONOUS_IGNORE_LEAVERS
  and think on switching to it for RpcManager calls that don't specify
  ResponseMode explicitly. As for me, it should do the trick. Also, I am
  going to enforce Quorum number of reponses, but that's another story.
  So, how do you think, would it work?

 ^ Not sure if that'll work. @Dan?

  P.S. Another Q for me, how does it work now, when SuspectException is
  thrown from CommitCommand broadcasting. Af far as I can see, commit is
  still done on some remote nodes (that are still in the cluster), but
  rolled back on local node because of this exception. Am I correct?

 ^ How Infinispan reacts in these situations depends a lot on the type of
 communications (synchronous or asynchronous) and the transaction
 configuration. Mircea can provide more details on this.

 Cheers,

  This
  can cause inconsistencies, but 

Re: [infinispan-dev] Using infinispan as quorum-based nosql

2013-06-06 Thread Dan Berindei
Say you have two transactions, tx1 and tx2. They both send a
LockControlCommand(k1) to the primary owner of k1 (let's call it B).
If the lock commands use SYNCHRONOUS_IGNORE_LEAVERS and B dies while
processing the commands, both tx1 and tx2 will think they have succeeded in
locking k1.

So you're right, everything should be locked before prepare in pessimistic
mode, but LockControlCommands are also susceptible to SuspectExceptions. On
the other hand, you can use SYNCHRONOUS mode for LockControlCommands and
you can just retry the transaction in case of a SuspectException.

Unfortunately, you can't retry the transaction if the PrepareCommand fails
(in pessimistic mode; or the CommitCommand in optimistic mode), because it
is executed in the commit phase. The transaction manager swallows all the
exceptions in the commit phase, making it impossible to see if it failed
because of a node leaving. I guess this means I should increase the
priority of https://issues.jboss.org/browse/ISPN-2402 ...



On Thu, Jun 6, 2013 at 11:49 AM, vitalii.tymchys...@ubs.com wrote:

 **
 Hello.

 We are using pessimistic transaction mode. In this case everything's
 already locked by the time of prepare, is not it?
 As of merge, for quorum mode it's simple - take data from quorum. I think
 I will try to simply suppress sending data from non-quorum members on
 merge. Because currently everyone sends it's data and it creates complete
 mess with unsynchronized data after merge (depending on the timing).

 Best regards, Vitalii Tymchyshyn

  --
 *From:* infinispan-dev-boun...@lists.jboss.org [mailto:
 infinispan-dev-boun...@lists.jboss.org] *On Behalf Of *Dan Berindei
 *Sent:* Wednesday, June 05, 2013 12:04 PM

 *To:* infinispan -Dev List
 *Subject:* Re: [infinispan-dev] Using infinispan as quorum-based nosql



 On Mon, Jun 3, 2013 at 4:23 PM, vitalii.tymchys...@ubs.com wrote:

 Hello.

 Thanks for your information. I will subscribe and vote for the issues
 noted.
 In the meantime I've implemented hacky JgroupsTransport that downgrades
 all (but CacheViewControlCommand and StateTransferControlCommand)
 SYNCHRONOUS invokeRemotely calls to SYNCHRONOUS_IGNORE_LEAVERS and checks
 if required number of answers was received with a filter (I've tried to use
 original invokeRemotely return value but it often returns some strange
 value, like empty map). It seems to do the trick for me. But I am still not
 sure if this has any side effects.


 Indeed, I started working on a solution, but I over-engineered it and then
 I got side-tracked with other stuff. Sorry about that.

 The problem with using SYNCHRONOUS_IGNORE_LEAVERS everywhere, as I found
 out, is that you don't want to ignore the primary owner of a key leaving
 during a prepare/lock command (or the coordinator, in REPL mode prior to
 5.3.0.CR1/ISPN-2772). If that happens, you have to retry on the new primary
 owner, otherwise you can't know if the prepare command has locked the key
 or not.

 A similar problem appears in non-transactional caches with
 supportsConcurrentUpdates=true: there the primary owner can ignore any of
 the backup owners leaving, but the originator can't ignore the primary
 owner leaving.



 For now I can see merge problem in my test: different values are picked
 during merge. I am going to dig a little deeper and follow up. But it's
 already a little strange for me, since the test algorithm is:
 1)Assign old value to full cluster (it's REPL_SYNC mode)
 2)Block coordinator
 3)Writer new value to one of two remaining nodes. It's syncrhonized to
 second remaining node
 4)Unblock coordinator
 5)Wait (I could not find a good way to wait for state transfer but wait
 in this case).
 6)Check the value on coordinator

 And in my test I am randomly getting old or new in assert. I am now
 going to check why. May be I will need to reinitialize smaller cluster
 part to ensure data is taken from the quorum part of the cluster.


 We don't handle merges properly. See
 https://issues.jboss.org/browse/ISPN-263 and the discussion at
 http://markmail.org/message/meyczotzobuva7js

 What happens right now is that after a merge, all the caches are assumed
 to have up-to-date data, so there is no state transfer. We had several
 ideas floating around on how we could force the smaller partition to
 receive data from the quorum partition, but I think with the public API
 your best option is to stop all the caches in the smaller partition after
 the split and start them back up after the merge.

 Cheers
 Dan



 Best regards, Vitalii Tymchyshyn

 -Original Message-
 From: infinispan-dev-boun...@lists.jboss.org [mailto:
 infinispan-dev-boun...@lists.jboss.org] On Behalf Of Galder Zamarreno
 Sent: Monday, June 03, 2013 9:04 AM
 To: infinispan -Dev List
 Subject: Re: [infinispan-dev] Using infinispan as quorum-based nosql


 On May 30, 2013, at 5:10 PM, vitalii.tymchys...@ubs.com wrote:

  Hello.
 
  We are going to use Infinispan in our project as NoSQL

Re: [infinispan-dev] Retrieval operations with the IGNORE_RETURN_VALUES flag

2013-06-10 Thread Dan Berindei
On Thu, Jun 6, 2013 at 9:08 PM, Ray Tsang saturn...@gmail.com wrote:

 On Jun 6, 2013, at 13:26, Mircea Markus mmar...@redhat.com wrote:

 
  On 4 Jun 2013, at 13:55, Dan Berindei dan.berin...@gmail.com wrote:
 
  CacheLoaderInterceptor and DistributionInterceptor both honour the
 IGNORE_RETURN_VALUES flag for get commands, but I think it would be more
 useful if they ignored it - just like they ignore it for conditional
 commands.
 
  That would make it possible for users to only keep a reference to a
 cache.getAdvancedCache().withFlags(IGNORE_RETURN_VALUES) and use it for
 both read and write operations.
 
  What do you think?
 
  If I was to take the role of a colleague of the person who's written
 the Infinispan code, it'd be very confused to see a cache reference created
 with IGNORE_RETURN_VALUES being used for a get() operation… I can see
 myself thinking: Why on earth do you call get with IGNORE_RETURN_VALUES?
 
  Isn't Galder's point not to allow invoking get with
 IGNORE_RETURN_VALUES? As both of you pointed out, Get +
 IGNORE_RETURN_VALUES doesn't make any sense :-)
 
 
  You'd think conditional operations with IGNORE_RETURN_VALUES don't make
 sense either, yet we have a special case to handle those as if the flag
 wasn't present :)
 
  I guess you're referring to ISPN-3141?


Exactly. Does it make sense to call
cache.withFlags(IGNORE_RETURN_VALUES).putIfAbsent(k, v)? What should it
return?



  Still I think Get + IGNORE_RETURN_VALUES doesn't make any sense :-)

 +1. It definitely threw me off...


Ok, maybe IGNORE_RETURN_VALUES wouldn't be the best flag name for what I
had in mind... I was thinking of a scenario where the application needs to
do both reads and writes, but for writes it never needs to know the
previous value. In that scenario it would make sense to call something like

cache =
cacheManager.getCache().getAdvancedCache().withFlags(IGNORE_RETURN_VALUES_ON_WRITES)

at the beginning and only ever use that reference in the application. I
agree that using the existing IGNORE_RETURN_VALUES flag for that would be a
bit misleading, though.

Should we change anything about the IGNORE_RETURN_VALUES, then? I guess it
would be relatively simple to make it so that get() operations with the
flag throw an exception and (optionally) put() operations always return
null. Should I create an issue in JIRA for that?

Cheers
Dan
___
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev

Re: [infinispan-dev] [infinispan-internal] PutMapCommand is ineffective

2013-06-10 Thread Dan Berindei
Yes, putAll is really heavy in non-tx (concurrent) mode, because the same
PutMapCommand is forwarded from each primary owner to all the backup owners
of the keys it primary-owns. However, I don't think

However, in non-tx mode locks are owned by threads. A separate lock command
would acquire a lock and associate it with its execution thread, making it
impossible for a following write command to use the same lock. Changing
putAll to implement Radim's proposal would indeed make it very similar to a
transactional putAll: you'd need a pseudo-transaction object to associate
the locks with, and a reaper to clean up the pseudo-transaction objects
when the originator leaves the cluster.




On Mon, Jun 10, 2013 at 1:33 PM, Manik Surtani msurt...@redhat.com wrote:

 Agreed.  It does sound pretty heavy.  We should investigate a better
 implementation - the two approaches you suggest both sound good, could you
 create a JIRA for this?

 Adding infinispan-dev, that's the correct place to discuss this.

 Cheers
 Manik

 On 7 Jun 2013, at 13:39, Radim Vansa rva...@redhat.com wrote:

  Hi,
 
  recently I was looking into the performance of PutMapCommand and what's
 in fact going on under the hood. From what I've seen (not from the code but
 from message flow analysis), in non-transactional synchronous mode this
 happens:
 
  A wants to execute PutMapCommand with many keys - let's assume that in
 fact the keys span all nodes in the cluster.
 
  1. A locks all local keys and sends via unicast a message to each
 primary owner of some of the keys in the map
  2. A sends unicast message to each node, requesting the operation
  3. Each node locks its keys and sends multicast message to ALL other
 nodes in the cluster


I don't think that's right... Each primary owner only sends this message to
all the backup owners of the keys for which that node is the primary owner.
So it will only send the message to all the other nodes (optimized as a
multicast) if every other node is a backup owner for one of its primary
keys.



  This happens N - 1 times:
  4. Each node receives the multicast message, (updates the non-primary
 segments) and sends reply back to the sender of mcast message.
  5. The primary owners send confirmation back to A.
 
  Let's compute how many messages are here received - it's
  N - 1 // A's request
  (N - 1) * (N - 1) // multicast message received
  (N - 1) * (N - 1) // reply to the multicast message received
  N - 1 // response to A
  That's 2*N^2 - 2*N messages. In case nobody needs flow control
 replenishments, nothing is lost etc. I don't like that ^2 exponent - does
 not look like the cluster is really scaling. It could be fun to see execute
 it on 64-node cluster, spawning thousands of messages just for one putAll
 (with, say 100 key-value pairs - I don't want to compute the exact
 probability on how many nodes would such set of keys have primary segments).
 
  Could the requestor orchestrate the whole operation? The idea is that
 all messages are sent only between requestor and other nodes, never between
 the other nodes. The requestor would lock the primary keys by one set of
 messages (waiting for reply), updating the non-primaries by another set of
 messages and then again unlocking all primaries by last message.
  The set of messages could be either unicast with selected keys only for
 the recipient, or multicast with whole map - rationalization which one is
 actually better is subject to performance test.
  This results in 6*N - 6 messages (or 5*N - 5 if the last message
 wouldn't require the reply). You can easily see when 5*(N - 1) is better
 than 2*N*(N - 1).
  Or is this too similar to transactions with multiple keys?
 
  I think that with current implementation, the putAll operation should be
 discouraged as it does not provide better performance than multiple put
 (and in terms of atomicity it's probably not much better either).
 
  WDYT?
 
  Radim
 
  ---
  Radim Vansa
  Quality Assurance Engineer
  JBoss Datagrid
  tel. +420532294559 ext. 62559
 
  Red Hat Czech, s.r.o.
  Brno, Purkyňova 99/71, PSČ 612 45
  Czech Republic
 
 

 --
 Manik Surtani
 ma...@jboss.org
 twitter.com/maniksurtani

 Platform Architect, JBoss Data Grid
 http://red.ht/data-grid


 ___
 infinispan-dev mailing list
 infinispan-dev@lists.jboss.org
 https://lists.jboss.org/mailman/listinfo/infinispan-dev
___
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev

Re: [infinispan-dev] Suppressing state transfer via JMX

2013-06-11 Thread Dan Berindei
On Tue, Jun 11, 2013 at 2:01 PM, Manik Surtani msurt...@redhat.com wrote:


 On 10 Jun 2013, at 15:12, Dan Berindei dan.berin...@gmail.com wrote:

 Erik, I think in your case you'd be better served by a
 ConsistentHashFactory that always assigns at most one owner from each
 machine for each segment.

 I guess the fix for ISPN-3140 should work as well, but it wouldn't be very
 straightforward: you'd have to keep the rebalancingEnabled attribute set to
 false by default, and you'd have to enable it temporarily every time you
 have a topology change that you do want to process.


 Why?  Does the workflow detailed in ISPN-3140 not work?


ISPN-3140 is geared toward planned shutdowns, my understanding was that
Erik's scenario involves an unexpected failure.

Say we have a cluster with 4 nodes spread on 2 machines: A(m1), B(m1),
C(m2), D(m2).
If m2 fails, rebalancing will start automatically and m1 will have 2 copies
of each entry (one on A and one on B).
Trying to suspend rebalancing after m2 has already failed won't have any
effect - if state transfer is already in progress it won't be cancelled.
In order to avoid the unnecessary transfers, rebalancing would have to be
suspended before the failure - i.e. rebalancing should be suspended by
default.


 It's certainly possible to do this automatically from your app or from a
 monitoring daemon, but I'm pretty sure an enhanced topology-aware CHF would
 be a better fit.


 Do explain.


A custom ConsistentHashFactory could distribute segments so that a machine
never has more than 1 copy of each segment. If m2 failed, there would be
just one machine in the cluster, and just one copy of each segment. The
factory would not change the consistent hash, and there wouldn't be any
state transfer.

It could be even simpler - the existing
TopologyAwareConsistentHashFactory/TopologyAwareSyncConsistentHashFactory
implementations already ensure just one copy per machine if the number of
machines is = numOwners. So a custom ConsistentHashFactory could just
extend one of these and skip calling super.rebalance() when the number of
machines in the cluster is  numOwners.






 On Fri, Jun 7, 2013 at 1:45 PM, Erik Salter an1...@hotmail.com wrote:

 I'd like something similar.  If I have equal keys on two machines (given
 an
 orthogonal setup and a TACH), I'd like to suppress state transfer and run
 with only one copy until I can recover my machines.  The business case is
 that in a degraded scenario, additional replicas aren't going to buy me
 anything, as a failure will most likely be at the machine level and will
 cause me to lose data.  Once I've recovered the other machine, I can turn
 back on state transfer to get my data redundancy.

 Erik

 -Original Message-
 From: infinispan-dev-boun...@lists.jboss.org
 [mailto:infinispan-dev-boun...@lists.jboss.org] On Behalf Of Mircea
 Markus
 Sent: Tuesday, June 04, 2013 5:44 AM
 To: infinispan -Dev List
 Subject: Re: [infinispan-dev] Suppressing state transfer via JMX

 Manik, what's wrong with Dan's suggestion with clearing the cache before
 shutdown?

 On 31 May 2013, at 14:20, Manik Surtani msurt...@redhat.com wrote:

 
  If we only want to deal with full cluster shutdown, then I think
 stopping
 all application requests, calling Cache.clear() on one node, and then
 shutting down all the nodes should be simpler. On start, assuming no cache
 store, the caches will start empty, so starting all the nodes at once and
 only allowing application requests when they've all joined should also
 work
 without extra work.
 
  If we only want to stop a part of the cluster, suppressing rebalancing
 would be better, because we wouldn't lose all the data. But we'd still
 lose
 the keys whose owners are all among the nodes we want to stop. I've
 discussed this with Adrian, and we think if we want to stop a part of the
 cluster without losing data we need a JMX operation on the coordinator
 that
 will atomically remove a set of nodes from the CH. After the operation
 completes, the user will know it's safe to stop those nodes without losing
 data.
 
  I think the no-data-loss option is bigger scope, perhaps part of
 ISPN-1394.  And that's not what I am asking about.
 
  When it comes to starting a part of the cluster, a pause rebalancing
 option would probably be better - but again, on the coordinator, not on
 each
 joining node. And clearly, if more than numOwner nodes leave while
 rebalancing is suspended, data will be lost.
 
  Yup.  This sort of option would only be used where data loss isn't an
 issue (such as a distributed cache).  Where data loss is an issue, we'd
 need
 more control - ISPN-1394.
 

 Cheers,
 --
 Mircea Markus
 Infinispan lead (www.infinispan.org)





 ___
 infinispan-dev mailing list
 infinispan-dev@lists.jboss.org
 https://lists.jboss.org/mailman/listinfo/infinispan-dev

 ___
 infinispan-dev mailing list
 infinispan-dev

Re: [infinispan-dev] Suppressing state transfer via JMX

2013-06-12 Thread Dan Berindei
On Wed, Jun 12, 2013 at 2:57 PM, Manik Surtani msurt...@redhat.com wrote:


 On 11 Jun 2013, at 16:27, Dan Berindei dan.berin...@gmail.com wrote:




 On Tue, Jun 11, 2013 at 2:01 PM, Manik Surtani msurt...@redhat.comwrote:


 On 10 Jun 2013, at 15:12, Dan Berindei dan.berin...@gmail.com wrote:

 Erik, I think in your case you'd be better served by a
 ConsistentHashFactory that always assigns at most one owner from each
 machine for each segment.

 I guess the fix for ISPN-3140 should work as well, but it wouldn't be
 very straightforward: you'd have to keep the rebalancingEnabled attribute
 set to false by default, and you'd have to enable it temporarily every time
 you have a topology change that you do want to process.


 Why?  Does the workflow detailed in ISPN-3140 not work?


 ISPN-3140 is geared toward planned shutdowns, my understanding was that
 Erik's scenario involves an unexpected failure.

 Say we have a cluster with 4 nodes spread on 2 machines: A(m1), B(m1),
 C(m2), D(m2).
 If m2 fails, rebalancing will start automatically and m1 will have 2
 copies of each entry (one on A and one on B).
 Trying to suspend rebalancing after m2 has already failed won't have any
 effect - if state transfer is already in progress it won't be cancelled.
 In order to avoid the unnecessary transfers, rebalancing would have to be
 suspended before the failure - i.e. rebalancing should be suspended by
 default.


 It's certainly possible to do this automatically from your app or from a
 monitoring daemon, but I'm pretty sure an enhanced topology-aware CHF would
 be a better fit.


 Do explain.


 A custom ConsistentHashFactory could distribute segments so that a machine
 never has more than 1 copy of each segment. If m2 failed, there would be
 just one machine in the cluster, and just one copy of each segment. The
 factory would not change the consistent hash, and there wouldn't be any
 state transfer.


 But that's bad for unplanned failures, as you lose data in that case.


Erik's assumption was that machine failures were much more likely than
individual node failures, and having a second copy on m1 wouldn't help if
m1 failed as well.



 It could be even simpler - the existing
 TopologyAwareConsistentHashFactory/TopologyAwareSyncConsistentHashFactory
 implementations already ensure just one copy per machine if the number of
 machines is = numOwners. So a custom ConsistentHashFactory could just
 extend one of these and skip calling super.rebalance() when the number of
 machines in the cluster is  numOwners.






 On Fri, Jun 7, 2013 at 1:45 PM, Erik Salter an1...@hotmail.com wrote:

 I'd like something similar.  If I have equal keys on two machines (given
 an
 orthogonal setup and a TACH), I'd like to suppress state transfer and run
 with only one copy until I can recover my machines.  The business case is
 that in a degraded scenario, additional replicas aren't going to buy me
 anything, as a failure will most likely be at the machine level and will
 cause me to lose data.  Once I've recovered the other machine, I can turn
 back on state transfer to get my data redundancy.

 Erik

 -Original Message-
 From: infinispan-dev-boun...@lists.jboss.org
 [mailto:infinispan-dev-boun...@lists.jboss.org] On Behalf Of Mircea
 Markus
 Sent: Tuesday, June 04, 2013 5:44 AM
 To: infinispan -Dev List
 Subject: Re: [infinispan-dev] Suppressing state transfer via JMX

 Manik, what's wrong with Dan's suggestion with clearing the cache before
 shutdown?

 On 31 May 2013, at 14:20, Manik Surtani msurt...@redhat.com wrote:

 
  If we only want to deal with full cluster shutdown, then I think
 stopping
 all application requests, calling Cache.clear() on one node, and then
 shutting down all the nodes should be simpler. On start, assuming no
 cache
 store, the caches will start empty, so starting all the nodes at once and
 only allowing application requests when they've all joined should also
 work
 without extra work.
 
  If we only want to stop a part of the cluster, suppressing rebalancing
 would be better, because we wouldn't lose all the data. But we'd still
 lose
 the keys whose owners are all among the nodes we want to stop. I've
 discussed this with Adrian, and we think if we want to stop a part of the
 cluster without losing data we need a JMX operation on the coordinator
 that
 will atomically remove a set of nodes from the CH. After the operation
 completes, the user will know it's safe to stop those nodes without
 losing
 data.
 
  I think the no-data-loss option is bigger scope, perhaps part of
 ISPN-1394.  And that's not what I am asking about.
 
  When it comes to starting a part of the cluster, a pause rebalancing
 option would probably be better - but again, on the coordinator, not on
 each
 joining node. And clearly, if more than numOwner nodes leave while
 rebalancing is suspended, data will be lost.
 
  Yup.  This sort of option would only be used where data loss isn't an
 issue (such as a distributed cache

Re: [infinispan-dev] New bundler performance

2013-06-12 Thread Dan Berindei
On Wed, Jun 12, 2013 at 5:39 PM, Radim Vansa rva...@redhat.com wrote:



 - Original Message -
 | From: Bela Ban b...@redhat.com
 | To: infinispan-dev@lists.jboss.org
 | Sent: Wednesday, June 12, 2013 3:16:58 PM
 | Subject: Re: [infinispan-dev] New bundler performance
 |
 |
 |
 | On 06/12/2013 04:54 AM, Radim Vansa wrote:
 |  Hi,
 | 
 |  I was going through the commits (running tests on each of them) to
 |  seek the performance regression we've recently discovered and it
 |  seems that our test (replicated udp non-transactional stress test on
 |  4 nodes) experiences a serious regression on the commit
 | 
 |  ISPN-2848 Use the new bundling mechanism from JGroups 3.3.0
 |  (73da108cdcf9db4f3edbcd6dbda6938d6e45d148)
 | 
 |  The performance drops from about 7800 writes/s to 4800 writes/s, and
 |  from 1.5M reads/s to 1.2M reads/s (having slower reads in replicated
 |  mode is really odd).
 |
 |
 | Is this using sync or async replication ?

 Sync

 |
 | You could set UDP.bundler_type=old to see if the old bundler makes a
 | difference.

 Yeah, that does not work at all, performance drops very low as everything
 is bundled

 |
 | Yes, reads should have *nothing* to do with the bundler as there should
 | not be any communication on reads !
 |
 | Maybe you should instrument the reads with byteman and see if they cause
 | any communication.

 Maybe that's really just about overall system behaviour. If we waited for
 some communication, there wouldn't be million reads per second :)


Higher CPU usage for the write commands will slow down everything else on
the same machine. So as long as the test does reads and writes in parallel,
there will be some interference. I suggest running a read-only test and a
write-only test to get a more reliable picture of the performance for each
of them.



 |
 |
 |  It seems that the bundler is not really as good as we hoped for - it
 |  may be the bottleneck. I have tried to create another bundler which
 |  shares the queue between 4 instances of TransportQueueBundler (so, 4
 |  threads are actually sending the messages which go into one queue)
 |  and the performance mildly improved - to 5200 writes/s, but that's
 |  not enough.
 |
 | I'm currently at the summit, but do you have a test that I could run
 | when I'm back to reproduce the performance issue ?

 Would be RadarGun benchmark enough to you? See attachments.
 You have compile Infinispan from snapshot
 (113842c8cf91cbb5a1bbd26e05fab7024fdec081 is the last OK,
 73da108cdcf9db4f3edbcd6dbda6938d6e45d148 is slower) and compile RadarGun
 from snapshot with

 mvn install -Pinfinispan53

 (this creates also the plugin using infinispan-core-5.3.0-SNAPSHOT)

 Radim

 |
 |
 | 
 |  Radim
 | 
 |  Note: you may have seen my conversation with Pedro Ruivo on IRC about
 |  the bundler several days ago, in that time our configuration had old
 |  bundler. This was fixed, but as I have not built Infinispan properly
 |  (something got cached), I have not noticed the regression between
 |  these builds.
 |
 | --
 | Bela Ban
 | Lead JGroups / Clustering Team
 | JBoss
 | ___
 | infinispan-dev mailing list
 | infinispan-dev@lists.jboss.org
 | https://lists.jboss.org/mailman/listinfo/infinispan-dev
 |

 ___
 infinispan-dev mailing list
 infinispan-dev@lists.jboss.org
 https://lists.jboss.org/mailman/listinfo/infinispan-dev

___
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev

Re: [infinispan-dev] Retrieval operations with the IGNORE_RETURN_VALUES flag

2013-06-13 Thread Dan Berindei
On Wed, Jun 12, 2013 at 3:39 PM, Mircea Markus mmar...@redhat.com wrote:


 On 12 Jun 2013, at 13:14, Galder Zamarreño gal...@redhat.com wrote:

 
 
  On Jun 10, 2013, at 12:01 PM, Adrian Nistor anis...@redhat.com wrote:
 
  Maybe we could just clarify the javadoc of IGNORE_RETURN_VALUES and say
 that it only applies to write operations and is ignored for everything
 else? Why punish the user with an exception when doing a 'get'?
 
  We already document there's a (very common-sense) exception for
 conditional writes were the flag is ignored (ISPN-3141).
 
  I wonder if anyone noticed my reply earlier...
 
  The flag business does need a big re-think. Not only to separate
 internal from external flags (we have a jira for that [1]), but also to
 have a way to define which flags can be passed to a particular operation,
 in a way that's type-safe, and without resulting in a runtime error of the
 likes of X flag cannot be used with Y operation. IOW, any error on which
 flag can be used with what operation should ideally be caught at
 compilation time. I don't have specific ideas on this right now, but I
 think it'd be good to achieve this.
 
  IOW, I suggest we leave it as it is. We need to re-think it anyway. So
 let's tackle it in 6.0 so that a get operation can never be passed
 IGNORE_RETURN_VALUES flag, and this being something that's caught at
 **compilation time**.

 this would be the elegant way of doing it.


An even more elegant solution would be to make put return void by default,
and force the user to state explicitly that he wants a return value - a la
JSR-107. I know that would break backwards compatibility, so it may not be
an option even in 6.0, but still I think we should focus on that.

Maybe it's worth making other flags type-safe, but I don't think it's
worth doing it for IGNORE_RETURN_VALUES.



  
  I'm just about to add another internal flag to Flag as a result of the
 JCache 0.7 upgrade…, so need to tackle ISPN-2201 to avoid causing more
 confusion, and alongside avoid the issues that have been highlighted WRT
 which operations are allowed which flags. I'm happy to do this for 6.0.
 
  [1] https://issues.jboss.org/browse/ISPN-2201
 I've update the JIRA to track the fact that IGNORE_RETURN_VALUES + get
 should not be possible.

 Cheers,
 --
 Mircea Markus
 Infinispan lead (www.infinispan.org)





 ___
 infinispan-dev mailing list
 infinispan-dev@lists.jboss.org
 https://lists.jboss.org/mailman/listinfo/infinispan-dev

___
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev

Re: [infinispan-dev] partial shutdown problem when shutting down the coordinator

2013-06-13 Thread Dan Berindei
Done: https://github.com/infinispan/infinispan/pull/1900



On Thu, Jun 13, 2013 at 4:15 PM, Mircea Markus mmar...@redhat.com wrote:

 Hi Dan,

 snip
 But unfortunately the partial shutdown case does NOT work, if the
 coordinator node is a member of partial shutdown nodes.
 When the coordinator is down while suspending rebalance, one of the rest
 servers is becoming a new coordinator with rebalanceEnabled=true, and after
 that, the rebalance process is starting.
 Once this happened, it will cause the unexpected cache data accumulation
 on the nodes that are awaiting shutdown request.
 /snip

 Seems like this scenario is not covered. I guess we should broadcast (to
 all the members) the clean shutdown request so that the new coordinator
 would know pick it up and don't start the rebalance?

 Cheers,
 --
 Mircea Markus
 Infinispan lead (www.infinispan.org)





 ___
 infinispan-dev mailing list
 infinispan-dev@lists.jboss.org
 https://lists.jboss.org/mailman/listinfo/infinispan-dev

___
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev

Re: [infinispan-dev] KeyAffinityService

2013-06-14 Thread Dan Berindei
Ben, I'm pretty sure it's not possible to define getKeyForAddress as
returning the same type as its input key.

Let's say f1(address)(x) = getKeyForAddress(address, x)

f1(address) has to be injective for each address. So if the size of the key
domain is size(K) and the number of addresses is N, the number of unique
output values of getKeyForAddress is N * size(K), which is clearly not
possible.

It's probably possible to define something like this instead:

PinnedKeyK getKeyForAddress(address, key)


Even so, keeping the result key pinned to the same address after a
rebalance would be tricky - probably requiring some changes to the
ConsistentHash implementation. Of course, the node to which the key is
pinned may also die, and I'm not sure where the keys should be located
then.

For the existing KeyAffinityService methods it's simple: they give you a
key that is currently mapped to a specific node, but they don't make any
guarantees about where that key will be located in the future.

Speaking of which, how do you know which keys should map to which node in
your application?

If the rules are simple enough, you may be able to create multiple caches,
each of them with a custom ConsistentHashFactory that locates *all* the
keys to the same primary node (with random backup owners).

Cheers
Dan


On Fri, Jun 14, 2013 at 3:43 PM, cotton-ben
ben.cot...@alumni.rutgers.eduwrote:

 Thanks for this reply, Mircea.

 Very interesting approach.  By dispatching a distributed executor back to
 the node (node 0) that produced the pinned key affinity for the object's
 natural key, we could indeed do an iterative search (from node 2) to find
 the original 'myObject' in the pinned cache (on node 0).  As you point out,
 however, this approach is prohibitively costly -- we are effectively doing
 an iterative search across nodes.

 Again, we see the best approach as being to use the enhancement to the Key
 Affinity Service API suggested in
 https://issues.jboss.org/browse/ISPN-3112.
 Once

 *K getKeyForAddress(Address pinnedNodeAddress, K objectNaturalKey);*

 is provided, we will be able to proceed as simply as follows:

 //given that all nodes know that all birthday Entry(s) are pinned to Node
 0

 @t=0, from node=0, pin Entry to cache at Node 0

   Object pinnedKey = getKeyForAddress(getAddress(Node0),
 1962-08-10);
   cache.put(pinnedKey,Ben D. Cotton III birthday);


 Later @t=1, from node =2, we can use the improved API (suggested in the
 JIRA) from a 'foreign' node without any need to do a distributed, iterative
 search, i.e.

   Object pinnedKey = getKeyForAddress(getAddress(Node0),
 1962-08-10);
   Object birthdayCelebrant = cache.get(pinnedKey);  //return Ben D.
 Cotton III birthday


 Of course, for us to be able to do this, the ISPN team would have to
 deliver
 to us the API recommended in the JIRA
 https://issues.jboss.org/browse/ISPN-3112.  Don't you agree such an API
 enhancement would be an awesome capability improvement for ISPN users
 wishing to operate in a distributed fashion on pinned Caches? :-)

 As always, thanks to you and this forum for its fine support,
 Ben






 --
 View this message in context:
 http://infinispan-developer-list.980875.n3.nabble.com/KeyAffinityService-nice-2-recommendations-tp4027152p4027401.html
 Sent from the Infinispan Developer List mailing list archive at Nabble.com.
 ___
 infinispan-dev mailing list
 infinispan-dev@lists.jboss.org
 https://lists.jboss.org/mailman/listinfo/infinispan-dev

___
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev

Re: [infinispan-dev] KeyAffinityService

2013-06-14 Thread Dan Berindei
On Fri, Jun 14, 2013 at 5:46 PM, cotton-ben
ben.cot...@alumni.rutgers.eduwrote:



 /It's probably possible to define something like this instead:

 PinnedKeyK getKeyForAddress(address, key)
 /

 This is fine.  Do it this way.

 /Even so, keeping the result key pinned to the same address after a
 rebalance would be tricky - probably requiring some changes to the
 ConsistentHash implementation. Of course, the node to which the key is
 pinned may also die, and I'm not sure where the keys should be located
 then.
 /

 But, shouldn't some combination of my usage of the ISPN GrouperT and the
 ISPN KeyAffintyService capabilities cover me?   In my understanding, the
 Grouper capability can allow me to group my keys to a specific node
 (however node identity is anonymous).  Grouper capability's ambition is to
 transparently survive topology changes of nodes participating in my grid.
 In my understanding, the KAS capability can allow me to pin my keys to a
 specific node (by node identity).   A potetnailly great tutorial document
 for ISPN users might be How to harmoniously use the ISPN Grouper and ISP
 KeyAffinityService Capabilities.  How to use these 2 meritorious
 capabilies
 together (and in harmony) is still unclear to me.


Just to be clear, KAS doesn't really allow you to pin a key to a certain
node. It only gives you a key that maps to the given node the moment you
called KAS.getKeyForAddress(Address) - by the time you call cache.put()
with the pinned key, it may have already moved to a different node.

Would that be enough for you? If yes, adding the new method should be
relatively easy to accomplish with what we have now (although the fact that
a grouping key can only be a String does impose some limitations).

But it has been my impression that you want something that would keep the
key pinned to the same node as long as that node was alive. If that's the
case, then implementing it would require significant additional work.



 /Speaking of which, how do you know which keys should map to which node in
 your application?/

 We just know.  :-)   Seriously, we do have a mechanism for resolving
 pinnedKey identity to Node identity custody.


I'm just wondering, wouldn't it be simpler if you created a custom
ConsistentHashFactory to map keys to nodes using the same mechanism you
already have? Not as easy as the API you proposed, but it wouldn't require
a separate cache either...


/If the rules are simple enough, you may be able to create multiple caches,
 each of them with a custom ConsistentHashFactory that locates *all* the
 keys
 to the same primary node (with random backup owners)./

 Very true.  But we don't think that is as nice as extending the API in the
 way recommend.  To be blunt, we could just do something as simple as this:

 @t=0, from node=0, pin Entry to cache at Node 0

   Object pinnedKey =
 getKeyForAddress(getAddress(Node0),1962-08-10);


I suppose you meant pinnedKey = getKeyForAddress(getAddress(
Node0)) here.

With this solution, pinned keys could also move from the initial target
node to another node as the cache topology changes (and not all keys will
move to the same node).


  globalReplicatedPinnedKeyCache.put(1962-08-10, pinnedKey); //cache
 the pinnedKeys globally
   pinnedCache.put(pinnedKey,Ben D. Cotton III birthday);

 Later @t=1, from node =2

Object pinnedKey =
  globalReplicatedPinnedKeyCache.get(1962-08-10);
Object birthdayCelebrant = pinnedCache.get(pinnedKey);  //return
 Ben
 D. Cotton III birthday

 But that is effectively forcing us to do 2 explicit CacheK,V#get()
 operations in application code.  First to get the pinnedKey (given the
 naturalKey).  Second to get the pinnedValue (given the pinnedKey).  Not
 preferred -- but DOABLE, FOR SURE.


What do you think?  Just for the niceness of it, do you think  we might be
 able to have the API suggested at
 https://issues.jboss.org/browse/ISPN-3112
 ?


My main concern is that you actually want something more than the
KeyAffinityService is able to give you ATM - even if we did add the
niceness in the API, the keys still won't be pinned to a certain node for
as long as that node is alive.
___
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev

Re: [infinispan-dev] Doubts about TxDistributionInterceptor and possible break in transaction isolation

2013-06-17 Thread Dan Berindei
On Mon, Jun 17, 2013 at 3:58 PM, Pedro Ruivo pe...@infinispan.org wrote:



 On 06/17/2013 12:56 PM, Mircea Markus wrote:
 
  On 17 Jun 2013, at 11:52, Pedro Ruivo pe...@infinispan.org wrote:
 
  I've been looking at TxDistributionInterceptor and I have a couple of
  questions (assuming REPEATABLE_READ isolation level):
 
  #1. why are we doing a remote get each time we write on a key? (huge
  perform impact if the key was previously read)
  indeed this is suboptimal for transactions that write the same key
 repeatedly and repeatable read. Can you please create a JIRA for this?

 created: https://issues.jboss.org/browse/ISPN-3235


Oops... when I fixed https://issues.jboss.org/browse/ISPN-3124 I removed
the SKIP_REMOTE_LOOKUP, thinking that the map is already in the invocation
context so there shouldn't be any perf penalty. I can't put the
SKIP_REMOTE_LOOKUP flag back, otherwise delta writes won't have the
previous value during state transfer, so +1 to fixing ISPN-3235.



 
  #2. why are we doing a dataContainer.get() if the remote get returns a
  null value? Shouldn't the interactions with data container be performed
  only in the (Versioned)EntryWrappingInterceptor?
  This was added in the scope of ISPN-2688 and covers the scenario in
 which a state transfer is in progress, the remote get returns null as the
 remote value was dropped (no longer owner) and this node has become the
 owner in between.
 

 ok :)


Yeah, this should be correct as long as we check if we already have the key
in the invocation context before doing the remote + local get.



 
  #3. (I didn't verify this) why are we acquire the lock is the remote get
  is performed for a write? This looks correct for pessimistic locking but
  not for optimistic...
  I think that, given that the local node is not owner, the lock
 acquisition is redundant even for pessimistic caches.
  Mind creating a test to check if dropping that lock acquisition doesn't
 break things?

 I created a JIRA with low priority since it does not affect the
 transaction outcome/isolation and I believe the performance impact
 should be lower (you can increase the priority if you want).

 https://issues.jboss.org/browse/ISPN-3237


If we don't lock the L1 entry, I think something like this could happen:

tx1@A: remote get(k1) from B - stores k1=v1 in invocation context
tx2@A: write(k1, v2)
tx2@A: commit - writes k1=v2 in L1
tx1@A: commit - overwrites k1=v1 in L1



  After this analysis, it is possible to break the isolation between
  transaction if I do a get on the key that does not exist:
 
  tm.begin()
  cache.get(k) //returns null
  //in the meanwhile a transaction writes on k and commits
  cache.get(k) //return the new value. IMO, this is not valid for
  REPEATABLE_READ isolation level!
 
  Indeed sounds like a bug, well spotted.
  Can you please add a UT to confirm it and raise a JIRA?

 created: https://issues.jboss.org/browse/ISPN-3236

 IMO, this should be the correct behaviour (I'm going to add the test
 cases later):

 tm.begin()
 cache.get(k) //returns null (op#1)
 //in the meanwhile a transaction writes on k and commits
 write operation performed:
 * put: must return the same value as op#1
 * conditional put //if op#1 returns null the operation should be always
 successful (i.e. the key is updated, return true). Otherwise, the key
 remains unchanged (return false)
 * replace: must return the same value as op#1
 * conditional replace: replace should be successful if checked with the
 op#1 return value (return true). Otherwise, the key must remain
 unchanged (return false).
 * remote: must return the same value as op#1
 * conditional remove: the key should be removed if checked with the op#1
 return value (return true). Otherwise, the key must remain unchanged
 (return false)

 Also, the description above should be valid after a removal of a key.

 
  Cheers,
 
 ___
 infinispan-dev mailing list
 infinispan-dev@lists.jboss.org
 https://lists.jboss.org/mailman/listinfo/infinispan-dev

___
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev

Re: [infinispan-dev] Doubts about TxDistributionInterceptor and possible break in transaction isolation

2013-06-18 Thread Dan Berindei
On Mon, Jun 17, 2013 at 7:00 PM, Mircea Markus mmar...@redhat.com wrote:


 On 17 Jun 2013, at 16:11, Dan Berindei dan.berin...@gmail.com wrote:

   I think that, given that the local node is not owner, the lock
 acquisition is redundant even for pessimistic caches.
   Mind creating a test to check if dropping that lock acquisition
 doesn't break things?
 
  I created a JIRA with low priority since it does not affect the
  transaction outcome/isolation and I believe the performance impact
  should be lower (you can increase the priority if you want).
 
  https://issues.jboss.org/browse/ISPN-3237
 
  If we don't lock the L1 entry, I think something like this could happen:

 There is a lock happening *without* L1 enabled.


Nope, tx1 doesn't lock k1 on B because it doesn't do a put(k1, v3) - it
only reads the value from B. So even if tx2 does lock k1 on B, it doesn't
add any synchronization between tx1 and tx2.

But tx1 does write the entry to L1 on A, so it should acquire an L1 lock
on A - and tx2 should also acquire the same lock.



  
  tx1@A: remote get(k1) from B - stores k1=v1 in invocation context
  tx2@A: write(k1, v2)
  tx2@A: commit - writes k1=v2 in L1
  tx1@A: commit - overwrites k1=v1 in L1
 
 

 Cheers,
 --
 Mircea Markus
 Infinispan lead (www.infinispan.org)





 ___
 infinispan-dev mailing list
 infinispan-dev@lists.jboss.org
 https://lists.jboss.org/mailman/listinfo/infinispan-dev

___
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev

Re: [infinispan-dev] Doubts about TxDistributionInterceptor and possible break in transaction isolation

2013-06-18 Thread Dan Berindei
On Mon, Jun 17, 2013 at 6:35 PM, William Burns mudokon...@gmail.com wrote:




 On Mon, Jun 17, 2013 at 11:11 AM, Dan Berindei dan.berin...@gmail.comwrote:




 On Mon, Jun 17, 2013 at 3:58 PM, Pedro Ruivo pe...@infinispan.orgwrote:



 On 06/17/2013 12:56 PM, Mircea Markus wrote:
 
  On 17 Jun 2013, at 11:52, Pedro Ruivo pe...@infinispan.org wrote:
 
  I've been looking at TxDistributionInterceptor and I have a couple of
  questions (assuming REPEATABLE_READ isolation level):
 
  #1. why are we doing a remote get each time we write on a key? (huge
  perform impact if the key was previously read)
  indeed this is suboptimal for transactions that write the same key
 repeatedly and repeatable read. Can you please create a JIRA for this?

 created: https://issues.jboss.org/browse/ISPN-3235


 Oops... when I fixed https://issues.jboss.org/browse/ISPN-3124 I removed
 the SKIP_REMOTE_LOOKUP, thinking that the map is already in the invocation
 context so there shouldn't be any perf penalty. I can't put the
 SKIP_REMOTE_LOOKUP flag back, otherwise delta writes won't have the
 previous value during state transfer, so +1 to fixing ISPN-3235.



 
  #2. why are we doing a dataContainer.get() if the remote get returns a
  null value? Shouldn't the interactions with data container be
 performed
  only in the (Versioned)EntryWrappingInterceptor?
  This was added in the scope of ISPN-2688 and covers the scenario in
 which a state transfer is in progress, the remote get returns null as the
 remote value was dropped (no longer owner) and this node has become the
 owner in between.
 

 ok :)


 Yeah, this should be correct as long as we check if we already have the
 key in the invocation context before doing the remote + local get.



 
  #3. (I didn't verify this) why are we acquire the lock is the remote
 get
  is performed for a write? This looks correct for pessimistic locking
 but
  not for optimistic...
  I think that, given that the local node is not owner, the lock
 acquisition is redundant even for pessimistic caches.
  Mind creating a test to check if dropping that lock acquisition
 doesn't break things?

 I created a JIRA with low priority since it does not affect the
 transaction outcome/isolation and I believe the performance impact
 should be lower (you can increase the priority if you want).

 https://issues.jboss.org/browse/ISPN-3237


 If we don't lock the L1 entry, I think something like this could happen:

 tx1@A: remote get(k1) from B - stores k1=v1 in invocation context
 tx2@A: write(k1, v2)
 tx2@A: commit - writes k1=v2 in L1
 tx1@A: commit - overwrites k1=v1 in L1

 This one is just like here: referenced in
 https://issues.jboss.org/browse/ISPN-2965?focusedCommentId=12779780page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-12779780


Yep, it's the same thing.


 And even locking doesn't help in this case since it doesn't lock the key
 for a remote get only a remote get in the context of a write - which means
 the L1 could be updated concurrently in either order - causing possibly an
 inconsistency.  This will be solved when I port the same fix I have for
 https://issues.jboss.org/browse/ISPN-3197 for tx caches.


I thought the locking happened for all remote gets, and that's how I think
it should work.

We don't have to keep the lock for the entire duration of the transaction,
though. If we write the L1 entry to the data container during the remote
get, like you suggested in your comment, then we could release the L1 lock
immediately and remote invalidation commands would be free to remove the
entry.




 
  After this analysis, it is possible to break the isolation between
  transaction if I do a get on the key that does not exist:
 
  tm.begin()
  cache.get(k) //returns null
  //in the meanwhile a transaction writes on k and commits
  cache.get(k) //return the new value. IMO, this is not valid for
  REPEATABLE_READ isolation level!
 
  Indeed sounds like a bug, well spotted.
  Can you please add a UT to confirm it and raise a JIRA?

 created: https://issues.jboss.org/browse/ISPN-3236

 IMO, this should be the correct behaviour (I'm going to add the test
 cases later):

 tm.begin()
 cache.get(k) //returns null (op#1)
 //in the meanwhile a transaction writes on k and commits
 write operation performed:
 * put: must return the same value as op#1
 * conditional put //if op#1 returns null the operation should be always
 successful (i.e. the key is updated, return true). Otherwise, the key
 remains unchanged (return false)
 * replace: must return the same value as op#1
 * conditional replace: replace should be successful if checked with the
 op#1 return value (return true). Otherwise, the key must remain
 unchanged (return false).
 * remote: must return the same value as op#1
 * conditional remove: the key should be removed if checked with the op#1
 return value (return true). Otherwise, the key must remain unchanged
 (return false)

 Also, the description above should

Re: [infinispan-dev] FAO Infinispan Developers: Must set JAVA_HOME_7 environment variable

2013-06-19 Thread Dan Berindei
If I run the whole test suite with OpenJDK 1.7, do I still need to set the
JAVA_HOME_7 variable?


On Wed, Jun 19, 2013 at 11:47 AM, Galder Zamarreño gal...@redhat.comwrote:

 Hi all,

 Starting with Infinispan 5.3.0.CR2, anyone who builds Infinispan requires
 an environment variable called JAVA_HOME_7 to be set and point to a JDK7
 installation. This is required in order to run the JCache TCK, which
 requires Java 7, and runs automatically as part of the Infinsipan build
 (when testing...).

 @Mircea, the TCK is not running on CI cos the environment variable is not
 set, see [1] log snippet taken from [2]. Can you sort out this in the CI
 machine?

 The TCK run build failure was ignored cos test failures are ignored, and
 this property is only used when running the TCK per se.

 Cheers,

 [1] https://gist.github.com/galderz/39851953ca4d6b4af899
 [2]
 http://ci.infinispan.org/viewLog.html?buildId=2072buildTypeId=bt2tab=buildLog#_
 --
 Galder Zamarreño
 gal...@redhat.com
 twitter.com/galderz

 Project Lead, Escalante
 http://escalante.io

 Engineer, Infinispan
 http://infinispan.org


 ___
 infinispan-dev mailing list
 infinispan-dev@lists.jboss.org
 https://lists.jboss.org/mailman/listinfo/infinispan-dev

___
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev

Re: [infinispan-dev] L1 Data Container

2013-06-19 Thread Dan Berindei
This would only affect the Cache internals (use two DataContainers instead
of one). Users still won't have direct access to the L1 cache.



On Wed, Jun 19, 2013 at 4:24 PM, cotton-ben
ben.cot...@alumni.rutgers.eduwrote:


 Instinctively, this is a very attractive idea.

 Would the L1 data container be a separate, but equal container?  i.e.
 would it be  a separate instance from the DIST_SYNC cache but the exact
 same
 data structure?

 or

 Would the L1 data container be separate, and different container?  i.e.
 would it be a completely different data structure (and API) from the
 DIST_SYNC cache?





 --
 View this message in context:
 http://infinispan-developer-list.980875.n3.nabble.com/infinispan-dev-L1-Data-Container-tp4027447p4027449.html
 Sent from the Infinispan Developer List mailing list archive at Nabble.com.
 ___
 infinispan-dev mailing list
 infinispan-dev@lists.jboss.org
 https://lists.jboss.org/mailman/listinfo/infinispan-dev

___
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev

Re: [infinispan-dev] L1 Data Container

2013-06-19 Thread Dan Berindei
On Wed, Jun 19, 2013 at 5:19 PM, Sanne Grinovero sa...@infinispan.orgwrote:

 On 19 June 2013 13:44, William Burns mudokon...@gmail.com wrote:
  All the L1 data for a DIST cache is stored in the same data container as
 the
  actual distributed data itself.  I wanted to propose breaking this out so
  there is a separate data container for the L1 cache as compared to the
  distributed data.
 
  I thought of a few quick benefits/drawbacks:
 
  Benefits:
  1. L1 cache can be separately tuned - L1 maxEntries for example

 -1!
   I don't think thats a benefit actually, from the point of view of a user:
 as a user I only know I have a certain amount of memory available on
 each node, and the application is going to use certain data way more
 often than others.
 The eviction strategy should be put in condition to be able to make an
 optimal choice about which entries - among all - are better kept in
 memory vs. passivated.
 I don't see a specific reason to favour keeping in memory owned
 entries over an L1 entry: the L1 entry might be very hot, and the
 owned entry might be almost-never read.

Considering that even serving a Get operation to another node (as
 owners of the entry) makes the entry less likely to be passivated (it
 counts as a hit), the current design naturally provides an optimal
 balance for memory usage.

 At the opposite site, I don't see how - as a user - I could optimally
 tune a separate container.

  2. L1 values will not cause eviction of real data

 -1
 That's not a benefit, as I explained above. Real Data is not less
 important, especially if it's never used.
 Granted I'm making some assumptions about the application having some
 hot-data and some less hot data, and not being able to take advantage
 of node pinning or affinity strategies.. but that is another way to
 say that I'm assuming the user needs L1: if it was possible to apply
 these more advanced strategies I'd disable L1 altogether.


You're also assuming that data can always be recreated from another source,
but I don't think that's always the case. If Infinispan is the canonical
data store and there is no cache store, you can't enable eviction for the
real data and with a single container you can't enable eviction for the
L1 entries either.


  3. Would make https://issues.jboss.org/browse/ISPN-3229 an easy fix
  4. Could add a new DataContainer implementation specific to L1 with
  additional optimizations

 You have some example of what you have in mind?
 Considering you would need to consider the optimal balance usage of
 the available heap space, I suspect that would be quite hard.

  5. Help with some concurrency issues with L1 without requiring wider
 locking
  (such as locking a key for an entire ClusteredGet rpc call) -
  https://issues.jboss.org/browse/ISPN-3197.

 I don't understand this. L1 entries require the same level of
 consistency than any other entry so I suspect you would need the same
 locking patterns replicated. The downside would be that you get
 duplication of the same logic.
 Remember also that L1 is having some similarities with entries still
 hanging around when they where previously stored in this node after
 a state transfer.. today these are considered L1-active entries, if
 you change the storage you would need to design a migration of state
 from one container to the other; migration of state might not be too
 hard, doing it while guaranteeing consistent locking is going to be I
 guess as hard as considering the L1 problem today.

 
  Drawbacks:
  1. Would require, depending on configuration, an additional thread for
  eviction
  2. Users upgrading could have double memory used up due to 2 data
 containers

 This drawback specifically is to be considered very seriously. I don't
 think people would be happy to buy and maintain a twice as large
 datacenter than what they actually need.

 Sanne

 
  Both?:
  1. Additional configuration available
 a. Add maxEntires just like the normal data container (use data
 container
  size if not configured?)
 b. Eviction wakeup timer?  We could just reuse the task cleanup
  frequency?
 c. Eviction strategy?  I would think the default data container's
 would
  be sufficient.
 
  I was wondering what you guys thought.
 
  Thanks,
 
   - Will
 
  ___
  infinispan-dev mailing list
  infinispan-dev@lists.jboss.org
  https://lists.jboss.org/mailman/listinfo/infinispan-dev
 ___
 infinispan-dev mailing list
 infinispan-dev@lists.jboss.org
 https://lists.jboss.org/mailman/listinfo/infinispan-dev

___
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev

[infinispan-dev] Infinispan 5.3.0.Final is out!

2013-06-26 Thread Dan Berindei
Dear Infinispan community,

We're proud to announce the final release of Infinispan 5.3.0 Tactical
Nuclear Penguin. Besides increased stability (130+ bug fixes) this release
also brings some highly demanded features:

   - Total Order transaction
protocolhttp://infinispan.blogspot.co.uk/2013/04/faster-transaction-protocols-in.htmldeveloped
within the scope of the CloudTM project yielding significant
   performance improvements under concurrent access
   - Support for
JSR-107http://infinispan.blogspot.co.uk/2013/04/give-java-caching-standard-api-go-using.html(Java
Caching API) implementation
   - A new implementation of the Lucene Directory for Infinispan based on
   Lucene 4
   - A new 
packaginghttp://infinispan.blogspot.co.uk/2013/04/infinispan-server-530alpha1.htmlfor
the Infinispan server modules, based on the JBoss AS
   - SSL access to Hot
Rodhttp://infinispan.blogspot.co.uk/2013/05/infinispan-server-remote-protocols.html
   - 
Interoperabilityhttp://infinispan.blogspot.co.uk/2013/05/interoperability-between-embedded-and.htmlbetween
Hot Rod, Memcached, REST and embedded mode
   - Storing 
arrayshttps://docs.jboss.org/author/display/ISPN/Storing+objects+%28e.g.+arrays%29+with+custom+Equivalence+functionsin
Infinispan as keys and values
   - Several new cache stores: a mongoDB cache
storehttp://infinispan.blogspot.co.uk/2013/06/using-mongodb-as-cache-store.html(courtesy
of  Guillaume Scheibel), a JPA
   based cache 
storehttp://infinispan.blogspot.co.uk/2013/05/introducing-jpa-cache-store.htmland
a LevelDB
   cache store https://issues.jboss.org/browse/ISPN-2657 (courtesy of Ray
   Tsang)

 For a complete list of features included in this release refer to the release
noteshttps://issues.jboss.org/secure/ReleaseNote.jspa?projectId=12310799version=12320550
.
 Visit our downloads http://www.jboss.org/infinispan/downloads section to
find the latest release and if you have any questions please check our
forums http://www.jboss.org/infinispan/forums, our mailing
listshttps://lists.jboss.org/mailman/listinfo/infinispan-devor ping
us directly on
IRC.

 Time to move forward now: Infinispan 6.0 will bring some significant
improvements https://community.jboss.org/en/infinispan?view=documents to
the Inifinispan ecosystem  and also a shift in licence to Apache Software
Licencehttp://infinispan.blogspot.co.uk/2013/05/infinispan-to-adopt-apache-software.html

 Stay tuned!

Cheers,
Dan Berindei
___
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev

Re: [infinispan-dev] Introducing builder subpackages

2013-06-26 Thread Dan Berindei
-1 as well, as a user you only use ConfigurationBuilder and
GlobalConfigurationBuilder directly, and moving them to a subpackage would
make them less accessible.

If anything, I'd move these two classes to the parent package
(org.infinispan.configuration), but like Sanne said there's no point in
breaking the configuration again.


On Wed, Jun 26, 2013 at 10:02 AM, Mircea Markus mmar...@redhat.com wrote:


 On 26 Jun 2013, at 09:54, Sanne Grinovero sa...@infinispan.org wrote:

  Please don't break the configuration API again :-(
 
  On 26 June 2013 06:29, Navin Surtani nsurt...@redhat.com wrote:
  While working through ISPN-2463, and the sub-tasks I was wondering
 about the
  organisation of the ConfigurationBuilder classes.
 
  Currently, they are located in org.infinispan.configuration.cache.etc or
  org.infinispan.configuration.global.etc. The actual Configuration
 classes
  are within the same package already as well. To me, this sounds a
 little bit
  cluttered and perhaps not very intuitive and I was wondering if it
 might be
  a better idea to have something like:
 
  org.infinispan.configuration.cache.builders.ConfigurationBuilder (and
  others)
  org.infinispan.configuration.global.builders.GlobalConfigurationBuilder
 (etc
  etc)
 
  Another suggestion could be:
 
  org.infinispan.configuration.builders.cache.etc
  org.infinispan.configuration.builders.glocal.etc
 
  The only problem with that would be breaking backward compatibility, but
  from ISPN 6.x onwards I think that there are a fair few classes and
 packages
  being moved around anyway. It's just an idea that might make the API
 seem a
  little bit cleaner as to where it's located.
 
  Thoughts?

 Indeed sounds is a good suggestion but as Sanne mentioned this would break
 the backward compatibility which is something we want to avoid.

 Cheers,
 --
 Mircea Markus
 Infinispan lead (www.infinispan.org)





 ___
 infinispan-dev mailing list
 infinispan-dev@lists.jboss.org
 https://lists.jboss.org/mailman/listinfo/infinispan-dev

___
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev

Re: [infinispan-dev] TeamCity posting a message in pull req if any test failed?

2013-06-27 Thread Dan Berindei
On Thu, Jun 27, 2013 at 2:52 PM, Galder Zamarreño gal...@redhat.com wrote:


 On Jun 27, 2013, at 12:09 PM, Manik Surtani msurt...@redhat.com wrote:

 
  On 27 Jun 2013, at 09:57, Galder Zamarreño gal...@redhat.com wrote:
 
 
  On Jun 26, 2013, at 12:28 PM, Dan Berindei dan.berin...@gmail.com
 wrote:
 
  TeamCity also updates the status of the PR (right below the
 description).
 
  ^ Sure, it updates the status, but we get no notification. IOW, it's up
 to you to go and check the status which is not ideal. I'd like to see some
 kind of mail notification, or comment when the test finishes. The AS guys
 did this in their extension...
 
  Unfortunately the status disappears the moment the PR is updated OR
 the merge branch is updated (because the status is set on the commit that
 is the merge result).
 
  I also receive emails from TeamCity and notifications in IntelliJ for
 all my PRs with failing tests - if you don't, make sure you have set up the
 VCS user name in your TeamCity settings.
 
  ^ Emails from TeamCity? Notifications in IntelliJ? I wasn't aware of
 either… Any docu on this?
 
  http://www.jetbrains.com/idea/features/teamcity_integration.html

 Oh, there's a special plugin for this… I was looking for it in the
 preferences :)

 Thanks!



You can also configure your email notifications here:
http://ci.infinispan.org/profile.html?tab=userGeneralSettings
___
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev

Re: [infinispan-dev] L1 Consistency with Sync Caches

2013-06-27 Thread Dan Berindei
On Thu, Jun 27, 2013 at 4:18 PM, William Burns mudokon...@gmail.com wrote:

 First off I apologize for the length.

 There have been a few Jiras recently that have identified L1 consistency
 issues with both TX and non TX sync caches.  Async caches with L1 have
 their own issues as well, but I only wanted to talk about sync caches.

 https://issues.jboss.org/browse/ISPN-3197
 https://issues.jboss.org/browse/ISPN-2965
 https://issues.jboss.org/browse/ISPN-2990

 I have proposed a solution in
 https://github.com/infinispan/infinispan/pull/1922 which should start L1
 consistency down the right track.  There are quite a few comments on it if
 you want to look into it more, but because of that I am moving this to the
 dev mailing list.

 The key changes in the PR are the following (non-tx):

 1. Concurrent reads for a key that can retrieve a remote value are
 corralled into a single thread of execution for that given key.  This
 would reduce network traffic with concurrent gets for the same key.  Note
 the corralling only happens on a per key basis.


Get commands on owners should not be serialized. Get commands on non-owners
should not be serialized either, if the key already exists in L1. So I'd
say L1ReadSynchronizer should be L1WriteSynchronizer instead :)


 2. The single thread that is doing the remote get would update the L1 if
 able (without locking) and make available the value to all the requests
 waiting on the get.


Well, L1ReadSynchronizer does prevent other threads from modifying the same
key, so we are locking the key - just not using LockManager.
It would also require StateTransferLock.acquireSharedTopologyLock() to make
sure it doesn't write an L1 entry after the node became a proper owner.


 3. Invalidations that are received would first check to see if there is a
 current remote get occurring for it's keys.  If there is it will attempt to
 cancel the L1 write(s) before it occurs.  If it cannot cancel the L1 write,
 then it must also wait on the current remote get completion and
 subsequently run the invalidation.  Note the cancellation would fail when
 the remote get was done and it is in the middle of updating the L1, so this
 would be very small window.


I think it would be clearer to describe this as the L1 invalidation
cancelling the remote get, not the L1 update, because the actual L1 update
can't be cancelled.

We also have to remove the logic in AbstractLockingInterceptor that skips
L1 invalidation for a key if it can't acquire a lock with a 0 timeout.


 4. Local writes will also do the same thing as the invalidation with
 cancelling or waiting.  Note that non tx local writes only do L1
 invalidations and don't write the value to the data container.  Reasons why
 I found at https://issues.jboss.org/browse/ISPN-3214


I didn't know about ISPN-3214 or that non-tx writes don't write to L1, but
it sounds fair.


  5. Writes that require the previous value and don't have it in the L1
 would also do it's get operations using the same corralling method.


The remoteGetBeforeWrites are a bit different - they don't happen on
non-owners, they only happen on writeCH-owners that didn't receive that
entry via state transfer yet. They put the value in the InvocationContext,
but they don't write it to the data container - nor do they invalidate the
L1 entry, if it exists.



 4/5 are not currently implemented in PR.

 This approach would use no locking for non tx caches for all L1
 operations.  The synchronization point would be done through the
 corralling method and invalidations/writes communicating to it.

 Transactional caches would do almost the same thing as non-tx.  Note these
 changes are not done in any way yet.

 1. Gets would now update the L1 immediately after retrieving the value
 without locking, but still using the corralling technique that non-tx
 does.  Previously the L1 update from a get was transactional.  This
 actually would remedy issue [1]

2. Writes currently acquire the remote lock when committing, which is why
 tx caches are able to update the L1 with the value.  Writes would do the
 same cancellation/wait method as non-tx.

3. Writes that require the previous value and don't have it in the L1 would
 also do it's get operations using the same method.


Just like for non-tx caches, I don't think these remote gets have to be
stored in L1.


 4. For tx cache [2] would also have to be done.

 [1] -
 https://issues.jboss.org/browse/ISPN-2965?focusedCommentId=12779780page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-12779780
 [2] - https://issues.jboss.org/browse/ISPN-1540

 Also rehashing is another issue, but we should be able to acquire the
 state transfer lock before updating the L1 on a get, just like when an
 entry is committed to the data container.


The same for L1 invalidations - we don't want to remove real entries from
the data container after the local node became an owner.


 Any comments/concerns would be appreciated.

 Thanks,

  - 

Re: [infinispan-dev] L1 Consistency with Sync Caches

2013-06-27 Thread Dan Berindei
On Thu, Jun 27, 2013 at 4:40 PM, William Burns mudokon...@gmail.com wrote:

 Comments that were outstanding on PR:

 @danberindei:

   +1 to move the discussion to the mailing list, could you summarize
 your changes (preferably for both non-tx and tx cases) and send an
 email to the list?
   And now to add some more to this already unwieldy discussion :)

1. Many interceptors check ownership, I don't think that would be
 a problem. Besides, I think creating a new L1ReadSynchronizer for
 every read is just as bad for performance as locking the key on every
 read, so you'd need that check either way.
 
 We can't use a try lock approach for L1 invalidation when gets are
 locked without possibly dropping a L1 update
 We can't use a timed lock approach for L1 invalidation when writes
 lock the key as we could get into a timed deadlock situation when
 another node concurrently writes to a value/stripe

 I still don't see how we can get away with locking on a get.  What are
 you proposing?


I'm proposing something like this:

1. For get commands: acquire local lock + remote get + store in L1 +
release local lock
2. For invalidate commands: acquire local lock + remove entry from L1 +
release local lock
3. For write commands: invoke on primary owner, and the primary owner sends
invalidation back to originator; alternatively, skip the invalidation
command to the originator and do invoke on primary owner + acquire local
lock + remove entry from L1 + release local lock

All the lock acquisitions would use a regular timeout, not a try lock
approach.


2. By default L1 invalidations are sent as multicasts, so I'm not sure
 ISPN-3273 really matters here. BTW, I wonder if we have a check to only
 send L1 invalidations from one node if the threshold is 0...
 I agree that is the default, but we should support the operation,
 although it doesn't matter for this discussion.  Also I am curious as
 to why multicast for L1 isn't set to say 2 by default?  It seems
 wasteful to send a multicast to all members that they process when
 only 1 would do anything about it.  Do you know why this is like that?


I suppose it's because we don't have good perf numbers for different L1
invalidation threshold numbers...

The problem is, we don't have a way to count all the requestors of a key in
the cluster, so it's reasonably likely that with a threshold of 2 you'd get
1 unicast invalidation from one owner + 1 multicast invalidation from the
other owner, making it less efficient than a single multicast invalidation.

 
   3a. Right, for put commands we can't hold the local lock while
 executing the remote put, or we'll have a deadlock. But I think a shorter
 lock, held only after the remote put completed (or after the lock on the
 primary owner was acquired, with txs) should work.
 Same point under 1


I don't see how we could get a deadlock if we don't hold the local lock
during the remote write invocation.


 
   3b. We'd also have an ownership check before, so we'd only serialize
 the get commands that need to go remotely for the same key. I think it
 would be almost the same as your solution (although it does have one ?
 disadvantage - if the key doesn't exist in the cache, all the get commands
 will go remotely). The number of L1 writes should be very small compared to
 the number of L1 reads anyway, otherwise it would be more efficient to get
 the key from the owner every time.
 You are saying an optimization for owner nodes so they don't do the
 corralling for keys they own?  I like that.  Also I don't think it
 has the disadvantage, it only does remotes it if isn't an owner.


I meant your corralling strategy means if you have 2 concurrent get
commands and one of them retrieves a null from the entry owners, the other
command will return null directly. With regular locking, the other command
wouldn't find anything in L1 and it would do another remote get.

I don't think there's any disadvantage in skipping the corralling for key
owners, in fact I think we need to skip it if the key already exists in L1,
too.


 
  It would be nice to agree on what guarantees we want to provide for L1
 invalidation in non-tx caches, I'm not sure if we can do anything to
 prevent this scenario:
 Actually this scenario doesn't occur with non-tx since writes don't
 update the L1 with their value, they just invalidate.  Tx caches are
 fine with this because they acquire the primary owner lock for the
 duration of the write including the L1 update so you can't have this
 ordering.


Sounds good.


 
  A initiates a put(k, v1) to the primary owner B
  B performs the put(k, v1), invalidates every non-owner and returns
  B performs another put(k, v2), invalidating every non-owner
  A receives the result from B and puts k=v1 in its L1

 @pruivo:

  The invalidation does not need to wait for the remote get. When you
 receive an invalidation, you can mark the current remote get invalid. The
 invalidation command can return immediately and the remote get 

Re: [infinispan-dev] L1 Consistency with Sync Caches

2013-06-28 Thread Dan Berindei
On Fri, Jun 28, 2013 at 12:17 AM, William Burns mudokon...@gmail.comwrote:

 Trying to leave my points that would most likely have responses to
 second email so we can try to get back to a single thread :)


No such luck :)

Sorry for sending 2 replies in the first place, but it seemed more natural
- I meant to comment on your proposal in one email and to describe my
alternative proposal in the second email.


 On Thu, Jun 27, 2013 at 4:12 PM, Dan Berindei dan.berin...@gmail.com
 wrote:
 
  On Thu, Jun 27, 2013 at 4:18 PM, William Burns mudokon...@gmail.com
 wrote:
 
  First off I apologize for the length.
 
  There have been a few Jiras recently that have identified L1 consistency
  issues with both TX and non TX sync caches.  Async caches with L1 have
 their
  own issues as well, but I only wanted to talk about sync caches.
 
  https://issues.jboss.org/browse/ISPN-3197
  https://issues.jboss.org/browse/ISPN-2965
  https://issues.jboss.org/browse/ISPN-2990
 
  I have proposed a solution in
  https://github.com/infinispan/infinispan/pull/1922 which should start
 L1
  consistency down the right track.  There are quite a few comments on it
 if
  you want to look into it more, but because of that I am moving this to
 the
  dev mailing list.
 
  The key changes in the PR are the following (non-tx):
 
  1. Concurrent reads for a key that can retrieve a remote value are
  corralled into a single thread of execution for that given key.  This
  would reduce network traffic with concurrent gets for the same key.
  Note
  the corralling only happens on a per key basis.
 
 
  Get commands on owners should not be serialized. Get commands on
 non-owners
  should not be serialized either, if the key already exists in L1. So I'd
 say
  L1ReadSynchronizer should be L1WriteSynchronizer instead :)

 You are suggesting to check the context to see if the key is present
 before attempting the synchronizer right?  Reading your second email
 that seems the case :)


Nope, I meant we should check the data container (aka the L1 cache). But
obviously we have to check the invocation context first in a tx cache, if
the tx read the key before it should see the same value.


 
 
  2. The single thread that is doing the remote get would update the L1 if
  able (without locking) and make available the value to all the requests
  waiting on the get.
 
 
  Well, L1ReadSynchronizer does prevent other threads from modifying the
 same
  key, so we are locking the key - just not using LockManager.
  It would also require StateTransferLock.acquireSharedTopologyLock() to
 make
  sure it doesn't write an L1 entry after the node became a proper owner.

 Agree, when I was saying locking I was meaning through the use of the
 lock manager.

 
 
  3. Invalidations that are received would first check to see if there is
 a
  current remote get occurring for it's keys.  If there is it will
 attempt to
  cancel the L1 write(s) before it occurs.  If it cannot cancel the L1
 write,
  then it must also wait on the current remote get completion and
 subsequently
  run the invalidation.  Note the cancellation would fail when the remote
 get
  was done and it is in the middle of updating the L1, so this would be
 very
  small window.
 
 
  I think it would be clearer to describe this as the L1 invalidation
  cancelling the remote get, not the L1 update, because the actual L1
 update
  can't be cancelled.

 When I say L1 update I meant the write to the data container after the
 remote get.  The invalidation can't stop the remote get, all it does
 is tell the caller that Hey don't write the remote value you
 retrieved into the L1.


Oh right, the get command will still use the value it got from the remote
node, it just won't write it.
That makes me wonder, though, if something like this can happen:

1. A invokes get(k), starts a L1ReadSynchronizer and a remote get to B
2. B invokes put(k, v) and sends an invalidation command to A
3. The invalidation command cancels the L1 put on A
4. A invokes get(k) again, finds the L1ReadSynchronizer from step 1) and
queues on it
5. Both get(k) commands return the same value, even though the value has
changed on the owner(s).



 
  We also have to remove the logic in AbstractLockingInterceptor that
 skips L1
  invalidation for a key if it can't acquire a lock with a 0 timeout.
 
 
  4. Local writes will also do the same thing as the invalidation with
  cancelling or waiting.  Note that non tx local writes only do L1
  invalidations and don't write the value to the data container.  Reasons
 why
  I found at https://issues.jboss.org/browse/ISPN-3214
 
 
  I didn't know about ISPN-3214 or that non-tx writes don't write to L1,
 but
  it sounds fair.

 Yeah I really wanted that to work, but without some additional checks
 such as versioned data, I don't see a way to do this without locking
 at the primary node like tx caches.


In theory, the primary owner could send a synchronous RPC back to the
originator while it is holding

Re: [infinispan-dev] L1 Consistency with Sync Caches

2013-06-28 Thread Dan Berindei
On Fri, Jun 28, 2013 at 12:51 AM, William Burns mudokon...@gmail.comwrote:

 On Thu, Jun 27, 2013 at 4:40 PM, Dan Berindei dan.berin...@gmail.com
 wrote:
 
 
  On Thu, Jun 27, 2013 at 4:40 PM, William Burns mudokon...@gmail.com
 wrote:
 
  Comments that were outstanding on PR:
 
  @danberindei:
 
+1 to move the discussion to the mailing list, could you summarize
  your changes (preferably for both non-tx and tx cases) and send an
  email to the list?
And now to add some more to this already unwieldy discussion :)
 
 1. Many interceptors check ownership, I don't think that would be
  a problem. Besides, I think creating a new L1ReadSynchronizer for
  every read is just as bad for performance as locking the key on every
  read, so you'd need that check either way.
  
  We can't use a try lock approach for L1 invalidation when gets are
  locked without possibly dropping a L1 update
  We can't use a timed lock approach for L1 invalidation when writes
  lock the key as we could get into a timed deadlock situation when
  another node concurrently writes to a value/stripe
 
  I still don't see how we can get away with locking on a get.  What are
  you proposing?
 
 
  I'm proposing something like this:
 
  1. For get commands: acquire local lock + remote get + store in L1 +
 release
  local lock
  2. For invalidate commands: acquire local lock + remove entry from L1 +
  release local lock
  3. For write commands: invoke on primary owner, and the primary owner
 sends
  invalidation back to originator; alternatively, skip the invalidation
  command to the originator and do invoke on primary owner + acquire local
  lock + remove entry from L1 + release local lock

 I do like this and think it would be simpler for non-tx.  However this
 will still not be as performant I wouldn't think (just a gut feeling).


It would indeed be less performant in some cases, e.g. if we have two get
commands for a remote key at the same time and the key doesn't exist, my
proposal would request the key twice.

The writes to L1 *should* be rare enough that the extra lock contention
there would be minimal. There is a chance that an invalidation will be
stuck waiting for a remote get, so it will again be slower than your
approach.

 However I am more than certain it will be slower when collisions
 occur, which is more likely with lock striping.


I'm not that worried about lock striping - the user can always increase the
number of stripes to get less contention. And I hope they know better than
to use lock striping and transactions with multiple keys, or they'll get
deadlocks with or without L1 enabled.


 I don't think this however is feasible for tx caches.

 Optimistic - we don't want to have gets being blocked when the tx is
 being committed - that could be quite a bit of a performance hit
 especially if it has to do a 2 phase commit.  Updating the L1 on a
 remoteGetBeforeWrites  could help remedy this (since get wouldn't need
 to lock), however then you would be almost implementing what I have
 and still have locking overhead.  Maybe this would occur so
 infrequently it wouldn't matter though?


Yes, I would propose removing the L1 commit in EntryWrappingInterceptor as
well, and moving it to the L1 interceptor - immediately after the value was
retrieved from the remote owner. The local lock would also be released
immediately.

The part I wanted to avoid from your proposal was
L1ReadSync/L1ReadSynchronizer, because I found it kind of hard to
understand. I guess I'd find it more approachable all the synchronization
(the map of L1ReadSynchronizers and the interaction with it) would be in a
separate class, with its own tests, and the L1 interceptor would only know
about the high-level stuff. Like the LockManager also hides a map of locks;
not like the LockManager unit tests, because they're kind of lacking, but
LockManager has the advantage that almost all the tests touch it indirectly.


 Pessimistic - I haven't looked into pessimistic closely yet, but in
 that case I don't think it ever acquires the local locks so it would
 have to acquire a remote lock on a get, which would be pretty
 disastrous (gets would indirectly have a weak FORCE_WRITE_LOCK
 enabled).  Making this to acquire local locks might work as well, but
 then would still have the same issue as optimistic when committing - I
 really am not as familiar with Pessimistic to say for sure.


Nope, acquiring a local lock should be enough. We don't want to prevent
other txs from modifying the key at all, we just want to delay the
invalidation commands triggered by those modifications on the local node.
And the lock would only be held for the duration of the remote get + L1
update, not all the way to the commit.


 
  All the lock acquisitions would use a regular timeout, not a try lock
  approach.
 In this case they could, yes.
 
 
 2. By default L1 invalidations are sent as multicasts, so I'm not
 sure
   ISPN-3273 really matters here. BTW, I wonder if we have

Re: [infinispan-dev] L1 Consistency with Sync Caches

2013-06-28 Thread Dan Berindei
On Fri, Jun 28, 2013 at 3:41 PM, William Burns mudokon...@gmail.com wrote:

 On Fri, Jun 28, 2013 at 5:53 AM, Dan Berindei dan.berin...@gmail.com
 wrote:
 
 
  On Fri, Jun 28, 2013 at 12:51 AM, William Burns mudokon...@gmail.com
  wrote:
 
  On Thu, Jun 27, 2013 at 4:40 PM, Dan Berindei dan.berin...@gmail.com
  wrote:
  
  
   On Thu, Jun 27, 2013 at 4:40 PM, William Burns mudokon...@gmail.com
   wrote:
  
   Comments that were outstanding on PR:
  
   @danberindei:
  
 +1 to move the discussion to the mailing list, could you summarize
   your changes (preferably for both non-tx and tx cases) and send an
   email to the list?
 And now to add some more to this already unwieldy discussion :)
  
  1. Many interceptors check ownership, I don't think that would be
   a problem. Besides, I think creating a new L1ReadSynchronizer for
   every read is just as bad for performance as locking the key on every
   read, so you'd need that check either way.
   
   We can't use a try lock approach for L1 invalidation when gets are
   locked without possibly dropping a L1 update
   We can't use a timed lock approach for L1 invalidation when writes
   lock the key as we could get into a timed deadlock situation when
   another node concurrently writes to a value/stripe
  
   I still don't see how we can get away with locking on a get.  What
 are
   you proposing?
  
  
   I'm proposing something like this:
  
   1. For get commands: acquire local lock + remote get + store in L1 +
   release
   local lock
   2. For invalidate commands: acquire local lock + remove entry from L1
 +
   release local lock
   3. For write commands: invoke on primary owner, and the primary owner
   sends
   invalidation back to originator; alternatively, skip the invalidation
   command to the originator and do invoke on primary owner + acquire
 local
   lock + remove entry from L1 + release local lock
 
  I do like this and think it would be simpler for non-tx.  However this
  will still not be as performant I wouldn't think (just a gut feeling).
 
 
  It would indeed be less performant in some cases, e.g. if we have two get
  commands for a remote key at the same time and the key doesn't exist, my
  proposal would request the key twice.
 
  The writes to L1 *should* be rare enough that the extra lock contention
  there would be minimal. There is a chance that an invalidation will be
 stuck
  waiting for a remote get, so it will again be slower than your approach.
 
   However I am more than certain it will be slower when collisions
  occur, which is more likely with lock striping.
 
 
  I'm not that worried about lock striping - the user can always increase
 the
  number of stripes to get less contention. And I hope they know better
 than
  to use lock striping and transactions with multiple keys, or they'll get
  deadlocks with or without L1 enabled.

 I didn't think of the deadlock problem with tx and lock striping.  I
 am wondering should we log a warning message so that user's get an
 explicit warning?


Yeah, I think a warning would be nice - especially in the documentation, it
looks way too innocuous.


 
  I don't think this however is feasible for tx caches.
 
  Optimistic - we don't want to have gets being blocked when the tx is
  being committed - that could be quite a bit of a performance hit
  especially if it has to do a 2 phase commit.  Updating the L1 on a
  remoteGetBeforeWrites  could help remedy this (since get wouldn't need
  to lock), however then you would be almost implementing what I have
  and still have locking overhead.  Maybe this would occur so
  infrequently it wouldn't matter though?
 
 
  Yes, I would propose removing the L1 commit in EntryWrappingInterceptor
 as
  well, and moving it to the L1 interceptor - immediately after the value
 was
  retrieved from the remote owner. The local lock would also be released
  immediately.
 
  The part I wanted to avoid from your proposal was
  L1ReadSync/L1ReadSynchronizer, because I found it kind of hard to
  understand. I guess I'd find it more approachable all the synchronization
  (the map of L1ReadSynchronizers and the interaction with it) would be in
 a
  separate class, with its own tests, and the L1 interceptor would only
 know
  about the high-level stuff. Like the LockManager also hides a map of
 locks;
  not like the LockManager unit tests, because they're kind of lacking, but
  LockManager has the advantage that almost all the tests touch it
 indirectly.
 

 True, I agree I like it as a separate class better - much better
 separation.  Maybe I can move that over and add tests and see if that
 makes it more clear?  If not we can still always use the locking
 approach.


+1


 
  Pessimistic - I haven't looked into pessimistic closely yet, but in
  that case I don't think it ever acquires the local locks so it would
  have to acquire a remote lock on a get, which would be pretty
  disastrous (gets would indirectly have a weak FORCE_WRITE_LOCK
  enabled

Re: [infinispan-dev] L1 Consistency with Sync Caches

2013-07-02 Thread Dan Berindei
On Fri, Jun 28, 2013 at 4:39 PM, William Burns mudokon...@gmail.com wrote:

 On Fri, Jun 28, 2013 at 5:14 AM, Dan Berindei dan.berin...@gmail.com
 wrote:
 
  On Fri, Jun 28, 2013 at 12:17 AM, William Burns mudokon...@gmail.com
  wrote:
 
  Trying to leave my points that would most likely have responses to
  second email so we can try to get back to a single thread :)
 
 
  No such luck :)
 
  Sorry for sending 2 replies in the first place, but it seemed more
 natural -
  I meant to comment on your proposal in one email and to describe my
  alternative proposal in the second email.
 
 
  On Thu, Jun 27, 2013 at 4:12 PM, Dan Berindei dan.berin...@gmail.com
  wrote:
  
   On Thu, Jun 27, 2013 at 4:18 PM, William Burns mudokon...@gmail.com
   wrote:
  
   First off I apologize for the length.
  
   There have been a few Jiras recently that have identified L1
   consistency
   issues with both TX and non TX sync caches.  Async caches with L1
 have
   their
   own issues as well, but I only wanted to talk about sync caches.
  
   https://issues.jboss.org/browse/ISPN-3197
   https://issues.jboss.org/browse/ISPN-2965
   https://issues.jboss.org/browse/ISPN-2990
  
   I have proposed a solution in
   https://github.com/infinispan/infinispan/pull/1922 which should
 start
   L1
   consistency down the right track.  There are quite a few comments on
 it
   if
   you want to look into it more, but because of that I am moving this
 to
   the
   dev mailing list.
  
   The key changes in the PR are the following (non-tx):
  
   1. Concurrent reads for a key that can retrieve a remote value are
   corralled into a single thread of execution for that given key.
  This
   would reduce network traffic with concurrent gets for the same key.
   Note
   the corralling only happens on a per key basis.
  
  
   Get commands on owners should not be serialized. Get commands on
   non-owners
   should not be serialized either, if the key already exists in L1. So
 I'd
   say
   L1ReadSynchronizer should be L1WriteSynchronizer instead :)
 
  You are suggesting to check the context to see if the key is present
  before attempting the synchronizer right?  Reading your second email
  that seems the case :)
 
 
  Nope, I meant we should check the data container (aka the L1 cache). But
  obviously we have to check the invocation context first in a tx cache, if
  the tx read the key before it should see the same value.
 
 I was thinking because the non-tx wraps the value if it is in the data
 container before hand.  But actually for non-tx it seems I should
 check the data container only and tx I should only check the ctx (to
 guarantee read consistency)


For tx you probably have to check the context first, and the data container
only if you don't find the entry in the context (non-existent entries that
were previously read in the tx should have a NullMarkerEntry in the
context).


  
  
   2. The single thread that is doing the remote get would update the L1
   if
   able (without locking) and make available the value to all the
 requests
   waiting on the get.
  
  
   Well, L1ReadSynchronizer does prevent other threads from modifying the
   same
   key, so we are locking the key - just not using LockManager.
   It would also require StateTransferLock.acquireSharedTopologyLock() to
   make
   sure it doesn't write an L1 entry after the node became a proper
 owner.
 
  Agree, when I was saying locking I was meaning through the use of the
  lock manager.
 
  
  
   3. Invalidations that are received would first check to see if there
 is
   a
   current remote get occurring for it's keys.  If there is it will
   attempt to
   cancel the L1 write(s) before it occurs.  If it cannot cancel the L1
   write,
   then it must also wait on the current remote get completion and
   subsequently
   run the invalidation.  Note the cancellation would fail when the
 remote
   get
   was done and it is in the middle of updating the L1, so this would be
   very
   small window.
  
  
   I think it would be clearer to describe this as the L1 invalidation
   cancelling the remote get, not the L1 update, because the actual L1
   update
   can't be cancelled.
 
  When I say L1 update I meant the write to the data container after the
  remote get.  The invalidation can't stop the remote get, all it does
  is tell the caller that Hey don't write the remote value you
  retrieved into the L1.
 
 
  Oh right, the get command will still use the value it got from the remote
  node, it just won't write it.
  That makes me wonder, though, if something like this can happen:
 
  1. A invokes get(k), starts a L1ReadSynchronizer and a remote get to B
  2. B invokes put(k, v) and sends an invalidation command to A
  3. The invalidation command cancels the L1 put on A
  4. A invokes get(k) again, finds the L1ReadSynchronizer from step 1) and
  queues on it
  5. Both get(k) commands return the same value, even though the value has
  changed on the owner(s).
 

 Yeah

Re: [infinispan-dev] L1 consistency for transactional caches.

2013-07-02 Thread Dan Berindei
On Tue, Jul 2, 2013 at 5:59 PM, Pedro Ruivo pe...@infinispan.org wrote:

 Hi all,

 simple question: What are the consistency guaranties that is supposed to
 be ensured?

 I have the following scenario (happened in a test case):

 NonOwner: remote get key
 BackupOwner: receives the remote get and replies (with the correct value)
 BackupOwner: put in L1 the value


I assume you meant NonOwner here?


 PrimaryOwner: [at the same time] is committing a transaction that will
 update the key.
 PrimaryOwer: receives the remote get after sending the commit. The
 invalidation for L1 is not sent to NonOwner.


At some point, BackupOwner has to execute the commit as well, and it should
send an InvalidateL1Command(key) to NonOwner. However, one of the bugs that
Will is working on could prevent that invalidation from working (maybe
https://issues.jboss.org/browse/ISPN-2965).



 The test finishes and I perform a check for the key value in all the
 caches. The NonOwner returns the L1 cached value (==test fail).

 IMO, this is bug (or not) depending what guaranties we provide.

 wdyt?


It's a bug!

IMO, at least in DIST_SYNC mode with sync commit, we should guarantee that
stale entries are removed from non-owners before the TM.commit() call
returns on the originator.

With other configurations we probably can't guarantee that, instead we
should guarantee that stale entries are removed from non-owners soon
after the TM.commit() call returns on the originator.

I don't think it's ok to say that a stale entry can stay in L1 indefinitely
in any configuration - otherwise why have L1 invalidation at all?
___
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev

Re: [infinispan-dev] L1 consistency for transactional caches.

2013-07-02 Thread Dan Berindei
On Tue, Jul 2, 2013 at 6:36 PM, Pedro Ruivo pe...@infinispan.org wrote:



 On 07/02/2013 04:21 PM, Sanne Grinovero wrote:
  +1 for considering it a BUG
 
  Didn't we decide a year ago that GET operations should be sent to a
  single node only (the primary) ?

 +1 :)


Manik had a patch for staggering remote GET calls, but it was slowing down
reads by 25%: http://markmail.org/message/vsx46qbfzzxkkl4w


 
  On 2 July 2013 15:59, Pedro Ruivo pe...@infinispan.org wrote:
  Hi all,
 
  simple question: What are the consistency guaranties that is supposed to
  be ensured?
 
  I have the following scenario (happened in a test case):
 
  NonOwner: remote get key
  BackupOwner: receives the remote get and replies (with the correct
 value)
  BackupOwner: put in L1 the value
  PrimaryOwner: [at the same time] is committing a transaction that will
  update the key.
  PrimaryOwer: receives the remote get after sending the commit. The
  invalidation for L1 is not sent to NonOwner.
 
  The test finishes and I perform a check for the key value in all the
  caches. The NonOwner returns the L1 cached value (==test fail).
 
  IMO, this is bug (or not) depending what guaranties we provide.
 
  wdyt?
 
  Pedro
  ___
  infinispan-dev mailing list
  infinispan-dev@lists.jboss.org
  https://lists.jboss.org/mailman/listinfo/infinispan-dev
  ___
  infinispan-dev mailing list
  infinispan-dev@lists.jboss.org
  https://lists.jboss.org/mailman/listinfo/infinispan-dev
 
 ___
 infinispan-dev mailing list
 infinispan-dev@lists.jboss.org
 https://lists.jboss.org/mailman/listinfo/infinispan-dev

___
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev

Re: [infinispan-dev] L1 consistency for transactional caches.

2013-07-02 Thread Dan Berindei
On Tue, Jul 2, 2013 at 6:35 PM, Pedro Ruivo pe...@infinispan.org wrote:



 On 07/02/2013 04:29 PM, Dan Berindei wrote:
 
 
 
  On Tue, Jul 2, 2013 at 5:59 PM, Pedro Ruivo pe...@infinispan.org
  mailto:pe...@infinispan.org wrote:
 
  Hi all,
 
  simple question: What are the consistency guaranties that is
 supposed to
  be ensured?
 
  I have the following scenario (happened in a test case):
 
  NonOwner: remote get key
  BackupOwner: receives the remote get and replies (with the correct
  value)
  BackupOwner: put in L1 the value
 
 
  I assume you meant NonOwner here?

 yes

 
  PrimaryOwner: [at the same time] is committing a transaction that
 will
  update the key.
  PrimaryOwer: receives the remote get after sending the commit. The
  invalidation for L1 is not sent to NonOwner.
 
 
  At some point, BackupOwner has to execute the commit as well, and it
  should send an InvalidateL1Command(key) to NonOwner. However, one of the
  bugs that Will is working on could prevent that invalidation from
  working (maybe https://issues.jboss.org/browse/ISPN-2965).

 only the primary owner is sending the invalidation command.


Oops, you're right! And I'm pretty sure I made the same assumptions in my
replies to Will's L1 thread...

I guess we could make it work either sending the invalidations from all the
owners (slowing down writes, because most of the time we'd send the same
commands twice), or by sending remote get commands ONLY to the primary
owner (which will slow down remote reads).

Staggering remote GET commands won't work, because with staggering you
still have the possibility of the first request reaching the primary owner
only after the second request returned and the entry was written to L1.

Sending the invalidations from the tx originator after the commit might
work as well, but only if L1.invalidationTreshold == 0 and all the L1
invalidations are sent as multicasts.


  
 
  The test finishes and I perform a check for the key value in all the
  caches. The NonOwner returns the L1 cached value (==test fail).
 
  IMO, this is bug (or not) depending what guaranties we provide.
 
  wdyt?
 
 
  It's a bug!
 
  IMO, at least in DIST_SYNC mode with sync commit, we should guarantee
  that stale entries are removed from non-owners before the TM.commit()
  call returns on the originator.
 
  With other configurations we probably can't guarantee that, instead we
  should guarantee that stale entries are removed from non-owners soon
  after the TM.commit() call returns on the originator.
 
  I don't think it's ok to say that a stale entry can stay in L1
  indefinitely in any configuration - otherwise why have L1 invalidation
  at all?
 
 
 
 
  ___
  infinispan-dev mailing list
  infinispan-dev@lists.jboss.org
  https://lists.jboss.org/mailman/listinfo/infinispan-dev
 
 ___
 infinispan-dev mailing list
 infinispan-dev@lists.jboss.org
 https://lists.jboss.org/mailman/listinfo/infinispan-dev

___
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev

Re: [infinispan-dev] L1 consistency for transactional caches.

2013-07-02 Thread Dan Berindei
It's not wrong, sending the invalidation only from the primary owner is
wrong :)


On Tue, Jul 2, 2013 at 7:14 PM, Sanne Grinovero sa...@infinispan.orgwrote:

 I see, so we keep the wrong implementation because it's faster?

 :D

 On 2 July 2013 16:38, Dan Berindei dan.berin...@gmail.com wrote:
 
 
 
  On Tue, Jul 2, 2013 at 6:36 PM, Pedro Ruivo pe...@infinispan.org
 wrote:
 
 
 
  On 07/02/2013 04:21 PM, Sanne Grinovero wrote:
   +1 for considering it a BUG
  
   Didn't we decide a year ago that GET operations should be sent to a
   single node only (the primary) ?
 
  +1 :)
 
 
  Manik had a patch for staggering remote GET calls, but it was slowing
 down
  reads by 25%: http://markmail.org/message/vsx46qbfzzxkkl4w
 
 
  
   On 2 July 2013 15:59, Pedro Ruivo pe...@infinispan.org wrote:
   Hi all,
  
   simple question: What are the consistency guaranties that is supposed
   to
   be ensured?
  
   I have the following scenario (happened in a test case):
  
   NonOwner: remote get key
   BackupOwner: receives the remote get and replies (with the correct
   value)
   BackupOwner: put in L1 the value
   PrimaryOwner: [at the same time] is committing a transaction that
 will
   update the key.
   PrimaryOwer: receives the remote get after sending the commit. The
   invalidation for L1 is not sent to NonOwner.
  
   The test finishes and I perform a check for the key value in all the
   caches. The NonOwner returns the L1 cached value (==test fail).
  
   IMO, this is bug (or not) depending what guaranties we provide.
  
   wdyt?
  
   Pedro
   ___
   infinispan-dev mailing list
   infinispan-dev@lists.jboss.org
   https://lists.jboss.org/mailman/listinfo/infinispan-dev
   ___
   infinispan-dev mailing list
   infinispan-dev@lists.jboss.org
   https://lists.jboss.org/mailman/listinfo/infinispan-dev
  
  ___
  infinispan-dev mailing list
  infinispan-dev@lists.jboss.org
  https://lists.jboss.org/mailman/listinfo/infinispan-dev
 
 
 
  ___
  infinispan-dev mailing list
  infinispan-dev@lists.jboss.org
  https://lists.jboss.org/mailman/listinfo/infinispan-dev
 ___
 infinispan-dev mailing list
 infinispan-dev@lists.jboss.org
 https://lists.jboss.org/mailman/listinfo/infinispan-dev

___
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev

Re: [infinispan-dev] L1 consistency for transactional caches.

2013-07-03 Thread Dan Berindei
On Tue, Jul 2, 2013 at 8:41 PM, Sanne Grinovero sa...@infinispan.orgwrote:

 On 2 July 2013 17:24, Dan Berindei dan.berin...@gmail.com wrote:
  It's not wrong, sending the invalidation only from the primary owner is
  wrong :)

 Agreed, sending a GET operation to multiple nodes might not be wrong
 per-se but is the root cause of such race conditions, and other subtle
 complexities we might not even be aware of yet.

 I don't know why it was slower, but since the result doesn't make
 sense we should look at it a second time rather than throwing the code
 away.


It does make sense: statistically, the backup owner will sometimes reply
faster than the primary owner.

http://markmail.org/message/qmpn7yueym4tbnve

http://www.bailis.org/blog/doing-redundant-work-to-speed-up-distributed-queries/


 Sending invalidations from a non-primary owner is an interesting
 approach, but then we're having each owner to maintain an independent
 list of nodes who have read the value.
 For each write, the primary node would send an invalidation to each
 registered node, plus the copy to the secondary nodes, which in turn
 sends more L1 invalidation nodes to each of their registered nodes..
 what's the likelihood of duplication of invalidation messages here?
 Sounds like a big network traffic amplifier, lots of network traffic
 triggered for each write.


The likelihood of duplication is very near to 100%, indeed, and in non-tx
caches it would add another RPC to the critical path.

As always, it's a compromise: if we do something to speed up writes, it
will slow down reads. Perhaps we could send the request to the primary
owners only when L1 is enabled, as the number of remote gets should be
smaller, and send the request to all the owners when L1 is disabled, and
the number of remote gets is higher.

Pedro's suggestion to send the request to all the owners, but only write
the value to L1 if the first reply was from the primary owner, sounds like
it should work just as well. It would make L1 slightly less efficient, but
it wouldn't have latency spikes caused by a delay on the primary owner.


 It also implies that we don't have reliability on the list of
 registered nodes, as each owner will be maintaining a different set.
 In this case we should also have each node invalidate its L1 stored
 entries when the node from which they got these entries has left the
 cluster.


Right now we invalidate from L1 all the keys for which the list of owners
changed, whether they're still alive or not, because we don't keep track of
the node we got each entry from.

If we only sent remote get commands to the primary owner, we'd have to
invalidate from L1 all the keys for which the primary owner changed.

One thing that we don't do at the moment, but we should do whether we send
the invalidations from the primary owner or from all the owners, is to
clean up the requestor lists for the keys that a node no longer owns.


 Having it all dealt by the primary owner makes for a much simpler
 design and also makes it more likely that a single L1 invalidate
 message is sent via multicast, or at least with less duplication.


The simplest design would be to never keep track of requestors and always
send a multicast from the originator. In fact, the default configuration is
to always send multicasts (but we still keep track of requestors and we
send the invalidation from the primary owner).

Intuitively, unicasts would be preferable for keys that have a low
read:write ratio, as in a write-intensive scenario, but I wonder if
disabling L1 wouldn't be even better for that scenario.

Cheers
Dan



 Cheers,
 Sanne




 
 
 
  On Tue, Jul 2, 2013 at 7:14 PM, Sanne Grinovero sa...@infinispan.org
  wrote:
 
  I see, so we keep the wrong implementation because it's faster?
 
  :D
 
  On 2 July 2013 16:38, Dan Berindei dan.berin...@gmail.com wrote:
  
  
  
   On Tue, Jul 2, 2013 at 6:36 PM, Pedro Ruivo pe...@infinispan.org
   wrote:
  
  
  
   On 07/02/2013 04:21 PM, Sanne Grinovero wrote:
+1 for considering it a BUG
   
Didn't we decide a year ago that GET operations should be sent to a
single node only (the primary) ?
  
   +1 :)
  
  
   Manik had a patch for staggering remote GET calls, but it was slowing
   down
   reads by 25%: http://markmail.org/message/vsx46qbfzzxkkl4w
  
  
   
On 2 July 2013 15:59, Pedro Ruivo pe...@infinispan.org wrote:
Hi all,
   
simple question: What are the consistency guaranties that is
supposed
to
be ensured?
   
I have the following scenario (happened in a test case):
   
NonOwner: remote get key
BackupOwner: receives the remote get and replies (with the correct
value)
BackupOwner: put in L1 the value
PrimaryOwner: [at the same time] is committing a transaction that
will
update the key.
PrimaryOwer: receives the remote get after sending the commit. The
invalidation for L1 is not sent to NonOwner.
   
The test finishes and I perform a check

Re: [infinispan-dev] L1 consistency for transactional caches.

2013-07-03 Thread Dan Berindei
On Tue, Jul 2, 2013 at 9:51 PM, Pedro Ruivo pe...@infinispan.org wrote:



 On 07/02/2013 04:55 PM, Dan Berindei wrote:
 
 
 
  On Tue, Jul 2, 2013 at 6:35 PM, Pedro Ruivo pe...@infinispan.org
  mailto:pe...@infinispan.org wrote:
 
 
 
  On 07/02/2013 04:29 PM, Dan Berindei wrote:
   
   
   
On Tue, Jul 2, 2013 at 5:59 PM, Pedro Ruivo pe...@infinispan.org
  mailto:pe...@infinispan.org
mailto:pe...@infinispan.org mailto:pe...@infinispan.org
 wrote:
   
Hi all,
   
simple question: What are the consistency guaranties that is
  supposed to
be ensured?
   
I have the following scenario (happened in a test case):
   
NonOwner: remote get key
BackupOwner: receives the remote get and replies (with the
  correct
value)
BackupOwner: put in L1 the value
   
   
I assume you meant NonOwner here?
 
  yes
 
   
PrimaryOwner: [at the same time] is committing a transaction
  that will
update the key.
PrimaryOwer: receives the remote get after sending the
  commit. The
invalidation for L1 is not sent to NonOwner.
   
   
At some point, BackupOwner has to execute the commit as well, and
 it
should send an InvalidateL1Command(key) to NonOwner. However, one
  of the
bugs that Will is working on could prevent that invalidation from
working (maybe https://issues.jboss.org/browse/ISPN-2965).
 
  only the primary owner is sending the invalidation command.
 
 
  Oops, you're right! And I'm pretty sure I made the same assumptions in
  my replies to Will's L1 thread...
 
  I guess we could make it work either sending the invalidations from all
  the owners (slowing down writes, because most of the time we'd send the
  same commands twice), or by sending remote get commands ONLY to the
  primary owner (which will slow down remote reads).
 
  Staggering remote GET commands won't work, because with staggering you
  still have the possibility of the first request reaching the primary
  owner only after the second request returned and the entry was written
  to L1.

 This may be a stupid idea, but we only store the entry in L1 if it is a
 reply from the primary owner? Of course this will reduce the L1 hit
 ratio... :(


I like that... writing the entry to L1 only if the primary owner replied
first would also allow for staggering get requests.


  
  Sending the invalidations from the tx originator after the commit might
  work as well, but only if L1.invalidationTreshold == 0 and all the L1
  invalidations are sent as multicasts.
 
   
   
The test finishes and I perform a check for the key value in
  all the
caches. The NonOwner returns the L1 cached value (==test
 fail).
   
IMO, this is bug (or not) depending what guaranties we
 provide.
   
wdyt?
   
   
It's a bug!
   
IMO, at least in DIST_SYNC mode with sync commit, we should
 guarantee
that stale entries are removed from non-owners before the
 TM.commit()
call returns on the originator.
   
With other configurations we probably can't guarantee that,
  instead we
should guarantee that stale entries are removed from non-owners
  soon
after the TM.commit() call returns on the originator.
   
I don't think it's ok to say that a stale entry can stay in L1
indefinitely in any configuration - otherwise why have L1
  invalidation
at all?
   
   
   
   
___
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
  mailto:infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev
   
  ___
  infinispan-dev mailing list
  infinispan-dev@lists.jboss.org mailto:
 infinispan-dev@lists.jboss.org
  https://lists.jboss.org/mailman/listinfo/infinispan-dev
 
 
 
 
  ___
  infinispan-dev mailing list
  infinispan-dev@lists.jboss.org
  https://lists.jboss.org/mailman/listinfo/infinispan-dev
 
 ___
 infinispan-dev mailing list
 infinispan-dev@lists.jboss.org
 https://lists.jboss.org/mailman/listinfo/infinispan-dev

___
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev

Re: [infinispan-dev] L1 consistency for transactional caches.

2013-07-09 Thread Dan Berindei
On Mon, Jul 8, 2013 at 12:19 AM, Sanne Grinovero sa...@infinispan.orgwrote:

 On 3 July 2013 10:26, Dan Berindei dan.berin...@gmail.com wrote:
 
 
 
  On Tue, Jul 2, 2013 at 8:41 PM, Sanne Grinovero sa...@infinispan.org
  wrote:
 
  On 2 July 2013 17:24, Dan Berindei dan.berin...@gmail.com wrote:
   It's not wrong, sending the invalidation only from the primary owner
 is
   wrong :)
 
  Agreed, sending a GET operation to multiple nodes might not be wrong
  per-se but is the root cause of such race conditions, and other subtle
  complexities we might not even be aware of yet.
 
  I don't know why it was slower, but since the result doesn't make
  sense we should look at it a second time rather than throwing the code
  away.
 
 
  It does make sense: statistically, the backup owner will sometimes reply
  faster than the primary owner.
 
  http://markmail.org/message/qmpn7yueym4tbnve
 
 
 http://www.bailis.org/blog/doing-redundant-work-to-speed-up-distributed-queries/

 Of course, I remember the discussion but you can put may questions
 marks on this decision. First off, this is doubling the load on
 network, which is supposedly our most precious resource, so I highly
 question how we're measuring the benefit of the second request.
 If you read the articles you linked to you'll see google applies such
 a strategy to improve the tail latency, but only sends the second
 request when the first one is not getting a fast answer, and in their
 tests this seems to pay off to a mere 5% of increased network usage..
 I would say that's a significantly different level of trade off.
 Also, opposed to Google's BigTable and Apache Cassandra which use
 these techniques, Infinispan is not supporting an eventually
 consistent model which makes it far more dangerous to read the
 slightly out of date value from the non-owner... sure we can resolve
 those things, but it gets hairy.


In this case specifically, reading from the primary owner only seems a
 much cleaner solution, as IMHO it's a move towards simplification
 rather than adding yet another special case in the codebase.


It's cleaner, but it slows down reads, as we've seen in our own tests.

Also, the consistency problems with staggered remote gets are the same as
the consistency problems with simultaneous remote gets. So it would be much
harder to go back and experiment with staggered gets once we simplify the
code on the assumption that we should only ever read from the primary owner.


 
 
  Sending invalidations from a non-primary owner is an interesting
  approach, but then we're having each owner to maintain an independent
  list of nodes who have read the value.
  For each write, the primary node would send an invalidation to each
  registered node, plus the copy to the secondary nodes, which in turn
  sends more L1 invalidation nodes to each of their registered nodes..
  what's the likelihood of duplication of invalidation messages here?
  Sounds like a big network traffic amplifier, lots of network traffic
  triggered for each write.
 
 
  The likelihood of duplication is very near to 100%, indeed, and in non-tx
  caches it would add another RPC to the critical path.
 
  As always, it's a compromise: if we do something to speed up writes, it
 will
  slow down reads. Perhaps we could send the request to the primary owners
  only when L1 is enabled, as the number of remote gets should be smaller,
 and
  send the request to all the owners when L1 is disabled, and the number of
  remote gets is higher.
 
  Pedro's suggestion to send the request to all the owners, but only write
 the
  value to L1 if the first reply was from the primary owner, sounds like it
  should work just as well. It would make L1 slightly less efficient, but
 it
  wouldn't have latency spikes caused by a delay on the primary owner.

 That's far from an ideal solution; we don't have a clue on how to
 measure what slightly less efficient means: that might reveal to be
 unbearably worse for some usage pattern.


True, in a worst case scenario, the primary owner could be replying
consistently slower than the others, and the entry may never be stored in
L1. And it wouldn't make sense to try this with numOwners = 10, the chances
of the primary owner replying first would be slim.

I think my slightly less efficient description would be more accurate if
we had staggered gets, if we ever get that patch in...



 While we have no clue how worse it can be, it will definitely always
 provide a worse cache hit/miss ratio, so it's easily proven that it's
 going to be sub optimal in all cases.


If numOwners = 1 there is only one way to read the value, so it's clearly
optimal in one case :)


 If you really go for something like that, at least take the value from
 the primary owners when it arrives (second, third, .. whatever but at
 some point you might get it) and then store it in L1: will cost you a
 second unmarshalling operation but that's far better than causing
 (several?) cache misses

Re: [infinispan-dev] Isolation level in Repeatable Read + remote get

2013-07-09 Thread Dan Berindei
Pedro, I'll integrate the PR as it is, and then you can experiment with my
proposal in another branch/PR.


On Tue, Jul 9, 2013 at 1:48 PM, Pedro Ruivo pe...@infinispan.org wrote:



 On 07/09/2013 11:46 AM, Galder Zamarreño wrote:
 
  On Jul 8, 2013, at 11:02 AM, Pedro Ruivo pe...@infinispan.org wrote:
 
  Hi guys,
 
  re: ISPN-2840, ISPN-3235, ISPN-3236
  short: transaction isolation in repeatable read
 
  Dan came up with an idea (a good idea IMO) to change a little the logic
  how entries are put in the context for transactional caches.
 
  One of the Repeatable Read properties is after a key is accessed, the
  transaction should never see other concurrent transaction values, even
  if they commit first. In result, we can optimize the distributed mode by
  only do a remote get on the first time a transaction access a key.
 
  My idea (and the one implemented in the Pull Request)
 
  ^ Which pull req, link? You mean [1]?

 yes

 
  [1] https://github.com/infinispan/infinispan/pull/1937
 
  was adding a flag
  to mark the entry as accessed. All future access to that key will not
  try to fetch the data from container neither from a remote location (we
  have a small but with last one).
 
  The Dan's idea is more simple but require some change in the
  EntryFactory logic.
 
  ^ Which is Dan's idea exactly?

 is the description below:

 
  At this stage, we only put an entry in the
  transaction context if the following conditions are respected:
 
  * the entry exists in data container (i.e. the node is owner or it is
 in L1)
  * we put a RepeatableReadEntry with null value in the transaction
 context if
  ** the entry does not exist in the container but the node is owner
  ** the entry does not exist in the data container but the command has
  flags to skip the remote fetch (like IGNORE_RETURN_VALUE or
  SKIP_REMOTE_LOOKUP). Of course the conditional commands needs special
  attention here.
 
  Note: as usual, if the entry exists in the context, then nothing is
 done.
 
  At the TxDistributionInterceptor level, the check to see if the remote
  get should be done is simple as check if lookupEntries(k)==null.
 
  Dan, if I said something wrong let me know. If I was not clear at some
  point let me know too.
 
  Anyone see any issue with this approach?
  Any other suggestions?
 
  Cheers,
  Pedro Ruivo
  ___
  infinispan-dev mailing list
  infinispan-dev@lists.jboss.org
  https://lists.jboss.org/mailman/listinfo/infinispan-dev
 
 
  --
  Galder Zamarreño
  gal...@redhat.com
  twitter.com/galderz
 
  Project Lead, Escalante
  http://escalante.io
 
  Engineer, Infinispan
  http://infinispan.org
 
 
  ___
  infinispan-dev mailing list
  infinispan-dev@lists.jboss.org
  https://lists.jboss.org/mailman/listinfo/infinispan-dev
 
 ___
 infinispan-dev mailing list
 infinispan-dev@lists.jboss.org
 https://lists.jboss.org/mailman/listinfo/infinispan-dev

___
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev

Re: [infinispan-dev] MongoDB cachestore 5.3.0.Final now in jboss maven repo

2013-07-10 Thread Dan Berindei
Do we need to change something in pom.xml as well so that it's included
automatically in the next release?


On Mon, Jul 8, 2013 at 10:30 PM, Mircea Markus mmar...@redhat.com wrote:

 Thanks for updating the release doc :-)

 On 5 Jul 2013, at 10:02, Tristan Tarrant ttarr...@redhat.com wrote:

  Dearl all,
 
  I have released to maven the missing MongoDB cachestore for 5.3.0.Final.
  I have not re-released the zip distributions to avoid confusion, but
  let's make sure it is included in future releases.
 
  Tristan
 
  ___
  infinispan-dev mailing list
  infinispan-dev@lists.jboss.org
  https://lists.jboss.org/mailman/listinfo/infinispan-dev

 Cheers,
 --
 Mircea Markus
 Infinispan lead (www.infinispan.org)




 ___
 infinispan-dev mailing list
 infinispan-dev@lists.jboss.org
 https://lists.jboss.org/mailman/listinfo/infinispan-dev

___
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev

Re: [infinispan-dev] configuring fetchInMemoryState for topology caches

2013-07-10 Thread Dan Berindei
On Tue, Jul 9, 2013 at 12:48 PM, Mircea Markus mmar...@redhat.com wrote:


 On 21 May 2013, at 17:09, Dan Berindei dan.berin...@gmail.com wrote:

  I wouldn't want to deprecate CCL, I think it definitely has a purpose -
 at least in invalidation mode.
 The only use case I'm aware of for invalidation is 2nd level cache and I
 don't think that needs a remote cache, Galder wdyt?


I don't know if 2nd level cache actually uses ClusterCacheLoader, but it
could definitely use it without exposing the cache through HotRod.


  Even in replication mode, having a lazy alternative to state transfer
 may be useful. Maybe not for the topology cache, but it might make sense
 for large caches.

 this makes me wonder: what would be the case in which one doesn't want to
 use state transfer for a distributed/replicated cache.


In replication mode, if the cache has a huge (non-shared) cache store, some
users may prefer to disable fetchPersistentState and rely on
ClusterCacheLoader instead for less frequently accessed keys.

I don't know if it ever makes sense to use ClusterCacheLoader in
distribution mode, since it always sends the ClusteredGetCommand to the
entire cluster.


Cheers
Dan
___
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev

[infinispan-dev] Infinispan 6.0.0.Alpha2 is out!

2013-08-05 Thread Dan Berindei
Dear Infinispan community,

We're proud to announce the second Alpha release of Infinispan 6.0.0, and
also the second release using the Apache Software Licence.

New features in this release:

   - A new query DSL https://issues.jboss.org/browse/ISPN-3169 that is
   usable both in embedded/library mode and over HotRod.


   - Support for JCache 0.8 https://issues.jboss.org/browse/ISPN-3234,
   the Public Review Draft version of the JSR-107 specification.


   - A slimmer distribution, thanks to moving non-core cache store
   implementations to separate
repositorieshttps://issues.jboss.org/browse/ISPN-3377
   .

For a complete list of features and fixes included in this release please
refer to the release
noteshttps://issues.jboss.org/secure/ReleaseNote.jspa?projectId=12310799version=12321854
.
Visit our downloads http://www.jboss.org/infinispan/downloads section to
find the latest release and if you have any questions please check our
forums http://www.jboss.org/infinispan/forums, our mailing
listshttps://lists.jboss.org/mailman/listinfo/infinispan-dev or
ping us directly on IRC http://www.blogger.com/null.

Thanks to everyone for their involvement and contribution!

Cheers
Dan
___
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev

Re: [infinispan-dev] loaders/shared

2013-08-09 Thread Dan Berindei
+1 to make it per loader


On Thu, Aug 8, 2013 at 6:30 PM, Galder Zamarreño gal...@redhat.com wrote:


 On Aug 8, 2013, at 11:43 AM, Mircea Markus mmar...@redhat.com wrote:

  Currently the shared attribute is configured at *loaders* level, so
 all the defined cache loaders inherit this attribute.
  It should be configured on a per loader basis. Does anyone se any
 problem with that?

 No. I think you're right in that it should be per loader. You could have a
 shared JDBC store and a non-shared local FS.

 Cheers,

 
  Cheers,
  --
  Mircea Markus
  Infinispan lead (www.infinispan.org)
 
 
 
 
 
  ___
  infinispan-dev mailing list
  infinispan-dev@lists.jboss.org
  https://lists.jboss.org/mailman/listinfo/infinispan-dev


 --
 Galder Zamarreño
 gal...@redhat.com
 twitter.com/galderz

 Project Lead, Escalante
 http://escalante.io

 Engineer, Infinispan
 http://infinispan.org


 ___
 infinispan-dev mailing list
 infinispan-dev@lists.jboss.org
 https://lists.jboss.org/mailman/listinfo/infinispan-dev

___
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev

Re: [infinispan-dev] Eventual Consistency in Infinispan

2013-08-23 Thread Dan Berindei
Are you using explicit transactions or only implicit transactions?

As of 5.2, non-transactional caches use a scheme where the modification is
sent to the primary owner, which then sends the modification to the backup
owner(s). The primary owner waits for responses from all the backup owners,
and the originator waits for a response from the primary owner.

In optimistic caches, one-phase transactions are much simpler: the
originator just sends the modification to all the owners, and they all
apply it. Execution order is preserved for transactions originating from
the same node, but transactions originating from different nodes can be
applied by the owners in any order.

So implicit transactions with use1PcForAutoCommitTransactions=true can
certainly be faster than non-transactional operations, but I wouldn't
recommend using them because they aren't atomic. (To be fair, there are
also some problems with atomicity of non-tx operations during state
transfer that we're just fixing now - e.g.
https://issues.jboss.org/browse/ISPN-3357)

Cheers
Dan



On Mon, Aug 19, 2013 at 4:47 PM, Faseela K faseel...@ericsson.com wrote:

 Hi Tristan,

   I was just comparing the performance of synchronous transactional and
 synchronous non-transactional writes in replication mode.
   And I see, transactional writes were taking lesser time, than
 non-transactional.
   Is it expected behaviour? My transactional config has the following
 properties :

 syncRollbackPhase=true
 syncCommitPhase=true
 cacheStopTimeout=3
 use1PcForAutoCommitTransactions=true
 autoCommit=true
 lockingMode=OPTIMISTIC
 useSynchronization=true
 transactionMode=TRANSACTIONAL

   Or Am I using some wrong configuration?

 Thanks,
 Faseela

 -Original Message-
 From: Tristan Tarrant [mailto:ttarr...@redhat.com]
 Sent: Monday, August 19, 2013 6:28 PM
 To: infinispan -Dev List
 Cc: Faseela K
 Subject: Re: [infinispan-dev] Eventual Consistency in Infinispan

 Infinispan does not implement eventual consistency yet.

 Tristan

 On 08/19/2013 02:39 PM, Faseela K wrote:
  Hi,
 I am using Infinispan - 5.3.0 in clustering mode.
 I have a transactional configuration, in replication-synchronous mode.
 I want to know, whether eventual consistency is supported for
  synchronous replication in 5.3.
 Could someone please brief how eventual consistency works in
  infinispan context?
  Thanks,
  Faseela
 
 
  ___
  infinispan-dev mailing list
  infinispan-dev@lists.jboss.org
  https://lists.jboss.org/mailman/listinfo/infinispan-dev


 ___
 infinispan-dev mailing list
 infinispan-dev@lists.jboss.org
 https://lists.jboss.org/mailman/listinfo/infinispan-dev

___
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev

[infinispan-dev] ISPN-3051 configuration

2013-09-09 Thread Dan Berindei
Hi guys

As you know, I'm working on ISPN-3051, allowing each node to take a higher
or lower proportion of the entries in the cache. I've implemented this by
adding a float loadFactor setting in each node's configuration, with 1
being the default and any positive value being accepted (including 0).

There are two questions I wanted to ask you about the configuration:

1. What do you think about the loadFactor name? I started having doubts
about it, since it has a very different meaning in HashMap. I have come up
with a couple alternatives, but I don't love any of them:
proportionalLoad and proportionalCapacity.

2. Where should we put this setting? I have added it as
CacheConfiguration.clustering().hash().loadFactor(), but I can't think of a
reason for having different values for each cache, so we might as well put
it in the global configuration.

Cheers
Dan
___
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev

Re: [infinispan-dev] ISPN-3051 configuration

2013-09-09 Thread Dan Berindei
On Mon, Sep 9, 2013 at 12:34 PM, Tristan Tarrant ttarr...@redhat.comwrote:

 On 09/09/2013 11:18 AM, Dan Berindei wrote:
  Hi guys
 
  As you know, I'm working on ISPN-3051, allowing each node to take a
  higher or lower proportion of the entries in the cache. I've
  implemented this by adding a float loadFactor setting in each node's
  configuration, with 1 being the default and any positive value being
  accepted (including 0).
 
  There are two questions I wanted to ask you about the configuration:
 
  1. What do you think about the loadFactor name? I started having
  doubts about it, since it has a very different meaning in HashMap. I
  have come up with a couple alternatives, but I don't love any of them:
  proportionalLoad and proportionalCapacity.
 Since this is per-node, you want to use the node word in there, so
 nodeCapacity would be good. Can this value change at runtime ?


nodeCapacity by itself would sound like we're limiting the actual number of
keys held on this node, so I feel it would be misleading. You do have a
point about having node in the name though... how about nodeScale?

The value can't change at runtime. TBH, I haven't even considered it.



  2. Where should we put this setting? I have added it as
  CacheConfiguration.clustering().hash().loadFactor(), but I can't think
  of a reason for having different values for each cache, so we might as
  well put it in the global configuration.
 Yes, global sounds good. And don't forget server :)


I guess the others disagree, so I'll keep it at the cache level.

I'll try to remember about server ;)

Cheers
Dan
___
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev

Re: [infinispan-dev] ISPN-3557: interactions between a clear() operation and a Transaction

2013-10-03 Thread Dan Berindei
On Wed, Oct 2, 2013 at 12:48 PM, Pedro Ruivo pe...@infinispan.org wrote:



 On 10/02/2013 12:03 AM, Sanne Grinovero wrote:
  I'd love to brainstorm about the clear() operation and what it means
  on Infinispan.
 
  I'm not sure to what extent, but it seems that clear() is designed to
  work in a TX, or even create an implicit transaction if needed, but
  I'm not understanding how that can work.


I think it's pretty straight-forward to define the semantics of operations
after clear(): they should assume that every key in the cache was removed.

True, like Pedro mentioned, the current implementation allows subsequent
operations in the same tx to see keys inserted by other txs. But I don't
think fixing that would be harder than ensuring that if a tx inserts key k
owned by nodes A and B, clear() from outside the tx either deletes k on A
and B or doesn't delete k on either node.


 
  Obviously a clear() operation isn't listing all keys explicitly.

 (offtopic) like entrySet(), keySet(), values() and iterator() when
 executed in a transactional context. however, I think that this
 operations are more difficult to handle, because they have to return
 consistent data...


I agree, defining how entrySet() and the others should work in a tx seems
harder than defining clear().

Will, I'm guessing your fix for ISPN-3560 still allows entrySet() after
clear() to see entries inserted by another tx in the meantime?


  Which
  implies that it's undefined on which keys it's going to operate when
  it's fired.. that seems like terribly wrong in a distributed key/value
  store as we can't possibly freeze the global state and somehow define
  a set of keys which are going to be affected, while an explicit
  enumeration is needed to acquire the needed locks.

 you may not need to explicit acquire the lock for the affected key. you
 can use a global lock that is share acquired for write transactions
 without clear operation and exclusively acquire for transactions with
 clear operation (exactly what I use for Total Order). also, the clear
 transactions (i.e. the transactions with clear operation) can also only
 acquired the global lock.


+1 for the global lock. It may slow things down a bit in the general case,
because every tx will have to acquire it in shared mode, but it will
guarantee consistency for clear().

It still won't help entrySet() and its brethren...


 
  It might give a nice safe feeling that, when invoking a clear()
  operation in a transaction, I can still abort the transaction to make
  it cancel the operation; that's the only good part I can think of: we
  can cancel it.

 yep


Wouldn't it be harder to integrate it with Total Order if it didn't have a
tx?

But most of all, I think we have to keep it because it's a part of the Map
interface and a part of JCache's Cache interface.



 
  I don't think it has anything to do with consistency though? To make
  sure you're effectively involving all replicas of all entries in a
  consistent way, a lock would need to be acquired on each affected key,
  which again implies a need to enumerate all keys, including the
  unknown keys which might be hiding in a CacheStore: it's not enough to
  broadcast the clear() operation to all nodes and have them simply wipe
  their local state as that's never going to deal correctly
  (consistently) with in-flight transactions working on different nodes
  at different times (I guess enabling Total Order could help but you'd
  need to make it mandatory).
 

 see the comment above about the locks... may not be the most efficient,
 but handles the problem.


Note that we need to iterate the keys anyway, in order to fire the
CacheEntryRemoved listeners. Even if we optimized it skip the iteration
when there are no registered listeners, it would still have to handle the
general case.


  So let's step back a second and consider what is the use case for
  clear() ? I suspect it's primarily a method needed during testing, or
  maybe cleanup before a backup is restored (operations), maybe a
  manually activated JMX operation to clear the cache in exceptional
  cases.

 I'm not aware of the uses cases of a clear() operation, but use it as
 JMX operation can be a problem in transactional caches (depending the
 semantic you have in mind).

 If it is going to clear all the caches (i.e. from all replicas/nodes),
 then we have to find a way to set an order with the current transactions
 or we will have something some nodes delivering a transaction (say tx1)
 before the clear() and other nodes delivering tx1 after the clear(),
 leading to an inconsistent state.

 (offtopic) I'm wondering if we have that problem in the current
 implementation if tx1 is only insert new keys... in that case, we have
 no lock to serialize an order between the clear() and tx1...


I'm pretty sure we do have this problem. We definitely need to add some
tests for clear concurrent with other write operations.

In my view, as long as we're going to expose a clear() 

Re: [infinispan-dev] Warnings vs. Fail Fast

2013-10-04 Thread Dan Berindei
+1 to apply it, but maybe we should use the occasion to move the error
message to the Log interface as well.


On Fri, Oct 4, 2013 at 2:43 AM, Sanne Grinovero sa...@infinispan.orgwrote:

 Currently if a cache is configured with indexing enabled, but the
 Query module isn't on classpath, you get a simple warning.

 I think this should fail with a configuration validation error; it's
 not just safer but also consistent with many other validations.

 I've created ISPN-3583 and patch is ready.. any good reason to not apply
 it?

 Sanne
 ___
 infinispan-dev mailing list
 infinispan-dev@lists.jboss.org
 https://lists.jboss.org/mailman/listinfo/infinispan-dev

___
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev

Re: [infinispan-dev] ISPN-3557: interactions between a clear() operation and a Transaction

2013-10-09 Thread Dan Berindei
On Thu, Oct 3, 2013 at 1:57 PM, Sanne Grinovero sa...@infinispan.orgwrote:

 On 3 October 2013 10:29, Dan Berindei dan.berin...@gmail.com wrote:
 
  On Wed, Oct 2, 2013 at 12:48 PM, Pedro Ruivo pe...@infinispan.org
 wrote:
 
 
 
  On 10/02/2013 12:03 AM, Sanne Grinovero wrote:
   I'd love to brainstorm about the clear() operation and what it means
   on Infinispan.
  
   I'm not sure to what extent, but it seems that clear() is designed to
   work in a TX, or even create an implicit transaction if needed, but
   I'm not understanding how that can work.
 
 
  I think it's pretty straight-forward to define the semantics of
 operations
  after clear(): they should assume that every key in the cache was
 removed.
 
  True, like Pedro mentioned, the current implementation allows subsequent
  operations in the same tx to see keys inserted by other txs. But I don't
  think fixing that would be harder than ensuring that if a tx inserts key
 k
  owned by nodes A and B, clear() from outside the tx either deletes k on A
  and B or doesn't delete k on either node.
 
 
 
  
   Obviously a clear() operation isn't listing all keys explicitly.
 
  (offtopic) like entrySet(), keySet(), values() and iterator() when
  executed in a transactional context. however, I think that this
  operations are more difficult to handle, because they have to return
  consistent data...
 
 
  I agree, defining how entrySet() and the others should work in a tx seems
  harder than defining clear().

 I agree, those other nasty beasts are next on my list: see below..
 But we're not arguing about which method deserves the gold medal for
 complex design ;-)


  Will, I'm guessing your fix for ISPN-3560 still allows entrySet() after
  clear() to see entries inserted by another tx in the meantime?
 
 
   Which
   implies that it's undefined on which keys it's going to operate when
   it's fired.. that seems like terribly wrong in a distributed key/value
   store as we can't possibly freeze the global state and somehow define
   a set of keys which are going to be affected, while an explicit
   enumeration is needed to acquire the needed locks.
 
  you may not need to explicit acquire the lock for the affected key. you
  can use a global lock that is share acquired for write transactions
  without clear operation and exclusively acquire for transactions with
  clear operation (exactly what I use for Total Order). also, the clear
  transactions (i.e. the transactions with clear operation) can also only
  acquired the global lock.
 
 
  +1 for the global lock. It may slow things down a bit in the general
 case,
  because every tx will have to acquire it in shared mode, but it will
  guarantee consistency for clear().

 I had thought of that too, but it will slow down all operations, at
 which benefit?
 To have a nice transactional clear() operation which guarantees
 consistency?


For me, the most important reason is to support the JCache API.


 The goal of this thread is to challenge if that makes sense for real use
 cases,
 it seems in fact we would highly prefer a non transactional version,
 and considering that we can't run a single Cache in both modes anymore
 that's a problem.


 Side note on the implementation complexity:
 even the AsyncCacheStore decorator code is made extremely more tricky
 just to handle clear() appropriately, to keep it in the correct order
 of execution
 compared to the other operations, while still allowing
 coaleshing of other operations, which essentially implies we allow
 reordering of operations as long as they affect independent keys.. where
 clear()
 is of course a jolly which requires it's own execution path, its own
 global lock
 and generally makes it far more complex.


Off-topic: I'm not very familiar with the async cache store code, but I
really don't think coalescing writes is good for consistency. In case of a
crash, you'd have lots of incomplete transactions on disk.


 
  It still won't help entrySet() and its brethren...
 
 
  
   It might give a nice safe feeling that, when invoking a clear()
   operation in a transaction, I can still abort the transaction to make
   it cancel the operation; that's the only good part I can think of: we
   can cancel it.
 
  yep
 
 
  Wouldn't it be harder to integrate it with Total Order if it didn't have
 a
  tx?
 
  But most of all, I think we have to keep it because it's a part of the
 Map
  interface and a part of JCache's Cache interface.
 
 
 
  
   I don't think it has anything to do with consistency though? To make
   sure you're effectively involving all replicas of all entries in a
   consistent way, a lock would need to be acquired on each affected key,
   which again implies a need to enumerate all keys, including the
   unknown keys which might be hiding in a CacheStore: it's not enough to
   broadcast the clear() operation to all nodes and have them simply wipe
   their local state as that's never going to deal correctly
   (consistently) with in-flight

Re: [infinispan-dev] Running stress tests on CI ?

2013-10-09 Thread Dan Berindei
We don't have a configured maximum duration for the stress tests. The
stress build take about 13 minutes now, but enabling all the stress tests
will probably increase the duration.

I didn't want to delay regular builds, I've scheduled it to run every night
at 03:00 GMT instead of running after every commit.


On Mon, Oct 7, 2013 at 4:09 PM, Sanne Grinovero sa...@infinispan.orgwrote:

 Nice!
 Is that going to work out with our hardware?

 We'll need to be careful now with the configured duration of each such
 test.

 On 7 October 2013 13:48, Dan Berindei dan.berin...@gmail.com wrote:
  I've created a build in CI for the stress tests:
 
 
 http://ci.infinispan.org/viewType.html?buildTypeId=Infinispan_StressHotspotJdk6
 
  Cheers
  Dan
 
 
 
  On Sun, Oct 6, 2013 at 12:07 AM, Sanne Grinovero sa...@infinispan.org
  wrote:
 
  Hi all,
  the following change introduced a critical issue in the Lucene
 Directory:
 
 final SetString filesList = fileOps.getFileList();
  -  String[] array = filesList.toArray(new String[0]);
  -  return array;
  + return filesList.toArray(new String[filesList.size()]);
 
  I'll leave it as a puzzler to figure why the change is able to cause
  trouble ;-)
 
  This generates a NPE in just a single second of running one of the
  stress tests or performance tests, but I'm guilty of not being able to
  make a normal unit test for this case.
 
  That module contains such limited code, that in the very rare
  occasions in which I apply some changes I re-run the included
  benchmarks; I realize I can't expect that from all of you, so..
 
  Should we enable some stress tests on CI?
 
  As a side warning consequence of this, the Lucene Directory in release
  6.0.0.CR1 is very unreliable [ISPN-3592].
 
  --Sanne
  ___
  infinispan-dev mailing list
  infinispan-dev@lists.jboss.org
  https://lists.jboss.org/mailman/listinfo/infinispan-dev
 
 
 
  ___
  infinispan-dev mailing list
  infinispan-dev@lists.jboss.org
  https://lists.jboss.org/mailman/listinfo/infinispan-dev
 ___
 infinispan-dev mailing list
 infinispan-dev@lists.jboss.org
 https://lists.jboss.org/mailman/listinfo/infinispan-dev

___
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev

Re: [infinispan-dev] Is anyone else experiencing JGRP-1675

2013-10-11 Thread Dan Berindei
I've seen StateTransferLargeObjectTest hang on my machine with all OOB
threads waiting in FlowControl$Credit.decrementIfEnoughCredits, but I
thought that was because the JGroups internal thread pool wasn't enabled in
the test configuration. Now that I've seen it's enabled by default, I think
it could be a JGroups problem after all.

I'll enable TRACE logging for org.jgroups and hopefully I'll reproduce it
again. For the record, the test suite is running with TCP, and the internal
thread pool has queueing enabled by default, so I don't think any REPLENISH
message could be lost.

Cheers
Dan



On Fri, Oct 11, 2013 at 4:12 PM, Radim Vansa rva...@redhat.com wrote:

 I remember seeing that mostly on UDP, but I've had an issue with this
 over TCP as well on the cross-site link with multiple site masters
 (synchronous 1pc xs replication).

 Radim

 On 10/11/2013 02:51 PM, Ray Tsang wrote:
  Udp or tcp?
 
  On Oct 11, 2013, at 8:41, Radim Vansa rva...@redhat.com wrote:
 
  Hi,
 
  since Infinispan moved to JGroups 3.4, we're experiencing occassional
  deadlocks in some tests - most of threads that send anything over
  JGroups are waiting in JGroups' FlowControl.decrementCredits. The
  problem sometimes goes away after several seconds, but it produces some
  ugly spikes in our througput/response time charts. Originally this
  affected just some RadarGun tests but this is appearing in some
  client-server tests as well (we've recently investigated an issue where
  this appeared in a regular soak test).
  I was looking into that [1] for some time but haven't really figured out
  the cause. The workaround is to set up MFC and UFC credits high enough
  (I use 10M) and stuff works then. I was trying to reproduce that on pure
  JGroups, but unsuccessfully.
  I am not asking anyone to dig into that, but I wanted to know whether QA
  is alone experiencing that or if there are more of us.
 
  Radim
 
  [1] https://issues.jboss.org/browse/JGRP-1675
 
  --
  Radim Vansa rva...@redhat.com
  JBoss DataGrid QA
 
  ___
  infinispan-dev mailing list
  infinispan-dev@lists.jboss.org
  https://lists.jboss.org/mailman/listinfo/infinispan-dev
  ___
  infinispan-dev mailing list
  infinispan-dev@lists.jboss.org
  https://lists.jboss.org/mailman/listinfo/infinispan-dev


 --
 Radim Vansa rva...@redhat.com
 JBoss DataGrid QA

 ___
 infinispan-dev mailing list
 infinispan-dev@lists.jboss.org
 https://lists.jboss.org/mailman/listinfo/infinispan-dev

___
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev

Re: [infinispan-dev] Is anyone else experiencing JGRP-1675

2013-10-11 Thread Dan Berindei
I changed the default config some time ago, but I forgot to close ISPN-3334.


On Fri, Oct 11, 2013 at 4:35 PM, Ray Tsang saturn...@gmail.com wrote:

 also, don't forget this one: https://issues.jboss.org/browse/ISPN-3334


 On Fri, Oct 11, 2013 at 9:22 AM, Erik Salter an1...@hotmail.com wrote:

 Hi Radim,

 If you're using VMs, this may be of use.  I actually experienced a JGroups
 lockup in production with FC in decrementCredits.  What happened in my
 case
 is that virtualization tools had actually screwed with the system clock --
 it was reported that JBoss took ~1 year to start.

 http://www.vmware.com/pdf/vmware_timekeeping.pdf

 The upshot is that I had to take FC out of my stack going forward.

 Erik

 -Original Message-
 From: infinispan-dev-boun...@lists.jboss.org
 [mailto:infinispan-dev-boun...@lists.jboss.org] On Behalf Of Radim Vansa
 Sent: Friday, October 11, 2013 8:41 AM
 To: infinispan-dev@lists.jboss.org
 Subject: [infinispan-dev] Is anyone else experiencing JGRP-1675

 Hi,

 since Infinispan moved to JGroups 3.4, we're experiencing occassional
 deadlocks in some tests - most of threads that send anything over JGroups
 are waiting in JGroups' FlowControl.decrementCredits. The problem
 sometimes
 goes away after several seconds, but it produces some ugly spikes in our
 througput/response time charts. Originally this affected just some
 RadarGun
 tests but this is appearing in some client-server tests as well (we've
 recently investigated an issue where this appeared in a regular soak
 test).
 I was looking into that [1] for some time but haven't really figured out
 the
 cause. The workaround is to set up MFC and UFC credits high enough (I use
 10M) and stuff works then. I was trying to reproduce that on pure JGroups,
 but unsuccessfully.
 I am not asking anyone to dig into that, but I wanted to know whether QA
 is
 alone experiencing that or if there are more of us.

 Radim

 [1] https://issues.jboss.org/browse/JGRP-1675

 --
 Radim Vansa rva...@redhat.com
 JBoss DataGrid QA

 ___
 infinispan-dev mailing list
 infinispan-dev@lists.jboss.org
 https://lists.jboss.org/mailman/listinfo/infinispan-dev

 ___
 infinispan-dev mailing list
 infinispan-dev@lists.jboss.org
 https://lists.jboss.org/mailman/listinfo/infinispan-dev



 ___
 infinispan-dev mailing list
 infinispan-dev@lists.jboss.org
 https://lists.jboss.org/mailman/listinfo/infinispan-dev

___
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev

Re: [infinispan-dev] The windup of 6.0.0

2013-10-17 Thread Dan Berindei
Bela, do you really need to rename the thread? You already pass a name
argument to the Thread constructor, why not create the thread with the
correct name directly?


On Thu, Oct 17, 2013 at 5:48 PM, Dennis Reed der...@redhat.com wrote:

 On 10/17/2013 05:26 AM, Bela Ban wrote:
  The other thing to look at is the apparent cost of Thread.setName(): if
  we cannot avoid thread many creations, we could experiment with *not*
  naming threads, although this is IMO not a good idea.

 I agree that not naming the threads is a bad idea.

 The thread names are vital for debugging -- both in log messages and in
 thread dumps.
 Not naming the threads would lose a whole lot of information.

 -Dennis
 ___
 infinispan-dev mailing list
 infinispan-dev@lists.jboss.org
 https://lists.jboss.org/mailman/listinfo/infinispan-dev

___
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev

Re: [infinispan-dev] Improve WriteCommand processing code and [possibly] performance

2013-10-29 Thread Dan Berindei
On Tue, Oct 29, 2013 at 1:33 PM, Pedro Ruivo pe...@infinispan.org wrote:



 On 10/29/2013 07:58 AM, Radim Vansa wrote:
  On 10/25/2013 08:17 PM, Pedro Ruivo wrote:
  Hi guys.
 
  I've open a JIRA to tack this:
 https://issues.jboss.org/browse/ISPN-3664
 
  Suggestions/feedback is appreciated. This is probably be integrated in
  the next major (but no promises).
 
  I was not cleared just ping me.
 
  Have a nice weekend folks :)
 
  Regards,
  Pedro
 
  Description is the following:
 
  Major refactorization of the write command with the following goals:
 
  * Base WriteCommand: all the write command has the same workflow through
  the interceptor chain
  * Create a concrete WriteCommand for each operation (put, remove,
  replace, replaceIfEquals, removeIfEquals, putIfAbsent)
  * Extend the interceptor chain to process each one of the command and
  add a new visitWriteCommand, that is invoked by the default visitX
  methods.
  * (minor) change the GetKeyValueCommand to ReadCommand to make name
  compatible with WriteCommand.
 
  Note that most of the interceptor only needs to implement the
  visitWriteCommand because all the write command has the same execution
  flow. The basic flow of the write commands are: (non-tx) lock, fetch
  value (cachestore/remote), check condition and apply change. for tx
  mode, lock (if pessimistic), fetch value (cache loader, remote, etc),
  apply change and add it to the transaction (if successful)
  I've been wondering for a while why the fetch part is a separate message
  in the write flow. Is the return value of much use when it does not
  return really the replaced value but only some of the previous values?
  And this way you double* the latency. I think that returning the value
  from primary owner as the response for write would make more sense.
  Naturally, for optimistic txs you can only do the get, and for
  pessimistic ones you'd have to return the value together with the
  locking, but still, the operations would make more sense.

 Actually, I think you are right. It makes no sense to fecth the value
 before the write operation for non-tx caches. I also like your idea of
 fetching the value when the key is locked :)

 Do you mind to create a new thread for it (I'm not sure if everybody
 reads this thread)?


I don't think we actually fetch the value before the write operation for
non-tx caches, at most we do an extra lookup in the local data container
(in NonTxDistributionInterceptor.remoteGetBeforeWrite).

We do fetch the value explicitly in transactional caches, but we don't
invoke the write command remotely in that case.

There is an opportunity for optimization in pessimistic tx caches, because
we invoke both a remote get and a remote lock command for the same write
command. Perhaps we can make LockControlCommand return the previous value
(only if needed, of course).



 
  *: OK, it's not doubling as with the write the primary owner replicates
  the write to backups, but it's doubling the amount of communication
  originating from the requestor node.
 
 
  Also, another advantage is the simplification of the EntryFactory
  because if we think a little about it, independent of the write command
  we need to wrap the entry anyway.
 
  Suggested implementation
 
  class abstract WriteCommand,
  Object key, Object newValue
  boolen match(Object currentValue) //true by default
  boolean needsRemoteGetBeforeWrite() //true by default
  object perform() //common implementation like: if
  (match(entry.getValue()) then entry.setValue(newValue);
  entry.setChanged(true); entry.setRemoved(newValue == null)}
 
  * Concrete implementations *
 
  {PutCommand|RemoveCommand} extends WriteCommand
  ovewrite needsRemoteGetBeforeWrite() {return
  !flags.contains(IGNORE_RETURN_VALUE)}
 
  ReplaceIfPresentCommand extends WriteCommand
  ovewrite match(Object currentValue) {return currentValue != null}
 
  PutIfAbsentCommand extends WriteCommand
  ovewrite match(Object currentValue) {return currentValue == null}
 
  * Special base class for operation with expected value to compare *
 
  class abstract AdvancedWriteCommand extends WriteCommand
  Object expectedValue
  match(Object currentValue) {return currentValue.equals(expectedValue)}
  I'd call that rather ConditionalWriteCommand - AdvancedWC sounds too
  general to me.


+1 for ConditionalWriteCommand



  
  My 2c.
 
  Radim
 
 ___
 infinispan-dev mailing list
 infinispan-dev@lists.jboss.org
 https://lists.jboss.org/mailman/listinfo/infinispan-dev

___
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev

Re: [infinispan-dev] Retrieving value before write

2013-10-29 Thread Dan Berindei
On Tue, Oct 29, 2013 at 3:56 PM, Mircea Markus mmar...@redhat.com wrote:


 On Oct 29, 2013, at 3:08 PM, William Burns mudokon...@gmail.com wrote:

  I agree, and I could have sworn there was a JIRA that detailed this,
  but I can't seem to locate it now.  I was hoping to get to it next
  release.

 +1.

 
  This also is a very big improvement for non tx caches as well as the
  updating cache only has to send a single message, possibly reducing
  the overall amount of required JGroups messages by a third.

 We're already doing this for non-tx caches, at least since the lock
 delegation was implemented.

   Although
  the amount of bytes reduced is only a fixed amount less.
 
  Also it would make the consistency more correct since right now there
  is technically a window where you could have 2 concurrent updates but
  only 1 sees the correct returned value.

 +1


I'm not sure about that... in non-tx caches we already return the previous
value from the write command, and in pessimistic caches we acquire the
remote lock *before* we retrieve the value.



 
  On Tue, Oct 29, 2013 at 9:42 AM, Radim Vansa rva...@redhat.com wrote:
  Hi,
 
  Pedro suggested moving the following idea to separate thread (from the
  'Improve WriteCommand processing code...'). Re:
 
  I've been wondering for a while why the fetch part is a separate message
  in the write flow. Is the return value of much use when it does not
  return really the replaced value but only some of the previous values?
  And this way you double* the latency. I think that returning the value
  from primary owner as the response for write would make more sense.
  Naturally, for optimistic txs you can only do the get, and for
  pessimistic ones you'd have to return the value together with the
  locking, but still, the operations would make more sense.
 
  *: OK, it's not doubling as with the write the primary owner replicates
  the write to backups, but it's doubling  the amount of communication
  originating from the requestor node.
 
  WDYT?

 +1
 Care to create a JIRA for this please?



 
  Radim
  ___
  infinispan-dev mailing list
  infinispan-dev@lists.jboss.org
  https://lists.jboss.org/mailman/listinfo/infinispan-dev
  ___
  infinispan-dev mailing list
  infinispan-dev@lists.jboss.org
  https://lists.jboss.org/mailman/listinfo/infinispan-dev

 Cheers,
 --
 Mircea Markus
 Infinispan lead (www.infinispan.org)





 ___
 infinispan-dev mailing list
 infinispan-dev@lists.jboss.org
 https://lists.jboss.org/mailman/listinfo/infinispan-dev

___
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev

Re: [infinispan-dev] PutForExternalRead consistency

2013-11-18 Thread Dan Berindei
On Mon, Nov 18, 2013 at 9:43 AM, Galder Zamarreño gal...@redhat.com wrote:


 On Nov 14, 2013, at 1:20 PM, Pedro Ruivo pe...@infinispan.org wrote:

  Hi,
 
  Simple question: shouldn't PFER ensure some consistency?
 
  I know that PFER is asynchronous but (IMO) it can create inconsistencies
  in the data. the primary owner replicates the PFER follow by a PUT (PFER
  is sent async log the lock is released immediately) for the same key, we
  have no way to be sure if the PFER is delivered after or before in all
  the backup owners.
 
  comments?

 Assuming that PFER and PUT happen in the same thread, we're normally
 relying on the JGroups sequence of events to send the first, wait no
 response, and then send the second put. That should guarantee order in
 which puts are received in the other nodes, but after that yeah, there's a
 risk that it could happen. PFER and PUT for a given key normally happen in
 the same thread in cache heavy use cases such as Hibernate 2LC, but there's
 no guarantee.


I don't think that's correct. If the cache is synchronous, the PUT will be
sent as an OOB message, and as such it can be delivered on the target
before the previous PFER command. That's regardless of whether the PFER
command was sent as a regular or as an OOB message.


 Cheers,

 
  Pedro
  ___
  infinispan-dev mailing list
  infinispan-dev@lists.jboss.org
  https://lists.jboss.org/mailman/listinfo/infinispan-dev


 --
 Galder Zamarreño
 gal...@redhat.com
 twitter.com/galderz

 Project Lead, Escalante
 http://escalante.io

 Engineer, Infinispan
 http://infinispan.org


 ___
 infinispan-dev mailing list
 infinispan-dev@lists.jboss.org
 https://lists.jboss.org/mailman/listinfo/infinispan-dev

___
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev

Re: [infinispan-dev] Running master: some failing test

2013-11-18 Thread Dan Berindei
I have already issued a PR for NonTxStateTransferOverwritingValue2Test:
https://github.com/infinispan/infinispan/pull/2228


On Mon, Nov 18, 2013 at 10:13 PM, Mircea Markus mmar...@redhat.com wrote:

 Your machines are notorious the high number of intermittent failures, so
 this doesn't look that bad then :-)

 I've disabled AsyncReplExtendedStatisticTest [1] as it is not that
 critical.
 Dan, can you please look at NonTxStateTransferOverwritingValue2Test? We
 seem to be quite close to a stable test suite and would not be ideal to
 loose the momentum.

 [1] https://issues.jboss.org/browse/ISPN-3727

 On Nov 18, 2013, at 4:10 PM, Sanne Grinovero sa...@infinispan.org wrote:

  Attempt 1#
 
  Failed tests:
 
 AsyncReplExtendedStatisticTestBaseClusteredExtendedStatisticTest.testReplaceWithOldVal:190-BaseClusteredExtendedStatisticTest.assertCacheValue:253-MultipleCacheManagersTest.assertEventuallyEquals:554-AbstractInfinispanTest.eventually:173-AbstractInfinispanTest.eventually:45-AbstractInfinispanTest.eventually:62
  expected [true] but found [false]
 
  Attempt 2#
 
  Failed tests:
 
 NonTxStateTransferOverwritingValue2Test.testBackupOwnerJoiningDuringReplace:78-doTest:170
  » Timeout
 
  Attempt 3#
 
  Failed tests:
 
 AsyncReplExtendedStatisticTestBaseClusteredExtendedStatisticTest.testReplaceWithOldVal:190-BaseClusteredExtendedStatisticTest.assertCacheValue:253-MultipleCacheManagersTest.assertEventuallyEquals:554-AbstractInfinispanTest.eventually:173-AbstractInfinispanTest.eventually:45-AbstractInfinispanTest.eventually:62
  expected [true] but found [false]
 
  Attempt 4#
 
  Failed tests:
 
 AsyncReplExtendedStatisticTestBaseClusteredExtendedStatisticTest.testReplaceWithOldVal:190-BaseClusteredExtendedStatisticTest.assertCacheValue:253-MultipleCacheManagersTest.assertEventuallyEquals:554-AbstractInfinispanTest.eventually:173-AbstractInfinispanTest.eventually:45-AbstractInfinispanTest.eventually:62
  expected [true] but found [false]
 
  ___
  infinispan-dev mailing list
  infinispan-dev@lists.jboss.org
  https://lists.jboss.org/mailman/listinfo/infinispan-dev

 Cheers,
 --
 Mircea Markus
 Infinispan lead (www.infinispan.org)





 ___
 infinispan-dev mailing list
 infinispan-dev@lists.jboss.org
 https://lists.jboss.org/mailman/listinfo/infinispan-dev

___
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev

Re: [infinispan-dev] merging all the github projects back?

2013-11-19 Thread Dan Berindei
Mircea, what exactly are the problems that we want to achieve by moving
everything back in a single repository?

To me, the biggest problem right now is that builds take way too long, and
if I integrate something in master it will take almost an entire day for
TeamCity to update the status on all the open pull requests. I don't think
moving everything to a single repository will fix that, TBH.

BTW, Maven rebuilds every module in the reactor every time you run a build.
On my machine, it takes 6 minutes to build everything in the Infinispan
reactor. Would you really want to wait 6 minutes every time you want to
test something in the server?

Cheers
Dan





On Mon, Nov 18, 2013 at 10:26 PM, Mircea Markus mmar...@redhat.com wrote:

 I've created ISPN-3728 to track this.

 On Nov 18, 2013, at 2:16 PM, Tristan Tarrant ttarr...@redhat.com wrote:

  +1, also only the gui-demo should be in the main repo. All other
  examples/demos should be moved to quickstarts or their own repos.
 
  Tristan
 
  On 11/18/2013 09:15 AM, Galder Zamarreño wrote:
  On Nov 15, 2013, at 2:02 PM, Sanne Grinovero sa...@infinispan.org
 wrote:
 
  +1
  (as I've always been, damn my pessimistic opinions)
 
  However there is an alternative. If I recall correctly, one of your
  motivations for the split was to not keep maintaining all the
  non-essential components.
  I really like that idea, I just think it was implemented the wrong way.
 
  If you identify which components are non-essential (and you don't need
  them on the release day), you should keep them out but unlock them to
  a different version numbering, and especially never ever depend on
  snapshot versions.
 
  Let's try define the requirements:
  - any project should compile successfully out of the box
  - last minute surprises before a release needs to be done is not
 manageable
 
  You could get this easily if you allow the projects which are in a
  different repository to lag behind: you can't require to release those
  other projects each time you release an Infinispan core version.
  Which inherently implies that if you have something which is essential
  for a core release, it needs to be moved back to catch failures early
  on and keep them in perfect sync.
 
  Other module maintainers would then catch up on compatibility breaking
  changes of Infinispan core as (and when) they can. We can of course
  have a goal of keeping aligned quickly, but it should be optional
  (i.e. non-blocking for innovation and progress on the core module).
  + 1 to all said above and Mircea's suggestion.
 
  The thing is that Infinispan Server, and a subset of the cache stores
 really need to be released at the same time as Infinispan core. All those
 modules would benefit from living in same repo, making the compilation, CI
 and release much easier. Some examples: Level DB store needs to be brought
 back, whereas Cassandra cache store can live on its own.
 
  It's perfectly doable, all our other projects depending on Infinispan
  have lived well so far, with the occasional pain to fix some backwards
  compatibility issue but that's part of the dance. For example Search
  depends on Hibernate ORM, JGroups and Infinispan.. we try to catch
  backwards compatibility with ad-hoc CI jobs but it's not going to
  catch all problems, nor it's our intention to block other projects
  from innovating: it's rather for us to be aware of possible problems
  asap.. but none of these problems are allowed to prevent us to release
  early and often.
 
  [BTW the reason the Search case isn't perfect is because we target
  specific platforms like AS/Wildfly, not necessarily the latest
  Infinispan]
 
  Cheers,
  Sanne
 
 
  On 15 November 2013 12:43, Mircea Markus mmar...@redhat.com wrote:
  Hi guys,
 
  Given all the compiling problems we had since we've split in multiple
 github repos (server, stores and embedded) makes me think that the split
 wasn't such a great idea after all( and that I was hmm, wrong). Shall we
 move everything back into a single repo? We can still keep different CI
 runs for cache stores, server etc, but at least all this builds will
 compile everything.
 
  wdyt?
 
  Cheers,
  --
  Mircea Markus
  Infinispan lead (www.infinispan.org)
 
 
 
 
 
  ___
  infinispan-dev mailing list
  infinispan-dev@lists.jboss.org
  https://lists.jboss.org/mailman/listinfo/infinispan-dev
  ___
  infinispan-dev mailing list
  infinispan-dev@lists.jboss.org
  https://lists.jboss.org/mailman/listinfo/infinispan-dev
 
  --
  Galder Zamarreño
  gal...@redhat.com
  twitter.com/galderz
 
  Project Lead, Escalante
  http://escalante.io
 
  Engineer, Infinispan
  http://infinispan.org
 
 
  ___
  infinispan-dev mailing list
  infinispan-dev@lists.jboss.org
  https://lists.jboss.org/mailman/listinfo/infinispan-dev
 
 
  ___
  infinispan-dev 

Re: [infinispan-dev] merging all the github projects back?

2013-11-19 Thread Dan Berindei
On Tue, Nov 19, 2013 at 11:21 AM, Martin Gencur mgen...@redhat.com wrote:

 On 19.11.2013 09:59, Dan Berindei wrote:

 BTW, Maven rebuilds every module in the reactor every time you run a
 build. On my machine, it takes 6 minutes to build everything in the
 Infinispan reactor. Would you really want to wait 6 minutes every time you
 want to test something in the server?


 Actually, we often end up doing that cos we first need to build ISPN core
 and then the server which is dependent on ISPN core. It happens quite often
 that ISPN server works only with latest version of ISPN core.

 So if I understand it correctly, the ISPN server would be integrated in
 the main ISPN repository. I guess it would be a specific module which you
 could build on its own (and also run its tests). This would cover the case
 when you don't need to build ISPN core in order to run the server. Or did I
 miss something?


I'm pretty sure it wouldn't be a single module...
___
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev

Re: [infinispan-dev] rethinking ISPN transactions

2013-11-19 Thread Dan Berindei
Sorry, I was confusing things - the timeout I was thinking of happened in
the transaction manager itself, and it didn't matter whether we were
registering a synchronization or a full XA resource.

Replication timeout sounds like something that could be easily removed for
commit command RPCs, but there may be plenty of other things that could
fail without the caller knowing. So the question remains, should give up on
synchronizations (except maybe with asynchronous commit/1PC, if we decide
to keep that)?



On Tue, Nov 19, 2013 at 6:01 PM, Dan Berindei dan.berin...@gmail.comwrote:

 Maybe synchronization support is another thing that we should scrap in
 7.0, then?

 BTW, I've also seen the transaction timeout that Radim mentioned, but only
 recently. I wonder if we could do anything to increase it for the stress
 tests.


 On Tue, Nov 19, 2013 at 4:40 PM, Erik Salter an1...@hotmail.com wrote:


 - this is quite a subtle aspect of JTA transactions (is
 Synchronisation.afterCompletion failures are not propagated to the user) -
 you have a very good point here, thanks!

 This right here has burned me very badly.  I think this really needs to be
 called out, especially if we're using this mode rather than full-blown XA.

 Erik

 ___
 infinispan-dev mailing list
 infinispan-dev@lists.jboss.org
 https://lists.jboss.org/mailman/listinfo/infinispan-dev



___
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev

Re: [infinispan-dev] merging all the github projects back?

2013-11-20 Thread Dan Berindei
When it comes to IDEs, I'd rather remove some more modules from the main
project in order to make it more responsive and easier to use. Launching a
test from IntelliJ takes minutes to build (or parse, or analyze, or
whatever) everything, when it works. Extra dependencies on old versions in
the compatibility modules broke the Analyze stacktrace feature, which
used to be very nice for analyzing logs.

I'd argue that having the cache store builds break in CI actually serves as
a warning that we are breaking the API/SPI for our users. I don't think
making API/SPI breakage easier should be the goal of our setup.




On Wed, Nov 20, 2013 at 1:43 AM, Sanne Grinovero sa...@infinispan.orgwrote:

 Hi Manik,
 the problem the team has been facing past week is that the other
 modules are depending on a -SNAPSHOT version, and after having done
 some final touches to the modules in the core repo, we then find out
 that the other modules don't build anymore.

 This happened several times, and while it's normal for dependant
 projects to occasionally break, it's not so nice that the current
 structure doesn't allow for it to be spotted, or managed time-wise: it
 is important to be able to release at any point in time, but having
 potential surprises in different modules makes it hard to predict
 how long it will take to get to a stable tag.

 My suggestion is not to necessarily return all external modules in the
 core repository, but rather to make sure the modules which are
 separately also have the benefit of a different release cycle, so that
 you can always release any module at any time. This could mean that
 they have different version numbers but not necessarily: we could make
 it a convention to release compatible optional modules using the same
 version. For example - the optional CacheStore implementation for
 Infinispan 6.0.0.Final are released also as 6.0.0.Final but not
 _necessarily_ on the same day of the core module. Probably works best
 to have a same major,minor number, and be flexible with micro
 releases.

 You'd never want to depend on a snapshot version unless it's a module
 in your same repo. As mentioned, if a new Infinispan improvement
 breaks the Hibernate Search build, I have the option to decide to not
 upgrade; you don't have this flexibility if you're depending on
 snapshots.

 Cheers,
 Sanne

 On 19 November 2013 23:17, Manik Surtani ma...@infinispan.org wrote:
  As in, for easy refactoring?  True that helps.  But it doesn't help you
 with
  modular releases.
 
 
  On 19 November 2013 15:00, Mircea Markus mmar...@redhat.com wrote:
 
  That only half-solves the problem: having everything in the same IDE at
  the same time is more preentive.
 
  On 19 Nov 2013, at 22:44, Manik Surtani ma...@infinispan.org wrote:
 
  Can't this be achieved by checking out and building all relevant repos?
  This could be scripted.
 
 
  On 15 November 2013 04:43, Mircea Markus mmar...@redhat.com wrote:
 
  Hi guys,
 
  Given all the compiling problems we had since we've split in multiple
  github repos (server, stores and embedded) makes me think that the
 split
  wasn't such a great idea after all( and that I was hmm, wrong). Shall
 we
  move everything back into a single repo? We can still keep different
 CI runs
  for cache stores, server etc, but at least all this builds will compile
  everything.
 
  wdyt?
 
  Cheers,
  --
  Mircea Markus
  Infinispan lead (www.infinispan.org)
 
 
 
 
 
  ___
  infinispan-dev mailing list
  infinispan-dev@lists.jboss.org
  https://lists.jboss.org/mailman/listinfo/infinispan-dev
 
 
  ___
  infinispan-dev mailing list
  infinispan-dev@lists.jboss.org
  https://lists.jboss.org/mailman/listinfo/infinispan-dev
 
 
  ___
  infinispan-dev mailing list
  infinispan-dev@lists.jboss.org
  https://lists.jboss.org/mailman/listinfo/infinispan-dev
 
 
 
  ___
  infinispan-dev mailing list
  infinispan-dev@lists.jboss.org
  https://lists.jboss.org/mailman/listinfo/infinispan-dev
 ___
 infinispan-dev mailing list
 infinispan-dev@lists.jboss.org
 https://lists.jboss.org/mailman/listinfo/infinispan-dev

___
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev

Re: [infinispan-dev] merging all the github projects back?

2013-11-20 Thread Dan Berindei
Sorry Sanne for ignoring your message, I meant to send the reply to Manik's
:)

I agree that depending on SNAPSHOTS from another repository is not a good
idea. After all, we couldn't even make the dependencies work in CI for the
jobs we have now - the REST cache store's build still fails because it
can't find the infinispan-persistence-parent-6.0.1-SNAPSHOT POM in the
repository.

In this case we did have an option: we already have a compatibility layer
that allows cache stores to use the old, stable SPI, so the cache stores
could have depended on the latest Alpha/Beta/CR version, and (at least in
theory) the server could have kept on using the 5.2.6.Final version of the
cache stores until the final version of the core was released.

The problem, I guess, is that we still want to release Infinispan Server on
the same day as Infinispan Embedded, and we also want it to use to use
the latest and greatest cache stores. If we don't change that policy, we
can't use your suggestion.

Cheers
Dan



On Wed, Nov 20, 2013 at 1:43 AM, Sanne Grinovero sa...@infinispan.orgwrote:

 Hi Manik,
 the problem the team has been facing past week is that the other
 modules are depending on a -SNAPSHOT version, and after having done
 some final touches to the modules in the core repo, we then find out
 that the other modules don't build anymore.

 This happened several times, and while it's normal for dependant
 projects to occasionally break, it's not so nice that the current
 structure doesn't allow for it to be spotted, or managed time-wise: it
 is important to be able to release at any point in time, but having
 potential surprises in different modules makes it hard to predict
 how long it will take to get to a stable tag.

 My suggestion is not to necessarily return all external modules in the
 core repository, but rather to make sure the modules which are
 separately also have the benefit of a different release cycle, so that
 you can always release any module at any time. This could mean that
 they have different version numbers but not necessarily: we could make
 it a convention to release compatible optional modules using the same
 version. For example - the optional CacheStore implementation for
 Infinispan 6.0.0.Final are released also as 6.0.0.Final but not
 _necessarily_ on the same day of the core module. Probably works best
 to have a same major,minor number, and be flexible with micro
 releases.

 You'd never want to depend on a snapshot version unless it's a module
 in your same repo. As mentioned, if a new Infinispan improvement
 breaks the Hibernate Search build, I have the option to decide to not
 upgrade; you don't have this flexibility if you're depending on
 snapshots.

 Cheers,
 Sanne

 On 19 November 2013 23:17, Manik Surtani ma...@infinispan.org wrote:
  As in, for easy refactoring?  True that helps.  But it doesn't help you
 with
  modular releases.
 
 
  On 19 November 2013 15:00, Mircea Markus mmar...@redhat.com wrote:
 
  That only half-solves the problem: having everything in the same IDE at
  the same time is more preentive.
 
  On 19 Nov 2013, at 22:44, Manik Surtani ma...@infinispan.org wrote:
 
  Can't this be achieved by checking out and building all relevant repos?
  This could be scripted.
 
 
  On 15 November 2013 04:43, Mircea Markus mmar...@redhat.com wrote:
 
  Hi guys,
 
  Given all the compiling problems we had since we've split in multiple
  github repos (server, stores and embedded) makes me think that the
 split
  wasn't such a great idea after all( and that I was hmm, wrong). Shall
 we
  move everything back into a single repo? We can still keep different
 CI runs
  for cache stores, server etc, but at least all this builds will compile
  everything.
 
  wdyt?
 
  Cheers,
  --
  Mircea Markus
  Infinispan lead (www.infinispan.org)
 
 
 
 
 
  ___
  infinispan-dev mailing list
  infinispan-dev@lists.jboss.org
  https://lists.jboss.org/mailman/listinfo/infinispan-dev
 
 
  ___
  infinispan-dev mailing list
  infinispan-dev@lists.jboss.org
  https://lists.jboss.org/mailman/listinfo/infinispan-dev
 
 
  ___
  infinispan-dev mailing list
  infinispan-dev@lists.jboss.org
  https://lists.jboss.org/mailman/listinfo/infinispan-dev
 
 
 
  ___
  infinispan-dev mailing list
  infinispan-dev@lists.jboss.org
  https://lists.jboss.org/mailman/listinfo/infinispan-dev
 ___
 infinispan-dev mailing list
 infinispan-dev@lists.jboss.org
 https://lists.jboss.org/mailman/listinfo/infinispan-dev

___
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev

Re: [infinispan-dev] Infinispan 6.0.0.Final is out!

2013-11-20 Thread Dan Berindei
Yes, it was implemented in Beta1:
http://blog.infinispan.org/2013/09/heterogenous-clusters-with-infinispan.html

Cheers
Dan


On Wed, Nov 20, 2013 at 11:37 AM, Radim Vansa rva...@redhat.com wrote:

  Hi, the heterogenous clusters link does not work. I also miss any
 related JIRA in release notes - is it really implemented?

 Radim


 On 11/19/2013 09:08 PM, Adrian Nistor wrote:

 Dear Infinispan community,

  We're pleased to announce the final release of Infinispan 6.0
 Infinium. As 
 announcedhttp://infinispan.blogspot.co.uk/2013/05/infinispan-to-adopt-apache-software.html,
 this is the first Infinispan stable version to be released under the terms
 of Apache License v2.0 http://www.apache.org/licenses/LICENSE-2.0.html.

  This release brings some highly demanded features besides many stability
 enhancements and bug fixes:


-  Support for remote 
 queryhttp://blog.infinispan.org/2013/09/embedded-and-remote-query-in-infinispan.html.
It is now possible for the HotRod clients to query an Infinispan grid using
a new expressive query DSL. This querying functionality is built on top of
Apache Lucene and Google Protobuf and lays the foundation for storing
information and querying an Infinispan server in a language neutral manner.
The Java HotRod client has already been enhanced to support this, the
soon-to-be announced C++ HotRod client will also contain this functionality
(initially for write/read, then full blown querying).
 -  C++ HotRod client.  Allows C++ applications to read and write
information from an Infinispan server. This is a fully fledged HotRod
client that is topology (level 2) and consistent hash aware (level 3) and
will be released in the following days. Some features (such as Remote Query
and SSL support) will be developed during the next iteration so that it
maintains feature parity with its Java counterpart.
 -  Better persistence integration. We’ve revisited the entire cache
loader API and we’re quite pleased with the result: the new Persistence

 APIhttp://blog.infinispan.org/2013/09/new-persistence-api-in-infinispan.htmlbrought
  by Infinispan 6.0 supports parallel iteration of the stored
entries, reduces the overall serialization overhead and also is aligned
with the JSR-107 http://jcp.org/en/jsr/detail?id=107 specification,
which makes implementations more portable.


-  A more efficient FileCacheStore 
 implementationhttp://blog.infinispan.org/2013/07/faster-file-cache-store-no-extra.html.
This file store is built with efficiency in mind: it outperforms the
existing file store with up to 2 levels of magnitude. This comes at a cost
though, as keys need to be kept  in memory. Thanks to Karsten 
 Bleeshttps://github.com/kbleesfor contributing this!
 -  Support for heterogeneous 
 clustershttp://blog.infinispan.org/2013/09/heterogenous-clusters-with-infinispan.htm.
Up to this release, every member of the cluster owns an equal share of the
cluster’s data. This doesn’t work well if one machine is more powerful than
the other cluster participants. This functionality allows specifying the
amount of data, compared with the average, held by a particular machine.
 -  A new set of usage and performance 
 statisticshttps://issues.jboss.org/browse/ISPN-2861developed within the 
 scope of the CloudTM projecthttps://issues.jboss.org/browse/ISPN-3234
.
 -  JCache https://issues.jboss.org/browse/ISPN-3234 (JSR-107)
implementation upgrade. First released in Infinispan 5.3.0, the standard
caching support is now upgraded to version 1.0.0-PFD.



  For a complete list of features included in this release please refer to
 the release 
 noteshttps://issues.jboss.org/secure/ReleaseNote.jspa?projectId=12310799version=12322480
 .
  The user documentation for this release has been revamped and migrated
 to the new website http://infinispan.org/documentation/ - we think it
 looks much better and hope you’ll like it too!
  This release has spread over a period of 5 months: a sustained effort
 from the core development team, QE team and our growing community - a BIG
 thanks to everybody involved! Please visit our 
 downloadshttp://infinispan.org/download/section to find the latest release. 
 Also if you have any questions please
 check our forums http://infinispan.org/community/, our mailing 
 listshttps://lists.jboss.org/mailman/listinfo/infinispan-devor ping us 
 directly on
 IRC.

  Cheers,
  Adrian


 ___
 infinispan-dev mailing 
 listinfinispan-dev@lists.jboss.orghttps://lists.jboss.org/mailman/listinfo/infinispan-dev



 --
 Radim Vansa rva...@redhat.com rva...@redhat.com
 JBoss DataGrid QA


 ___
 infinispan-dev mailing list
 infinispan-dev@lists.jboss.org
 https://lists.jboss.org/mailman/listinfo/infinispan-dev

___
infinispan-dev mailing list

Re: [infinispan-dev] rethinking ISPN transactions

2013-11-21 Thread Dan Berindei
On Thu, Nov 21, 2013 at 5:39 PM, Pedro Ruivo pe...@infinispan.org wrote:



 On 11/21/2013 03:18 PM, Dan Berindei wrote:
  Hmm, couldn't you just disable recovery in the TM to get the same
  performance with a XA resource as with a synchronization?

 If you are suggesting to mess around with the TM, I don't think that is
 a good idea. First, the TM is external to Infinispan and second you can
 have others XaResources associated to the TM.


I'm not suggesting that Infinispan should change the TM settings, I'm just
wondering if there's a difference between using synchronization in
Infinispan and disabling recovery completely (or using an in-memory
transaction log) from the user's point of view.

Actually, that might be irrelevant if using synchronization can lead to
stale locks in Infinispan. If the commit command fails and doesn't release
the locks, how will the user be able to find out that there are stale locks
and release them?


 
 
  On Thu, Nov 21, 2013 at 1:57 PM, Pedro Ruivo pe...@infinispan.org
  mailto:pe...@infinispan.org wrote:
 
 
 
  On 11/21/2013 11:34 AM, Galder Zamarreño wrote:
   
   
It's way faster actually. The speed difference from all the extra
  work required by Transaction Manager to deal with multiple XA
  resources, make transactions recoverable..etc. We've done tests in
  the past (i.e. Hibernate 2LC) comparing both and the difference was
  quite big.
   
 
  you are right. I forgot the recovery mechanism :)
  ___
  infinispan-dev mailing list
  infinispan-dev@lists.jboss.org mailto:
 infinispan-dev@lists.jboss.org
  https://lists.jboss.org/mailman/listinfo/infinispan-dev
 
 
 
 
  ___
  infinispan-dev mailing list
  infinispan-dev@lists.jboss.org
  https://lists.jboss.org/mailman/listinfo/infinispan-dev
 
 ___
 infinispan-dev mailing list
 infinispan-dev@lists.jboss.org
 https://lists.jboss.org/mailman/listinfo/infinispan-dev

___
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev

Re: [infinispan-dev] PutForExternalRead consistency

2013-11-21 Thread Dan Berindei
On Thu, Nov 21, 2013 at 12:35 PM, Galder Zamarreño gal...@redhat.comwrote:


 On Nov 18, 2013, at 12:42 PM, Dan Berindei dan.berin...@gmail.com wrote:

 
 
 
  On Mon, Nov 18, 2013 at 9:43 AM, Galder Zamarreño gal...@redhat.com
 wrote:
 
  On Nov 14, 2013, at 1:20 PM, Pedro Ruivo pe...@infinispan.org wrote:
 
   Hi,
  
   Simple question: shouldn't PFER ensure some consistency?
  
   I know that PFER is asynchronous but (IMO) it can create
 inconsistencies
   in the data. the primary owner replicates the PFER follow by a PUT
 (PFER
   is sent async log the lock is released immediately) for the same key,
 we
   have no way to be sure if the PFER is delivered after or before in all
   the backup owners.
  
   comments?
 
  Assuming that PFER and PUT happen in the same thread, we're normally
 relying on the JGroups sequence of events to send the first, wait no
 response, and then send the second put. That should guarantee order in
 which puts are received in the other nodes, but after that yeah, there's a
 risk that it could happen. PFER and PUT for a given key normally happen in
 the same thread in cache heavy use cases such as Hibernate 2LC, but there's
 no guarantee.
 
  I don't think that's correct. If the cache is synchronous, the PUT will
 be sent as an OOB message, and as such it can be delivered on the target
 before the previous PFER command. That's regardless of whether the PFER
 command was sent as a regular or as an OOB message.

 ^ H, that's definitely risky. I think we should make PFER local only.

 The fact that PFER is asynchronous is nice to have. IOW, if you read a
 value from a database and you want to store it in the cache for later read,
 the fact that it's replicated asynchronously is just so that other nodes
 can take advantage of the value being in the cache. Since it's asynchronous
 some nodes could fail to apply, but that's fine since you can go to the
 database and re-retrieve it from there. So, making PFER local only would be
 the degenerate case, where all nodes fail to apply except the local node,
 which is fine. This is better than having the reordering above.

 In a chat I had with Dan, he pointed out that having PFER local only would
 be problematic for DIST mode w/ L1 enabled, since the local write would not
 invalidate other nodes, but this is fine because PFER only really makes
 sense for situations where the Infinispan is used as a cache. So, if the
 data is in the DB, you might as well go there (1 network trip), as opposed
 to askign the other nodes for data and the database in the worst case (2
 network trips).

 PFER is really designed for replication or invalidation use cases, which
 are precisely the ones configured for Hibernate 2LC.

 Thoughts?


+1 to make PFER local-only in replicated caches, but I now think we should
go all the way and disallow PFER completely in dist caches.

I still think having L1 enabled would be a problem, because a regular put()
won't invalidate the entry on all the nodes that did a PFER for that key
(there are no requestors, and even if we assume that we do a remote get
before the PFER we'd still have race conditions).

With L1 disabled, we have the problem that you mentioned: we're trying to
read the value from the proper owners, but we never write it to the proper
owners, so the hit ratio will be pretty bad. Using the SKIP_REMOTE_LOOKUP
flag on reads, we'll avoid the extra RPC in Infinispan, but that will make
the hit ratio even worse. E.g. in a 4-nodes cluster with numOwners=2, the
hit ratio will never go above 50%.

I don't think anyone would use a cache knowing that its hit ratio can never
get above 50%, so we should just save ourselves some effort and stop
supporting PFER in DIST mode.

Cheers
Dan
___
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev

Re: [infinispan-dev] rethinking ISPN transactions

2013-11-21 Thread Dan Berindei
Hmm, couldn't you just disable recovery in the TM to get the same
performance with a XA resource as with a synchronization?


On Thu, Nov 21, 2013 at 1:57 PM, Pedro Ruivo pe...@infinispan.org wrote:



 On 11/21/2013 11:34 AM, Galder Zamarreño wrote:
 
 
  It's way faster actually. The speed difference from all the extra work
 required by Transaction Manager to deal with multiple XA resources, make
 transactions recoverable..etc. We've done tests in the past (i.e. Hibernate
 2LC) comparing both and the difference was quite big.
 

 you are right. I forgot the recovery mechanism :)
 ___
 infinispan-dev mailing list
 infinispan-dev@lists.jboss.org
 https://lists.jboss.org/mailman/listinfo/infinispan-dev

___
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev

Re: [infinispan-dev] merging all the github projects back?

2013-11-21 Thread Dan Berindei
On Thu, Nov 21, 2013 at 1:55 PM, Galder Zamarreño gal...@redhat.com wrote:


 On Nov 20, 2013, at 11:03 AM, Dan Berindei dan.berin...@gmail.com wrote:

  When it comes to IDEs, I'd rather remove some more modules from the main
 project in order to make it more responsive and easier to use. Launching a
 test from IntelliJ takes minutes to build (or parse, or analyze, or
 whatever) everything, when it works. Extra dependencies on old versions in
 the compatibility modules broke the Analyze stacktrace feature, which
 used to be very nice for analyzing logs.

 ^ Isnt't that a problem with the IDE which is messing up the code that's
 really in use?


I guess the IDE could be smarter here, but I think it's pretty niche, so
it's not going to be high on their priority list.
___
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev

Re: [infinispan-dev] PutForExternalRead consistency

2013-11-22 Thread Dan Berindei
That doesn't sound right, we don't keep any lock for the duration of the
replication. In non-tx mode, we have to do a RPC to the primary owner
before we acquire any key. So there's nothing stopping the PFER from
writing its value after a regular (sync) put when the put was initiated
after the PFER.


On Fri, Nov 22, 2013 at 2:49 PM, William Burns mudokon...@gmail.com wrote:

 I wonder if we are over analyzing this.  It seems the main issue is
 that the replication is done asynchronously.  Infinispan has many ways
 to be make something asynchronous.  In my opinion we just chose the
 wrong way.  Wouldn't it be easier to just change the PFER to instead
 of passing along the FORCE_ASYNCHRONOUS flag we instead just have the
 operation performed asynchronous using putIfAbsentAsync ?  This way
 the lock is held during the duration of the replication and should be
 consistent with other operations.  Also the user can regain control
 back faster as it doesn't even have to process the local interceptor
 chain.  We could also change the putForExternalRead method declaration
 to also return a NotifiableFutureVoid or something so they know when
 the operation is completed (if they want).

  - Will

 On Thu, Nov 21, 2013 at 9:54 AM, Dan Berindei dan.berin...@gmail.com
 wrote:
 
 
 
  On Thu, Nov 21, 2013 at 12:35 PM, Galder Zamarreño gal...@redhat.com
  wrote:
 
 
  On Nov 18, 2013, at 12:42 PM, Dan Berindei dan.berin...@gmail.com
 wrote:
 
  
  
  
   On Mon, Nov 18, 2013 at 9:43 AM, Galder Zamarreño gal...@redhat.com
   wrote:
  
   On Nov 14, 2013, at 1:20 PM, Pedro Ruivo pe...@infinispan.org
 wrote:
  
Hi,
   
Simple question: shouldn't PFER ensure some consistency?
   
I know that PFER is asynchronous but (IMO) it can create
inconsistencies
in the data. the primary owner replicates the PFER follow by a PUT
(PFER
is sent async log the lock is released immediately) for the same
 key,
we
have no way to be sure if the PFER is delivered after or before in
 all
the backup owners.
   
comments?
  
   Assuming that PFER and PUT happen in the same thread, we're normally
   relying on the JGroups sequence of events to send the first, wait no
   response, and then send the second put. That should guarantee order
 in which
   puts are received in the other nodes, but after that yeah, there's a
 risk
   that it could happen. PFER and PUT for a given key normally happen in
 the
   same thread in cache heavy use cases such as Hibernate 2LC, but
 there's no
   guarantee.
  
   I don't think that's correct. If the cache is synchronous, the PUT
 will
   be sent as an OOB message, and as such it can be delivered on the
 target
   before the previous PFER command. That's regardless of whether the
 PFER
   command was sent as a regular or as an OOB message.
 
  ^ H, that's definitely risky. I think we should make PFER local
 only.
 
  The fact that PFER is asynchronous is nice to have. IOW, if you read a
  value from a database and you want to store it in the cache for later
 read,
  the fact that it's replicated asynchronously is just so that other
 nodes can
  take advantage of the value being in the cache. Since it's asynchronous
 some
  nodes could fail to apply, but that's fine since you can go to the
 database
  and re-retrieve it from there. So, making PFER local only would be the
  degenerate case, where all nodes fail to apply except the local node,
 which
  is fine. This is better than having the reordering above.
 
  In a chat I had with Dan, he pointed out that having PFER local only
 would
  be problematic for DIST mode w/ L1 enabled, since the local write would
 not
  invalidate other nodes, but this is fine because PFER only really makes
  sense for situations where the Infinispan is used as a cache. So, if the
  data is in the DB, you might as well go there (1 network trip), as
 opposed
  to askign the other nodes for data and the database in the worst case (2
  network trips).
 
  PFER is really designed for replication or invalidation use cases, which
  are precisely the ones configured for Hibernate 2LC.
 
  Thoughts?
 
 
  +1 to make PFER local-only in replicated caches, but I now think we
 should
  go all the way and disallow PFER completely in dist caches.
 
  I still think having L1 enabled would be a problem, because a regular
 put()
  won't invalidate the entry on all the nodes that did a PFER for that key
  (there are no requestors, and even if we assume that we do a remote get
  before the PFER we'd still have race conditions).
 
  With L1 disabled, we have the problem that you mentioned: we're trying to
  read the value from the proper owners, but we never write it to the
 proper
  owners, so the hit ratio will be pretty bad. Using the SKIP_REMOTE_LOOKUP
  flag on reads, we'll avoid the extra RPC in Infinispan, but that will
 make
  the hit ratio even worse. E.g. in a 4-nodes cluster with numOwners=2, the
  hit ratio will never go above 50%.
 
  I don't think

Re: [infinispan-dev] rethinking ISPN transactions

2013-11-22 Thread Dan Berindei
On Fri, Nov 22, 2013 at 10:26 AM, Radim Vansa rva...@redhat.com wrote:

  On 11/21/2013 05:09 PM, Dan Berindei wrote:



 On Thu, Nov 21, 2013 at 5:39 PM, Pedro Ruivo pe...@infinispan.org wrote:



 On 11/21/2013 03:18 PM, Dan Berindei wrote:
  Hmm, couldn't you just disable recovery in the TM to get the same
  performance with a XA resource as with a synchronization?

  If you are suggesting to mess around with the TM, I don't think that is
 a good idea. First, the TM is external to Infinispan and second you can
 have others XaResources associated to the TM.


  I'm not suggesting that Infinispan should change the TM settings, I'm
 just wondering if there's a difference between using synchronization in
 Infinispan and disabling recovery completely (or using an in-memory
 transaction log) from the user's point of view.

  Actually, that might be irrelevant if using synchronization can lead to
 stale locks in Infinispan. If the commit command fails and doesn't release
 the locks, how will the user be able to find out that there are stale locks
 and release them?


 Infinispan detects the failed commit and sends a rollback command. Only if
 this fails as well, the locks stay unreleased.
 The problem is just that application thinks it has succeeded while the
 transaction may have been rolled back (in Infinispan way, the JTA
 transaction was already committed).


Ok, maybe I was overreacting a bit :)

But like you said, the rollback command can fail as well. True, the
rollback doesn't have to write anything to the cache stores, so it's less
likely to fail, but we should still consider the possibility.



 Radim




 
 
  On Thu, Nov 21, 2013 at 1:57 PM, Pedro Ruivo pe...@infinispan.org
   mailto:pe...@infinispan.org wrote:
 
 
 
  On 11/21/2013 11:34 AM, Galder Zamarreño wrote:
   
   
It's way faster actually. The speed difference from all the extra
  work required by Transaction Manager to deal with multiple XA
  resources, make transactions recoverable..etc. We've done tests in
  the past (i.e. Hibernate 2LC) comparing both and the difference was
  quite big.
   
 
  you are right. I forgot the recovery mechanism :)
  ___
  infinispan-dev mailing list
   infinispan-dev@lists.jboss.org mailto:
 infinispan-dev@lists.jboss.org
  https://lists.jboss.org/mailman/listinfo/infinispan-dev
  
 
 
 
  ___
  infinispan-dev mailing list
  infinispan-dev@lists.jboss.org
  https://lists.jboss.org/mailman/listinfo/infinispan-dev
 
 ___
 infinispan-dev mailing list
 infinispan-dev@lists.jboss.org
 https://lists.jboss.org/mailman/listinfo/infinispan-dev




 ___
 infinispan-dev mailing 
 listinfinispan-dev@lists.jboss.orghttps://lists.jboss.org/mailman/listinfo/infinispan-dev



 --
 Radim Vansa rva...@redhat.com rva...@redhat.com
 JBoss DataGrid QA


 ___
 infinispan-dev mailing list
 infinispan-dev@lists.jboss.org
 https://lists.jboss.org/mailman/listinfo/infinispan-dev

___
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev

Re: [infinispan-dev] PutForExternalRead consistency

2013-11-22 Thread Dan Berindei
I think I need to clarify my earlier email a bit: the problem I'm worried
about is that we could have a thread execute a putForExternalLeave(k, v1),
then a put(k, v2), then a remove(k), and in the end leaving k = v1 in the
cache. (Without the remove(k) we'd be ok, because PFER uses putIfAbsent()
under the hood.)

This is quite different from the problem that Pedro raised, that of
different owners ending up with different values for the same key. Will's
suggestion to implement PFER as a regular put from a background thread does
fix that problem.

Writing the value only locally like Galder suggested would also address my
concern, but at the expense of extra misses from the cache - especially in
DIST mode. Hence my proposal to not support PFER in DIST mode at all.



On Fri, Nov 22, 2013 at 3:45 PM, Dan Berindei dan.berin...@gmail.comwrote:

 That doesn't sound right, we don't keep any lock for the duration of the
 replication. In non-tx mode, we have to do a RPC to the primary owner
 before we acquire any key. So there's nothing stopping the PFER from
 writing its value after a regular (sync) put when the put was initiated
 after the PFER.


 On Fri, Nov 22, 2013 at 2:49 PM, William Burns mudokon...@gmail.comwrote:

 I wonder if we are over analyzing this.  It seems the main issue is
 that the replication is done asynchronously.  Infinispan has many ways
 to be make something asynchronous.  In my opinion we just chose the
 wrong way.  Wouldn't it be easier to just change the PFER to instead
 of passing along the FORCE_ASYNCHRONOUS flag we instead just have the
 operation performed asynchronous using putIfAbsentAsync ?  This way
 the lock is held during the duration of the replication and should be
 consistent with other operations.  Also the user can regain control
 back faster as it doesn't even have to process the local interceptor
 chain.  We could also change the putForExternalRead method declaration
 to also return a NotifiableFutureVoid or something so they know when
 the operation is completed (if they want).

  - Will

 On Thu, Nov 21, 2013 at 9:54 AM, Dan Berindei dan.berin...@gmail.com
 wrote:
 
 
 
  On Thu, Nov 21, 2013 at 12:35 PM, Galder Zamarreño gal...@redhat.com
  wrote:
 
 
  On Nov 18, 2013, at 12:42 PM, Dan Berindei dan.berin...@gmail.com
 wrote:
 
  
  
  
   On Mon, Nov 18, 2013 at 9:43 AM, Galder Zamarreño gal...@redhat.com
 
   wrote:
  
   On Nov 14, 2013, at 1:20 PM, Pedro Ruivo pe...@infinispan.org
 wrote:
  
Hi,
   
Simple question: shouldn't PFER ensure some consistency?
   
I know that PFER is asynchronous but (IMO) it can create
inconsistencies
in the data. the primary owner replicates the PFER follow by a PUT
(PFER
is sent async log the lock is released immediately) for the same
 key,
we
have no way to be sure if the PFER is delivered after or before in
 all
the backup owners.
   
comments?
  
   Assuming that PFER and PUT happen in the same thread, we're normally
   relying on the JGroups sequence of events to send the first, wait no
   response, and then send the second put. That should guarantee order
 in which
   puts are received in the other nodes, but after that yeah, there's a
 risk
   that it could happen. PFER and PUT for a given key normally happen
 in the
   same thread in cache heavy use cases such as Hibernate 2LC, but
 there's no
   guarantee.
  
   I don't think that's correct. If the cache is synchronous, the PUT
 will
   be sent as an OOB message, and as such it can be delivered on the
 target
   before the previous PFER command. That's regardless of whether the
 PFER
   command was sent as a regular or as an OOB message.
 
  ^ H, that's definitely risky. I think we should make PFER local
 only.
 
  The fact that PFER is asynchronous is nice to have. IOW, if you read a
  value from a database and you want to store it in the cache for later
 read,
  the fact that it's replicated asynchronously is just so that other
 nodes can
  take advantage of the value being in the cache. Since it's
 asynchronous some
  nodes could fail to apply, but that's fine since you can go to the
 database
  and re-retrieve it from there. So, making PFER local only would be the
  degenerate case, where all nodes fail to apply except the local node,
 which
  is fine. This is better than having the reordering above.
 
  In a chat I had with Dan, he pointed out that having PFER local only
 would
  be problematic for DIST mode w/ L1 enabled, since the local write
 would not
  invalidate other nodes, but this is fine because PFER only really makes
  sense for situations where the Infinispan is used as a cache. So, if
 the
  data is in the DB, you might as well go there (1 network trip), as
 opposed
  to askign the other nodes for data and the database in the worst case
 (2
  network trips).
 
  PFER is really designed for replication or invalidation use cases,
 which
  are precisely the ones configured for Hibernate 2LC.
 
  Thoughts

Re: [infinispan-dev] Today's topic: eviction

2013-11-25 Thread Dan Berindei
On Fri, Nov 22, 2013 at 6:15 PM, Pedro Ruivo pe...@infinispan.org wrote:



 On 11/22/2013 11:40 AM, Sanne Grinovero wrote:

 
  Do we still need to support the non-V8 implementations?
  Clearly there are cases in which we need it to be available:
 
 org.infinispan.util.concurrent.locks.containers.AbstractPerEntryLockContainerL
  refers explicitly to EquivalentConcurrentHashMapV8

 I don't think we need to support the non-v8 implementation but I'm not
 aware of any BoundedConcurrentHashMapV8. So, if I have to choose, for me
 it is simpler to change the current BounderConcurrentHashMap than
 implement all the eviction policies on top of ECHM-V8.



To put it another way, we do need to support the non-v8 implementation
because that's the only implementation of size-based eviction we've got :)

AFAIK there are some lock-free implementations out there for LRU, and there
may be some for LIRS as well. But our current implementations require a
lock on the entire segment, so there is no simple way to make them work
with CHMV8.
___
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev

Re: [infinispan-dev] Today's topic: eviction

2013-11-27 Thread Dan Berindei
On Wed, Nov 27, 2013 at 10:15 AM, Galder Zamarreño gal...@redhat.comwrote:


 On Nov 22, 2013, at 12:31 PM, Pedro Ruivo pe...@infinispan.org wrote:

 
 
  On 11/21/2013 10:08 PM, Sanne Grinovero wrote:
  Hi Pedro,
  great analysis.
  Conceptually I agree, and since I'm not too familiar with the core
  code, I actually expected this to be the case as we discussed the
  approach when we first discussed the single-lock-owner pattern.
 
  More specifically on the DataContainer: I don't think you should lock
  the segment, you need to lock the key entry only as the CacheStore
  could be quite slow so that is a long lock at least relatively to
  normal usage.
 
  Agree with you. I based my solution on the current
  BoundConcurrentHashMap [1] implementation that uses locks per segment.
  Also, the eviction is kicked inside this locks so we already have a
  long lock to write the entry in the cache writer.
 
  [1] Segment.put():
 
 https://github.com/infinispan/infinispan/blob/master/core/src/main/java/org/infinispan/util/concurrent/BoundedConcurrentHashMap.java#L1554
 
 
  This should NOT be the normal key lock however, as you might remember
  I'm since long time advocating that there needs to be a split in
  logical lock as the user sees it (and as used by transactions) vs
  the locks we use internally to keep our very own data structure
  consistency.
 
  It's trivial to apply if you use
 
 EquivalentConcurrentHashMapV8.computeIfAbsent(K, Fun? super K, ?
 extends V)
 
  This is a great idea :)

 +1

 
  The EquivalentConcurrentHashMapV8 will solve the problem for unbounded
  data container but we still have to do more work with the bounded data
  container I'm afraid…

 We need a BoundedEquivalentConcurrentHashMapV8. The reason we haven't
 built one yet is because ConcurrentHashMapV8 has been in flux over past few
 months, so keeping it up to date required some effort. As it gets more
 stable, we can start looking into a BoundedEquivalentConcurrentHashMapV8.
 This might be a good time to do it.


+1 to write a BoundedEquivalentConcurrentHashMapV8, but changing the LRU
and LIRS implementations is going to be tricky business, so I'd rather we
do it separately from the DataContainer API change.


 
 
  Doesn't sound like a patch which requires a major release?
 
  Sanne
 
 
 
 
  On 21 November 2013 18:48, Pedro Ruivo pe...@infinispan.org wrote:
  (re: https://issues.jboss.org/browse/ISPN-3048)
  (ps. sorry for the long mail. I hope that is clear)
 
  * Automatic eviction:
 
  I've been writing some scenario involving eviction and concurrent
  operations. This test shows a very reliable eviction but we have a
  problem when passivation is enabled. The passivation phase is ok but
  when we need to activate a key, we have a huge window in which the
  read/write operation does not find the key neither in-memory data
  container neither in the cache loader. This happens because the check
  in-memory data container is not synchronized with the check in cache
 loader.
 
  Passivation phase works fine, as I said, because it happens inside the
  data container, i.e., the bounded concurrent hash map (BCHM) evict the
  key under the segment lock.
 
  * Manual eviction
 
  I haven't finish my test suite but so far it does not work as expected.
  If you are a backup owner, you are unable to evict any keys (because
 the
  EvictCommand is handled as a RemoveCommand and the key is not wrapped
 in
  EntryWrappingInterceptor). In addition, I believe that it has the same
  problems with activation as above.
 
  Also, I believe that passivation phase does not work well because it
  first tries to passivate than it removes from DataContainer. Nothing is
  preventing to passivate an old value, a put occurs and modifies the
 data
  in DataContainer and finally the EvictCommand removes the new value
  (that is lost).
 
  * New implementation
 
  Since we are stating a new major release, I would like to propose the
  following improvements. One aspect of the implementation is the
 dropping
  of the *Cache[Writer or Loader]Interceptor and the EvictCommand. Also,
  it would add a new method in DataContainer (void evict(K key)) and
  possible change the interface in PersistenceManager.
 
  My idea is based on the current implementation of the BCHM. The BCHM
  implementation is aware of the persistence and it performs internally
  the passivate() and activate() operations. I would like to extend this
  and make it full aware of the PersistenceManager. Also, it would be
  required to have ConcurrentHashMap (CHM) with the same features
 
  Enter in more detail, the (B)CHM.get() will be responsible to load the
  key from persistence (and activate it) if it is not found in-memory.
  Also, it stores the entry in memory. In other hand, the put() stores
 the
  entry in-memory and in the persistence (if passivation==false) and we
  add an evict(k) to the CHM interface to remove from memory and
 passivate
  it to persistence a single key.
 
  * 

Re: [infinispan-dev] Design of Remote Hot Rod events

2013-12-02 Thread Dan Berindei
On Mon, Dec 2, 2013 at 11:44 AM, Galder Zamarreño gal...@redhat.com wrote:


 On Nov 27, 2013, at 3:06 PM, Mircea Markus mmar...@redhat.com wrote:

 
 
  - how does the server know that a request originated from a certain
 client in order not to send it to that client again? There's no clientId in
 the request…
 
  Well spotted :). There are two ways to solve this.
 
  First one is by adding the source id to each cache operation sent from
 the client. This would require a change in the way the header is parsed for
 all operations. This is the simplest solution, with a little addition to
 the header.
 
  The second option is a bit more complicated but avoids the need to send
 the source id per request. At least in the Java client, each connection
 opened sends a ping at the start. You could add source id to the ping
 command, and then the server could track all incoming connections that send
 a particular id. There could be multiple in the case of clients pooling
 connections. The server can track disconnections and keep this collection
 up to date, but it'd be quite a bit of work on top of the rest of stuff.
 
  I'd prefer the first option.
 
  +1, for simplicity. We also don't enforce the client connections to
 start with a ping either.

 ^ Indeed we don't. It's something the java client does by default but it's
 not mandatory at all.

  How would you generate the client id? ip+port perhaps? or something the
 server would issue (shared server counter) when a client asks for it?

 The client or source id should ideally be composed of two parts:
 1. Something the Hot Rod client provides via configuration.
 2. Something that's dynamically generated whenever the RemoteCacheManager
 is started.

 The former could be anything from a simple application id, to an
 application id alonside client host and port. This is the static part of
 the source or client id. The one that's always the same for a
 RemoteCacheManager unless the configuration changes. The second part, which
 is dynamic, should be created by the Hot Rod client implementation in order
 to avoid client resurrection issues (a similar method to what JGroups does).

 Regardless, the source or client id will be a variable length byte array.

 I think this is easier than relying in some kind of server side state, and
 having to synch that. You could have many clients connecting, so having to
 produce something different for each, cluster wide, could be challenging.

 Thoughts?


Why not a UUID?




 
 
  Cheers,
 
 
 
  On Nov 12, 2013, at 3:17 PM, Galder Zamarreño gal...@redhat.com
 wrote:
 
  Hi all,
 
  Re:
 https://github.com/infinispan/infinispan/wiki/Remote-Hot-Rod-Events
 
  I've just finished writing up the Hot Rod remote events design
 document. Amongst many other use cases, this will enable near caching use
 cases with the help of Hot Rod client callbacks.
 
  Cheers,
  --
  Galder Zamarreño
  gal...@redhat.com
  twitter.com/galderz
 
  Project Lead, Escalante
  http://escalante.io
 
  Engineer, Infinispan
  http://infinispan.org
 
 
  ___
  infinispan-dev mailing list
  infinispan-dev@lists.jboss.org
  https://lists.jboss.org/mailman/listinfo/infinispan-dev
 
  Cheers,
  --
  Mircea Markus
  Infinispan lead (www.infinispan.org)
 
 
 
 
 
  ___
  infinispan-dev mailing list
  infinispan-dev@lists.jboss.org
  https://lists.jboss.org/mailman/listinfo/infinispan-dev
 
 
  --
  Galder Zamarreño
  gal...@redhat.com
  twitter.com/galderz
 
  Project Lead, Escalante
  http://escalante.io
 
  Engineer, Infinispan
  http://infinispan.org
 
 
  ___
  infinispan-dev mailing list
  infinispan-dev@lists.jboss.org
  https://lists.jboss.org/mailman/listinfo/infinispan-dev
 
  Cheers,
  --
  Mircea Markus
  Infinispan lead (www.infinispan.org)
 
 
 
 
 
  ___
  infinispan-dev mailing list
  infinispan-dev@lists.jboss.org
  https://lists.jboss.org/mailman/listinfo/infinispan-dev


 --
 Galder Zamarreño
 gal...@redhat.com
 twitter.com/galderz

 Project Lead, Escalante
 http://escalante.io

 Engineer, Infinispan
 http://infinispan.org


 ___
 infinispan-dev mailing list
 infinispan-dev@lists.jboss.org
 https://lists.jboss.org/mailman/listinfo/infinispan-dev

___
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev

Re: [infinispan-dev] Denormalizing hashes

2013-12-11 Thread Dan Berindei
Hi Radim

Actually, it's me that wrote the denormalization code :)

It was meant as a stop-gap measure before we upgraded the HotRod protocol
to support the segment-based consistent hash, but the denormalization
worked well enough (or so we thought) that we didn't get to changing the
protocol yet.

That's not a big change in itself, but we also wanted to make the
consistent hash per-cache on the client (it's now per-cache manager), and
that's a bit more complicated to do. And it's not like it would have been a
good idea to change this before starting the C++ client, the client would
still have to support the current style of consistent hash.


On Tue, Dec 10, 2013 at 8:17 PM, Radim Vansa rva...@redhat.com wrote:

 Hi Galder,

 as I am trying to debug some problem in C++ client, I was looking into
 the server code. And I am not sure whether I understand the code
 correctly, but it seems to me that the server denormalizes the
 consistent hash for each client anew (after each topology change or
 client joining). Is this true? Looking into trace logs, I can see stuff
 like

 18:15:17,339 TRACE [org.infinispan.server.hotrod.Encoders$Encoder12$]
 (HotRodServerWorker-12) Writing hash id 639767 for 192.168.11.101:11222

  From denormalizeSegmentHashIds() method I see that this means that we
 have executed the hash function 639768 times just to notify one client.
 Is my understanding correct?


Yes, this happens every time a client joins and/or every time the cache
topology changes.

We could easily cache the result of denormalizeSegmentHashIds, as it only
depends on the number of segments. It's just that I wasn't expecting it to
take so many iterations.




 Also, there is nothing like the concept of primary owner, is this right?


The client CH doesn't have a concept of backup owners. But for each (hash
id, server) pair that gets sent to the client, it means all the hash codes
between the previous hash id and this hash id have this server as the
primary owner. The server in the next (hash id, server) pair is the first
backup, and so on.

For each segment, the server generates numOwners (hash id, server) pairs.
That means, for most of the hash codes in the segment, the list of owners
on the client will be the same as the list of owners on the server. But for
0.0002 (leewayFraction) of the hash codes, the client primary owner will be
indeed one of the server backup owners.



 I thought that every first request in HotRod will go to primary owner,
 so that the PUT does not have to do the first hop and is executed
 directly on the primary. But it seems to me that it goes to any of the
 owners (practically random one, as you are only looking for the numOwner
 ids in leeway = on the beginning of the range - then, 99.98% or more
 requests should go to the server with last position in the leeway). This
 looks pretty suboptimal for writes, isn't it?


I'm not sure what you mean here, but I'm pretty sure the request goes to
the correct server because we have a test for it:
ConsistentHashV1IntegrationTest

Cheers
Dan




 Cheers

 Radim

 PS: for every line of code you write in Scala, God kills a kitten

 --
 Radim Vansa rva...@redhat.com
 JBoss DataGrid QA

 ___
 infinispan-dev mailing list
 infinispan-dev@lists.jboss.org
 https://lists.jboss.org/mailman/listinfo/infinispan-dev

___
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev

Re: [infinispan-dev] Denormalizing hashes

2013-12-11 Thread Dan Berindei
On Wed, Dec 11, 2013 at 10:38 AM, Radim Vansa rva...@redhat.com wrote:

  Hi Dan

 I am not speaking about changing something for the C++ client, I
 understand that the client code cannot be changed in order to keep the
 backward compatibility.


Sure, I was just trying to give some background information on what we
discussed and why we still have the wheel-based CH in the client.



 The current hash-wheel approach is working well, but there are few flaws
 that could be fixed keeping the client code untouched. Please, correct me
 if I am wrong.

 1) The denormalization is executed for every client for every topology
 change/client join. I don't have any numbers, but calling the hashing
 algorithm million times per every such occasion sounds as wasting computing
 power. - cache the denormalized stuff on server


+1, like I said it would be easy to do but it never came up as a problem
before.




 2) The server is sending numOwners hashIds per segment, one for each
 owner. What's the reason for that? I think that only primary owners should
 be inserted there. This would:


The main reason is to support clients from Infinispan 5.1, which pick a
random owner instead of always choosing the primary (
https://issues.jboss.org/browse/ISPN-2655).



 a) target all PUT requests to primary owner, reducing PUT latency and
 lowering the general load in cluster


Nope, it wouldn't. The same fraction of requests would go to the primary
owner as before, because we won't find the exact denormalized hash id
that maps to the segment border when normalized.


b) reduce the routing information


For 7.0, I guess we could say that 5.1 clients are no longer supported and
we could switch to sending only the primary owners to the clients. But I'm
not sure whether the loss of backwards compatibility is worth a couple
hundred bytes sent once for every client.



 And yes, ISPN-3530 and ISPN-3701 are pretty serious, but IMO rather
 orthogonal to the segment vs. hash wheel approach and its details.


Agree. Could you create issues in JIRA for both your proposals?



  Radim



 On 12/11/2013 09:18 AM, Dan Berindei wrote:

  Hi Radim

  Actually, it's me that wrote the denormalization code :)

  It was meant as a stop-gap measure before we upgraded the HotRod protocol
 to support the segment-based consistent hash, but the denormalization
 worked well enough (or so we thought) that we didn't get to changing the
 protocol yet.

 That's not a big change in itself, but we also wanted to make the
 consistent hash per-cache on the client (it's now per-cache manager), and
 that's a bit more complicated to do. And it's not like it would have been a
 good idea to change this before starting the C++ client, the client would
 still have to support the current style of consistent hash.


 On Tue, Dec 10, 2013 at 8:17 PM, Radim Vansa rva...@redhat.com wrote:

 Hi Galder,

 as I am trying to debug some problem in C++ client, I was looking into
 the server code. And I am not sure whether I understand the code
 correctly, but it seems to me that the server denormalizes the
 consistent hash for each client anew (after each topology change or
 client joining). Is this true? Looking into trace logs, I can see stuff
 like

 18:15:17,339 TRACE [org.infinispan.server.hotrod.Encoders$Encoder12$]
 (HotRodServerWorker-12) Writing hash id 639767 for 192.168.11.101:11222

  From denormalizeSegmentHashIds() method I see that this means that we
 have executed the hash function 639768 times just to notify one client.
 Is my understanding correct?


  Yes, this happens every time a client joins and/or every time the cache
 topology changes.

 We could easily cache the result of denormalizeSegmentHashIds, as it only
 depends on the number of segments. It's just that I wasn't expecting it to
 take so many iterations.




 Also, there is nothing like the concept of primary owner, is this right?


  The client CH doesn't have a concept of backup owners. But for each
 (hash id, server) pair that gets sent to the client, it means all the hash
 codes between the previous hash id and this hash id have this server as the
 primary owner. The server in the next (hash id, server) pair is the first
 backup, and so on.

  For each segment, the server generates numOwners (hash id, server)
 pairs. That means, for most of the hash codes in the segment, the list of
 owners on the client will be the same as the list of owners on the server.
 But for 0.0002 (leewayFraction) of the hash codes, the client primary owner
 will be indeed one of the server backup owners.



 I thought that every first request in HotRod will go to primary owner,
 so that the PUT does not have to do the first hop and is executed
 directly on the primary. But it seems to me that it goes to any of the
 owners (practically random one, as you are only looking for the numOwner
 ids in leeway = on the beginning of the range - then, 99.98% or more
 requests should go to the server with last position in the leeway

Re: [infinispan-dev] Denormalizing hashes

2013-12-11 Thread Dan Berindei
On Wed, Dec 11, 2013 at 11:37 AM, Radim Vansa rva...@redhat.com wrote:



 On Wed, Dec 11, 2013 at 10:38 AM, Radim Vansa rva...@redhat.com wrote:

  Hi Dan

 I am not speaking about changing something for the C++ client, I
 understand that the client code cannot be changed in order to keep the
 backward compatibility.


  Sure, I was just trying to give some background information on what we
 discussed and why we still have the wheel-based CH in the client.



 The current hash-wheel approach is working well, but there are few flaws
 that could be fixed keeping the client code untouched. Please, correct me
 if I am wrong.

 1) The denormalization is executed for every client for every topology
 change/client join. I don't have any numbers, but calling the hashing
 algorithm million times per every such occasion sounds as wasting computing
 power. - cache the denormalized stuff on server


  +1, like I said it would be easy to do but it never came up as a problem
 before.

Fair enough.




 2) The server is sending numOwners hashIds per segment, one for each
 owner. What's the reason for that? I think that only primary owners should
 be inserted there. This would:


  The main reason is to support clients from Infinispan 5.1, which pick a
 random owner instead of always choosing the primary (
 https://issues.jboss.org/browse/ISPN-2655).


 You can always report numKeyOwners=1 and the old code should handle that.


Yeah, it looks like it would work. I was thinking that when retrying a
failed operation, the pre-5.2 client would still try one of the key owners,
but I see now that it always chose a random server when retrying. Also, the
client doesn't expose numKeyOwners to the user, like I had assumed.






  a) target all PUT requests to primary owner, reducing PUT latency and
 lowering the general load in cluster


  Nope, it wouldn't. The same fraction of requests would go to the primary
 owner as before, because we won't find the exact denormalized hash id
 that maps to the segment border when normalized.


 Oh, only now have I noticed that the hashIds are sorted by the normalized
 ID, therefore, the primary owner always picks the first position (and most
 requests will hit the first). Mea culpa. Still, it makes no sense to
 include the backup owners into the routing table, as the probability that a
 read will hit them is negligable.



   b) reduce the routing information


  For 7.0, I guess we could say that 5.1 clients are no longer supported
 and we could switch to sending only the primary owners to the clients. But
 I'm not sure whether the loss of backwards compatibility is worth a couple
 hundred bytes sent once for every client.



 And yes, ISPN-3530 and ISPN-3701 are pretty serious, but IMO rather
 orthogonal to the segment vs. hash wheel approach and its details.


  Agree. Could you create issues in JIRA for both your proposals?


 OK. By the way, shouldn't we tag the features that should be included for
 hotrod protocol v1.4 with some tag, such as hotrod14?
 But as I said, if we fake the numOwners to be always 1, it should be
 backwards compatible even now.
 Nevertheless, as you in fact do route most requests to the primary owner,
 it won't provide much performance gain and it's not as hot as I first
 thought.

 Radim


 ___
 infinispan-dev mailing list
 infinispan-dev@lists.jboss.org
 https://lists.jboss.org/mailman/listinfo/infinispan-dev

___
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev

Re: [infinispan-dev] Clean ThreadLocals

2013-12-12 Thread Dan Berindei
On Thu, Dec 12, 2013 at 12:52 AM, David M. Lloyd david.ll...@redhat.comwrote:

 On 12/11/2013 04:47 PM, Pedro Ruivo wrote:
  Hi,
 
  I've created a method to clean a specific ThreadLocal variable from all
  live threads [1].
 
  My goal is to clean the ThreadLocal variables after a cache stops. It's
  kind expensive method (it uses reflection) but I think it is fine.
 
  Currently, I'm using it to clear the Marshaller ThreadLocal in here [2].
 
  Does anyone see any problem with this approach? Maybe it can be used in
  other possible leaking ThreadLocals?

 It's an interesting idea (I've done something similar in the past to
 simply drop all thread locals).  I would recommend that you check that
 all the JDKs you want to support use the same technique for thread
 locals though.  Because these fields are not part of the JDK, they may
 not always exist in all environments.


Yeah, the implementation of ThreadLocal has changed in the past and is
likely to change again in the future. If that happens and we have to
support different JDKs with different methods for clearing ThreadLocals, it
won't be pretty.


 Also, it is important to only ever clean the thread locals of your
 current thread, or you're inviting all kinds of bizarre concurrency
 problems (maybe even locking threads into infinite loops), especially if
 the thread is running and actively using thread locals at the time.


Indeed, ThreadLocalMap doesn't look to be thread-safe. I'm not sure if a
remote() from another thread can cause an infinite loop like in HashMap
because it uses open addressing instead of chaining, but it looks like it
may cause a get() for a different ThreadLocal to return the wrong instance.
(I would be ok with it if it would only cause breakage for threads using
the cache that's being stopped, but it looks like it can touch more than
one entry.)

I think we shouldn't concern ourselves with actually removing the
ThreadLocal from the map, we just have to keep track of the ThreadLocal
instances and clean any references to other objects they might have (a la
ExternalizerTableProxy). Or maybe give up on ThreadLocals and use a
ConcurrentWeakKeyHashMapThread, PerThreadInstanceHolder instead.

Dan



 --
 - DML
 ___
 infinispan-dev mailing list
 infinispan-dev@lists.jboss.org
 https://lists.jboss.org/mailman/listinfo/infinispan-dev

___
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev

Re: [infinispan-dev] Design of Remote Hot Rod events - round 2

2013-12-19 Thread Dan Berindei
On Thu, Dec 19, 2013 at 2:15 PM, Emmanuel Bernard emman...@hibernate.orgwrote:

 On Thu 2013-12-19  9:46, Galder Zamarreño wrote:
   == Example of continuous query atop remote listeners
  
   Thinking about how to implement continuous query atop this
   infrastructure I am missing a few things.
  
   The primary problem is that I don't want to enlist a filter id per
   continuous query I want to run. Not only that but I'd love to be able
 to
   add a continuous query on the fly and disable it on the fly as well per
   client. For that filters and converters are not flexible enough.
  
   What is missing is the ability to pass parameters from the client to
   the remote filter and remote converter. Parameters should be provided
   *per client*. Say Client 1 register the continuous query listener with
   where age  19 and client 2 registers the CQ listener with where
 name
   = emmanuel. The filter knowing for which client it filters, it will
 be able to only
   return the keys that match the query.
 
  This all sounds a bit like remote code exectution to me? You're asking
 for the client to pass some kind of executable thing that acts as a filter.
 That's a separate feature IMO, which I believe @Tristan is looking into.
 Once that's in place, I'm happy to enhance stuff in the remote event side
 to support it.

 I don't think you are correct.
 This is not remote execution in the sense of arbitrary code driven by
 the client. Remote execution will likely be triggered, render a
 result and stop. It will not send matching events in a continuous fashion.
 Plus remote execution will likely involve dynamic languages and I'm not
 sure we want to go that route for things like continuous query.


To be clear, this is exactly the same as the filter parameters that Radim
was asking for, right? From Infinispan's point of view, the filter just
takes a String parameter, and the fact that that string can be parsed by
the filter in a particular language is irrelevant.
___
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev

Re: [infinispan-dev] Design of Remote Hot Rod events - round 2

2013-12-20 Thread Dan Berindei
On Thu, Dec 19, 2013 at 8:43 PM, Emmanuel Bernard emman...@hibernate.orgwrote:

 On Thu 2013-12-19 19:57, Dan Berindei wrote:
  On Thu, Dec 19, 2013 at 2:15 PM, Emmanuel Bernard 
 emman...@hibernate.orgwrote:
 
   On Thu 2013-12-19  9:46, Galder Zamarreño wrote:
 == Example of continuous query atop remote listeners

 Thinking about how to implement continuous query atop this
 infrastructure I am missing a few things.

 The primary problem is that I don't want to enlist a filter id per
 continuous query I want to run. Not only that but I'd love to be
 able
   to
 add a continuous query on the fly and disable it on the fly as
 well per
 client. For that filters and converters are not flexible enough.

 What is missing is the ability to pass parameters from the client
 to
 the remote filter and remote converter. Parameters should be
 provided
 *per client*. Say Client 1 register the continuous query listener
 with
 where age  19 and client 2 registers the CQ listener with where
   name
 = emmanuel. The filter knowing for which client it filters, it
 will
   be able to only
 return the keys that match the query.
   
This all sounds a bit like remote code exectution to me? You're
 asking
   for the client to pass some kind of executable thing that acts as a
 filter.
   That's a separate feature IMO, which I believe @Tristan is looking
 into.
   Once that's in place, I'm happy to enhance stuff in the remote event
 side
   to support it.
  
   I don't think you are correct.
   This is not remote execution in the sense of arbitrary code driven by
   the client. Remote execution will likely be triggered, render a
   result and stop. It will not send matching events in a continuous
 fashion.
   Plus remote execution will likely involve dynamic languages and I'm not
   sure we want to go that route for things like continuous query.
  
 
  To be clear, this is exactly the same as the filter parameters that Radim
  was asking for, right? From Infinispan's point of view, the filter just
  takes a String parameter, and the fact that that string can be parsed by
  the filter in a particular language is irrelevant.

 Not sure what string you are talking about. The filterid?


I was referring to the condition string: where age  19, or where name
= emmanuel.
___
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev

Re: [infinispan-dev] Design of Remote Hot Rod events - round 2

2013-12-20 Thread Dan Berindei
On Fri, Dec 13, 2013 at 4:11 PM, Radim Vansa rva...@redhat.com wrote:

 On 12/13/2013 02:44 PM, Galder Zamarreño wrote:
  On Dec 6, 2013, at 10:45 AM, Radim Vansa rva...@redhat.com wrote:
 
  Hi,
 
  1) IMO, filtering for specific key is a very important use case.
 Registering a filterId is a very powerful feature, but as long as you don't
 provide runtime parameter for this filter, you cannot implement one-key
 filtering.
  What do you mean by runtime parameter exactly? Can you give a concrete
 example of what you want to achieve that is not possible with what I've
 written up?

 As I stressed, if the client wants to listen for events on key_123456,
 then you can deploy a filter matching key_{number} (and additional
 constraints) but the 123456 is not known at deployment time.

 
  2) setting ack/no ack in listener, and then configuring server-wise
 whether you should ack each / only last event sounds weird. I'd replace the
 boolean with enum { NO_ACK, ACK_EACH, ACK_LAST }.
  Makes a lot of sense, +1.
 
  3) should the client provide source id when registering listener or
 when starting RemoteCacheManager? No API for that.
  Every operation will require a source ID from now on, so clients must
 provide it from first operation sent to the server. From a Java client
 perspective, you'd have this from the start via the configuration.
 
  4) clustered events design does not specify any means to replicating
 the clustered event listener - all it does is that you register the
 listener on one node and the other nodes then route events to this node,
 until the node dies/deregisters the listener. No replication. Please
 specify, how should it piggyback on clustered events, and how should the
 listener list be replicated.
  In clustered listeners, the other nodes you talk about are gonna need to
 know about the clustered listeners so that they route events. Some kind of
 information about these clustered listeners will need to be sent around the
 cluster. The exact details are probably implementation details but we have
 a clustered registry already in place for this kind of things. In any case,
 it'd make a lot of sense that both use cases reuse as much as logic in this
 area.

 OK, this is probably the desired behaviour, it just is not covered by
 the Clustered Events design draft. Probably something to add - I'll ping
 Mircea about that. And you're right that it would make a lot of sense to
 have shared structure for the listeners, and two implementations of the
 delivery boy (one to the node where a clustered event has been
 registered and second to local component handling HotRod clients).

 
  5) non-acked events: how exactly do you expect the ack data to be
 replicated, and updated? I see three options:
  A) Let non-acked list be a part of the listener record in replicated
 cache, and the primary owner which executes the event should update these
 via delta messages. I guess for proper reliability it should add operation
 record synchronously before confirming the operation to the originator, and
 then it might asynchronously remove it after the ack from client. When a
 node becomes primary owner, it should send events to client for all
 non-acked events.
  B) Having the non-acked list attached directly to cache entry (updating
 it together with regular backup), and then asynchronously updating the
 non-ack list after ack comes
  C) Separate cache for acks by entry keys, similar to B, consistent hash
 synced with the main entry cache
  Definitely not B. I don't wanna tie the internal cache entry to the
 ACKs. The two should be independent. Either C or A. For C, you'd wished to
 have a single cache for all listeners+caches, but you'd have to think about
 the keys and to have the same consistent hash, you'd have to have same
 keys. A might be better, but you certainly don't want this ACK info in a
 replicated structure. You'd want ACKs in a distributed cache preferably,
 and clustered listener info in the clustered replicated registry.
 There already is some CH implementation which aims at sharing the same
 distribution for all caches, SyncConsistentHash. Is there some problem
 with C and forcing this for the caches? Dan?


I'm not sure what the exact requirements would be here.

SyncConsistentHashFactory does ensure that the same key is mapped to the
same owners in all the caches using it, most of the time. However, it
requires both caches to have the same members, and since topologies aren't
applied at exactly the same time there will be periods when the owners in
the two caches won't match.
___
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev

Re: [infinispan-dev] Design of Remote Hot Rod events - round 2

2013-12-20 Thread Dan Berindei
On Fri, Dec 20, 2013 at 12:20 PM, Emmanuel Bernard
emman...@hibernate.orgwrote:

 On Fri 2013-12-20 12:09, Dan Berindei wrote:
  On Thu, Dec 19, 2013 at 8:43 PM, Emmanuel Bernard 
 emman...@hibernate.orgwrote:
 
   On Thu 2013-12-19 19:57, Dan Berindei wrote:
On Thu, Dec 19, 2013 at 2:15 PM, Emmanuel Bernard 
   emman...@hibernate.orgwrote:
   
 On Thu 2013-12-19  9:46, Galder Zamarreño wrote:
   == Example of continuous query atop remote listeners
  
   Thinking about how to implement continuous query atop this
   infrastructure I am missing a few things.
  
   The primary problem is that I don't want to enlist a filter id
 per
   continuous query I want to run. Not only that but I'd love to
 be
   able
 to
   add a continuous query on the fly and disable it on the fly as
   well per
   client. For that filters and converters are not flexible
 enough.
  
   What is missing is the ability to pass parameters from the
 client
   to
   the remote filter and remote converter. Parameters should be
   provided
   *per client*. Say Client 1 register the continuous query
 listener
   with
   where age  19 and client 2 registers the CQ listener with
 where
 name
   = emmanuel. The filter knowing for which client it filters, it
   will
 be able to only
   return the keys that match the query.
 
  This all sounds a bit like remote code exectution to me? You're
   asking
 for the client to pass some kind of executable thing that acts as a
   filter.
 That's a separate feature IMO, which I believe @Tristan is looking
   into.
 Once that's in place, I'm happy to enhance stuff in the remote
 event
   side
 to support it.

 I don't think you are correct.
 This is not remote execution in the sense of arbitrary code driven
 by
 the client. Remote execution will likely be triggered, render a
 result and stop. It will not send matching events in a continuous
   fashion.
 Plus remote execution will likely involve dynamic languages and
 I'm not
 sure we want to go that route for things like continuous query.

   
To be clear, this is exactly the same as the filter parameters that
 Radim
was asking for, right? From Infinispan's point of view, the filter
 just
takes a String parameter, and the fact that that string can be
 parsed by
the filter in a particular language is irrelevant.
  
   Not sure what string you are talking about. The filterid?
  
  
  I was referring to the condition string: where age  19, or where name
  = emmanuel.

 OK. I am not sure that the fact that the parameter is a string in both
 cases and that there is only one parameter and not multiple is
 particularly relevant
 to the general problem at hand.


I don't think so either, I was just asking if you think there is any
difference between what you're asking for and what Radim was asking for...
___
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev

Re: [infinispan-dev] help with Infinispan OSGi

2014-01-09 Thread Dan Berindei
On Tue, Jan 7, 2014 at 9:14 PM, Brett Meyer brme...@redhat.com wrote:

 Apologies for the delay -- things have been nuts.

 Here's the route I've taken so far.  I created OsgiClassLoader that
 searches available Bundles for classes and resources.  A new (non-static)
 AggregateClassLoader replaces FileLookup, CL-related methods in Util, and
 parts of ReflectionUtil.  AggregateClassLoader extends/overrides
 ClassLoader and searches over a prioritized list of user/app CL, OsgiCL,
 System CL, TCCL, etc.

 This is easily wired into GlobalConfiguration concepts.  However, I'm not
 exactly sure how this will play into Configuration and Cache.
  Configuration#classLoader is deprecated.  Is that in favor of
 CacheImpl#getClassLoader  CacheImpl#with(ClassLoader)?


Actually, I think the idea was to only set the classloader in the
GlobalConfiguration and use a separate CacheManager for each deployment,
removing the need for AdvancedCache.with(ClassLoader) as well. But I'm not
sure if we'll ever get to that...

AdvancedCache.with(ClassLoader) is also very limited in scope. The
classloader can't be sent remotely so AFAICT it's only really useful for
unmarshalling return values for get operations with storeAsBinary enabled.



 Can someone describe the scoping of CL concepts between
 GlobalConfiguration and the Configuration/Cache?  Should both have their
 own instance of AggregateClassLoader?  Cache's instance would put the
 user/app provided CL on the top of the queue?


Sounds about right.

For marshalling, ATM we use an EmbeddedContextClassResolver, which uses the
InvocationContextContainer to obtain the classloader set by the user with
AdvancedCache.with(ClassLoader) (unlike JBoss Marshalling's
ContextClassResolver, which uses the TCCL). You will probably want to
implement your own ClassResolver based on AggregateClassLoader, but using
InvocationContextContainer as well.

I'm not sure about other places where we need to load classes. While
working on https://issues.jboss.org/browse/ISPN-3836 I did find some
instances of ServiceLoader.load(Class), which use the TCCL implicitly, but
I'm trying to change that to use the global/cache configuration's
classloader.


 Hopefully that all makes sense...

 Brett Meyer
 Red Hat, Hibernate ORM

 - Original Message -
 From: Galder Zamarreño gal...@redhat.com
 To: Eric Wittmann eric.wittm...@redhat.com
 Cc: infinispan -Dev List infinispan-dev@lists.jboss.org, Brett
 Meyer brme...@redhat.com, Sanne Grinovero sa...@hibernate.org,
 Pete Muir pm...@redhat.com, Randall Hauch rha...@redhat.com,
 Steve Jacobs sjac...@redhat.com
 Sent: Wednesday, December 18, 2013 8:22:13 AM
 Subject: Re: [infinispan-dev] help with Infinispan OSGi


 On Dec 16, 2013, at 3:00 PM, Eric Wittmann eric.wittm...@redhat.com
 wrote:

  I wanted to add that in the Overlord group we're also looking into using
 ISPN in OSGi.  Our directive is to get our projects running in Fuse 6.1.
 
  To that end I've been working on getting Overlord:S-RAMP up and running,
 which requires both ModeShape and ISPN.
 
  Additionally, Gary Brown uses ISPN in Overlord:RTGov and so will need to
 get it working directly (no ModeShape) in Fuse 6.1.
 
  I've made some progress on my end but have run into some of the same
 issues as Brett.
 
  An additional issue I hit was the use of Java's ServiceLoader for
 org.infinispan.configuration.parsing.ConfigurationParser.  None of the
 parsers get loaded because ServiceLoader doesn't work particularly well in
 OSGi.  We had this same issue in S-RAMP (we use ServiceLoader in a few
 places).  I solved it by using the OSGi Service Registry when running in an
 OSGi container, but continuing to use ServiceLoader otherwise.

 ^ Can you add a JIRA for this so that we can abstract this away? I'm not
 sure how exactly we'd decide on the impl to use. By default it'd be SL
 impl. When used on OSGI though, an alternative service loading impl would
 need to be configured specifically by the user? Or would Infinispan itself
 detect that it's in OSGi and hence used the corresponding impl? I've no
 idea about OSGI.

  In any case - I was wondering if anyone thought it might be a good idea
 to create a git repo where we can create some test OSGi applications that
 use ISPN and can be deployed (e.g. to Fuse).  This would be for testing
 purposes only - to shake out problems.  Might be useful for collaboration?

 A quickstart on [1] would be the perfect place for something like that,
 i.e. fuse + infinispan or something like that.

 [1] http://www.jboss.org/jdf/quickstarts/get-started/

 
  -Eric
 
 
  On 12/12/2013 12:55 PM, Brett Meyer wrote:
  I finally had a chance to start working with this, a bit, today.
  Here's what I've found so far.
 
  In general, I'm seeing 2 types of CL issues come up when testing w/
 hibernate-infinispan:
 
  1.) Reliance on the client bundle's CL.  Take the following stack as an
 example: https://gist.github.com/brmeyer/c8aaa1157a4a951a462c.
  Hibernate's 

Re: [infinispan-dev] Integration between HotRod and OGM

2014-01-22 Thread Dan Berindei
On Tue, Jan 21, 2014 at 4:07 PM, Mircea Markus mmar...@redhat.com wrote:

 Hi Emmanuel,

 Just had a good chat with Davide on this and one solution to overcome the
 shortcoming you mentioned in the above email would be to enhance the hotrod
 client to support grouping:

 RemoteClient.put(G g, K k, V v); //first param is the group
 RemoteClinet.getGroup(G g) : MapK,V;


I think you'd also need RemoteClient.get(G g, K k), as in embedded mode the
group is included in the key.




 It requires an enhancement on our local grouping API:
 EmbeddedCache.getGroup(G). This is something useful for us in a broader
 context, as it is the step needed to be able to deprecated AtomicMaps and
 get suggest them being replaced with Grouping.


It would also require us to keep a SetK for each group, with the keys
associated with that group. As such, I'm not sure it would be a lot easier
to implement (correctly) than FineGrainedAtomicMap.


 This approach still has some limitations compared to the current embedded
 integration:
 - performance caused by the lack of transactions: this means increased TCP
 chattiness between the Hot Rod client and the server.
 - you'd have to handle atomicity, potentially by retrying an operation

 What do you think?


 On Dec 3, 2013, at 3:10 AM, Mircea Markus mmar...@redhat.com wrote:

 
  On Nov 19, 2013, at 10:22 AM, Emmanuel Bernard emman...@hibernate.org
 wrote:
 
  It's an interesting approach that would work fine-ish for entities
  assuming the Hot Rod client is multi threaded and assuming the client
  uses Future to parallelize the calls.
 
  The Java Hotrod client is both multithreaded and exposes an Async API.
 
 
  But it won't work for associations as we have them designed today.
  Each association - or more precisely the query results to go from an
  entity A1 to the list of entities B associated to it - is represented by
  an AtomicMap.
  Each entry in this map does correspond to an entry in the association.
 
  While we can guess the column names and build from the metadata the
  list of composed keys for entities, we cannot do the same for
  associations as the key is literally the (composite) id of the
  association and we cannot guess that most of the time (we can in very
  pathological cases).
  We could imagine that we list the association row keys in a special
  entry to work around that but this approach is just as problematic and
  is conceptually the same.
  The only solution would be to lock the whole association for each
  operation and I guess impose some versioning / optimistic lock.
 
  That is not a pattern that scales sufficiently from my experience.
 
  I think so too :-)
 
  That's the problem with interconnected data :)
 
  Emmanuel
 
  On Mon 2013-11-18 23:05, Mircea Markus wrote:
  Neither the grouping API nor the AtomicMap work over hotrod.
  Between the grouping API and AtomicMap, I think the one that would
 make more sense migrating is the grouping API.
  One way or the other, I think the hotrod protocol would require an
 enhancement - mind raising a JIRA for that?
  For now I guess you can sacrifice performance and always sending the
 entire object across on every update instead of only the deltas?
 
  On Nov 18, 2013, at 9:56 AM, Emmanuel Bernard emman...@hibernate.org
 wrote:
 
  Someone mentioned the grouping API as some sort of alternative to
  AtomicMap. Maybe we should use that?
  Note that if we don't have a fine-grained approach we will need to
  make sure we *copy* the complex data structure upon reads to mimic
  proper transaction isolation.
 
  On Tue 2013-11-12 15:14, Sanne Grinovero wrote:
  On 12 November 2013 14:54, Emmanuel Bernard emman...@hibernate.org
 wrote:
  On the transaction side, we can start without them.
 
  +1 on omitting transactions for now.
 
  And on the missing AtomicMaps, I hope the Infinispan will want to
 implement it?
  Would be good to eventually converge on similar featuresets on remote
  vs embedded APIs.
 
  I know the embedded version relies on batching/transactions, but I
  guess we could obtain a similar effect with some ad-hoc commands in
  Hot Rod?
 
  Sanne
 
 
  On Tue 2013-11-12 14:34, Davide D'Alto wrote:
  Hi,
  I'm working on the integration between HotRod and OGM.
 
  We already have a dialect for Inifinispan and I'm trying to follow
 the same
  logic.
  At the moment I'm having two problems:
 
  1) In the Infinispan dialect we are using the AtomicMap and the
  AtomicMapLookup but this classes don't work with the RemoteCache.
 Is there
  an equivalent for HotRod?
 
  2) As far as I know HotRod does not support transactions. I've
 found a link
  to a branch on Mircea repository:
  https://github.com/mmarkus/ops_over_hotrod/wiki/Usage-guide
  Is this something I could/should use?
 
  Any help is appreciated.
 
  Thanks,
  Davide
 
  ___
  infinispan-dev mailing list
  infinispan-dev@lists.jboss.org
  https://lists.jboss.org/mailman/listinfo/infinispan-dev
 
  

Re: [infinispan-dev] Integration between HotRod and OGM

2014-01-23 Thread Dan Berindei
On 22 Jan 2014 16:10, Pedro Ruivo pe...@infinispan.org wrote:



 On 01/22/2014 01:58 PM, Dan Berindei wrote:
 
 
  It would also require us to keep a SetK for each group, with the keys
  associated with that group. As such, I'm not sure it would be a lot
  easier to implement (correctly) than FineGrainedAtomicMap.
 
 

 Dan, I didn't understand why do we need to keep a SetK. Can you
 elaborate?


We'd need some way to keep track of the keys that are part of the group,
iterating over the entire cache for every getGroup() call would be way too
slow.
___
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev

Re: [infinispan-dev] Dropping AtomicMap/FineGrainedAtomicMap

2014-01-27 Thread Dan Berindei
I think it's way too early to discuss removing FineGrainedAtomicMap and
AtomicMap, as long as we don't have a concrete alternative with similar
properties.

Cache.getGroup(groupName) is just a method name at this point, we don't
have any idea how it will compare to AtomicMap/FineGrainedAtomicMap from a
transaction isolation or performance perspective. BTW, do we really need
the group name to be a String?

A good way to prove that the grouping API is a proper replacement for the
atomic maps would be to replace the usage of atomic maps in the Tree module
with the grouping API. Unless we plan to drop the Tree module completely...

Cheers
Dan



On Wed, Jan 22, 2014 at 2:45 PM, Mircea Markus mmar...@redhat.com wrote:


 On Jan 21, 2014, at 8:42 PM, Vladimir Blagojevic vblag...@redhat.com
 wrote:

  I agree with Erik here. Deltas are used in M/R and I've never detected
  any problems so far.
  On 1/21/2014, 1:39 PM, Erik Salter wrote:
  Please don't remove the Delta stuff.  That's quite useful, especially
 for
  large collections.

 +1 to keep DeltaAware. Thanks for the feedbak

 
  Erik
 
 
  ___
  infinispan-dev mailing list
  infinispan-dev@lists.jboss.org
  https://lists.jboss.org/mailman/listinfo/infinispan-dev

 Cheers,
 --
 Mircea Markus
 Infinispan lead (www.infinispan.org)




 ___
 infinispan-dev mailing list
 infinispan-dev@lists.jboss.org
 https://lists.jboss.org/mailman/listinfo/infinispan-dev

___
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev

Re: [infinispan-dev] Integration between HotRod and OGM

2014-01-27 Thread Dan Berindei
On Mon, Jan 27, 2014 at 2:43 PM, Pedro Ruivo pe...@infinispan.org wrote:



 On 01/27/2014 12:26 PM, Emmanuel Bernard wrote:
  I'd be curious to see performance tests on Pedro's approach (ie walk
  through the entire data key set to find the matching elements of a given
  group). That might be fast enough but that looks quite scary compared to
  a single lookup.

 I would prefer to have a performance hit than a map of sets (group name
 = set of keys). I also think that keep this map synchronized with the
 keys in data container will not be easy...


Sure, I would prefer the simpler implementation as well. But if changing an
application to use groups instead of atomic maps will change the processing
time of a request from 1ms to 1s, I'm pretty sure users will prefer to keep
use the atomic maps :)


 
  Any doc explaining how FGAM is broken in transactions for curiosity.
 

 well, the map is not isolated, so you can see the updates from other
 transactions immediately (https://issues.jboss.org/browse/ISPN-3932)


Do you know if AtomicMap is affected, too?



 It also does not work when you enable write skew check with optimistic
 transactions (we have a JIRA somewhere)


I assume you mean https://issues.jboss.org/browse/ISPN-3939 ?
This looks like it also affects AtomicMap, so the only workaround is to use
pessimistic locking.




  On Mon 2014-01-27 11:54, Pedro Ruivo wrote:
 
 
  On 01/27/2014 09:52 AM, Sanne Grinovero wrote:
  On 27 January 2014 09:38, Pedro Ruivo pe...@infinispan.org wrote:
 
 
  On 01/27/2014 09:20 AM, Sanne Grinovero wrote:
  On 23 January 2014 18:03, Dan Berindei dan.berin...@gmail.com
 wrote:
 
  On 22 Jan 2014 16:10, Pedro Ruivo pe...@infinispan.org wrote:
 
 
 
  On 01/22/2014 01:58 PM, Dan Berindei wrote:
 
 
  It would also require us to keep a SetK for each group, with
 the keys
  associated with that group. As such, I'm not sure it would be a
 lot
  easier to implement (correctly) than FineGrainedAtomicMap.
 
 
 
  Dan, I didn't understand why do we need to keep a SetK. Can you
  elaborate?
 
 
  We'd need some way to keep track of the keys that are part of the
 group,
  iterating over the entire cache for every getGroup() call would be
 way too
  slow.
 
  Right, and load all entries from any CacheStore too :-/
 
  IMO, I prefer to iterate over the data container and cache loader when
  it is needed than keep the SetK for each group. I think the memory
  will thank you
 
  Of course. I'm just highlighting how importand Dan's comment is,
  because we obviously don' t want to load everything from CacheStore.
  So, tracking which entries are part of the group is essential:
  failing this, I'm still skeptical about why the Grouping API is a
  better replacement than FGAM.
 
  I have one reason: FGAM does not work inside transactions...
 
 
  Sanne
  ___
  infinispan-dev mailing list
  infinispan-dev@lists.jboss.org
  https://lists.jboss.org/mailman/listinfo/infinispan-dev
 
  ___
  infinispan-dev mailing list
  infinispan-dev@lists.jboss.org
  https://lists.jboss.org/mailman/listinfo/infinispan-dev
  ___
  infinispan-dev mailing list
  infinispan-dev@lists.jboss.org
  https://lists.jboss.org/mailman/listinfo/infinispan-dev
 
 ___
 infinispan-dev mailing list
 infinispan-dev@lists.jboss.org
 https://lists.jboss.org/mailman/listinfo/infinispan-dev

___
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev

Re: [infinispan-dev] L1OnRehash Discussion

2014-02-04 Thread Dan Berindei
On Tue, Feb 4, 2014 at 10:07 AM, Galder Zamarreño gal...@redhat.com wrote:


 On 28 Jan 2014, at 15:29, William Burns mudokon...@gmail.com wrote:

  Hello everyone,
 
  I wanted to discuss what I would say as dubious benefit of L1OnRehash
  especially compared to the benefits it provide.
 
  L1OnRehash is used to retain a value by moving a previously owned
  value into the L1 when a rehash occurs and this node no longer owns
  that value  Also any current L1 values are removed when a rehash
  occurs.  Therefore it can only save a single remote get for only a few
  keys when a rehash occurs.
 
  This by itself is fine however L1OnRehash has many edge cases to
  guarantee consistency as can be seen from
  https://issues.jboss.org/browse/ISPN-3838.  This can get quite
  complicated for a feature that gives marginal performance increases
  (especially given that this value may never have been read recently -
  at least normal L1 usage guarantees this).
 
  My first suggestion is instead to deprecate the L1OnRehash
  configuration option and to remove this logic.

 +1


+1 from me as well



  My second suggestion is a new implementation of L1OnRehash that is
  always enabled when L1 threshold is configured to 0.  For those not
  familiar L1 threshold controls whether invalidations are broadcasted
  instead of individual messages.  A value of 0 means to always
  broadcast.  This would allow for some benefits that we can't currently
  do:
 
  1. L1 values would never have to be invalidated on a rehash event
  (guarantee locality reads under rehash)
  2. L1 requestors would not have to be tracked any longer
 
  However every write would be required to send an invalidation which
  could slow write performance in additional cases (since we currently
  only send invalidations when requestors are found).  The difference
  would be lessened with udp, which is the transport I would assume
  someone would use when configuring L1 threshold to 0.

 Sounds good to me, but I think you could go even beyond this and maybe get
 rid of threshold configuration option too?

 If the transport is UDP and multicast is configured, invalidations are
 broadcasted (and apply the two benefits you mention).
 If UDP w/ unicast or TCP used, track invalidations and send them as
 unicasts.

 Do we really need to expose these configuration options to the user?


I think the idea was that even with UDP, sending 2 unicasts and waiting for
only 2 responses may be faster than sending a multicast and waiting for 10
responses. However, I'm not sure that's the case if we send 1 unicast
invalidation from each owner instead of a single multicast invalidation
from the primary owner/originator [1]. Maybe if each owner would return a
list of requestors and the originator would do the invalidation at the
end...

One tangible benefit of having the setting is that we can run the test
suite with TCP only, and still cover every path in L1Manager. If removed it
completely, it would still be possible to change the toggle in
L1ManagerImpl via reflection, but it would be a little hacky.


  What do you guys think?  I am thinking that no one minds the removal
  of L1OnRehash that we have currently (if so let me know).  I am quite
  curious what others think about the changes for L1 threshold value of
  0, maybe this configuration value is never used?
 


Since we don't give any guidance as to what a good threshold value would
be, I doubt many people use it.

My alternative proposal would be to replace the
invalidationThreshold=-1|0|0 setting with a traceRequestors=true|false
setting.
1. If traceRequestors == false, don't keep track of requestors, only send
the invalidation from the originator, and enable l1OnRehash.
This means we can keep the entries that are in L1 after a rehash as
well.
2. If traceRequestors == true, track requestors, send unicast/multicast
invalidations depending on the transport, and disable l1OnRehash.


[1] https://issues.jboss.org/browse/ISPN-186

Cheers
Dan
___
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev

Re: [infinispan-dev] [jboss-as7-dev] Module jars dissapearing leaving empty classes/ folders and errors

2014-02-04 Thread Dan Berindei
On Tue, Feb 4, 2014 at 2:36 PM, Galder Zamarreño gal...@redhat.com wrote:

 Narrowing down the list now, since this is a problem of how our CI is
 doing builds.

 These logs are retrieved from [1].

 Dunno how our CI is configured but this is odd. Seems like the build is
 halt due to test failures, but it continues somehow? I mean, the jars are
 not being produced properly, but the build is not halting.


We run the build with -fn (fail-never), so the build should never be halted
because of a test failure. The configuration is here:
http://ci.infinispan.org/admin/editRunType.html?id=buildType:bt8runnerId=RUNNER_1




 It's about time we did the following:
 1) Any test failures should halt the build there and then. IOW, do not
 continue the build at all.


Will having 100 tests in one run and 2000 tests in another really help?



 2) Any tests that fail randomly should be disabled.


Let's go ahead and disable all the server tests then? ;)




 Cheers,

 [1] https://dl.dropboxusercontent.com/u/6148072/does-not-work.log

 On 04 Feb 2014, at 13:30, Galder Zamarreño gal...@redhat.com wrote:

 
  On 04 Feb 2014, at 10:38, Stuart Douglas stuart.w.doug...@gmail.com
 wrote:
 
  It is almost certainly something to do with this:
 
  module-def name=org.infinispan.server.rest
 src=${infinispan.server.modules.dir}
 
  maven-resource-with-classifier group=org.infinispan
 artifact=infinispan-server-rest classifier=classes /
 
  /module-def
 
  I guess sometimes the classes artefact is being attached as a reference
 to the classes directory, rather than a reference to a jar, which causes
 the issue.
 
  Here's a gist with a subset of the build log [1]. When it works fine,
 it's copying a jar, when it's not, it's copying an empty folder.
 
  However, this is not only happening for the org.infinispan.server.rest
 module, others show the same issue [2]. What seems to be a pattern is that
 it only happens with modules that are built by us, it's not happening for
 modules coming with the base AS/WF instance.
 
  I've traced back and this might be due to build failures that are not
 producing the right jars [3].
 
  @Stuart, this is really our problem. Sorry for the inconvenience!
 
  [1] https://gist.github.com/galderz/b9286f385aad1316df51
  [2] https://gist.github.com/galderz/9e6a9bd9b18b805db323
  [3] https://gist.github.com/galderz/6ab662a1027cd96cbd8c
 
 
  Stuart
 
 
  On Tue, Feb 4, 2014 at 11:14 AM, Galder Zamarreño gal...@redhat.com
 wrote:
 
  On 04 Feb 2014, at 10:01, Stuart Douglas stuart.w.doug...@gmail.com
 wrote:
 
 
 
 
  On Tue, Feb 4, 2014 at 11:00 AM, Stuart Douglas 
 stuart.w.doug...@gmail.com wrote:
  Yes, there is nothing in the server code that modified the modules
 directory.
 
  Well, except for the new patching stuff, but that is not really
 relevant here.
 
  The testsuite AS/WF builds are built out of the distribution build,
 which shows the same problem. The distribution we build uses the scripts we
 got from AS [1].
 
  Do you see anything in there that could be causing this? We are using
 maven-antrun-plugin version 1.3, and take into account the lib.xml in [2].
 
  Finally, do you have any suggestions on changes we could make to these
 files to further debug the issue?
 
  Thanks a lot for your help!
 
  [1]
 https://github.com/infinispan/infinispan/blob/master/server/integration/build/build.xml
  [2]
 https://github.com/infinispan/infinispan/blob/master/server/integration/build/lib.xml
 
 
  Stuart
 
 
  Stuart
 
 
  On Tue, Feb 4, 2014 at 10:56 AM, Galder Zamarreño gal...@redhat.com
 wrote:
 
  On 04 Feb 2014, at 09:37, Stuart Douglas stuart.w.doug...@gmail.com
 wrote:
 
  This looks like an issue with your environment. The modules directory
 is static. Wildfly does not contain any code that messes with it. I would
 say the culprit is probably something in either your build process or your
 test suite.
 
  Correction, this is happening with AS 7.2.0.Final (Wildfly 8 used
 somewhere else). I guess your answer still applies?
 
  Cheers,
 
 
  Stuart
 
 
  On Tue, Feb 4, 2014 at 10:21 AM, Galder Zamarreño gal...@redhat.com
 wrote:
  Hi all,
 
  We're having issues with our Infinispan Server integration tests,
 which run within Wildfly 8.0.0.Beta1 (as I'm typing I'm wondering if we
 should just upgrade it to see if this goes away...?).
 
  Quite often some of the runs fail with error message [1].
 
  Having looked at the build environment when a run fails, you see this:
 
  --
  $ ls modules/system/layers/base/org/infinispan/server/rest/main
  drwxrwxr-x  2 g  staff68B Feb  3 18:41 classes (-- a directory??)
  -rw-r--r--  1 g  staff 1B Feb  3 18:41 classes.index
  -rw-r--r--  1 g  staff   2.1K Feb  3 18:41 module.xml
 
  $ ls
 modules/system/layers/base/org/infinispan/server/rest/main/classes
  drwxrwxr-x  2 g  staff68B Feb  3 18:41 .
  drwxrwxr-x  5 g  staff   170B Feb  3 18:41 ..
 
  $ more
 modules/system/layers/base/org/infinispan/server/rest/main/module.xml
  module 

<    1   2   3   4   5   6   7   >