Re: [infinispan-dev] [jboss-as7-dev] Module jars dissapearing leaving empty classes/ folders and errors
On Feb 4, 2014, at 12:50 PM, Dan Berindei dan.berin...@gmail.com wrote: On Tue, Feb 4, 2014 at 2:36 PM, Galder Zamarreño gal...@redhat.com wrote: Narrowing down the list now, since this is a problem of how our CI is doing builds. These logs are retrieved from [1]. Dunno how our CI is configured but this is odd. Seems like the build is halt due to test failures, but it continues somehow? I mean, the jars are not being produced properly, but the build is not halting. We run the build with -fn (fail-never), so the build should never be halted because of a test failure. The configuration is here:http://ci.infinispan.org/admin/editRunType.html?id=buildType:bt8runnerId=RUNNER_1 It’s about time we did the following: 1) Any test failures should halt the build there and then. IOW, do not continue the build at all. Will having 100 tests in one run and 2000 tests in another really help? 2) Any tests that fail randomly should be disabled. Doing this in past didn't seem to help: tests were disabled and never re-enabled again. IMO we should fight to get the suite green and then any intermittent failure should be considered a blocker and treated as the highest prio. Cheers, -- Mircea Markus Infinispan lead (www.infinispan.org) ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev
Re: [infinispan-dev] L1OnRehash Discussion
On Feb 4, 2014, at 11:04 AM, Dan Berindei dan.berin...@gmail.com wrote: On Tue, Feb 4, 2014 at 10:07 AM, Galder Zamarreño gal...@redhat.com wrote: On 28 Jan 2014, at 15:29, William Burns mudokon...@gmail.com wrote: Hello everyone, I wanted to discuss what I would say as dubious benefit of L1OnRehash especially compared to the benefits it provide. L1OnRehash is used to retain a value by moving a previously owned value into the L1 when a rehash occurs and this node no longer owns that value Also any current L1 values are removed when a rehash occurs. Therefore it can only save a single remote get for only a few keys when a rehash occurs. This by itself is fine however L1OnRehash has many edge cases to guarantee consistency as can be seen from https://issues.jboss.org/browse/ISPN-3838. This can get quite complicated for a feature that gives marginal performance increases (especially given that this value may never have been read recently - at least normal L1 usage guarantees this). My first suggestion is instead to deprecate the L1OnRehash configuration option and to remove this logic. +1 +1 from me as well +1 Cheers, -- Mircea Markus Infinispan lead (www.infinispan.org) ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev
Re: [infinispan-dev] Store as binary
On Feb 4, 2014, at 7:14 AM, Galder Zamarreño gal...@redhat.com wrote: On 21 Jan 2014, at 17:45, Mircea Markus mmar...@redhat.com wrote: On Jan 21, 2014, at 2:13 PM, Sanne Grinovero sa...@infinispan.org wrote: On 21 January 2014 13:37, Mircea Markus mmar...@redhat.com wrote: On Jan 21, 2014, at 1:21 PM, Galder Zamarreño gal...@redhat.com wrote: What's the point for these tests? +1 To validate if storing the data in binary format yields better performance than store is as a POJO. That will highly depend on the scenarios you want to test for. AFAIK this started after Paul described how session replication works in WildFly, and we already know that both strategies are suboptimal with the current options available: in his case the active node will always write on the POJO, while the backup node will essentially only need to store the buffer just in case he might need to take over. Indeed as it is today, it doesn't make sense for WildFly's session replication. Sure, one will be slower, but if you want to make a suggestion to him about which configuration he should be using, we should measure his use case, not a different one. Even then as discussed in Palma, an in memory String representation might be way more compact because of pooling of strings and a very high likelihood for repeated headers (as common in web frameworks), pooling like in String.intern()? Even so, if most of your access to the String is to serialize it and sent is remotely then you have a serialization cost(CPU) to pay for the reduced size. Serialization has a cost, but nothing compared with the transport itself, and you don’t have to go very far to see the impact of transport. Just recently we were chasing some performance regression and even though there were some changes in serialization, the impact of my improvements was minimal, max 2-3%. Optimal network and transport configuration is more important IMO, and once again, misconfiguration in that layer is what was causing us to be ~20% slower. yes, I din't expect huge improvements from storeAsBinary, but at least some improvement caused by the fact that lots of serialization should't happen in the tested scenario. 2-3% improvement wouldn't hurt, though :-) Cheers, -- Mircea Markus Infinispan lead (www.infinispan.org) ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev
Re: [infinispan-dev] L1OnRehash Discussion
On Tue, Feb 4, 2014 at 6:04 AM, Dan Berindei dan.berin...@gmail.com wrote: On Tue, Feb 4, 2014 at 10:07 AM, Galder Zamarreño gal...@redhat.com wrote: On 28 Jan 2014, at 15:29, William Burns mudokon...@gmail.com wrote: Hello everyone, I wanted to discuss what I would say as dubious benefit of L1OnRehash especially compared to the benefits it provide. L1OnRehash is used to retain a value by moving a previously owned value into the L1 when a rehash occurs and this node no longer owns that value Also any current L1 values are removed when a rehash occurs. Therefore it can only save a single remote get for only a few keys when a rehash occurs. This by itself is fine however L1OnRehash has many edge cases to guarantee consistency as can be seen from https://issues.jboss.org/browse/ISPN-3838. This can get quite complicated for a feature that gives marginal performance increases (especially given that this value may never have been read recently - at least normal L1 usage guarantees this). My first suggestion is instead to deprecate the L1OnRehash configuration option and to remove this logic. +1 +1 from me as well My second suggestion is a new implementation of L1OnRehash that is always enabled when L1 threshold is configured to 0. For those not familiar L1 threshold controls whether invalidations are broadcasted instead of individual messages. A value of 0 means to always broadcast. This would allow for some benefits that we can't currently do: 1. L1 values would never have to be invalidated on a rehash event (guarantee locality reads under rehash) 2. L1 requestors would not have to be tracked any longer However every write would be required to send an invalidation which could slow write performance in additional cases (since we currently only send invalidations when requestors are found). The difference would be lessened with udp, which is the transport I would assume someone would use when configuring L1 threshold to 0. Sounds good to me, but I think you could go even beyond this and maybe get rid of threshold configuration option too? If the transport is UDP and multicast is configured, invalidations are broadcasted (and apply the two benefits you mention). If UDP w/ unicast or TCP used, track invalidations and send them as unicasts. Do we really need to expose these configuration options to the user? I think the idea was that even with UDP, sending 2 unicasts and waiting for only 2 responses may be faster than sending a multicast and waiting for 10 responses. However, I'm not sure that's the case if we send 1 unicast invalidation from each owner instead of a single multicast invalidation from the primary owner/originator [1]. Maybe if each owner would return a list of requestors and the originator would do the invalidation at the end... I totally agree since we currently have to send invalidations from the primary owner and all backup owners to guarantee consistency if we have a response from the backup owner [2]. By moving to this route we only ever have to send a single multicast invalidation instead of N unicast invalidations. However this also brings up another change where we only L1 cache the primary owner response [3] :) Actually that would tilt the performance discussion the other way. Makes me think deprecating current L1OnRehash and adding primary owner L1 caching should be first and then reevaluate if the new L1OnRehash support is even needed. The originator firing the invalidations is interesting, but don't think it is feasible. With async transport this is not doable at all. Also if the originator goes down and the value is persisted we will have invalid L1 values cached still. The latter could be fixed with txs but non tx would still be broken. One tangible benefit of having the setting is that we can run the test suite with TCP only, and still cover every path in L1Manager. If removed it completely, it would still be possible to change the toggle in L1ManagerImpl via reflection, but it would be a little hacky. What do you guys think? I am thinking that no one minds the removal of L1OnRehash that we have currently (if so let me know). I am quite curious what others think about the changes for L1 threshold value of 0, maybe this configuration value is never used? Since we don't give any guidance as to what a good threshold value would be, I doubt many people use it. My alternative proposal would be to replace the invalidationThreshold=-1|0|0 setting with a traceRequestors=true|false setting. 1. If traceRequestors == false, don't keep track of requestors, only send the invalidation from the originator, and enable l1OnRehash. This means we can keep the entries that are in L1 after a rehash as well. 2. If traceRequestors == true, track requestors, send unicast/multicast invalidations depending on the transport, and disable l1OnRehash. I
Re: [infinispan-dev] reusing infinispan's marshalling
One way to do it is use a distributed cache with two different marshallers: JBMAR and protostream. Admittedly this won't measure only the serialisation performance, but include other stuff as well, such as network time (I guess you can remove this from the result though). This way we would get a better understanding on how the two marshaller affects performance of the system as a whole. Also if using radargun, you could get more info around how much CPU time is used by each scenario. On Jan 30, 2014, at 12:13 PM, Adrian Nistor anis...@redhat.com wrote: I've been pondering about re-using the marshalling machinery of Infinispan in another project, specifically in ProtoStream, where I'm planning to add it as a test scoped dependency so I can create a benchmark to compare marshalling performace. I'm basically interested in comparing ProtoStream and Infinispan's JBoss Marshalling based mechanism. Comparing against plain JBMAR, without using the ExternalizerTable and Externalizers introduced by Infinispan is not going to get me accurate results. Cheers, -- Mircea Markus Infinispan lead (www.infinispan.org) ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev
Re: [infinispan-dev] reusing infinispan's marshalling
On Feb 3, 2014, at 6:24 PM, Galder Zamarreño gal...@redhat.com wrote: Not sure I understand the need to compare this. JBMAR and ProtoStream are solving different problems. The former is focused on getting the best out of Java persistence. The latter is focused on serializing stuff in a plattform independent way. IMO, it’s not an apples to apples comparison. AFAIK the only thing JBMAR does and proto doesn't is tracking circular references: e.g. person has a reference to address which has a reference to the same person instance. That comes at a performance cost (I guess an IdentityMapLookup per serialized object), though and for many users tracking circular dependencies is not needed, because of their data model. My expectation is that ISPN+protostram will be faster than ISPN+JBMAR because: - protostream doesn't track circular references (AFAIK this is something that can be disabled in JBMAR as well) - protostream allows for partial deserialization, that is only deserialize a specific attribute of a class On top of that, it is platform independent, so if you start using it as the default serialization format, it will be easier for you to use ISPN from multiple platforms. The drawback protostream has over JBMAR is that it requires one to define, besides the serialized, a protofile. Las time we discussed, Adrian had some ideas on how that can be circumvented, though. IMO, in certain deployments makes sense to use protostream over JBMAR even when serializing only java objects and this benchmark would be a good tool to validate that. Cheers, -- Mircea Markus Infinispan lead (www.infinispan.org) ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev
Re: [infinispan-dev] New Cache Entry Notifications
On Feb 3, 2014, at 4:07 PM, Galder Zamarreño gal...@redhat.com wrote: On Jan 23, 2014, at 5:48 PM, William Burns mudokon...@gmail.com wrote: Hello all, I have been working with notifications and most recently I have come to look into events generated when a new entry is created. Now normally I would just expect a CacheEntryCreatedEvent to be raised. However we currently raise a CacheEntryModifiedEvent event and then a CacheEntryCreatedEvent. I notice that there are comments around the code saying that tests require both to be fired. it doesn't sound right to me: modified is different than created. I’ve lost count the number of times I’ve raised this up in the dev mailing list :| And, if CacheEntryModifiedEvent has a method called isCreated(), is cos I added it in order to differentiate between a modify and a create without breaking backwards compatibility and expectations of events to be received. Just need to trace back the jira issue number, and associated forum threads ;) :p Sorry for missing this till now :-) If it was raised that frequently through, must be because people are confused about it. As we're having a major release next, I think we should get this right, even if breaking backward compatibility, and document it as such in the migration guide. I am wondering if anyone has an objection to only raising a CacheEntryCreatedEvent on a new cache entry being created. It’d break expectations of existing applications that expect certain events. It’s a very difficult one to swallow. we're at a major now, so we should break compatibility if it makes sense. Plus, there’s JCache specifications, which adds listeners, and gets them right. Eventually everyone should move towards that. Not sure I understand: JCache raises both an created and a modified event when an entry is created? or just created events? Cheers, -- Mircea Markus Infinispan lead (www.infinispan.org) ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev
Re: [infinispan-dev] New Cache Entry Notifications
On 03 Feb 2014, at 17:29, William Burns mudokon...@gmail.com wrote: On Mon, Feb 3, 2014 at 11:07 AM, Galder Zamarreño gal...@redhat.com wrote: On 23 Jan 2014, at 18:54, Mircea Markus mmar...@redhat.com wrote: On Jan 23, 2014, at 5:48 PM, William Burns mudokon...@gmail.com wrote: Hello all, I have been working with notifications and most recently I have come to look into events generated when a new entry is created. Now normally I would just expect a CacheEntryCreatedEvent to be raised. However we currently raise a CacheEntryModifiedEvent event and then a CacheEntryCreatedEvent. I notice that there are comments around the code saying that tests require both to be fired. it doesn't sound right to me: modified is different than created. I've lost count the number of times I've raised this up in the dev mailing list :| And, if CacheEntryModifiedEvent has a method called isCreated(), is cos I added it in order to differentiate between a modify and a create without breaking backwards compatibility and expectations of events to be received. Just need to trace back the jira issue number, and associated forum threads ;) :p Ah nice I didn't even notice the method until you pointed it out. I am wondering if anyone has an objection to only raising a CacheEntryCreatedEvent on a new cache entry being created. It'd break expectations of existing applications that expect certain events. It's a very difficult one to swallow. I agree. Maybe I should change to if anyone minds if Cluster Listeners only raise the CacheEntryModifiedEvent on an entry creation for cluster listeners instead? This wouldn't break existing assumptions since we don't currently support Cluster Listeners. The only thing is it wouldn't be consistent with regular listeners… Yeah, it’s a tricky one. You don’t wanna raise both cos that’d be expensive to ship it around for no extra gain. If you are going to choose one that’d be CacheEntryModifiedEvent indeed. I think we can break off here for clustered listeners specifying it clearly. I don’t think there’s much point in creating a new set of listeners/event/annotations for the clustered option since eventually we should move towards JCache listeners and only have custom ones for the extra stuff we provide callbacks for. Plus, there's JCache specifications, which adds listeners, and gets them right. Eventually everyone should move towards that. Just to be clear you are saying the JCache only raises a single event for change and create right? Yeah, see JCacheListenerAdapter class. Does anyone know why we raise both currently? Legacy really. Was it just so the PutKeyValueCommand could more ignorantly just raise the CacheEntryModified pre Event? Any input would be appreciated, Thanks. Cheers, -- Mircea Markus Infinispan lead (www.infinispan.org) ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev -- Galder Zamarreño gal...@redhat.com twitter.com/galderz Project Lead, Escalante http://escalante.io Engineer, Infinispan http://infinispan.org ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev -- Galder Zamarreño gal...@redhat.com twitter.com/galderz Project Lead, Escalante http://escalante.io Engineer, Infinispan http://infinispan.org ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev
Re: [infinispan-dev] L1OnRehash Discussion
I'm all for simplification, assuming that this will deliver better reliability and easier maintenance, but let's not forget that some entries might be actually large. Saving a couple of transfers might be a pointless complexity for our usual small-key tests but maybe it's an interesting feature when you store gigabytes per value. Also, performance hiccups are not desirable even in small-key scenarios: an often read key should stay where it is rather than needing an occasional RPC. I haven't looked into the details of your problem, so if you think it's too complex I'm not against ditching this, I'm just trying to make sure we evaluate the full picture. I think you made a great point when specifying that the entry remaining in place might actually not get any hit - so being pointless - but that should be a decision the eviction strategy should be able to handle? Cheers, Sanne On 5 February 2014 13:19, William Burns mudokon...@gmail.com wrote: On Tue, Feb 4, 2014 at 6:04 AM, Dan Berindei dan.berin...@gmail.com wrote: On Tue, Feb 4, 2014 at 10:07 AM, Galder Zamarreño gal...@redhat.com wrote: On 28 Jan 2014, at 15:29, William Burns mudokon...@gmail.com wrote: Hello everyone, I wanted to discuss what I would say as dubious benefit of L1OnRehash especially compared to the benefits it provide. L1OnRehash is used to retain a value by moving a previously owned value into the L1 when a rehash occurs and this node no longer owns that value Also any current L1 values are removed when a rehash occurs. Therefore it can only save a single remote get for only a few keys when a rehash occurs. This by itself is fine however L1OnRehash has many edge cases to guarantee consistency as can be seen from https://issues.jboss.org/browse/ISPN-3838. This can get quite complicated for a feature that gives marginal performance increases (especially given that this value may never have been read recently - at least normal L1 usage guarantees this). My first suggestion is instead to deprecate the L1OnRehash configuration option and to remove this logic. +1 +1 from me as well My second suggestion is a new implementation of L1OnRehash that is always enabled when L1 threshold is configured to 0. For those not familiar L1 threshold controls whether invalidations are broadcasted instead of individual messages. A value of 0 means to always broadcast. This would allow for some benefits that we can't currently do: 1. L1 values would never have to be invalidated on a rehash event (guarantee locality reads under rehash) 2. L1 requestors would not have to be tracked any longer However every write would be required to send an invalidation which could slow write performance in additional cases (since we currently only send invalidations when requestors are found). The difference would be lessened with udp, which is the transport I would assume someone would use when configuring L1 threshold to 0. Sounds good to me, but I think you could go even beyond this and maybe get rid of threshold configuration option too? If the transport is UDP and multicast is configured, invalidations are broadcasted (and apply the two benefits you mention). If UDP w/ unicast or TCP used, track invalidations and send them as unicasts. Do we really need to expose these configuration options to the user? I think the idea was that even with UDP, sending 2 unicasts and waiting for only 2 responses may be faster than sending a multicast and waiting for 10 responses. However, I'm not sure that's the case if we send 1 unicast invalidation from each owner instead of a single multicast invalidation from the primary owner/originator [1]. Maybe if each owner would return a list of requestors and the originator would do the invalidation at the end... I totally agree since we currently have to send invalidations from the primary owner and all backup owners to guarantee consistency if we have a response from the backup owner [2]. By moving to this route we only ever have to send a single multicast invalidation instead of N unicast invalidations. However this also brings up another change where we only L1 cache the primary owner response [3] :) Actually that would tilt the performance discussion the other way. Makes me think deprecating current L1OnRehash and adding primary owner L1 caching should be first and then reevaluate if the new L1OnRehash support is even needed. The originator firing the invalidations is interesting, but don't think it is feasible. With async transport this is not doable at all. Also if the originator goes down and the value is persisted we will have invalid L1 values cached still. The latter could be fixed with txs but non tx would still be broken. One tangible benefit of having the setting is that we can run the test suite with TCP only, and still cover every path in L1Manager. If removed it completely, it would
Re: [infinispan-dev] New Cache Entry Notifications
On 05 Feb 2014, at 15:38, Mircea Markus mmar...@redhat.com wrote: On Feb 3, 2014, at 4:07 PM, Galder Zamarreño gal...@redhat.com wrote: On Jan 23, 2014, at 5:48 PM, William Burns mudokon...@gmail.com wrote: Hello all, I have been working with notifications and most recently I have come to look into events generated when a new entry is created. Now normally I would just expect a CacheEntryCreatedEvent to be raised. However we currently raise a CacheEntryModifiedEvent event and then a CacheEntryCreatedEvent. I notice that there are comments around the code saying that tests require both to be fired. it doesn't sound right to me: modified is different than created. I’ve lost count the number of times I’ve raised this up in the dev mailing list :| And, if CacheEntryModifiedEvent has a method called isCreated(), is cos I added it in order to differentiate between a modify and a create without breaking backwards compatibility and expectations of events to be received. Just need to trace back the jira issue number, and associated forum threads ;) :p Sorry for missing this till now :-) If it was raised that frequently through, must be because people are confused about it. As we're having a major release next, I think we should get this right, even if breaking backward compatibility, and document it as such in the migration guide. -1. As already mentioned, the reason why we’ve never this tackled this problem is cos of JCache, which gets listeners right in this area. JCache is about to go final and people should start moving towards that. Redoing our listeners would be a waste of time IMO. You’d be doing some work to fix something people should stop using in near-medium future. I am wondering if anyone has an objection to only raising a CacheEntryCreatedEvent on a new cache entry being created. It’d break expectations of existing applications that expect certain events. It’s a very difficult one to swallow. we're at a major now, so we should break compatibility if it makes sense. Plus, there’s JCache specifications, which adds listeners, and gets them right. Eventually everyone should move towards that. Not sure I understand: JCache raises both an created and a modified event when an entry is created? or just created events? Cheers, -- Mircea Markus Infinispan lead (www.infinispan.org) ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev -- Galder Zamarreño gal...@redhat.com twitter.com/galderz Project Lead, Escalante http://escalante.io Engineer, Infinispan http://infinispan.org ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev
Re: [infinispan-dev] New Cache Entry Notifications
On Feb 5, 2014, at 3:03 PM, Galder Zamarreño gal...@redhat.com wrote: On 05 Feb 2014, at 15:38, Mircea Markus mmar...@redhat.com wrote: On Feb 3, 2014, at 4:07 PM, Galder Zamarreño gal...@redhat.com wrote: On Jan 23, 2014, at 5:48 PM, William Burns mudokon...@gmail.com wrote: Hello all, I have been working with notifications and most recently I have come to look into events generated when a new entry is created. Now normally I would just expect a CacheEntryCreatedEvent to be raised. However we currently raise a CacheEntryModifiedEvent event and then a CacheEntryCreatedEvent. I notice that there are comments around the code saying that tests require both to be fired. it doesn't sound right to me: modified is different than created. I’ve lost count the number of times I’ve raised this up in the dev mailing list :| And, if CacheEntryModifiedEvent has a method called isCreated(), is cos I added it in order to differentiate between a modify and a create without breaking backwards compatibility and expectations of events to be received. Just need to trace back the jira issue number, and associated forum threads ;) :p Sorry for missing this till now :-) If it was raised that frequently through, must be because people are confused about it. As we're having a major release next, I think we should get this right, even if breaking backward compatibility, and document it as such in the migration guide. -1. As already mentioned, the reason why we’ve never this tackled this problem is cos of JCache, which gets listeners right in this area. JCache is about to go final and people should start moving towards that. Redoing our listeners would be a waste of time IMO. The effort here is minimum, pretty much adding an if statement. The good thing though is that you won't have to raise this on the mailing list again :-) You’d be doing some work to fix something people should stop using in near-medium future. Cheers, -- Mircea Markus Infinispan lead (www.infinispan.org) ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev
Re: [infinispan-dev] Design change in Infinispan Query
On Feb 3, 2014, at 9:32 AM, Emmanuel Bernard emman...@hibernate.org wrote: Sure searching for any cache is useful. What I was advocating is that if you search for more than one cache transparently, then you probably need to CRUD for more than one cache transparently as well. And this is not being discussed. Not sure what you mean by CRUD over multiple caches? ATM one can run a TX over multiple caches, but I think there's something else you have in mind :-) I have to admit that having to add a cache name to the stored elements of the index documents makes me a bit sad. sad because of the increased index size? I was already unhappy when I had to do it for class names. Renaming a cache will be a heavy operation too. Sanne, if we know that we don't share the semi index for different caches, can we avoid the need to store the cache name in each document? BTW, this discussion should be in the open. +1 On 31 janv. 2014, at 18:04, Adrian Nistor anis...@gmail.com wrote: I think it conceptually makes sense to have one entity type per cache but this should be a good practice rather than an enforced constraint. It would be a bit late and difficult to add such a constraint now. The design change we are talking about is being able to search across caches. That can easily be implemented regardless of this. We can move the SearchManager from Cache scope to CacheManager scope. Indexes are bound to types not to caches anyway, so same-type entities from multiple caches can end up in the same index, we just need to store an extra hidden field: the name of the originating cache. This move would also allow us to share some lucene/hsearch resources. We can easily continue to support Search.getSearchManager(cache) so old api usages continue to work. This would return a delegating/decorating SearchManager that creates queries that are automatically restricted to the scope of the given cache. Piece of cake? :) On Thu, Jan 30, 2014 at 9:56 PM, Mircea Markus mmar...@redhat.com wrote: curious to see your thoughts on this: it is a recurring topic and will affects the way we design things in future in a significant way. E.g. if we think (recommend) that a distinct cache should be used for each entity, then we'll need querying to work between caches. Also some cache stores can be built along these lines (e.g. for the JPA cache store we only need it to support a single entity type). Begin forwarded message: On Jan 30, 2014, at 9:42 AM, Galder Zamarreño gal...@redhat.com wrote: On Jan 21, 2014, at 11:52 PM, Mircea Markus mmar...@redhat.com wrote: On Jan 15, 2014, at 1:42 PM, Emmanuel Bernard emman...@hibernate.org wrote: By the way, people looking for that feature are also asking for a unified Cache API accessing these several caches right? Otherwise I am not fully understanding why they ask for a unified query. Do you have written detailed use cases somewhere for me to better understand what is really requested? IMO from a user perspective, being able to run queries spreading several caches makes the programming simplifies the programming model: each cache corresponding to a single entity type, with potentially different configuration. Not sure if it simplifies things TBH if the configuration is the same. IMO, it just adds clutter. Not sure I follow: having a cache that contains both Cars and Persons sound more cluttering to me. I think it's cumbersome to write any kind of querying with an heterogenous cache, e.g. Map/Reduce tasks that need to count all the green Cars would need to be aware of Persons and ignore them. Not only it is harder to write, but discourages code reuse and makes it hard to maintain (if you'll add Pets in the same cache in future you need to update the M/R code as well). And of course there are also different cache-based configuration options that are not immediately obvious (at design time) but will be in the future (there are more Persons than Cars, they live longer/expiry etc): mixing everything together in the same cache from the begging is a design decision that might bite you in the future. The way I see it - and very curious to see your opinion on this - following an database analogy, the CacheManager corresponds to an Database and the Cache to a Table. Hence my thought that queries spreading multiple caches are both useful and needed (same as query spreading over multiple tables). Cheers, -- Mircea Markus Infinispan lead (www.infinispan.org) Cheers, -- Mircea Markus Infinispan lead (www.infinispan.org) ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev
Re: [infinispan-dev] Design change in Infinispan Query
On Wed 2014-02-05 15:53, Mircea Markus wrote: On Feb 3, 2014, at 9:32 AM, Emmanuel Bernard emman...@hibernate.org wrote: Sure searching for any cache is useful. What I was advocating is that if you search for more than one cache transparently, then you probably need to CRUD for more than one cache transparently as well. And this is not being discussed. Not sure what you mean by CRUD over multiple caches? ATM one can run a TX over multiple caches, but I think there's something else you have in mind :-) //some unified query giving me entries pointing by fk copy to bar and //buz objects. So I need to manually load these references. //happy emmanuel Cache unifiedCache = cacheManager.getMotherOfAllCaches(); Bar bar = unifiedCache.get(foo); Buz buz = unifiedCache.get(baz); //not so happy emmanuel Cache fooCache = cacheManager.getCache(foo); Bar bar = fooCache.get(foo); Cache bazCache = cacheManager.getCache(baz); Buz buz = bazCache.put(baz); I have to admit that having to add a cache name to the stored elements of the index documents makes me a bit sad. sad because of the increased index size? It makes the index non natural and less reusable using direct Lucene APIs. But that might be less of a concern for Infinispan. I was already unhappy when I had to do it for class names. Renaming a cache will be a heavy operation too. Sanne, if we know that we don't share the semi index for different caches, can we avoid the need to store the cache name in each document? ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev
Re: [infinispan-dev] Design change in Infinispan Query
On 02/05/2014 05:30 PM, Emmanuel Bernard wrote: On Wed 2014-02-05 15:53, Mircea Markus wrote: On Feb 3, 2014, at 9:32 AM, Emmanuel Bernard emman...@hibernate.org wrote: Sure searching for any cache is useful. What I was advocating is that if you search for more than one cache transparently, then you probably need to CRUD for more than one cache transparently as well. And this is not being discussed. Not sure what you mean by CRUD over multiple caches? ATM one can run a TX over multiple caches, but I think there's something else you have in mind :-) //some unified query giving me entries pointing by fk copy to bar and //buz objects. So I need to manually load these references. //happy emmanuel Cache unifiedCache = cacheManager.getMotherOfAllCaches(); Bar bar = unifiedCache.get(foo); Buz buz = unifiedCache.get(baz); //not so happy emmanuel Cache fooCache = cacheManager.getCache(foo); Bar bar = fooCache.get(foo); Cache bazCache = cacheManager.getCache(baz); Buz buz = bazCache.put(baz); cacheManager.getCache(foo).put(xxx, yyy); cacheManager.getCache(foo).put(xxx, zzz); String xxx = cacheManager.getMotherOfAllCaches().get(xxx); System.out.println(xxx); What should it print? Should an exception be thrown? Or should get on mother of all caches return MapCacheString, String, String? Radim I have to admit that having to add a cache name to the stored elements of the index documents makes me a bit sad. sad because of the increased index size? It makes the index non natural and less reusable using direct Lucene APIs. But that might be less of a concern for Infinispan. I was already unhappy when I had to do it for class names. Renaming a cache will be a heavy operation too. Sanne, if we know that we don't share the semi index for different caches, can we avoid the need to store the cache name in each document? ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev -- Radim Vansa rva...@redhat.com JBoss DataGrid QA ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev
Re: [infinispan-dev] Design change in Infinispan Query
On Feb 5, 2014, at 4:30 PM, Emmanuel Bernard emman...@hibernate.org wrote: On Wed 2014-02-05 15:53, Mircea Markus wrote: On Feb 3, 2014, at 9:32 AM, Emmanuel Bernard emman...@hibernate.org wrote: Sure searching for any cache is useful. What I was advocating is that if you search for more than one cache transparently, then you probably need to CRUD for more than one cache transparently as well. And this is not being discussed. Not sure what you mean by CRUD over multiple caches? ATM one can run a TX over multiple caches, but I think there's something else you have in mind :-) //some unified query giving me entries pointing by fk copy to bar and //buz objects. So I need to manually load these references. //happy emmanuel Cache unifiedCache = cacheManager.getMotherOfAllCaches(); Bar bar = unifiedCache.get(foo); Buz buz = unifiedCache.get(baz); Can you please elaborate the advantages the mother of all caches would bring? :-) It but feels to me like querying a whole database by a primary key without mentioning the table name :-) Also might get nasty if multiple caches have the same key. //not so happy emmanuel Cache fooCache = cacheManager.getCache(foo); Bar bar = fooCache.get(foo); Cache bazCache = cacheManager.getCache(baz); Buz buz = bazCache.put(baz); Cheers, -- Mircea Markus Infinispan lead (www.infinispan.org) ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev
Re: [infinispan-dev] Design change in Infinispan Query
On Wed 2014-02-05 17:44, Radim Vansa wrote: On 02/05/2014 05:30 PM, Emmanuel Bernard wrote: On Wed 2014-02-05 15:53, Mircea Markus wrote: On Feb 3, 2014, at 9:32 AM, Emmanuel Bernard emman...@hibernate.org wrote: Sure searching for any cache is useful. What I was advocating is that if you search for more than one cache transparently, then you probably need to CRUD for more than one cache transparently as well. And this is not being discussed. Not sure what you mean by CRUD over multiple caches? ATM one can run a TX over multiple caches, but I think there's something else you have in mind :-) //some unified query giving me entries pointing by fk copy to bar and //buz objects. So I need to manually load these references. //happy emmanuel Cache unifiedCache = cacheManager.getMotherOfAllCaches(); Bar bar = unifiedCache.get(foo); Buz buz = unifiedCache.get(baz); //not so happy emmanuel Cache fooCache = cacheManager.getCache(foo); Bar bar = fooCache.get(foo); Cache bazCache = cacheManager.getCache(baz); Buz buz = bazCache.put(baz); cacheManager.getCache(foo).put(xxx, yyy); cacheManager.getCache(foo).put(xxx, zzz); String xxx = cacheManager.getMotherOfAllCaches().get(xxx); System.out.println(xxx); What should it print? Should an exception be thrown? Or should get on mother of all caches return MapCacheString, String, String? Yes I'm aware of that. What I am saying is that the idea of search across caches as appealing as it is is is not the whole story. People search, read, navigate and M/R their data in interleaved ways. You need to project and think about a 100-200 lines of code that would use that feature in combination with other related features to see if that will be useful in the end (or gimmicky) and if the user experience (API mostly in our case) will be good or make people kill themselves. The feeling I have is that we are too feature focused and not enough use case and experience focused. ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev
Re: [infinispan-dev] Design change in Infinispan Query
On Feb 5, 2014, at 1:34 PM, Emmanuel Bernard emman...@hibernate.org wrote: What I am saying is that the idea of search across caches as appealing as it is is is not the whole story. People search, read, navigate and M/R their data in interleaved ways. You need to project and think about a 100-200 lines of code that would use that feature in combination with other related features to see if that will be useful in the end (or gimmicky) and if the user experience (API mostly in our case) will be good or make people kill themselves. What is the plan for supporting joins across entity types? ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev
Re: [infinispan-dev] Design change in Infinispan Query
On Feb 5, 2014, at 7:34 PM, Emmanuel Bernard emman...@hibernate.org wrote: On Wed 2014-02-05 17:44, Radim Vansa wrote: On 02/05/2014 05:30 PM, Emmanuel Bernard wrote: On Wed 2014-02-05 15:53, Mircea Markus wrote: On Feb 3, 2014, at 9:32 AM, Emmanuel Bernard emman...@hibernate.org wrote: Sure searching for any cache is useful. What I was advocating is that if you search for more than one cache transparently, then you probably need to CRUD for more than one cache transparently as well. And this is not being discussed. Not sure what you mean by CRUD over multiple caches? ATM one can run a TX over multiple caches, but I think there's something else you have in mind :-) //some unified query giving me entries pointing by fk copy to bar and //buz objects. So I need to manually load these references. //happy emmanuel Cache unifiedCache = cacheManager.getMotherOfAllCaches(); Bar bar = unifiedCache.get(foo); Buz buz = unifiedCache.get(baz); //not so happy emmanuel Cache fooCache = cacheManager.getCache(foo); Bar bar = fooCache.get(foo); Cache bazCache = cacheManager.getCache(baz); Buz buz = bazCache.put(baz); cacheManager.getCache(foo).put(xxx, yyy); cacheManager.getCache(foo).put(xxx, zzz); String xxx = cacheManager.getMotherOfAllCaches().get(xxx); System.out.println(xxx); What should it print? Should an exception be thrown? Or should get on mother of all caches return MapCacheString, String, String? Yes I'm aware of that. What I am saying is that the idea of search across caches as appealing as it is is is not the whole story. People search, read, navigate and M/R their data in interleaved ways. In all the non-trivial deployments I saw people used multiple caches for different data, instead of one. That's why for me this came as the straight forward way of structuring data and naturally I thought that querying multiple caches makes sense in this context: to allow querying to run over a model that is already in use and not to change the model to accommodate querying. You need to project and think about a 100-200 lines of code that would use that feature in combination with other related features to see if that will be useful in the end (or gimmicky) and if the user experience (API mostly in our case) will be good or make people kill themselves. The feeling I have is that we are too feature focused and not enough use case and experience focused. Cheers, -- Mircea Markus Infinispan lead (www.infinispan.org) ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev