Re: [infinispan-dev] Separate ExecutorService for map/reduce tasks?
In 6.0 I would really like to go away from the current executor configuration (e.g. a specific element for every executor) and allow the creation of named executors (this is how the AS configuration works). Tristan On 11/27/2012 09:07 PM, Vladimir Blagojevic wrote: Hi, Although https://issues.jboss.org/browse/ISPN-2284 is charted for 6.0 I would like to see if there is a possibility to finish it for 5.2. Most of the parallel execution I have done already this and last week [1]. However, this change is not limited to map/reduce package only as we might possibly want to have a separate executor for map/reduce execution on each node. These changes affect global configuration and are not confined to map/reduce packages only. Or should we simply use transport executor for execution of these tasks for now and should the need arise introduce separate executor in the future releases? Regards, Vladimir [1] https://github.com/vblagoje/infinispan/tree/t_2284 ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev
Re: [infinispan-dev] Separate ExecutorService for map/reduce tasks?
On 5 Dec 2012, at 08:36, Tristan Tarrant wrote: In 6.0 I would really like to go away from the current executor configuration (e.g. a specific element for every executor) and allow the creation of named executors (this is how the AS configuration works). So that you can refer to the same executor from multiple places? Cheers, -- Mircea Markus Infinispan lead (www.infinispan.org) ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev
Re: [infinispan-dev] Put issues with newly joining node
On Dec 4, 2012, at 10:22 AM, Sanne Grinovero sa...@infinispan.org wrote: On 4 December 2012 09:14, Galder Zamarreño gal...@redhat.com wrote: Hey Dan/Adrian, Re: https://issues.jboss.org/browse/ISPN-2541 I'm looking at this intermittent failure, and it seems to be caused by the fact that the test does not wait for the cluster to be formed when the new node is started, which can lead a replication timeout failure from the new joining node. The test can easily be fixed by waiting for cluster to form, and then do the call. [...] I don't think the cache should ever be in an illegal state to be used after being started. So Infinispan should not require tests to wait for a cluster to be formed, I'd rather guarantee that after a cache is started it's usable. Precisely, which is why I raised the flag instead of going down the easy path. If this is not possible, then any application would also need to wait for that cluster formed event, and we should expose an API for that. The problem is considering when a cluster is formed. How many nodes should you wait for? There's already plans for something similar: https://issues.jboss.org/browse/ISPN-928 I'd prefer the getCache() to block for long enough. Sanne ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev -- Galder Zamarreño gal...@redhat.com twitter.com/galderz Project Lead, Escalante http://escalante.io Engineer, Infinispan http://infinispan.org ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev
Re: [infinispan-dev] Put issues with newly joining node
On Dec 4, 2012, at 11:52 AM, Bela Ban b...@redhat.com wrote: On 12/4/12 11:30 AM, Dan Berindei wrote: BTW, I also got an exception yesterday in MarshallExternalPojosTest and I investigated it, but in my case the error was much weirder: two nodes both opened a TCP connection to each other, yet none of them received the forwarded command. I've asked Bela to investigate as well, but he didn't find anything suspicious in JGroups. If a node A connects to B and B connects to A at the exact same time (and there wasn't any existing connection between the 2 nodes, then one of the 2 will 'win' and the other one will close its connection. The message to be sent is then lost. This is corrected by one of the upper layers, e.g. UNICAST retransmits the message until it gets an ack. Re-sending a message will then create a new connection, if the existing one was closed / removed. However, with UNICAST2, if a given message was the last message and no further messages are sent, then only UNICAST2's stability messages will detect that the other node is missing the last message sent. Stability is triggered every 60 seconds by default, so unless that property was changed, or stability was triggered programmatically, that last (lost) message won't get retransmitted for 60 seconds. ^ Isn't that a default too high? Seems to me the scenario explained could happen relatively easily if two nodes are started simultaneously. We no longer ask users to stagger their startups? -- Bela Ban, JGroups lead (http://www.jgroups.org) ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev -- Galder Zamarreño gal...@redhat.com twitter.com/galderz Project Lead, Escalante http://escalante.io Engineer, Infinispan http://infinispan.org ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev
Re: [infinispan-dev] Put issues with newly joining node
On 12/5/12 12:06 PM, Galder Zamarreño wrote: On Dec 4, 2012, at 11:52 AM, Bela Ban b...@redhat.com wrote: If a node A connects to B and B connects to A at the exact same time (and there wasn't any existing connection between the 2 nodes, then one of the 2 will 'win' and the other one will close its connection. The message to be sent is then lost. This is corrected by one of the upper layers, e.g. UNICAST retransmits the message until it gets an ack. Re-sending a message will then create a new connection, if the existing one was closed / removed. However, with UNICAST2, if a given message was the last message and no further messages are sent, then only UNICAST2's stability messages will detect that the other node is missing the last message sent. Stability is triggered every 60 seconds by default, so unless that property was changed, or stability was triggered programmatically, that last (lost) message won't get retransmitted for 60 seconds. ^ Isn't that a default too high? Assuming that the size-based stable messages are the norm, then time-based only kicks in as a second line of defense. With https://issues.jboss.org/browse/JGRP-1548 in place, this becomes even less important, so I want to leave it high as it does generate some traffic when set too small. Seems to me the scenario explained could happen relatively easily if two nodes are started simultaneously. No, the startup won't trigger concurrent connections, as only the joiner connects to the coordinator and the coordinator reuses the same connection to send the JOIN-RSP back. It is the rebalancing process that triggers this in certain cases; the forwarding of state transfer requests to different owners. We no longer ask users to stagger their startups? Because concurrent startup works, and not having to stagger startup is simpler. -- Bela Ban, JGroups lead (http://www.jgroups.org) ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev
Re: [infinispan-dev] Put issues with newly joining node
On Dec 4, 2012, at 11:30 AM, Dan Berindei dan.berin...@gmail.com wrote: On Tue, Dec 4, 2012 at 11:32 AM, Mircea Markus mmar...@redhat.com wrote: On 4 Dec 2012, at 09:22, Sanne Grinovero wrote: [...] I don't think the cache should ever be in an illegal state to be used after being started. So Infinispan should not require tests to wait for a cluster to be formed, I'd rather guarantee that after a cache is started it's usable. +1. Unless the test relies/verifies internal state, e.g. locks being acquired, data present in the data container etc. It's not just a question of what you want to check, it's also a question of what you don't want to check... I think in general a test should focus on a specific issue, and we know state transfer is always a potential source of (unrelated) failures. So I'd rather have tests that do test state transfer and command forwarding, and tests that avoid state transfer and command forwarding (by waiting for the cluster to form completely). I'm pretty sure this is another instance of ISPN-2473, and once we have a fix (and a unit test) for this particular failure, MarshallExternalPojosTest could very well wait for the cluster to form and ignore any state transfer-related issues. BTW, I also got an exception yesterday in MarshallExternalPojosTest and I investigated it, but in my case the error was much weirder: two nodes both opened a TCP connection to each other, yet none of them received the forwarded command. I've asked Bela to investigate as well, but he didn't find anything suspicious in JGroups. Ok, wrt ISPN-2541, I suggest holding off until all other known issues have been solved and see if the issue keeps appearing. It seems to be a good test for catching these issues (indirectly), so it could be useful to verify :) ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev -- Galder Zamarreño gal...@redhat.com twitter.com/galderz Project Lead, Escalante http://escalante.io Engineer, Infinispan http://infinispan.org ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev
Re: [infinispan-dev] Put issues with newly joining node
On 5 December 2012 11:02, Galder Zamarreño gal...@redhat.com wrote: On Dec 4, 2012, at 10:22 AM, Sanne Grinovero sa...@infinispan.org wrote: On 4 December 2012 09:14, Galder Zamarreño gal...@redhat.com wrote: Hey Dan/Adrian, Re: https://issues.jboss.org/browse/ISPN-2541 I'm looking at this intermittent failure, and it seems to be caused by the fact that the test does not wait for the cluster to be formed when the new node is started, which can lead a replication timeout failure from the new joining node. The test can easily be fixed by waiting for cluster to form, and then do the call. [...] I don't think the cache should ever be in an illegal state to be used after being started. So Infinispan should not require tests to wait for a cluster to be formed, I'd rather guarantee that after a cache is started it's usable. Precisely, which is why I raised the flag instead of going down the easy path. If this is not possible, then any application would also need to wait for that cluster formed event, and we should expose an API for that. The problem is considering when a cluster is formed. How many nodes should you wait for? Why can't we rely on JGroups Discovery to know that, as a user I already specified the expected initial group size with num_initial_members Don't want to repeat that configuration ;-) Sanne ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev
Re: [infinispan-dev] Put issues with newly joining node
On Dec 5, 2012, at 1:23 PM, Sanne Grinovero sa...@infinispan.org wrote: On 5 December 2012 11:02, Galder Zamarreño gal...@redhat.com wrote: On Dec 4, 2012, at 10:22 AM, Sanne Grinovero sa...@infinispan.org wrote: On 4 December 2012 09:14, Galder Zamarreño gal...@redhat.com wrote: Hey Dan/Adrian, Re: https://issues.jboss.org/browse/ISPN-2541 I'm looking at this intermittent failure, and it seems to be caused by the fact that the test does not wait for the cluster to be formed when the new node is started, which can lead a replication timeout failure from the new joining node. The test can easily be fixed by waiting for cluster to form, and then do the call. [...] I don't think the cache should ever be in an illegal state to be used after being started. So Infinispan should not require tests to wait for a cluster to be formed, I'd rather guarantee that after a cache is started it's usable. Precisely, which is why I raised the flag instead of going down the easy path. If this is not possible, then any application would also need to wait for that cluster formed event, and we should expose an API for that. The problem is considering when a cluster is formed. How many nodes should you wait for? Why can't we rely on JGroups Discovery to know that, as a user I already specified the expected initial group size with num_initial_members Don't want to repeat that configuration ;-) The num initial members is simply used to decide who's the coordinator and has no relationship with the number of nodes that are in the cluster. I don't think it's the same thing, but could be reused... Sanne ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev -- Galder Zamarreño gal...@redhat.com twitter.com/galderz Project Lead, Escalante http://escalante.io Engineer, Infinispan http://infinispan.org ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev
Re: [infinispan-dev] Put issues with newly joining node
On 12/5/12 1:23 PM, Sanne Grinovero wrote: On 5 December 2012 11:02, Galder Zamarreño gal...@redhat.com wrote: On Dec 4, 2012, at 10:22 AM, Sanne Grinovero sa...@infinispan.org wrote: On 4 December 2012 09:14, Galder Zamarreño gal...@redhat.com wrote: Hey Dan/Adrian, Re: https://issues.jboss.org/browse/ISPN-2541 I'm looking at this intermittent failure, and it seems to be caused by the fact that the test does not wait for the cluster to be formed when the new node is started, which can lead a replication timeout failure from the new joining node. The test can easily be fixed by waiting for cluster to form, and then do the call. [...] I don't think the cache should ever be in an illegal state to be used after being started. So Infinispan should not require tests to wait for a cluster to be formed, I'd rather guarantee that after a cache is started it's usable. Precisely, which is why I raised the flag instead of going down the easy path. If this is not possible, then any application would also need to wait for that cluster formed event, and we should expose an API for that. The problem is considering when a cluster is formed. How many nodes should you wait for? Why can't we rely on JGroups Discovery to know that, as a user I already specified the expected initial group size with num_initial_members Don't want to repeat that configuration ;-) I don't understand this discussion: when a new node join, it'll return from JChannel.connect() when it received a JOIN response from the coordinator, with the current view... or are you guys talking about Infinispan's 'service views' ? -- Bela Ban, JGroups lead (http://www.jgroups.org) ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev
Re: [infinispan-dev] Put issues with newly joining node
On 5 December 2012 14:01, Bela Ban b...@redhat.com wrote: On 12/5/12 1:23 PM, Sanne Grinovero wrote: On 5 December 2012 11:02, Galder Zamarreño gal...@redhat.com wrote: On Dec 4, 2012, at 10:22 AM, Sanne Grinovero sa...@infinispan.org wrote: On 4 December 2012 09:14, Galder Zamarreño gal...@redhat.com wrote: Hey Dan/Adrian, Re: https://issues.jboss.org/browse/ISPN-2541 I'm looking at this intermittent failure, and it seems to be caused by the fact that the test does not wait for the cluster to be formed when the new node is started, which can lead a replication timeout failure from the new joining node. The test can easily be fixed by waiting for cluster to form, and then do the call. [...] I don't think the cache should ever be in an illegal state to be used after being started. So Infinispan should not require tests to wait for a cluster to be formed, I'd rather guarantee that after a cache is started it's usable. Precisely, which is why I raised the flag instead of going down the easy path. If this is not possible, then any application would also need to wait for that cluster formed event, and we should expose an API for that. The problem is considering when a cluster is formed. How many nodes should you wait for? Why can't we rely on JGroups Discovery to know that, as a user I already specified the expected initial group size with num_initial_members Don't want to repeat that configuration ;-) I don't understand this discussion: when a new node join, it'll return from JChannel.connect() when it received a JOIN response from the coordinator, with the current view... or are you guys talking about Infinispan's 'service views' ? +1 That's why I'm confused too, and not understanding how it is possible that a Cache is returned to the application - which doesn't have a clue about number of expected nodes - in a state for which the cluster is not formed yet. That should never happen!? I never understood why the test framework in Infinispan requires this to happen in all tests - even in the cases listed by Mircea that the testsuite is looking for something very specific, I would expect the wait to be unnecessary. (or more precisely, to have been blocked already for long enough) ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev
Re: [infinispan-dev] Separate ExecutorService for map/reduce tasks?
On 12-12-05 5:07 AM, Mircea Markus wrote: On 5 Dec 2012, at 08:36, Tristan Tarrant wrote: In 6.0 I would really like to go away from the current executor configuration (e.g. a specific element for every executor) and allow the creation of named executors (this is how the AS configuration works). So that you can refer to the same executor from multiple places? Yeah, Tristan, can you elaborate a bit more, I am now curious too! ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev
Re: [infinispan-dev] Put issues with newly joining node
On Wed, Dec 5, 2012 at 4:20 PM, Sanne Grinovero sa...@infinispan.orgwrote: On 5 December 2012 14:01, Bela Ban b...@redhat.com wrote: On 12/5/12 1:23 PM, Sanne Grinovero wrote: On 5 December 2012 11:02, Galder Zamarreño gal...@redhat.com wrote: On Dec 4, 2012, at 10:22 AM, Sanne Grinovero sa...@infinispan.org wrote: On 4 December 2012 09:14, Galder Zamarreño gal...@redhat.com wrote: Hey Dan/Adrian, Re: https://issues.jboss.org/browse/ISPN-2541 I'm looking at this intermittent failure, and it seems to be caused by the fact that the test does not wait for the cluster to be formed when the new node is started, which can lead a replication timeout failure from the new joining node. The test can easily be fixed by waiting for cluster to form, and then do the call. [...] I don't think the cache should ever be in an illegal state to be used after being started. So Infinispan should not require tests to wait for a cluster to be formed, I'd rather guarantee that after a cache is started it's usable. Precisely, which is why I raised the flag instead of going down the easy path. If this is not possible, then any application would also need to wait for that cluster formed event, and we should expose an API for that. The problem is considering when a cluster is formed. How many nodes should you wait for? Why can't we rely on JGroups Discovery to know that, as a user I already specified the expected initial group size with num_initial_members Don't want to repeat that configuration ;-) I don't understand this discussion: when a new node join, it'll return from JChannel.connect() when it received a JOIN response from the coordinator, with the current view... or are you guys talking about Infinispan's 'service views' ? +1 That's why I'm confused too, and not understanding how it is possible that a Cache is returned to the application - which doesn't have a clue about number of expected nodes - in a state for which the cluster is not formed yet. That should never happen!? It's simple: getCache() returns once the joiner has received ownership of some segments (in distributed mode) and once it received all the data it owner (dist and repl). This does not guarantee that the other nodes see the joiner as a full member at the time getCache() has returned. This doesn't mean that the cache is not functional, on the contrary we could return even before the joiner had received the data and the cache would still work. But because some nodes think state transfer is still in progress, the tests do run into state transfer corner cases that aren't handled properly (they're getting rarer, but we still have them). I never understood why the test framework in Infinispan requires this to happen in all tests - even in the cases listed by Mircea that the testsuite is looking for something very specific, I would expect the wait to be unnecessary. (or more precisely, to have been blocked already for long enough) getCache() only waits enough for the cache to work, it doesn't wait (and I don't think it should wait) for all the other nodes to acknowledge the joiner as a full member (i.e. in the read consistent hash). Because of this, assertions made on nodes other than the joiner can fail (in addition to the aforementioned corner cases in state transfer). It's also possible (and it was quite likely with older JGroups versions) that a joiner would actually form a new cluster by itself instead of joining the existing nodes in a single cluster. When that happens, getCache() definitely returns without the cluster being formed, and we have to wait for the separate clusters to find each other and merge before running our test. Cheers Dan ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev
Re: [infinispan-dev] Put issues with newly joining node
So to make sure I understood that, this has no visible impact on the functionality of API methods, correct? Like any get operation would successfully retrieve a remote entry if one exists somewhere? On 5 December 2012 15:42, Dan Berindei dan.berin...@gmail.com wrote: On Wed, Dec 5, 2012 at 4:20 PM, Sanne Grinovero sa...@infinispan.org wrote: On 5 December 2012 14:01, Bela Ban b...@redhat.com wrote: On 12/5/12 1:23 PM, Sanne Grinovero wrote: On 5 December 2012 11:02, Galder Zamarreño gal...@redhat.com wrote: On Dec 4, 2012, at 10:22 AM, Sanne Grinovero sa...@infinispan.org wrote: On 4 December 2012 09:14, Galder Zamarreño gal...@redhat.com wrote: Hey Dan/Adrian, Re: https://issues.jboss.org/browse/ISPN-2541 I'm looking at this intermittent failure, and it seems to be caused by the fact that the test does not wait for the cluster to be formed when the new node is started, which can lead a replication timeout failure from the new joining node. The test can easily be fixed by waiting for cluster to form, and then do the call. [...] I don't think the cache should ever be in an illegal state to be used after being started. So Infinispan should not require tests to wait for a cluster to be formed, I'd rather guarantee that after a cache is started it's usable. Precisely, which is why I raised the flag instead of going down the easy path. If this is not possible, then any application would also need to wait for that cluster formed event, and we should expose an API for that. The problem is considering when a cluster is formed. How many nodes should you wait for? Why can't we rely on JGroups Discovery to know that, as a user I already specified the expected initial group size with num_initial_members Don't want to repeat that configuration ;-) I don't understand this discussion: when a new node join, it'll return from JChannel.connect() when it received a JOIN response from the coordinator, with the current view... or are you guys talking about Infinispan's 'service views' ? +1 That's why I'm confused too, and not understanding how it is possible that a Cache is returned to the application - which doesn't have a clue about number of expected nodes - in a state for which the cluster is not formed yet. That should never happen!? It's simple: getCache() returns once the joiner has received ownership of some segments (in distributed mode) and once it received all the data it owner (dist and repl). This does not guarantee that the other nodes see the joiner as a full member at the time getCache() has returned. This doesn't mean that the cache is not functional, on the contrary we could return even before the joiner had received the data and the cache would still work. But because some nodes think state transfer is still in progress, the tests do run into state transfer corner cases that aren't handled properly (they're getting rarer, but we still have them). I never understood why the test framework in Infinispan requires this to happen in all tests - even in the cases listed by Mircea that the testsuite is looking for something very specific, I would expect the wait to be unnecessary. (or more precisely, to have been blocked already for long enough) getCache() only waits enough for the cache to work, it doesn't wait (and I don't think it should wait) for all the other nodes to acknowledge the joiner as a full member (i.e. in the read consistent hash). Because of this, assertions made on nodes other than the joiner can fail (in addition to the aforementioned corner cases in state transfer). It's also possible (and it was quite likely with older JGroups versions) that a joiner would actually form a new cluster by itself instead of joining the existing nodes in a single cluster. When that happens, getCache() definitely returns without the cluster being formed, and we have to wait for the separate clusters to find each other and merge before running our test. Cheers Dan ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev
Re: [infinispan-dev] Separate ExecutorService for map/reduce tasks?
Yes. Ideally I would like to have: GlobalConfigurationBuilder global = new GlobalConfigurationBuilder(); global .addExecutor().name(blah); .addScheduledExecutor().name(sched); Configuration config = new Configuration(); config .clustering().async().replQueueExecutor(blah) .eviction().executor(sched); Don't take the above as a proposed API, it's just to make things clearer. Tristan On 12/05/2012 03:55 PM, Vladimir Blagojevic wrote: On 12-12-05 5:07 AM, Mircea Markus wrote: On 5 Dec 2012, at 08:36, Tristan Tarrant wrote: In 6.0 I would really like to go away from the current executor configuration (e.g. a specific element for every executor) and allow the creation of named executors (this is how the AS configuration works). So that you can refer to the same executor from multiple places? Yeah, Tristan, can you elaborate a bit more, I am now curious too! ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev
Re: [infinispan-dev] Put issues with newly joining node
Yes, no visible impact. On Wed, Dec 5, 2012 at 5:46 PM, Sanne Grinovero sa...@infinispan.orgwrote: So to make sure I understood that, this has no visible impact on the functionality of API methods, correct? Like any get operation would successfully retrieve a remote entry if one exists somewhere? On 5 December 2012 15:42, Dan Berindei dan.berin...@gmail.com wrote: On Wed, Dec 5, 2012 at 4:20 PM, Sanne Grinovero sa...@infinispan.org wrote: On 5 December 2012 14:01, Bela Ban b...@redhat.com wrote: On 12/5/12 1:23 PM, Sanne Grinovero wrote: On 5 December 2012 11:02, Galder Zamarreño gal...@redhat.com wrote: On Dec 4, 2012, at 10:22 AM, Sanne Grinovero sa...@infinispan.org wrote: On 4 December 2012 09:14, Galder Zamarreño gal...@redhat.com wrote: Hey Dan/Adrian, Re: https://issues.jboss.org/browse/ISPN-2541 I'm looking at this intermittent failure, and it seems to be caused by the fact that the test does not wait for the cluster to be formed when the new node is started, which can lead a replication timeout failure from the new joining node. The test can easily be fixed by waiting for cluster to form, and then do the call. [...] I don't think the cache should ever be in an illegal state to be used after being started. So Infinispan should not require tests to wait for a cluster to be formed, I'd rather guarantee that after a cache is started it's usable. Precisely, which is why I raised the flag instead of going down the easy path. If this is not possible, then any application would also need to wait for that cluster formed event, and we should expose an API for that. The problem is considering when a cluster is formed. How many nodes should you wait for? Why can't we rely on JGroups Discovery to know that, as a user I already specified the expected initial group size with num_initial_members Don't want to repeat that configuration ;-) I don't understand this discussion: when a new node join, it'll return from JChannel.connect() when it received a JOIN response from the coordinator, with the current view... or are you guys talking about Infinispan's 'service views' ? +1 That's why I'm confused too, and not understanding how it is possible that a Cache is returned to the application - which doesn't have a clue about number of expected nodes - in a state for which the cluster is not formed yet. That should never happen!? It's simple: getCache() returns once the joiner has received ownership of some segments (in distributed mode) and once it received all the data it owner (dist and repl). This does not guarantee that the other nodes see the joiner as a full member at the time getCache() has returned. This doesn't mean that the cache is not functional, on the contrary we could return even before the joiner had received the data and the cache would still work. But because some nodes think state transfer is still in progress, the tests do run into state transfer corner cases that aren't handled properly (they're getting rarer, but we still have them). I never understood why the test framework in Infinispan requires this to happen in all tests - even in the cases listed by Mircea that the testsuite is looking for something very specific, I would expect the wait to be unnecessary. (or more precisely, to have been blocked already for long enough) getCache() only waits enough for the cache to work, it doesn't wait (and I don't think it should wait) for all the other nodes to acknowledge the joiner as a full member (i.e. in the read consistent hash). Because of this, assertions made on nodes other than the joiner can fail (in addition to the aforementioned corner cases in state transfer). It's also possible (and it was quite likely with older JGroups versions) that a joiner would actually form a new cluster by itself instead of joining the existing nodes in a single cluster. When that happens, getCache() definitely returns without the cluster being formed, and we have to wait for the separate clusters to find each other and merge before running our test. Cheers Dan ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev
[infinispan-dev] 5.2.0.Beta6 schedule
Hi, Beta6 will be cut on 13 Dec. Here's the list[1] of bugs scheduled: http://goo.gl/ILjeM Also just a heads up, CR1 is scheduled for 21 Dec. Cheers, -- Mircea Markus Infinispan lead (www.infinispan.org) ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev
Re: [infinispan-dev] Separate ExecutorService for map/reduce tasks?
On 5 Dec 2012, at 15:53, Tristan Tarrant wrote: GlobalConfigurationBuilder global = new GlobalConfigurationBuilder(); global .addExecutor().name(blah); .addScheduledExecutor().name(sched); Configuration config = new Configuration(); config .clustering().async().replQueueExecutor(blah) .eviction().executor(sched); +1 Cheers, -- Mircea Markus Infinispan lead (www.infinispan.org) ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev