Re: [infinispan-dev] default value for virtualNodes
Manik, I'm assigning ISPN-1801 to myself - I need to add my key distribution test and the results anyway. Cheers Dan On Mon, Jan 30, 2012 at 1:17 PM, Manik Surtani ma...@jboss.org wrote: On 29 Jan 2012, at 14:57, Galder Zamarreño wrote: To reiterate what I said in another thread, the memory effects of virtual nodes on on Hot Rod clients is none since version 1.1 of the protocol (included in 5.1). I enhanced the protocol so that clients would generate virtual node hashes and so avoid sending them over the wire. Yes, I remember. Perfect. :) So for https://issues.jboss.org/browse/ISPN-1801 I'll set the default to 48. Any objections? Cheers Manik -- Manik Surtani ma...@jboss.org twitter.com/maniksurtani Lead, Infinispan http://www.infinispan.org ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev
Re: [infinispan-dev] default value for virtualNodes
To reiterate what I said in another thread, the memory effects of virtual nodes on on Hot Rod clients is none since version 1.1 of the protocol (included in 5.1). I enhanced the protocol so that clients would generate virtual node hashes and so avoid sending them over the wire. Cheers, On Jan 27, 2012, at 4:19 PM, Manik Surtani wrote: Good stuff! Thanks for this. Yes, I'm ok with numVirtualNodes=48 as a default. Galder, your thoughts from a Hot Rod perspective? On 27 Jan 2012, at 08:41, Dan Berindei wrote: Hi guys I've been working on a test to search for an optimal default value here: https://github.com/danberindei/infinispan/commit/983c0328dc40be9609fcabb767dd46f9b98af464 I'm measuring both the number of keys for which a node is primary owner and the number of keys for which it is one of the owners compared to the ideal distribution (K/N keys on each node). The former tells us how much more work the node could be expected to do, the latter how much memory the node is likely to need. I'm only running 1 loops, so the max figure is not the absolute maximum. But it's certainly bigger than the 0. percentile. The full results are here: http://fpaste.org/cI1r/ The uniformity of the distribution goes up with the number of virtual nodes but down with the number of physical nodes. I think we should go with a default of 48 nodes (or 50 if you prefer decimal). With 32 nodes, there's only a 0.1% chance that a node will hold more than 1.35 * K/N keys, and a 0.1% chance that the node will be primary owner for more than 1.5 * K/N keys. We could go higher, but we run against the risk of node addresses colliding on the hash wheel. According to the formula on the Birthday Paradox page (http://en.wikipedia.org/wiki/Birthday_problem), we only need 2072 addresses on our 2^31 hash wheel to get a 0.1% chance of collision. That means 21 nodes * 96 virtual nodes, 32 nodes * 64 virtual nodes or 43 nodes * 48 virtual nodes. Cheers Dan On Fri, Jan 27, 2012 at 12:37 AM, Sanne Grinovero sa...@infinispan.org wrote: On 26 January 2012 22:29, Manik Surtani ma...@jboss.org wrote: On 26 Jan 2012, at 20:16, Sanne Grinovero wrote: +1 Which default? 100? A prime? We should also make sure the CH function is optimized for this being on. Yes, we should profile a session with vnodes enabled. Manik, we're using VNodes in our performance tests. The proposal is if we can provide a good default value, as the feature is currently disabled by default. Cheers, Sanne -- Manik Surtani ma...@jboss.org twitter.com/maniksurtani Lead, Infinispan http://www.infinispan.org ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev -- Manik Surtani ma...@jboss.org twitter.com/maniksurtani Lead, Infinispan http://www.infinispan.org ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev -- Galder Zamarreño Sr. Software Engineer Infinispan, JBoss Cache ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev
Re: [infinispan-dev] default value for virtualNodes
Good stuff! Thanks for this. Yes, I'm ok with numVirtualNodes=48 as a default. Galder, your thoughts from a Hot Rod perspective? On 27 Jan 2012, at 08:41, Dan Berindei wrote: Hi guys I've been working on a test to search for an optimal default value here: https://github.com/danberindei/infinispan/commit/983c0328dc40be9609fcabb767dd46f9b98af464 I'm measuring both the number of keys for which a node is primary owner and the number of keys for which it is one of the owners compared to the ideal distribution (K/N keys on each node). The former tells us how much more work the node could be expected to do, the latter how much memory the node is likely to need. I'm only running 1 loops, so the max figure is not the absolute maximum. But it's certainly bigger than the 0. percentile. The full results are here: http://fpaste.org/cI1r/ The uniformity of the distribution goes up with the number of virtual nodes but down with the number of physical nodes. I think we should go with a default of 48 nodes (or 50 if you prefer decimal). With 32 nodes, there's only a 0.1% chance that a node will hold more than 1.35 * K/N keys, and a 0.1% chance that the node will be primary owner for more than 1.5 * K/N keys. We could go higher, but we run against the risk of node addresses colliding on the hash wheel. According to the formula on the Birthday Paradox page (http://en.wikipedia.org/wiki/Birthday_problem), we only need 2072 addresses on our 2^31 hash wheel to get a 0.1% chance of collision. That means 21 nodes * 96 virtual nodes, 32 nodes * 64 virtual nodes or 43 nodes * 48 virtual nodes. Cheers Dan On Fri, Jan 27, 2012 at 12:37 AM, Sanne Grinovero sa...@infinispan.org wrote: On 26 January 2012 22:29, Manik Surtani ma...@jboss.org wrote: On 26 Jan 2012, at 20:16, Sanne Grinovero wrote: +1 Which default? 100? A prime? We should also make sure the CH function is optimized for this being on. Yes, we should profile a session with vnodes enabled. Manik, we're using VNodes in our performance tests. The proposal is if we can provide a good default value, as the feature is currently disabled by default. Cheers, Sanne -- Manik Surtani ma...@jboss.org twitter.com/maniksurtani Lead, Infinispan http://www.infinispan.org ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev -- Manik Surtani ma...@jboss.org twitter.com/maniksurtani Lead, Infinispan http://www.infinispan.org ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev
Re: [infinispan-dev] default value for virtualNodes
I assume the number of vnodes cannot be changed at runtime, dynamically adapting to a changing environment ? I understand everybody has to have the exact same number of vnodes for reads and writes to hit the correct node, right ? On 1/27/12 9:41 AM, Dan Berindei wrote: Hi guys I've been working on a test to search for an optimal default value here: https://github.com/danberindei/infinispan/commit/983c0328dc40be9609fcabb767dd46f9b98af464 I'm measuring both the number of keys for which a node is primary owner and the number of keys for which it is one of the owners compared to the ideal distribution (K/N keys on each node). The former tells us how much more work the node could be expected to do, the latter how much memory the node is likely to need. I'm only running 1 loops, so the max figure is not the absolute maximum. But it's certainly bigger than the 0. percentile. The full results are here: http://fpaste.org/cI1r/ The uniformity of the distribution goes up with the number of virtual nodes but down with the number of physical nodes. I think we should go with a default of 48 nodes (or 50 if you prefer decimal). With 32 nodes, there's only a 0.1% chance that a node will hold more than 1.35 * K/N keys, and a 0.1% chance that the node will be primary owner for more than 1.5 * K/N keys. We could go higher, but we run against the risk of node addresses colliding on the hash wheel. According to the formula on the Birthday Paradox page (http://en.wikipedia.org/wiki/Birthday_problem), we only need 2072 addresses on our 2^31 hash wheel to get a 0.1% chance of collision. That means 21 nodes * 96 virtual nodes, 32 nodes * 64 virtual nodes or 43 nodes * 48 virtual nodes. Cheers Dan On Fri, Jan 27, 2012 at 12:37 AM, Sanne Grinoverosa...@infinispan.org wrote: On 26 January 2012 22:29, Manik Surtanima...@jboss.org wrote: On 26 Jan 2012, at 20:16, Sanne Grinovero wrote: +1 Which default? 100? A prime? We should also make sure the CH function is optimized for this being on. Yes, we should profile a session with vnodes enabled. Manik, we're using VNodes in our performance tests. The proposal is if we can provide a good default value, as the feature is currently disabled by default. -- Bela Ban Lead JGroups (http://www.jgroups.org) JBoss / Red Hat ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev
Re: [infinispan-dev] default value for virtualNodes
On 27 Jan 2012, at 10:52, Bela Ban wrote: I assume the number of vnodes cannot be changed at runtime, dynamically adapting to a changing environment ? I understand everybody has to have the exact same number of vnodes for reads and writes to hit the correct node, right ? Yes. -- Manik Surtani ma...@jboss.org twitter.com/maniksurtani Lead, Infinispan http://www.infinispan.org ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev
Re: [infinispan-dev] default value for virtualNodes
I've created a JIRA to track this: https://issues.jboss.org/browse/ISPN-1801 I understand everybody has to have the exact same number of vnodes for reads and writes to hit the correct node, right ? Yes. That's true, but it is not a good thing: numVirtNodes should be proportional with the node's capacity, i.e. more powerful machines in the cluster should have assigned more virtual nodes. This way we can better control the load. A node would need to send its configured numVirtualNodes when joining in order to support this, but that's a thing we already do for TACH.___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev
Re: [infinispan-dev] default value for virtualNodes
On Fri, Jan 27, 2012 at 2:35 PM, Mircea Markus mircea.mar...@jboss.com wrote: I've created a JIRA to track this: https://issues.jboss.org/browse/ISPN-1801 I understand everybody has to have the exact same number of vnodes for reads and writes to hit the correct node, right ? Yes. That's true, but it is not a good thing: numVirtNodes should be proportional with the node's capacity, i.e. more powerful machines in the cluster should have assigned more virtual nodes. This way we can better control the load. A node would need to send its configured numVirtualNodes when joining in order to support this, but that's a thing we already do for TACH. We should use a different mechanism than the TopologyAwareUUID we use for TACH, because the address is sent with every command. The capacity instead should be fairly static. We may want to make it changeable at runtime, but it will take a state transfer to propagate that info to all the members of the cluster (because the nodes' CHs need to stay in sync). In fact, I can imagine users wanting to balance key ownership between machines/racks/sites with TACH, but without actually using RELAY - the TopologyAwareUUID is just an overhead for them. Cheers Dan ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev
Re: [infinispan-dev] default value for virtualNodes
On Fri, Jan 27, 2012 at 2:53 PM, Mircea Markus mircea.mar...@jboss.com wrote: That's true, but it is not a good thing: numVirtNodes should be proportional with the node's capacity, i.e. more powerful machines in the cluster should have assigned more virtual nodes. This way we can better control the load. A node would need to send its configured numVirtualNodes when joining in order to support this, but that's a thing we already do for TACH. We should use a different mechanism than the TopologyAwareUUID we use for TACH, because the address is sent with every command. so every command sends cluster, rack and machine info? That's sounds a bit redundant. Can't we just send them once with the JOIN request? When RELAY is enabled, it actually needs the topology info in order to relay messages between sites. I agree that the topology info should never change, but RELAY requires it for now so we can't avoid it (in the general case). Cheers Dan ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev
Re: [infinispan-dev] default value for virtualNodes
+1 Which default? 100? A prime? We should also make sure the CH function is optimized for this being on. On Jan 26, 2012 8:12 PM, Pete Muir pm...@redhat.com wrote: I think if we are confident it will benefit all, we should turn it on. On 26 Jan 2012, at 18:54, Mircea Markus wrote: Hi, ATM the default value for virtualNodes is 1. This means that the wheel-share each node has can be very uneven[1] for smalls(up to 15 nodes) clusters. Increasing this value even to a small number(10-30) would significantly improve each node's share of wheel and the chance for a well balanced data distribution over the cluster. So I think that increasing the default value would make sense. What are the drawbacks though? I'm thinking performance and HR wise... [1] a random example of uneven distribution obtained with radargun Cluster size: 4 - ( 15505 13698 5918 4482) Cluster size: 6 - ( 8761 7820 17145 8188 12827 4183) Cluster size: 8 - ( 8391 6302 10773 22068 3589 200 3050 25211) ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev
Re: [infinispan-dev] default value for virtualNodes
On 26 Jan 2012, at 20:16, Sanne Grinovero wrote: +1 Which default? 100? A prime? We should also make sure the CH function is optimized for this being on. Yes, we should profile a session with vnodes enabled. -- Manik Surtani ma...@jboss.org twitter.com/maniksurtani Lead, Infinispan http://www.infinispan.org ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev