Re: [openstack-dev] [Ironic] (Non-)consistency of the Ironic hash ring implementation
On 09/08/2014 06:22 AM, Robert Collins wrote: On 8 September 2014 05:57, Nejc Saje wrote: \ That generator API is pretty bad IMO - because it means you're very heavily dependent on gc and refcount behaviour to keep things clean - and there isn't (IMO) a use case for walking the entire ring from the perspective of an item. Whats the concern with having replicas a part of the API? Because they don't really make sense conceptually. Hash ring itself doesn't actually 'make' any replicas. The replicas parameter in the current Ironic implementation is used solely to limit the amount of buckets returned. Conceptually, that seems to me the same as take(, iterate_nodes()). I don't know python internals enough to know what problems this would cause though, can you please clarify? I could see replicas being a parameter to a function call, but take(N, generator) has the same poor behaviour - generators in general that won't be fully consumed rely on reference counting to be freed. Sometimes thats absolutely the right tradeoff. Ok, I can agree with it being a function call. its absolutely a partition of the hash space - each spot we hash a bucket onto is thats how consistent hashing works at all :) Yes, but you don't assign the number of partitions beforehand, it depends on the number of buckets. What you do assign is the amount of times you hash a single bucket onto the ring, which is currently named 'replicas' in Ceilometer code, but I suggested 'distribution_quality' or something similarly descriptive in an earlier e-mail. I think you misunderstand the code. We do assign the number of partitions beforehand - its approximately fixed and independent of the number of buckets. More buckets == less times we hash each bucket. Ah, your first sentence tipped me off that we're not actually speaking about the same code :). I'm talking about current Ceilometer code and the way the algorithm is described in the original paper. There, the number of times we hash a bucket doesn't depend on the number of buckets at all. The implementation with an array that Ironic used to have definitely needed to define the number of partition, but I don't see the need for it with the new approach as well. Why would you want to limit yourself to a fixed number of partitions if you're limited only by the output range of the hash function? Cheers, Nejc -Rob ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Ironic] (Non-)consistency of the Ironic hash ring implementation
On 8 September 2014 05:57, Nejc Saje wrote: \ >> That generator API is pretty bad IMO - because it means you're very >> heavily dependent on gc and refcount behaviour to keep things clean - >> and there isn't (IMO) a use case for walking the entire ring from the >> perspective of an item. Whats the concern with having replicas a part >> of the API? > > > Because they don't really make sense conceptually. Hash ring itself doesn't > actually 'make' any replicas. The replicas parameter in the current Ironic > implementation is used solely to limit the amount of buckets returned. > Conceptually, that seems to me the same as take(, > iterate_nodes()). I don't know python internals enough to know what problems > this would cause though, can you please clarify? I could see replicas being a parameter to a function call, but take(N, generator) has the same poor behaviour - generators in general that won't be fully consumed rely on reference counting to be freed. Sometimes thats absolutely the right tradeoff. >> its absolutely a partition of the hash space - each spot we hash a >> bucket onto is thats how consistent hashing works at all :) > > > Yes, but you don't assign the number of partitions beforehand, it depends on > the number of buckets. What you do assign is the amount of times you hash a > single bucket onto the ring, which is currently named 'replicas' in > Ceilometer code, but I suggested 'distribution_quality' or something > similarly descriptive in an earlier e-mail. I think you misunderstand the code. We do assign the number of partitions beforehand - its approximately fixed and independent of the number of buckets. More buckets == less times we hash each bucket. -Rob -- Robert Collins Distinguished Technologist HP Converged Cloud ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Ironic] (Non-)consistency of the Ironic hash ring implementation
On 09/04/2014 11:24 PM, Robert Collins wrote: On 4 September 2014 23:42, Nejc Saje wrote: On 09/04/2014 11:51 AM, Robert Collins wrote: It doesn't contain that term precisely, but it does talk about replicating the buckets. What about using a descriptive name for this parameter, like 'distribution_quality', where the higher the value, higher the distribution evenness (and higher memory usage)? I've no objection talking about keys, but 'node' is an API object in Ironic, so I'd rather we talk about hosts - or make it something clearly not node like 'bucket' (which the 1997 paper talks about in describing consistent hash functions). So proposal: - key - a stringifyable thing to be mapped to buckets What about using the term 'item' from the original paper as well? Sure. Item it is. - bucket a worker/store that wants keys mapped to it - replicas - number of buckets a single key wants to be mapped to Can we keep this as an Ironic-internal parameter? Because it doesn't really affect the hash ring. If you want multiple buckets for your item, you just continue your journey along the ring and keep returning new buckets. Check out how the pypi lib does it: https://github.com/Doist/hash_ring/blob/master/hash_ring/ring.py#L119 That generator API is pretty bad IMO - because it means you're very heavily dependent on gc and refcount behaviour to keep things clean - and there isn't (IMO) a use case for walking the entire ring from the perspective of an item. Whats the concern with having replicas a part of the API? Because they don't really make sense conceptually. Hash ring itself doesn't actually 'make' any replicas. The replicas parameter in the current Ironic implementation is used solely to limit the amount of buckets returned. Conceptually, that seems to me the same as take(, iterate_nodes()). I don't know python internals enough to know what problems this would cause though, can you please clarify? - partitions - number of total divisions of the hash space (power of 2 required) I don't think there are any divisions of the hash space in the correct implementation, are there? I think that in the current Ironic implementation this tweaks the distribution quality, just like 'replicas' parameter in Ceilo implementation. its absolutely a partition of the hash space - each spot we hash a bucket onto is thats how consistent hashing works at all :) Yes, but you don't assign the number of partitions beforehand, it depends on the number of buckets. What you do assign is the amount of times you hash a single bucket onto the ring, which is currently named 'replicas' in Ceilometer code, but I suggested 'distribution_quality' or something similarly descriptive in an earlier e-mail. Cheers, Nejc -Rob ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Ironic] (Non-)consistency of the Ironic hash ring implementation
On 4 September 2014 23:42, Nejc Saje wrote: > > > On 09/04/2014 11:51 AM, Robert Collins wrote: > It doesn't contain that term precisely, but it does talk about replicating > the buckets. What about using a descriptive name for this parameter, like > 'distribution_quality', where the higher the value, higher the distribution > evenness (and higher memory usage)? > > >> >> I've no objection talking about keys, but 'node' is an API object in >> Ironic, so I'd rather we talk about hosts - or make it something >> clearly not node like 'bucket' (which the 1997 paper talks about in >> describing consistent hash functions). >> >> So proposal: >> - key - a stringifyable thing to be mapped to buckets > > What about using the term 'item' from the original paper as well? Sure. Item it is. > >> - bucket a worker/store that wants keys mapped to it >> - replicas - number of buckets a single key wants to be mapped to > > Can we keep this as an Ironic-internal parameter? Because it doesn't really > affect the hash ring. If you want multiple buckets for your item, you just > continue your journey along the ring and keep returning new buckets. Check > out how the pypi lib does it: > https://github.com/Doist/hash_ring/blob/master/hash_ring/ring.py#L119 That generator API is pretty bad IMO - because it means you're very heavily dependent on gc and refcount behaviour to keep things clean - and there isn't (IMO) a use case for walking the entire ring from the perspective of an item. Whats the concern with having replicas a part of the API? >> - partitions - number of total divisions of the hash space (power of >> 2 required) > > I don't think there are any divisions of the hash space in the correct > implementation, are there? I think that in the current Ironic implementation > this tweaks the distribution quality, just like 'replicas' parameter in > Ceilo implementation. its absolutely a partition of the hash space - each spot we hash a bucket onto is thats how consistent hashing works at all :) -Rob -- Robert Collins Distinguished Technologist HP Converged Cloud ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Ironic] (Non-)consistency of the Ironic hash ring implementation
On 09/04/2014 11:51 AM, Robert Collins wrote: On 4 September 2014 19:53, Nejc Saje wrote: I used the terms that are used in the original caching use-case, as described in [1] and are used in the pypi lib as well[2]. With the correct approach, there aren't actually any partitions, 'replicas' actually denotes the number of times you hash a node onto the ring. As for nodes&keys, what's your suggestion? So - we should change the Ironic terms then, I suspect (but lets check with Deva who wrote the original code where he got them from). The parameters we need to create a ring are: - how many fallback positions we use for data (currently referred to as replicas) - how many times we hash the servers hosting data into the ring (currently inferred via the hash_partition_exponent / server count) - the servers and then we probe data items as we go. The original paper isn't http://www.martinbroadhurst.com/Consistent-Hash-Ring.html - http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.147.1879 is referenced by it, and that paper doesn't include the term replica count at all. In other systems like cassandra, replicas generally refers to how many servers end up holding a copy of the data: Martin Broadhurts paper uses replica there in quite a different sense - I much prefer the Ironic use, which says how many servers will be operating on the data: its externally relevant. It doesn't contain that term precisely, but it does talk about replicating the buckets. What about using a descriptive name for this parameter, like 'distribution_quality', where the higher the value, higher the distribution evenness (and higher memory usage)? I've no objection talking about keys, but 'node' is an API object in Ironic, so I'd rather we talk about hosts - or make it something clearly not node like 'bucket' (which the 1997 paper talks about in describing consistent hash functions). So proposal: - key - a stringifyable thing to be mapped to buckets What about using the term 'item' from the original paper as well? - bucket a worker/store that wants keys mapped to it - replicas - number of buckets a single key wants to be mapped to Can we keep this as an Ironic-internal parameter? Because it doesn't really affect the hash ring. If you want multiple buckets for your item, you just continue your journey along the ring and keep returning new buckets. Check out how the pypi lib does it: https://github.com/Doist/hash_ring/blob/master/hash_ring/ring.py#L119 - partitions - number of total divisions of the hash space (power of 2 required) I don't think there are any divisions of the hash space in the correct implementation, are there? I think that in the current Ironic implementation this tweaks the distribution quality, just like 'replicas' parameter in Ceilo implementation. Cheers, Nejc I've opened a bug[3], so you can add a Closes-Bug to your patch. Thanks! -Rob ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Ironic] (Non-)consistency of the Ironic hash ring implementation
> >> > The implementation in ceilometer is very different to the Ironic one - > >> > are you saying the test you linked fails with Ironic, or that it fails > >> > with the ceilometer code today? > >> > >> Disclaimer: in Ironic terms, node = conductor, key = host > >> > >> The test I linked fails with Ironic hash ring code (specifically the > >> part that tests consistency). With 1000 keys being mapped to 10 nodes, > >> when you add a node: > >> - current ceilometer code remaps around 7% of the keys (< 1/#nodes) > >> - Ironic code remaps > 90% of the keys > > > > So just to underscore what Nejc is saying here ... > > > > The key point is the proportion of such baremetal-nodes that would > > end up being re-assigned when a new conductor is fired up. > > That was 100% clear, but thanks for making sure. > > The question was getting a proper understanding of why it was > happening in Ironic. > > The ceilometer hashring implementation is good, but it uses the same > terms very differently (e.g. replicas for partitions) - I'm adapting > the key fix back into Ironic - I'd like to see us converge on a single > implementation, and making sure the Ironic one is suitable for > ceilometer seems applicable here (since ceilometer seems to need less > from the API), Absolutely +1 on converging on a single implementation. That was our intent on the ceilometer side from the get-go, to promote a single implementation to oslo that both projects could share. This turned out not to be possible in the short term when the non-consistent aspect of the Ironic implementation was discovered by Nejc, with the juno-3 deadline looming. However for kilo, we would definitely be interested in leveraging a best-of-breed implementation from oslo. > If reassigning was cheap Ironic wouldn't have bothered having a hash ring :) Fair enough, I was just allowing for the possibility that avoidance of needless re-mapping hasn't as high a priority on the ironic side as it was for ceilometer. Cheers, Eoghan ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Ironic] (Non-)consistency of the Ironic hash ring implementation
On 4 September 2014 19:53, Nejc Saje wrote: > I used the terms that are used in the original caching use-case, as > described in [1] and are used in the pypi lib as well[2]. With the correct > approach, there aren't actually any partitions, 'replicas' actually denotes > the number of times you hash a node onto the ring. As for nodes&keys, what's > your suggestion? So - we should change the Ironic terms then, I suspect (but lets check with Deva who wrote the original code where he got them from). The parameters we need to create a ring are: - how many fallback positions we use for data (currently referred to as replicas) - how many times we hash the servers hosting data into the ring (currently inferred via the hash_partition_exponent / server count) - the servers and then we probe data items as we go. The original paper isn't http://www.martinbroadhurst.com/Consistent-Hash-Ring.html - http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.147.1879 is referenced by it, and that paper doesn't include the term replica count at all. In other systems like cassandra, replicas generally refers to how many servers end up holding a copy of the data: Martin Broadhurts paper uses replica there in quite a different sense - I much prefer the Ironic use, which says how many servers will be operating on the data: its externally relevant. I've no objection talking about keys, but 'node' is an API object in Ironic, so I'd rather we talk about hosts - or make it something clearly not node like 'bucket' (which the 1997 paper talks about in describing consistent hash functions). So proposal: - key - a stringifyable thing to be mapped to buckets - bucket a worker/store that wants keys mapped to it - replicas - number of buckets a single key wants to be mapped to - partitions - number of total divisions of the hash space (power of 2 required) > I've opened a bug[3], so you can add a Closes-Bug to your patch. Thanks! -Rob -- Robert Collins Distinguished Technologist HP Converged Cloud ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Ironic] (Non-)consistency of the Ironic hash ring implementation
On 09/04/2014 01:37 AM, Robert Collins wrote: On 4 September 2014 00:13, Eoghan Glynn wrote: On 09/02/2014 11:33 PM, Robert Collins wrote: The implementation in ceilometer is very different to the Ironic one - are you saying the test you linked fails with Ironic, or that it fails with the ceilometer code today? Disclaimer: in Ironic terms, node = conductor, key = host The test I linked fails with Ironic hash ring code (specifically the part that tests consistency). With 1000 keys being mapped to 10 nodes, when you add a node: - current ceilometer code remaps around 7% of the keys (< 1/#nodes) - Ironic code remaps > 90% of the keys So just to underscore what Nejc is saying here ... The key point is the proportion of such baremetal-nodes that would end up being re-assigned when a new conductor is fired up. That was 100% clear, but thanks for making sure. The question was getting a proper understanding of why it was happening in Ironic. The ceilometer hashring implementation is good, but it uses the same terms very differently (e.g. replicas for partitions) - I'm adapting the key fix back into Ironic - I'd like to see us converge on a single implementation, and making sure the Ironic one is suitable for ceilometer seems applicable here (since ceilometer seems to need less from the API), I used the terms that are used in the original caching use-case, as described in [1] and are used in the pypi lib as well[2]. With the correct approach, there aren't actually any partitions, 'replicas' actually denotes the number of times you hash a node onto the ring. As for nodes&keys, what's your suggestion? I've opened a bug[3], so you can add a Closes-Bug to your patch. [1] http://www.martinbroadhurst.com/Consistent-Hash-Ring.html [2] https://pypi.python.org/pypi/hash_ring [3] https://bugs.launchpad.net/ironic/+bug/1365334 If reassigning was cheap Ironic wouldn't have bothered having a hash ring :) -Rob ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Ironic] (Non-)consistency of the Ironic hash ring implementation
On 4 September 2014 08:13, Robert Collins wrote: > On 3 September 2014 23:50, Nejc Saje wrote: > > Forgive my slowness :). > >> Disclaimer: in Ironic terms, node = conductor, key = host > > Sadly not inside the hash_ring code :/. host == conductor, key == data. > >> The test I linked fails with Ironic hash ring code (specifically the part >> that tests consistency). With 1000 keys being mapped to 10 nodes, when you >> add a node: >> - current ceilometer code remaps around 7% of the keys (< 1/#nodes) >> - Ironic code remaps > 90% of the keys > > Ok thats pretty definitive and definitely not intended. However > remapping 7% when adding 10% capacity is also undesirable - we'd like > to remap 1/11 -> +-9%. > >> The problem lies in the way you build your hash ring[1]. You initialize a >> statically-sized array and divide its fields among nodes. When you do a >> lookup, you check which field in the array the key hashes to and then return >> the node that that field belongs to. This is the wrong approach because when >> you add a new node, you will resize the array and assign the fields >> differently, but the same key will still hash to the same field and will >> therefore hash to a different node. > > You're referring to part2host where we round-robin using mod to map a > partition (hash value of key) to a host(conductor). Then when we add a > conductor this entire map scales out - yup I see the issue. > > Have you filed a bug for this? https://review.openstack.org/118932 has an equivalent test, which failed before the fixes were applied to the Ironic hash ring implementation. Cheers, Rob -- Robert Collins Distinguished Technologist HP Converged Cloud ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Ironic] (Non-)consistency of the Ironic hash ring implementation
On 4 September 2014 00:13, Eoghan Glynn wrote: > > >> On 09/02/2014 11:33 PM, Robert Collins wrote: >> > The implementation in ceilometer is very different to the Ironic one - >> > are you saying the test you linked fails with Ironic, or that it fails >> > with the ceilometer code today? >> >> Disclaimer: in Ironic terms, node = conductor, key = host >> >> The test I linked fails with Ironic hash ring code (specifically the >> part that tests consistency). With 1000 keys being mapped to 10 nodes, >> when you add a node: >> - current ceilometer code remaps around 7% of the keys (< 1/#nodes) >> - Ironic code remaps > 90% of the keys > > So just to underscore what Nejc is saying here ... > > The key point is the proportion of such baremetal-nodes that would > end up being re-assigned when a new conductor is fired up. That was 100% clear, but thanks for making sure. The question was getting a proper understanding of why it was happening in Ironic. The ceilometer hashring implementation is good, but it uses the same terms very differently (e.g. replicas for partitions) - I'm adapting the key fix back into Ironic - I'd like to see us converge on a single implementation, and making sure the Ironic one is suitable for ceilometer seems applicable here (since ceilometer seems to need less from the API), If reassigning was cheap Ironic wouldn't have bothered having a hash ring :) -Rob -- Robert Collins Distinguished Technologist HP Converged Cloud ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Ironic] (Non-)consistency of the Ironic hash ring implementation
On 3 September 2014 23:50, Nejc Saje wrote: Forgive my slowness :). > Disclaimer: in Ironic terms, node = conductor, key = host Sadly not inside the hash_ring code :/. host == conductor, key == data. > The test I linked fails with Ironic hash ring code (specifically the part > that tests consistency). With 1000 keys being mapped to 10 nodes, when you > add a node: > - current ceilometer code remaps around 7% of the keys (< 1/#nodes) > - Ironic code remaps > 90% of the keys Ok thats pretty definitive and definitely not intended. However remapping 7% when adding 10% capacity is also undesirable - we'd like to remap 1/11 -> +-9%. > The problem lies in the way you build your hash ring[1]. You initialize a > statically-sized array and divide its fields among nodes. When you do a > lookup, you check which field in the array the key hashes to and then return > the node that that field belongs to. This is the wrong approach because when > you add a new node, you will resize the array and assign the fields > differently, but the same key will still hash to the same field and will > therefore hash to a different node. You're referring to part2host where we round-robin using mod to map a partition (hash value of key) to a host(conductor). Then when we add a conductor this entire map scales out - yup I see the issue. Have you filed a bug for this? > Nodes must be hashed onto the ring as well, statically chopping up the ring > and dividing it among nodes isn't enough for consistency. > > Cheers, > Nejc > > >> >> The Ironic hash_ring implementation uses a hash: >> def _get_partition(self, data): >> try: >> return (struct.unpack_from('>I', >> hashlib.md5(data).digest())[0] >> >> self.partition_shift) >> except TypeError: >> raise exception.Invalid( >> _("Invalid data supplied to HashRing.get_hosts.")) >> >> >> so I don't see the fixed size thing you're referring to. Could you >> point a little more specifically? Thanks! >> >> -Rob >> >> On 1 September 2014 19:48, Nejc Saje wrote: >>> >>> Hey guys, >>> >>> in Ceilometer we're using consistent hash rings to do workload >>> partitioning[1]. We've considered generalizing your hash ring >>> implementation >>> and moving it up to oslo, but unfortunately your implementation is not >>> actually consistent, which is our requirement. >>> >>> Since you divide your ring into a number of equal sized partitions, >>> instead >>> of hashing hosts onto the ring, when you add a new host, >>> an unbound amount of keys get re-mapped to different hosts (instead of >>> the >>> 1/#nodes remapping guaranteed by hash ring). I've confirmed this with the >>> test in aforementioned patch[2]. >>> >>> If this is good enough for your use-case, great, otherwise we can get a >>> generalized hash ring implementation into oslo for use in both projects >>> or >>> we can both use an external library[3]. >>> >>> Cheers, >>> Nejc >>> >>> [1] https://review.openstack.org/#/c/113549/ >>> [2] >>> https://review.openstack.org/#/c/113549/21/ceilometer/tests/test_utils.py >>> [3] https://pypi.python.org/pypi/hash_ring >>> >>> ___ >>> OpenStack-dev mailing list >>> OpenStack-dev@lists.openstack.org >>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev >> >> >> >> > > ___ > OpenStack-dev mailing list > OpenStack-dev@lists.openstack.org > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev -- Robert Collins Distinguished Technologist HP Converged Cloud ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Ironic] (Non-)consistency of the Ironic hash ring implementation
On Wed, Sep 3, 2014 at 12:50 PM, Nejc Saje wrote: > > > On 09/02/2014 11:33 PM, Robert Collins wrote: >> >> The implementation in ceilometer is very different to the Ironic one - >> are you saying the test you linked fails with Ironic, or that it fails >> with the ceilometer code today? > > > Disclaimer: in Ironic terms, node = conductor, key = host > > The test I linked fails with Ironic hash ring code (specifically the part > that tests consistency). With 1000 keys being mapped to 10 nodes, when you > add a node: > - current ceilometer code remaps around 7% of the keys (< 1/#nodes) > - Ironic code remaps > 90% of the keys Thanks Nejc for posting it here, I'm not super familiar with the consistent hashing code but I def take a look at it. IMO this behavior is not desirable in Ironic at all, we don't want > 90% of the hash to get remapped when a conductor join or leave the cluster. Also it would be good to open a bug about this problem in Ironic so we can track the progress there (I can open it if you want, but I will first do some investigation to understand the problem better). Cheers, Lucas ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Ironic] (Non-)consistency of the Ironic hash ring implementation
> On 09/02/2014 11:33 PM, Robert Collins wrote: > > The implementation in ceilometer is very different to the Ironic one - > > are you saying the test you linked fails with Ironic, or that it fails > > with the ceilometer code today? > > Disclaimer: in Ironic terms, node = conductor, key = host > > The test I linked fails with Ironic hash ring code (specifically the > part that tests consistency). With 1000 keys being mapped to 10 nodes, > when you add a node: > - current ceilometer code remaps around 7% of the keys (< 1/#nodes) > - Ironic code remaps > 90% of the keys So just to underscore what Nejc is saying here ... The key point is the proportion of such baremetal-nodes that would end up being re-assigned when a new conductor is fired up. The defining property of a consistent hash-ring is that it significantly reduces the number of re-mappings that occur when the number of buckets change, when compared to a plain ol' hash. This desire for stickiness would often be motivated by some form of statefulness or initialization cost. In the ceilometer case, we want to minimize unnecessary re-assignment so as to keep the cadence of meter collection and alarm evaluation as even as possible (given that each agent will be running off non-synchronized periodic tasks). In the case of the Ironic baremetal-node to conductor mapping, perhaps such stickiness isn't really of any benefit? If so, that's fine, but Nejc's main point UUIC is that an approach that doesn't minimize the number of re-mappings isn't really a form of *consistent* hashing. Cheers, Eoghan > The problem lies in the way you build your hash ring[1]. You initialize > a statically-sized array and divide its fields among nodes. When you do > a lookup, you check which field in the array the key hashes to and then > return the node that that field belongs to. This is the wrong approach > because when you add a new node, you will resize the array and assign > the fields differently, but the same key will still hash to the same > field and will therefore hash to a different node. > > Nodes must be hashed onto the ring as well, statically chopping up the > ring and dividing it among nodes isn't enough for consistency. > > Cheers, > Nejc > > > > > The Ironic hash_ring implementation uses a hash: > > def _get_partition(self, data): > > try: > > return (struct.unpack_from('>I', > > hashlib.md5(data).digest())[0] > > >> self.partition_shift) > > except TypeError: > > raise exception.Invalid( > > _("Invalid data supplied to HashRing.get_hosts.")) > > > > > > so I don't see the fixed size thing you're referring to. Could you > > point a little more specifically? Thanks! > > > > -Rob > > > > On 1 September 2014 19:48, Nejc Saje wrote: > >> Hey guys, > >> > >> in Ceilometer we're using consistent hash rings to do workload > >> partitioning[1]. We've considered generalizing your hash ring > >> implementation > >> and moving it up to oslo, but unfortunately your implementation is not > >> actually consistent, which is our requirement. > >> > >> Since you divide your ring into a number of equal sized partitions, > >> instead > >> of hashing hosts onto the ring, when you add a new host, > >> an unbound amount of keys get re-mapped to different hosts (instead of the > >> 1/#nodes remapping guaranteed by hash ring). I've confirmed this with the > >> test in aforementioned patch[2]. > >> > >> If this is good enough for your use-case, great, otherwise we can get a > >> generalized hash ring implementation into oslo for use in both projects or > >> we can both use an external library[3]. > >> > >> Cheers, > >> Nejc > >> > >> [1] https://review.openstack.org/#/c/113549/ > >> [2] > >> https://review.openstack.org/#/c/113549/21/ceilometer/tests/test_utils.py > >> [3] https://pypi.python.org/pypi/hash_ring > >> > >> ___ > >> OpenStack-dev mailing list > >> OpenStack-dev@lists.openstack.org > >> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > > > > > > > > ___ > OpenStack-dev mailing list > OpenStack-dev@lists.openstack.org > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Ironic] (Non-)consistency of the Ironic hash ring implementation
Sorry, forgot to link the reference: [1] https://github.com/openstack/ironic/blob/b56db42aa39e855e558a52eb71e656ea14380f8a/ironic/common/hash_ring.py#L72 On 09/03/2014 01:50 PM, Nejc Saje wrote: On 09/02/2014 11:33 PM, Robert Collins wrote: The implementation in ceilometer is very different to the Ironic one - are you saying the test you linked fails with Ironic, or that it fails with the ceilometer code today? Disclaimer: in Ironic terms, node = conductor, key = host The test I linked fails with Ironic hash ring code (specifically the part that tests consistency). With 1000 keys being mapped to 10 nodes, when you add a node: - current ceilometer code remaps around 7% of the keys (< 1/#nodes) - Ironic code remaps > 90% of the keys The problem lies in the way you build your hash ring[1]. You initialize a statically-sized array and divide its fields among nodes. When you do a lookup, you check which field in the array the key hashes to and then return the node that that field belongs to. This is the wrong approach because when you add a new node, you will resize the array and assign the fields differently, but the same key will still hash to the same field and will therefore hash to a different node. Nodes must be hashed onto the ring as well, statically chopping up the ring and dividing it among nodes isn't enough for consistency. Cheers, Nejc The Ironic hash_ring implementation uses a hash: def _get_partition(self, data): try: return (struct.unpack_from('>I', hashlib.md5(data).digest())[0] >> self.partition_shift) except TypeError: raise exception.Invalid( _("Invalid data supplied to HashRing.get_hosts.")) so I don't see the fixed size thing you're referring to. Could you point a little more specifically? Thanks! -Rob On 1 September 2014 19:48, Nejc Saje wrote: Hey guys, in Ceilometer we're using consistent hash rings to do workload partitioning[1]. We've considered generalizing your hash ring implementation and moving it up to oslo, but unfortunately your implementation is not actually consistent, which is our requirement. Since you divide your ring into a number of equal sized partitions, instead of hashing hosts onto the ring, when you add a new host, an unbound amount of keys get re-mapped to different hosts (instead of the 1/#nodes remapping guaranteed by hash ring). I've confirmed this with the test in aforementioned patch[2]. If this is good enough for your use-case, great, otherwise we can get a generalized hash ring implementation into oslo for use in both projects or we can both use an external library[3]. Cheers, Nejc [1] https://review.openstack.org/#/c/113549/ [2] https://review.openstack.org/#/c/113549/21/ceilometer/tests/test_utils.py [3] https://pypi.python.org/pypi/hash_ring ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Ironic] (Non-)consistency of the Ironic hash ring implementation
On 09/02/2014 11:33 PM, Robert Collins wrote: The implementation in ceilometer is very different to the Ironic one - are you saying the test you linked fails with Ironic, or that it fails with the ceilometer code today? Disclaimer: in Ironic terms, node = conductor, key = host The test I linked fails with Ironic hash ring code (specifically the part that tests consistency). With 1000 keys being mapped to 10 nodes, when you add a node: - current ceilometer code remaps around 7% of the keys (< 1/#nodes) - Ironic code remaps > 90% of the keys The problem lies in the way you build your hash ring[1]. You initialize a statically-sized array and divide its fields among nodes. When you do a lookup, you check which field in the array the key hashes to and then return the node that that field belongs to. This is the wrong approach because when you add a new node, you will resize the array and assign the fields differently, but the same key will still hash to the same field and will therefore hash to a different node. Nodes must be hashed onto the ring as well, statically chopping up the ring and dividing it among nodes isn't enough for consistency. Cheers, Nejc The Ironic hash_ring implementation uses a hash: def _get_partition(self, data): try: return (struct.unpack_from('>I', hashlib.md5(data).digest())[0] >> self.partition_shift) except TypeError: raise exception.Invalid( _("Invalid data supplied to HashRing.get_hosts.")) so I don't see the fixed size thing you're referring to. Could you point a little more specifically? Thanks! -Rob On 1 September 2014 19:48, Nejc Saje wrote: Hey guys, in Ceilometer we're using consistent hash rings to do workload partitioning[1]. We've considered generalizing your hash ring implementation and moving it up to oslo, but unfortunately your implementation is not actually consistent, which is our requirement. Since you divide your ring into a number of equal sized partitions, instead of hashing hosts onto the ring, when you add a new host, an unbound amount of keys get re-mapped to different hosts (instead of the 1/#nodes remapping guaranteed by hash ring). I've confirmed this with the test in aforementioned patch[2]. If this is good enough for your use-case, great, otherwise we can get a generalized hash ring implementation into oslo for use in both projects or we can both use an external library[3]. Cheers, Nejc [1] https://review.openstack.org/#/c/113549/ [2] https://review.openstack.org/#/c/113549/21/ceilometer/tests/test_utils.py [3] https://pypi.python.org/pypi/hash_ring ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Ironic] (Non-)consistency of the Ironic hash ring implementation
On 09/02/2014 11:19 PM, Gregory Haynes wrote: Excerpts from Nejc Saje's message of 2014-09-01 07:48:46 +: Hey guys, in Ceilometer we're using consistent hash rings to do workload partitioning[1]. We've considered generalizing your hash ring implementation and moving it up to oslo, but unfortunately your implementation is not actually consistent, which is our requirement. Since you divide your ring into a number of equal sized partitions, instead of hashing hosts onto the ring, when you add a new host, an unbound amount of keys get re-mapped to different hosts (instead of the 1/#nodes remapping guaranteed by hash ring). I've confirmed this with the test in aforementioned patch[2]. I am just getting started with the ironic hash ring code, but this seems surprising to me. AIUI we do require some rebalancing when a conductor is removed or added (which is normal use of a CHT) but not for every host added. This is supported by the fact that we currently dont have a rebalancing routine, so I would be surprised if ironic worked at all if we required it for each host that is added. Sorry, I used terms that are used in the distributed caching use-case for hash-rings (which is why this algorithm was developed), where you have hosts and keys, and hash-ring tells you which host the key is on. To translate the original e-mail into Ironic use-case, where you have hosts being mapped to conductors: >> Since you divide your ring into a number of equal sized partitions >> instead of hashing *conductors* onto the ring, when you add a new *host*, >> an unbound amount of *hosts* get re-mapped to different *conductors* (instead of >> the 1/#*conductors* of *hosts* being re-mapped guaranteed by hash ring). I've confirmed this >> with the test in aforementioned patch[2]. Cheers, Nejc Can anyone in Ironic with a bit more experience confirm/deny this? If this is good enough for your use-case, great, otherwise we can get a generalized hash ring implementation into oslo for use in both projects or we can both use an external library[3]. Cheers, Nejc [1] https://review.openstack.org/#/c/113549/ [2] https://review.openstack.org/#/c/113549/21/ceilometer/tests/test_utils.py [3] https://pypi.python.org/pypi/hash_ring Thanks, Greg ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Ironic] (Non-)consistency of the Ironic hash ring implementation
The implementation in ceilometer is very different to the Ironic one - are you saying the test you linked fails with Ironic, or that it fails with the ceilometer code today? The Ironic hash_ring implementation uses a hash: def _get_partition(self, data): try: return (struct.unpack_from('>I', hashlib.md5(data).digest())[0] >> self.partition_shift) except TypeError: raise exception.Invalid( _("Invalid data supplied to HashRing.get_hosts.")) so I don't see the fixed size thing you're referring to. Could you point a little more specifically? Thanks! -Rob On 1 September 2014 19:48, Nejc Saje wrote: > Hey guys, > > in Ceilometer we're using consistent hash rings to do workload > partitioning[1]. We've considered generalizing your hash ring implementation > and moving it up to oslo, but unfortunately your implementation is not > actually consistent, which is our requirement. > > Since you divide your ring into a number of equal sized partitions, instead > of hashing hosts onto the ring, when you add a new host, > an unbound amount of keys get re-mapped to different hosts (instead of the > 1/#nodes remapping guaranteed by hash ring). I've confirmed this with the > test in aforementioned patch[2]. > > If this is good enough for your use-case, great, otherwise we can get a > generalized hash ring implementation into oslo for use in both projects or > we can both use an external library[3]. > > Cheers, > Nejc > > [1] https://review.openstack.org/#/c/113549/ > [2] > https://review.openstack.org/#/c/113549/21/ceilometer/tests/test_utils.py > [3] https://pypi.python.org/pypi/hash_ring > > ___ > OpenStack-dev mailing list > OpenStack-dev@lists.openstack.org > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev -- Robert Collins Distinguished Technologist HP Converged Cloud ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Ironic] (Non-)consistency of the Ironic hash ring implementation
Excerpts from Nejc Saje's message of 2014-09-01 07:48:46 +: > Hey guys, > > in Ceilometer we're using consistent hash rings to do workload > partitioning[1]. We've considered generalizing your hash ring > implementation and moving it up to oslo, but unfortunately your > implementation is not actually consistent, which is our requirement. > > Since you divide your ring into a number of equal sized partitions, > instead of hashing hosts onto the ring, when you add a new host, > an unbound amount of keys get re-mapped to different hosts (instead of > the 1/#nodes remapping guaranteed by hash ring). I've confirmed this > with the test in aforementioned patch[2]. I am just getting started with the ironic hash ring code, but this seems surprising to me. AIUI we do require some rebalancing when a conductor is removed or added (which is normal use of a CHT) but not for every host added. This is supported by the fact that we currently dont have a rebalancing routine, so I would be surprised if ironic worked at all if we required it for each host that is added. Can anyone in Ironic with a bit more experience confirm/deny this? > > If this is good enough for your use-case, great, otherwise we can get a > generalized hash ring implementation into oslo for use in both projects > or we can both use an external library[3]. > > Cheers, > Nejc > > [1] https://review.openstack.org/#/c/113549/ > [2] > https://review.openstack.org/#/c/113549/21/ceilometer/tests/test_utils.py > [3] https://pypi.python.org/pypi/hash_ring > Thanks, Greg ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev