[ https://issues.apache.org/jira/browse/IGNITE-9365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16620072#comment-16620072 ]
David Harvey edited comment on IGNITE-9365 at 9/19/18 11:54 AM: ---------------------------------------------------------------- [~vkulichenko], the use case I thinking about is we have a working cluster, and a new node is added to the baseline, but it is missing the attribute. If the affinityBackupFilter throws an exception, there is nothing to catch all the way back to GridDhtPartitionsExchangeFuture.processFullMessage(), and all nodes that tries to calculate() affinity will throw that exception. For this use case, we would want to validate that the node coming online has the proper attribute set, and not discover this problem on arbitrary nodes during a partition map exchange. I don't have a complete enough understanding to know where that validation should go. A more promising approach is to not allow a node with a null attribute to service _either_ the primary or backup. That is, if you configure a cache with an affinityBackupFilter, the set of nodes that can service that cache will be limited to nodes that will not throw an exception if the filter is given the node and a list of nodes only containing that node. I don't yet see how to handle the case where no nodes have the attribute, however. Also, I failed to mention there is precedent for the currently coded approach: If exclNeighbors==true, but there are only neighbors available, it will create the backup on the neighbor. If the node characteristic that distinguishes between groups does not exist, then as coded it simply places the node in its own group. The semantics of this, or your alternative of placing all such nodes in one group, are clear and easy to describe, even if the cause is likely a misconfiguration. If we go with your original proposal, then I can document a procedure where you can compare the total number of cache entries across all caches with SELECT COUNT(*)... to determine if the caches are properly backed up. was (Author: syssoftsol): [~vkulichenko], the use case I thinking about is we have a working cluster, and a new node is added to the baseline, but it is missing the attribute. If the affinityBackupFilter throws an exception, there is nothing to catch all the way back to GridDhtPartitionsExchangeFuture.processFullMessage(), and all nodes that tries to calculate() affinity will throw that exception. For this use case, we would want to validate that the node coming online has the proper attribute set, and not discover this on an arbitrary node. I don't have a complete enough understanding to know where that validation should go. A more promising approach is to not allow a node with a null attribute to service _either_ the primary or backup. That is, if you configure a cache with an affinityBackupFilter, the set of nodes that can service that cache will be limited to nodes that will not throw an exception if the filter is given the node and a list of nodes only containing that node. I don't yet see how to handle the case where no nodes have the attribute, however. > Force backups to different AWS availability zones using only Spring XML > ----------------------------------------------------------------------- > > Key: IGNITE-9365 > URL: https://issues.apache.org/jira/browse/IGNITE-9365 > Project: Ignite > Issue Type: Improvement > Components: cache > Environment: > Reporter: David Harvey > Assignee: David Harvey > Priority: Minor > Fix For: 2.7 > > Attachments: master_947962f785_availability_zones_via_spring.patch > > Original Estimate: 168h > Remaining Estimate: 168h > > As a developer, I want to be able to force cache backups each to a different > "Availability Zone", when I'm running out-of-the-box Ignite, without > additional Jars installed. "Availability zone" is a AWS feature with > different names for the same function by other cloud providers. A single > availability zone has the characteristic that some or all of the EC2 > instances in that zone can fail together due to a single fault. You have no > control over the hosts on which the EC2 instance VMs run on in AWS, except by > controlling the availability zone . > > I could write a few lines of a custom affinityBackupFilter, and configure it > a RendezvousAffinityFunction, but then I have to get it deployed on all nodes > in the cluster, and peer class loading will not work to this. The code to > do this should just be part of Ignite. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)