[ 
https://issues.apache.org/jira/browse/IGNITE-9365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16620072#comment-16620072
 ] 

David Harvey edited comment on IGNITE-9365 at 9/19/18 11:54 AM:
----------------------------------------------------------------

[~vkulichenko], the use case I thinking about is we have a working cluster, and 
a new node is added to the baseline, but it is missing the attribute.      If 
the affinityBackupFilter throws an exception,  there is nothing to catch all 
the way back to GridDhtPartitionsExchangeFuture.processFullMessage(), and all 
nodes that tries to calculate() affinity will throw that exception.       For 
this use case, we would want to validate that the node coming online has the 
proper attribute set, and not discover this problem on arbitrary nodes during a 
partition map exchange.   I don't have a complete enough understanding to know 
where that validation should go.    

A more promising approach is to not allow a node with a null attribute to 
service _either_ the primary or backup.   That is, if you configure a cache 
with an affinityBackupFilter, the set of nodes that can service that cache will 
be limited to nodes that will not throw an exception if the filter is given the 
node and a list of nodes only containing that node.     I don't yet see how to 
handle the case where no nodes have the attribute, however.

Also, I failed to mention there is precedent for the currently coded approach:  
If exclNeighbors==true, but there are only neighbors available, it will create 
the backup on the neighbor.      If the node characteristic that distinguishes 
between groups does not exist, then as coded it simply places the node in its 
own group.   The semantics of this, or your alternative of placing all such 
nodes in one group, are clear and easy to describe, even if the cause is likely 
a misconfiguration.   

If we go with your original proposal, then I can document a procedure where you 
can compare the total number of cache entries across all caches with SELECT 
COUNT(*)... to determine if the caches are properly backed up.


was (Author: syssoftsol):
[~vkulichenko], the use case I thinking about is we have a working cluster, and 
a new node is added to the baseline, but it is missing the attribute.      If 
the affinityBackupFilter throws an exception,  there is nothing to catch all 
the way back to GridDhtPartitionsExchangeFuture.processFullMessage(), and all 
nodes that tries to calculate() affinity will throw that exception.       For 
this use case, we would want to validate that the node coming online has the 
proper attribute set, and not discover this on an arbitrary node.   I don't 
have a complete enough understanding to know where that validation should go.   
 

A more promising approach is to not allow a node with a null attribute to 
service _either_ the primary or backup.   That is, if you configure a cache 
with an affinityBackupFilter, the set of nodes that can service that cache will 
be limited to nodes that will not throw an exception if the filter is given the 
node and a list of nodes only containing that node.     I don't yet see how to 
handle the case where no nodes have the attribute, however.

> Force backups to different AWS availability zones using only Spring XML
> -----------------------------------------------------------------------
>
>                 Key: IGNITE-9365
>                 URL: https://issues.apache.org/jira/browse/IGNITE-9365
>             Project: Ignite
>          Issue Type: Improvement
>          Components: cache
>         Environment:  
>            Reporter: David Harvey
>            Assignee: David Harvey
>            Priority: Minor
>             Fix For: 2.7
>
>         Attachments: master_947962f785_availability_zones_via_spring.patch
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> As a developer, I want to be able to force  cache backups each to a different 
> "Availability Zone", when I'm running out-of-the-box Ignite, without 
> additional Jars installed.  "Availability zone" is a AWS feature with 
> different names for the same function by other cloud providers.  A single 
> availability zone has the characteristic that some or all of the EC2 
> instances in that zone can fail together due to a single fault.   You have no 
> control over the hosts on which the EC2 instance VMs run on in AWS, except by 
> controlling the availability zone .  
>  
> I could write a few lines of a custom affinityBackupFilter, and configure it 
> a RendezvousAffinityFunction, but then I have to get it deployed on all nodes 
> in the cluster, and peer class loading will not work to this.   The code to 
> do this should just be part of Ignite. 
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to