Hope this message makes as much sense to me on Tuesday as it did at 3 AM in the
airport ;-) Inline...
- Original Message -
> From: "Jeff Darcy"
> To: "Ben England"
> Cc: "Gluster Devel" , "Manoj Pillai"
>
> Sent: Sunday, February 15, 2015 1:49:17 AM
> Subject: Re: [Gluster-devel] Multi-network support proposal
>
> > It's really important for glusterfs not to require that the clients mount
> > volumes using same subnet that is used by servers, and clearly your very
> > general-purpose proposal could address that. For example, in a site where
> > non-glusterfs protocols are used, there are already good reasons for using
> > multiple subnets, and we want glusterfs to be able to coexist with
> > non-glusterfs protocols at a site.
> >
> > However, is there a simpler way to allow glusterfs clients to connect to
> > servers through more than one subnet. For example, suppose your Gluster
> > volume subnet is 172.17.50.0/24 and your "public" network used by glusterfs
> > clients is 1.2.3.0/22, but one of the servers also has an interface on
> > subnet 4.5.6.0/24 . So at the time that the volume is either created or
> > bricks are added/removed:
> >
> > - determine what servers are actually in the volume
> > - ask each server to return the subnet for each of its active network
> > interfaces
> > - determine set of subnets that are directly accessible to ALL the volume's
> > servers
> > - write a glusterfs volfile for each of these subnets and save it
> >
> > This process is O(N) where N is number of servers, but it only happens for
> > volume creation or addition/removal of bricks, these events do not happen
> > very often (do they?). In example, 1.2.3.0/22 and 172.17.50.0/24 would
> > have
> > glusterfs volfiles, but 4.5.6.0/22 would not.
> >
> > So now when a client connects, the server knows which subnet the request
> > came
> > through (getsockaddr), so it can just return the volfile for that subnet.
> > If there is no volfile for that subnet, the client mount request is
> > rejected.. But what about existing Gluster volumes? When software is
> > upgraded, we should provide a mechanism for triggering this volfile
> > generation process to open up additional subnets for glusterfs clients.
> >
> > This proposal requires additional work to be done where volfiles are
> > generated and where glusterfs mount processing is done, but does not
> > require
> > any additional configuration commands or extra user knowledge of Gluster.
> > glusterfs clients can then use *any* subnet that is accessible to all the
> > servers.
>
> That does have the advantage of not requiring any special configuration,
> and might work well enough for front-end traffic, but it has the
> drawback of not giving any control over back-end traffic. How do
> *servers* choose which interfaces to use for NSR normal traffic,
> reconciliation/self-heal, DHT rebalance, and so on? Which network
> should Ganesha/Samba servers use to communicate with bricks? Even on
> the front end, what happens when we do get around to adding per-subnet
> access control or options? For those kinds of use cases we need
> networks to be explicit parts of our model, not implicit or inferred.
> So maybe we need to reconcile the two approaches, and hope that the
> combined result isn't too complicated. I'm open to suggestions.
>
In defense of your proposal, you are right that it is difficult to manage each
node's network configuration independently or by volfile, and it would be
useful to a system manager to be able to configure Gluster network behavior
across the entire volume. For example, you can use pdsh to issue commands to
any subset of Gluster servers, but what if some of them are down at the time
the command is issued? How do you make these configuration changes persistent?
What happens when you add or remove servers from the volume? That to me is
the real selling point of your proposal - if we have a 60-node or even a
1000-node Gluster volume, we could provide a way to control network behavior in
a persistent, highly-available, scalable way with as few sysadmin operations as
possible.
I have two concerns:
1) Do we have to specify each host's address rewriting in your example - why
not something like this?
# gluster network add client-net 1.2.3.0/24
glusterd could then use a discovery process as I described earlier to determine
for each server what its IP address is on that subnet and rewrite volfiles
accordingly.
The advantage of this subnet-based specification IMHO is that it scales - as
you add and remove nodes, you do not have to change "client-net" entity, you
just make sure that Gluster servers provide the appropriate network interface
with appropriate IP address and subnet mask.
2) Could we keep the number of roles and the sysadmin interface in general from
getting too complicated? Here's an oversimplified model of Gluster networking
- there are at most 2 kinds of subnets on each server in use by