Re: [Gluster-devel] Custom Transport layers
2016-10-31 12:40 GMT+01:00 Lindsay Mathieson : > But you can broadcast with UDP - one packet of data through one nic to all > nodes, so in theory you could broadcast 1GB *per nic* or 3GB via three nics. > Minus overhead for acks, nacks and ordering :) > > But I'm not sure it would work at all in practice now through a switch. I don't like this idea. I stil prefere a properly configured bonding. There is a bonding mode that does exactly this. Probably, also balance-xor and active-tld could do the trick ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Custom Transport layers
Another thought that just occurred to me: security. There's no broadcast/unicast equivalent of TLS, so you're not going to have that protection. Maybe it doesn't matter in some kinds of deployments, but in others it would matter very much. Also, a similarly-secure broadcast/multicast protocol would be a really awesome research project. ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Custom Transport layers
On 31/10/2016 5:56 PM, Gandalf Corvotempesta wrote: > I'd like to experiment with broadcast udp to see if its feasible in local networks. It would be amazing if we could write at 1GB speeds simultaneously to all nodes. > Is you have replica 3 and set a 3 nic bonded interface with balance-alb on the gluster client, you are able to use the 3 nics simultaneously writing at 1gb on each node. But you can broadcast with UDP - one packet of data through one nic to all nodes, so in theory you could broadcast 1GB *per nic* or 3GB via three nics. Minus overhead for acks, nacks and ordering :) But I'm not sure it would work at all in practice now through a switch. -- Lindsay Mathieson ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Custom Transport layers
On 31/10/2016 5:56 PM, Gandalf Corvotempesta wrote: Is you have replica 3 and set a 3 nic bonded interface with balance-alb on the gluster client, you are able to use the 3 nics simultaneously writing at 1gb on each node. Actually all you need is two nics, so each node can use 2 nics for the other two nodes. I actually have three nics per node, currently two bonded with balance-alb per node and I do indeed max out a 1G connection with Jumbo frames. A VM tops out at 120MB/s in seq writes. I did experiment with 3 nics bonded with balance-rr and managed to get 2.4Gbs throughput, balance-rr doesn't do to well with bonds bigger than 2. Unfortunately I need a private IP for gluster and a bridge for the VM's and I could only get OpenVSwitch to bond three nics to a bridge and a extra IP and OVS doesn't support balance-alb. -- Lindsay Mathieson ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Custom Transport layers
- Original Message - > From: "Raghavendra G" > To: "Lindsay Mathieson" > Cc: "Gluster Devel" > Sent: Monday, October 31, 2016 11:45:15 AM > Subject: Re: [Gluster-devel] Custom Transport layers > > > > On Fri, Oct 28, 2016 at 6:20 PM, Lindsay Mathieson < > lindsay.mathie...@gmail.com > wrote: > > > Is it possible to write custom transport layers for gluster?, data transfer, > not the management protocols. Pointers to the existing code and/or docs :) > would be helpful > > > I'd like to experiment with broadcast udp to see if its feasible in local > networks. > > Another thing to consider here is ordering of messages (sent over transport). > If Broadcast udp doesn't support ordering of messages (I know udp doesn't, > assuming broadcast udp doesn't too, but I may be wrong). If it doesn't, > you've to build ordering logic on top of it. If transport layer doesn't > provide ordering, we cannot reason about consistency of data stored on > filesystem. Last time when I thought about UDP vs TCP, I seemed to have stumbled upon use-cases where maintaining ordering of messages was necessary. However, now that I think more about it, higher layers (like write-behind, open-behind etc) maintain order wherever required. So, I am not sure whether ordering of messages is a primary requirement when choosing a transport. I'll post more if I find anything worthwhile. > > > It would be amazing if we could write at 1GB speeds simultaneously to all > nodes. > > > Alternatively let me know if this has been tried and discarded as a bad idea > ... > > thanks, > > -- > Lindsay Mathieson > > ___ > Gluster-devel mailing list > Gluster-devel@gluster.org > http://www.gluster.org/mailman/listinfo/gluster-devel > > > > -- > Raghavendra G > > ___ > Gluster-devel mailing list > Gluster-devel@gluster.org > http://www.gluster.org/mailman/listinfo/gluster-devel ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Custom Transport layers
Il 28 ott 2016 2:50 PM, "Lindsay Mathieson" ha scritto: > > I'd like to experiment with broadcast udp to see if its feasible in local networks. It would be amazing if we could write at 1GB speeds simultaneously to all nodes. > Is you have replica 3 and set a 3 nic bonded interface with balance-alb on the gluster client, you are able to use the 3 nics simultaneously writing at 1gb on each node. ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Custom Transport layers
On Fri, Oct 28, 2016 at 6:20 PM, Lindsay Mathieson < lindsay.mathie...@gmail.com> wrote: > Is it possible to write custom transport layers for gluster?, data > transfer, not the management protocols. Pointers to the existing code > and/or docs :) would be helpful > > > I'd like to experiment with broadcast udp to see if its feasible in local > networks. Another thing to consider here is ordering of messages (sent over transport). If Broadcast udp doesn't support ordering of messages (I know udp doesn't, assuming broadcast udp doesn't too, but I may be wrong). If it doesn't, you've to build ordering logic on top of it. If transport layer doesn't provide ordering, we cannot reason about consistency of data stored on filesystem. > It would be amazing if we could write at 1GB speeds simultaneously to all > nodes. > > > Alternatively let me know if this has been tried and discarded as a bad > idea ... > > thanks, > > -- > Lindsay Mathieson > > ___ > Gluster-devel mailing list > Gluster-devel@gluster.org > http://www.gluster.org/mailman/listinfo/gluster-devel > -- Raghavendra G ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Custom Transport layers
> Hmmm, I never considered that side of things. I guess I had a somewhat > naive vision of packets floating through the ethernet visible to all > interfaces, but switched based networks are basically a star topology. > Are you saying the switch would likely be the choke point here? Not necessarily. Switches can do this fan-out more efficiently than any general-purpose machine could, so they're not going to get choked up, but the benefit's not that great either. The fact that one send from the client can consume resources on N server-side channels can also exacerbate problems like TCP incast collapse (in much the same way that similar "amplification" is the basis for many DoS attacks). Broadcast/multicast can still be useful for many things, but IMO not those that are both throughput- and latency-sensitive like we are. ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Custom Transport layers
On 29/10/2016 12:46 AM, Jeff Darcy wrote: In a modern switched network, the savings are only on the sender side; the switch has to copy the packet to N receiver ports anyway. Hmmm, I never considered that side of things. I guess I had a somewhat naive vision of packets floating through the ethernet visible to all interfaces, but switched based networks are basically a star topology. Are you saying the switch would likely be the choke point here? -- Lindsay Mathieson ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Custom Transport layers
> Is it possible to write custom transport layers for gluster?, data > transfer, not the management protocols. Pointers to the existing code > and/or docs :) would be helpful Is it *possible*? Yes. Is it easy or well documented? Definitely no. The two transports we have - TCP/UNIX-domain sockets and IB RDMA - are both in rpc/rpc-transport in the source tree. They need to interact with several other pieces: generic RPC layer (rpc/rpc-lib) event polling (event_register and friends) server and client translators (xlators/protocol) authentication pseudo-translators (xlators/protocol/auth) Unfortunately neither of the examples we have are well documented, internally or externally, so a certain amount of reverse engineering will be necessary to understand these interfaces. > I'd like to experiment with broadcast udp to see if its feasible in > local networks. It would be amazing if we could write at 1GB speeds > simultaneously to all nodes. That particular idea involves some extra complexity. Our current communications model is all point-to-point request/response. Any kind of broadcast or multicast would therefore involve changing how we "think" about addressing. How does the user specify a multicast group? How do we generate a client volfile with one multicast client instead of several unicast ones? How do we track multiple acknowledgements to a single outbound message, so that we can enforce quorum and consistency? That's going to affect AFR as well as the other components mentioned (neither EC nor JBR could take advantage of this). How do we track which file descriptors are still valid on the servers, and which need to be recovered? > Alternatively let me know if this has been tried and discarded as a bad > idea ... I'm not saying it's a bad idea, but it's quite a departure from the communications model we have now. In a modern switched network, the savings are only on the sender side; the switch has to copy the packet to N receiver ports anyway. Server-side replication has that same advantage, plus it can use a separate (often faster) network for all but that first hop. If you want to help us improve traffic flows, that's where I'd suggest most effort should be spent. ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
[Gluster-devel] Custom Transport layers
Is it possible to write custom transport layers for gluster?, data transfer, not the management protocols. Pointers to the existing code and/or docs :) would be helpful I'd like to experiment with broadcast udp to see if its feasible in local networks. It would be amazing if we could write at 1GB speeds simultaneously to all nodes. Alternatively let me know if this has been tried and discarded as a bad idea ... thanks, -- Lindsay Mathieson ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel