Re: [Gluster-devel] Custom Transport layers

2016-10-31 Thread Gandalf Corvotempesta
2016-10-31 12:40 GMT+01:00 Lindsay Mathieson :
> But you can broadcast with UDP - one packet of data through one nic to all
> nodes, so in theory you could broadcast 1GB *per nic* or 3GB via three nics.
> Minus overhead for acks, nacks and ordering :)
>
> But I'm not sure it would work at all in practice now through a switch.

I don't like this idea.
I stil prefere a properly configured bonding. There is a bonding mode
that does exactly this.
Probably, also balance-xor and active-tld could do the trick
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Custom Transport layers

2016-10-31 Thread Jeff Darcy
Another thought that just occurred to me: security.  There's no 
broadcast/unicast equivalent of TLS, so you're not going to have that 
protection.  Maybe it doesn't matter in some kinds of deployments, but in 
others it would matter very much.  Also, a similarly-secure broadcast/multicast 
protocol would be a really awesome research project.
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Custom Transport layers

2016-10-31 Thread Lindsay Mathieson

On 31/10/2016 5:56 PM, Gandalf Corvotempesta wrote:


> I'd like to experiment with broadcast udp to see if its feasible in 
local networks. It would be amazing if we could write at 1GB speeds 
simultaneously to all nodes.

>

Is you have replica 3 and set a 3 nic bonded interface with 
balance-alb on the gluster client,  you are able to use the 3 nics 
simultaneously writing at 1gb on each node.




But you can broadcast with UDP - one packet of data through one nic to 
all nodes, so in theory you could broadcast 1GB *per nic* or 3GB via 
three nics. Minus overhead for acks, nacks and ordering :)



But I'm not sure it would work at all in practice now through a switch.

--
Lindsay Mathieson

___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Custom Transport layers

2016-10-31 Thread Lindsay Mathieson

On 31/10/2016 5:56 PM, Gandalf Corvotempesta wrote:
Is you have replica 3 and set a 3 nic bonded interface with 
balance-alb on the gluster client,  you are able to use the 3 nics 
simultaneously writing at 1gb on each node.


Actually all you need is two nics, so each node can use 2 nics for the 
other two nodes.



I actually have three nics per node, currently two bonded with 
balance-alb per node and I do indeed max out a 1G connection with Jumbo 
frames. A VM tops out at 120MB/s in seq writes.


I did experiment with 3 nics bonded with balance-rr and managed to get 
2.4Gbs throughput, balance-rr doesn't do to well with bonds bigger than 2.


Unfortunately I need a private IP for gluster and a bridge for the VM's 
and I could only get OpenVSwitch to bond three nics to a bridge and a 
extra IP and OVS doesn't support balance-alb.


--
Lindsay Mathieson

___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Custom Transport layers

2016-10-31 Thread Raghavendra Gowdappa


- Original Message -
> From: "Raghavendra G" 
> To: "Lindsay Mathieson" 
> Cc: "Gluster Devel" 
> Sent: Monday, October 31, 2016 11:45:15 AM
> Subject: Re: [Gluster-devel] Custom Transport layers
> 
> 
> 
> On Fri, Oct 28, 2016 at 6:20 PM, Lindsay Mathieson <
> lindsay.mathie...@gmail.com > wrote:
> 
> 
> Is it possible to write custom transport layers for gluster?, data transfer,
> not the management protocols. Pointers to the existing code and/or docs :)
> would be helpful
> 
> 
> I'd like to experiment with broadcast udp to see if its feasible in local
> networks.
> 
> Another thing to consider here is ordering of messages (sent over transport).
> If Broadcast udp doesn't support ordering of messages (I know udp doesn't,
> assuming broadcast udp doesn't too, but I may be wrong). If it doesn't,
> you've to build ordering logic on top of it. If transport layer doesn't
> provide ordering, we cannot reason about consistency of data stored on
> filesystem.

Last time when I thought about UDP vs TCP, I seemed to have stumbled upon 
use-cases where maintaining ordering of messages was necessary. However, now 
that I think more about it, higher layers (like write-behind, open-behind etc) 
maintain order wherever required. So, I am not sure whether ordering of 
messages is a primary requirement when choosing a transport. I'll post more if 
I find anything worthwhile.

> 
> 
> It would be amazing if we could write at 1GB speeds simultaneously to all
> nodes.
> 
> 
> Alternatively let me know if this has been tried and discarded as a bad idea
> ...
> 
> thanks,
> 
> --
> Lindsay Mathieson
> 
> ___
> Gluster-devel mailing list
> Gluster-devel@gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-devel
> 
> 
> 
> --
> Raghavendra G
> 
> ___
> Gluster-devel mailing list
> Gluster-devel@gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-devel
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Custom Transport layers

2016-10-31 Thread Gandalf Corvotempesta
Il 28 ott 2016 2:50 PM, "Lindsay Mathieson" 
ha scritto:
>
> I'd like to experiment with broadcast udp to see if its feasible in local
networks. It would be amazing if we could write at 1GB speeds
simultaneously to all nodes.
>

Is you have replica 3 and set a 3 nic bonded interface with balance-alb on
the gluster client,  you are able to use the 3 nics simultaneously writing
at 1gb on each node.
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] Custom Transport layers

2016-10-30 Thread Raghavendra G
On Fri, Oct 28, 2016 at 6:20 PM, Lindsay Mathieson <
lindsay.mathie...@gmail.com> wrote:

> Is it possible to write custom transport layers for gluster?, data
> transfer, not the management protocols. Pointers to the existing code
> and/or docs :) would be helpful
>
>
> I'd like to experiment with broadcast udp to see if its feasible in local
> networks.


Another thing to consider here is ordering of messages (sent over
transport). If Broadcast udp doesn't support ordering of messages (I know
udp doesn't, assuming broadcast udp doesn't too, but I may be wrong). If it
doesn't, you've to build ordering logic on top of it. If transport layer
doesn't provide ordering, we cannot reason about consistency of data stored
on filesystem.


> It would be amazing if we could write at 1GB speeds simultaneously to all
> nodes.
>
>
> Alternatively let me know if this has been tried and discarded as a bad
> idea ...
>
> thanks,
>
> --
> Lindsay Mathieson
>
> ___
> Gluster-devel mailing list
> Gluster-devel@gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-devel
>



-- 
Raghavendra G
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] Custom Transport layers

2016-10-29 Thread Jeff Darcy
> Hmmm, I never considered that side of things. I guess I had a somewhat
> naive vision of packets floating through the ethernet visible to all
> interfaces, but switched based networks are basically a star topology.
> Are you saying the switch would likely be the choke point here?

Not necessarily.  Switches can do this fan-out more efficiently than
any general-purpose machine could, so they're not going to get choked
up, but the benefit's not that great either.  The fact that one send
from the client can consume resources on N server-side channels can
also exacerbate problems like TCP incast collapse (in much the same
way that similar "amplification" is the basis for many DoS attacks).
Broadcast/multicast can still be useful for many things, but IMO not
those that are both throughput- and latency-sensitive like we are.
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Custom Transport layers

2016-10-28 Thread Lindsay Mathieson

On 29/10/2016 12:46 AM, Jeff Darcy wrote:

In a modern switched network, the
savings are only on the sender side; the switch has to copy the
packet to N receiver ports anyway.


Hmmm, I never considered that side of things. I guess I had a somewhat 
naive vision of packets floating through the ethernet visible to all 
interfaces, but switched based networks are basically a star topology. 
Are you saying the switch would likely be the choke point here?


--
Lindsay Mathieson

___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Custom Transport layers

2016-10-28 Thread Jeff Darcy
> Is it possible to write custom transport layers for gluster?, data
> transfer, not the management protocols. Pointers to the existing code
> and/or docs :) would be helpful

Is it *possible*?  Yes.  Is it easy or well documented?  Definitely no.
The two transports we have - TCP/UNIX-domain sockets and IB RDMA - are
both in rpc/rpc-transport in the source tree.  They need to interact
with several other pieces:

  generic RPC layer (rpc/rpc-lib)

  event polling (event_register and friends)

  server and client translators (xlators/protocol)

  authentication pseudo-translators (xlators/protocol/auth)

Unfortunately neither of the examples we have are well documented,
internally or externally, so a certain amount of reverse engineering
will be necessary to understand these interfaces.

> I'd like to experiment with broadcast udp to see if its feasible in
> local networks. It would be amazing if we could write at 1GB speeds
> simultaneously to all nodes.

That particular idea involves some extra complexity.  Our current
communications model is all point-to-point request/response.  Any
kind of broadcast or multicast would therefore involve changing
how we "think" about addressing.  How does the user specify a
multicast group?  How do we generate a client volfile with one
multicast client instead of several unicast ones?  How do we track
multiple acknowledgements to a single outbound message, so that we
can enforce quorum and consistency?  That's going to affect AFR as
well as the other components mentioned (neither EC nor JBR could
take advantage of this).  How do we track which file descriptors
are still valid on the servers, and which need to be recovered?

> Alternatively let me know if this has been tried and discarded as a bad
> idea ...

I'm not saying it's a bad idea, but it's quite a departure from the
communications model we have now.  In a modern switched network, the
savings are only on the sender side; the switch has to copy the
packet to N receiver ports anyway.  Server-side replication has that
same advantage, plus it can use a separate (often faster) network
for all but that first hop.  If you want to help us improve traffic
flows, that's where I'd suggest most effort should be spent.


___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


[Gluster-devel] Custom Transport layers

2016-10-28 Thread Lindsay Mathieson
Is it possible to write custom transport layers for gluster?, data 
transfer, not the management protocols. Pointers to the existing code 
and/or docs :) would be helpful



I'd like to experiment with broadcast udp to see if its feasible in 
local networks. It would be amazing if we could write at 1GB speeds 
simultaneously to all nodes.



Alternatively let me know if this has been tried and discarded as a bad 
idea ...


thanks,

--
Lindsay Mathieson

___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel