Hi Atin,

You’re right in saying if it’s activate then all nodes should have it activated.

What I find strange is that when glusterfsd has problems communicating with the 
other peers that that single node with issues isn’t considered “not connected” 
and thus expelled from the cluster somehow; in my case it caused a complete 
hang of the trusted storage pool.

And to emphasise this, pinging was no problem as it uses small packets anyway 
so jumbo frames were not used at all… enabling jumbo frames on the interface 
and switches is only a way to tell the TCP/IP stack that it can send larger 
packets but it does’t have to.

Or am I mistaking in that the TCP/IP stack will control wether to send the 
bigger packets and that glusterfsd has no control over that?

Met vriendelijke groet / kind regards,

Sander Zijlstra

| Linux Engineer | SURFsara | Science Park 140 | 1098XG Amsterdam | T +31 (0)6 
43 99 12 47 | sander.zijls...@surfsara.nl | www.surfsara.nl |

Regular day off on friday

> On 15 Oct 2015, at 08:24, Atin Mukherjee <amukh...@redhat.com> wrote:
> 
> 
> 
> On 10/14/2015 05:09 PM, Sander Zijlstra wrote:
>> LS,
>> 
>> I recently reconfigured one of my gluster nodes and forgot to update the MTU 
>> size on the switch while I did configure the host with jumbo frames.
>> 
>> The result was that the complete cluster had communication issues.
>> 
>> All systems are part of a distributed striped volume with a replica size of 
>> 2 but still the cluster was completely unusable until I updated the switch 
>> port to accept jumbo frames rather than to discard them.
> This is expected. When enabling the network components to communicate
> with TCP jumbo frames in a Gluster Trusted Storage Pool, you'd need to
> ensure that all the network components such as switches, nodes are
> configured properly. I think with this setting you'd fail to ping the
> other nodes in the pool. So that could be a step of verification before
> you set the cluster up.
>> 
>> The symptoms were:
>> 
>> - Gluster clients had a very hard time reading the volume information and 
>> thus couldn’t do any filesystem ops on them.
>> - The glusterfs servers could see each other (peer status) and a volume info 
>> command was ok, but a volume status command would not return or would return 
>> a “staging failed” error.
>> 
>> I know MTU size mixing and don’t fragment bit’s can screw up a lot but why 
>> wasn’t that gluster peer just discarded from the cluster so that not all 
>> clients kept on communicating with it and causing all sorts of errors.
> To answer this question, peer status & volume info are local operation
> and doesn't incur N/W, so in this very same case you might see peer
> status showing all the nodes are connected all though there is a
> breakage, OTOH in status command originator node communicates with other
> peers and hence it fails there.
> 
> HTH,
> Atin
>> 
>> I use glusterFS 3.6.2 at the moment…..
>> 
>> Kind regards
>> Sander
>> 
>> 
>> 
>> 
>> 
>> _______________________________________________
>> Gluster-users mailing list
>> Gluster-users@gluster.org
>> http://www.gluster.org/mailman/listinfo/gluster-users
>> 

Attachment: signature.asc
Description: Message signed with OpenPGP using GPGMail

_______________________________________________
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

Reply via email to