Raj,
Thanks for the insight.
It looks like it was the buffer size. The rx buffer was increased on the
lustre nodes and there have been no more dropped packets.
Brian Andrus
On 12/5/2017 11:12 AM, Raj wrote:
Brian,
I would check the following:
- MTU size must be same across all the nodes (servers + client)
- peer_credit and credit must be same across all the nodes
- /proc/sys/lnet/peers can show if you are constantly seeing negative
credits
- Buffer overflow counters on the switches if it provide. If the
buffer size is low to handle IO stream, you may want to reduce credits.
-Raj
On Tue, Dec 5, 2017 at 11:56 AM Brian Andrus <toomuc...@gmail.com
<mailto:toomuc...@gmail.com>> wrote:
Shawn,
Flow control is configured and these connections are all on the
same 40g subnet and all directly connected to the same switch.
I'm a little new with using lnet_selftest, but as I run it 1:1, I
do see the dropped packets go up on the client node pretty
significantly when I run it. The node I set for server does not
drop any packets.
Brian Andrus
On 12/5/2017 9:20 AM, Shawn Hall wrote:
Hi Brian,
Do you have flow control configured on all ports that are on the
network path? Lustre has a tendency to cause packet losses in
ways that performance testing tools don’t because of the N to 1
packet flows, so flow control is often necessary. Lnet_selftest
should replicate this behavior.
Is there a point in the network path where the link bandwidth
changes (e.g. 40 GbE down to 10 GbE, or 2x40 GbE down to 1x40
GbE)? That will commonly be the biggest point of loss if flow
control isn’t doing its job.
Shawn
On 12/5/17, 11:49 AM, "lustre-discuss on behalf of
jongwoo...@naver.com <mailto:jongwoo...@naver.com>"
<lustre-discuss-boun...@lists.lustre.org on behalf of
jongwoo...@naver.com>
<mailto:lustre-discuss-bounces@lists.lustre.orgonbehalfofjongwoo...@naver.com>
wrote:
Did you check your connection with iperf and iperf3 in TCP
bandwidth? in that case, these tools cannot find out packet drops.
Try checking out your block device backend responsibility with
benchmark tools like vdbench or bonnie++. Sometimes bad block
device causes incorrect data transfer.
-----Original Message-----
From: "Brian Andrus"<toomuc...@gmail.com>
<mailto:toomuc...@gmail.com>
To: "lustre-discuss@lists.lustre.org"
<mailto:lustre-discuss@lists.lustre.org><lustre-discuss@lists.lustre.org>
<mailto:lustre-discuss@lists.lustre.org>;
Cc:
Sent: 2017-12-06 (수) 01:38:04
Subject: [lustre-discuss] lustre causing dropped packets
All,
I have a small setup I am testing (1 MGS, 2 OSS) that is
connected via
40G ethernet.
I notice that when I run anything that writes to the lustre
filesystem
causes dropped packets. Reads do not seem to cause this. I have also
tested the network (iperf, iperf3, general traffic) with no
dropped packets.
Is there something with writes that can cause dropped packets?
Brian Andrus
_______________________________________________
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
<mailto:lustre-discuss@lists.lustre.org>
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
_______________________________________________
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
<mailto:lustre-discuss@lists.lustre.org>
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
*Disclaimer*
This e-mail has been scanned for all viruses and malware, and may
have been automatically archived by Mimecast Ltd, an innovator in
Software as a Service (SaaS) for business.
_______________________________________________
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
<mailto:lustre-discuss@lists.lustre.org>
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
_______________________________________________
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org