Re: [lustre-discuss] lustre causing dropped packets

2017-12-05 Thread Brian Andrus

Raj,

Thanks for the insight.
It looks like it was the buffer size. The rx buffer was increased on the 
lustre nodes and there have been no more dropped packets.


Brian Andrus




On 12/5/2017 11:12 AM, Raj wrote:

Brian,
I would check the following:
- MTU size must be same across all the nodes (servers + client)
- peer_credit and credit must be same across all the nodes
- /proc/sys/lnet/peers can show if you are constantly seeing negative 
credits
- Buffer overflow counters on the switches if it provide. If the 
buffer size is low to handle IO stream, you may want to reduce credits.


-Raj


On Tue, Dec 5, 2017 at 11:56 AM Brian Andrus > wrote:


Shawn,

Flow control is configured and these connections are all on the
same 40g subnet and all directly connected to the same switch.

I'm a little new with using lnet_selftest, but as I run it 1:1, I
do see the dropped packets go up on the client node pretty
significantly when I run it. The node I set for server does not
drop any packets.

Brian Andrus


On 12/5/2017 9:20 AM, Shawn Hall wrote:

Hi Brian,

Do you have flow control configured on all ports that are on the
network path? Lustre has a tendency to cause packet losses in
ways that performance testing tools don’t because of the N to 1
packet flows, so flow control is often necessary. Lnet_selftest
should replicate this behavior.

Is there a point in the network path where the link bandwidth
changes (e.g. 40 GbE down to 10 GbE, or 2x40 GbE down to 1x40
GbE)? That will commonly be the biggest point of loss if flow
control isn’t doing its job.

Shawn

On 12/5/17, 11:49 AM, "lustre-discuss on behalf of
jongwoo...@naver.com "



wrote:

Did you check your connection with iperf and iperf3 in TCP
bandwidth? in that case, these tools cannot find out packet drops.

Try checking out your block device backend responsibility with
benchmark tools like vdbench or bonnie++. Sometimes bad block
device causes incorrect data transfer.

-Original Message-
From: Brian Andrus

To: "lustre-discuss@lists.lustre.org"

;
Cc:
Sent: 2017-12-06 (수) 01:38:04
Subject: [lustre-discuss] lustre causing dropped packets

All,

I have a small setup I am testing (1 MGS, 2 OSS) that is
connected via
40G ethernet.

I notice that when I run anything that writes to the lustre
filesystem
causes dropped packets. Reads do not seem to cause this. I have also
tested the network (iperf, iperf3, general traffic) with no
dropped packets.

Is there something with writes that can cause dropped packets?


Brian Andrus

___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org

http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org

http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org



*Disclaimer*

This e-mail has been scanned for all viruses and malware, and may
have been automatically archived by Mimecast Ltd, an innovator in
Software as a Service (SaaS) for business.



___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org

http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org



___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] lustre causing dropped packets

2017-12-05 Thread Raj
Brian,
I would check the following:
- MTU size must be same across all the nodes (servers + client)
- peer_credit and credit must be same across all the nodes
- /proc/sys/lnet/peers can show if you are constantly seeing negative
credits
- Buffer overflow counters on the switches if it provide. If the buffer
size is low to handle IO stream, you may want to reduce credits.

-Raj


On Tue, Dec 5, 2017 at 11:56 AM Brian Andrus  wrote:

> Shawn,
>
> Flow control is configured and these connections are all on the same 40g
> subnet and all directly connected to the same switch.
>
> I'm a little new with using lnet_selftest, but as I run it 1:1, I do see
> the dropped packets go up on the client node pretty significantly when I
> run it. The node I set for server does not drop any packets.
>
> Brian Andrus
>
> On 12/5/2017 9:20 AM, Shawn Hall wrote:
>
> Hi Brian,
>
> Do you have flow control configured on all ports that are on the network
> path? Lustre has a tendency to cause packet losses in ways that performance
> testing tools don’t because of the N to 1 packet flows, so flow control is
> often necessary. Lnet_selftest should replicate this behavior.
>
> Is there a point in the network path where the link bandwidth changes
> (e.g. 40 GbE down to 10 GbE, or 2x40 GbE down to 1x40 GbE)? That will
> commonly be the biggest point of loss if flow control isn’t doing its job.
>
> Shawn
>
> On 12/5/17, 11:49 AM, "lustre-discuss on behalf of jongwoo...@naver.com" 
>  on behalf of jongwoo...@naver.com>
> 
> wrote:
>
> Did you check your connection with iperf and iperf3 in TCP bandwidth? in
> that case, these tools cannot find out packet drops.
>
> Try checking out your block device backend responsibility with benchmark
> tools like vdbench or bonnie++. Sometimes bad block device causes incorrect
> data transfer.
>
> -Original Message-
> From: Brian Andrus 
> To: "lustre-discuss@lists.lustre.org" 
>  ;
> Cc:
> Sent: 2017-12-06 (수) 01:38:04
> Subject: [lustre-discuss] lustre causing dropped packets
>
> All,
>
> I have a small setup I am testing (1 MGS, 2 OSS) that is connected via
> 40G ethernet.
>
> I notice that when I run anything that writes to the lustre filesystem
> causes dropped packets. Reads do not seem to cause this. I have also
> tested the network (iperf, iperf3, general traffic) with no dropped
> packets.
>
> Is there something with writes that can cause dropped packets?
>
>
> Brian Andrus
>
> ___
> lustre-discuss mailing list
> lustre-discuss@lists.lustre.org
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
> ___
> lustre-discuss mailing list
> lustre-discuss@lists.lustre.org
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>
>
>
> *Disclaimer*
>
> This e-mail has been scanned for all viruses and malware, and may have
> been automatically archived by Mimecast Ltd, an innovator in Software as a
> Service (SaaS) for business.
>
>
> ___
> lustre-discuss mailing list
> lustre-discuss@lists.lustre.org
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] lustre causing dropped packets

2017-12-05 Thread Brian Andrus

Shawn,

Flow control is configured and these connections are all on the same 40g 
subnet and all directly connected to the same switch.


I'm a little new with using lnet_selftest, but as I run it 1:1, I do see 
the dropped packets go up on the client node pretty significantly when I 
run it. The node I set for server does not drop any packets.


Brian Andrus


On 12/5/2017 9:20 AM, Shawn Hall wrote:

Hi Brian,

Do you have flow control configured on all ports that are on the 
network path? Lustre has a tendency to cause packet losses in ways 
that performance testing tools don’t because of the N to 1 packet 
flows, so flow control is often necessary. Lnet_selftest should 
replicate this behavior.


Is there a point in the network path where the link bandwidth changes 
(e.g. 40 GbE down to 10 GbE, or 2x40 GbE down to 1x40 GbE)? That will 
commonly be the biggest point of loss if flow control isn’t doing its job.


Shawn

On 12/5/17, 11:49 AM, "lustre-discuss on behalf of 
jongwoo...@naver.com"  wrote:


Did you check your connection with iperf and iperf3 in TCP bandwidth? 
in that case, these tools cannot find out packet drops.


Try checking out your block device backend responsibility with 
benchmark tools like vdbench or bonnie++. Sometimes bad block device 
causes incorrect data transfer.


-Original Message-
From: Brian Andrus
To: "lustre-discuss@lists.lustre.org";
Cc:
Sent: 2017-12-06 (수) 01:38:04
Subject: [lustre-discuss] lustre causing dropped packets

All,

I have a small setup I am testing (1 MGS, 2 OSS) that is connected via
40G ethernet.

I notice that when I run anything that writes to the lustre filesystem
causes dropped packets. Reads do not seem to cause this. I have also
tested the network (iperf, iperf3, general traffic) with no dropped 
packets.


Is there something with writes that can cause dropped packets?


Brian Andrus

___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org 
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org 
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org



*Disclaimer*

This e-mail has been scanned for all viruses and malware, and may have 
been automatically archived by Mimecast Ltd, an innovator in Software 
as a Service (SaaS) for business.




___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] lustre causing dropped packets

2017-12-05 Thread Shawn Hall
Hi Brian,

Do you have flow control configured on all ports that are on the network path?  
Lustre has a tendency to cause packet losses in ways that performance testing 
tools don’t because of the N to 1 packet flows, so flow control is often 
necessary.  Lnet_selftest should replicate this behavior.

Is there a point in the network path where the link bandwidth changes (e.g. 40 
GbE down to 10 GbE, or 2x40 GbE down to 1x40 GbE)?  That will commonly be the 
biggest point of loss if flow control isn’t doing its job.

Shawn

On 12/5/17, 11:49 AM, "lustre-discuss on behalf of jongwoo...@naver.com" 
 
wrote:

Did you check your connection with iperf and iperf3 in TCP bandwidth? in 
that case, these tools cannot find out packet drops.

Try checking out your block device backend responsibility with benchmark 
tools like vdbench or bonnie++. Sometimes bad block device causes incorrect 
data transfer.

-Original Message-
From: Brian Andrus 
To: "lustre-discuss@lists.lustre.org"; 
Cc: 
Sent: 2017-12-06 (수) 01:38:04
Subject: [lustre-discuss] lustre causing dropped packets
 
All,

I have a small setup I am testing (1 MGS, 2 OSS) that is connected via 
40G ethernet.

I notice that when I run anything that writes to the lustre filesystem 
causes dropped packets. Reads do not seem to cause this. I have also 
tested the network (iperf, iperf3, general traffic) with no dropped packets.

Is there something with writes that can cause dropped packets?


Brian Andrus

___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

This e-mail has been scanned for all viruses and malware, and may have been 
automatically archived by Mimecast Ltd, an innovator in Software as a Service 
(SaaS) for business
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] quota: space accounting isn't functional

2017-12-05 Thread Thomas Roth

Hmm, stupid me - I had only some files belonging to user 'root' on that 
filesystem.
Checking the quota of a user when there are no user-files does not work.

So for the record: no quota, no space accounting if there is nothing to account 
for.


TLDR:
I have since reformatted, issued "lctl conf_param hebetest.quota.mdt=ug" and "lctl conf_param 
hebetest.quota.ost=ug", just to see these error messages again.


For the fun of it, I did "lctl conf_param hebetest.quota.mdt=u" and "lctl conf_param 
hebetest.quota.ost=u", which did not provoke these messages again.


I asked for my quota
--
lfs quota -v -u troth /lustre/testhebe
Disk quotas for usr troth :
 Filesystem  kbytes   quota   limit   grace   files   quota   limit   grace
/lustre/testhebe
[0]   0   0   -   0   0   0   -
hebetest-MDT_UUID
  0   -   0   -   0   -   0   -
quotactl ost0 failed.
quotactl ost1 failed.
Total allocated inode limit: 0, total allocated block limit: 0
Some errors happened when getting quota info. Some devices may be not working or deactivated. The data 
in "[]" is inaccurate.



"quotactl failed" never happens/is never explained?


Since measuring a zero is always tricky business, I put two directories pinned to my two OSTs and 
wrote one file each _as user troth_.


Now I get

-
lfs quota -v -u troth /lustre/testhebe
Disk quotas for usr troth (uid 4128):
 Filesystem  kbytes   quota   limit   grace   files   quota   limit   grace
/lustre/testhebe
1525409   0   0   -   4   0   0   -
hebetest-MDT_UUID
  8   -   0   -   4   -   0   -
hebetest-OST_UUID
 704780   -   0   -   -   -   -   -
hebetest-OST0001_UUID
 820622   -   0   -   -   -   -   -
Total allocated inode limit: 0, total allocated block limit: 0


(and of course the group quota also works).


Cheers,
Thomas

On 11/17/2017 03:51 PM, Thomas Roth wrote:

Hi all,

I have this test system where the OSS are CentOS 7.4, ZFS 0.7.1, the MDS uses ldiskfs. Lustre version 
= 2.10


When I check the quota of some user - "lfs quota -u troth /lustre/hebetest" - I'm told by the client 
that the data may be inaccurate, log entry is

 > LustreError: 10006:0:(osc_quota.c:291:osc_quotactl()) ptlrpc_queue_wait 
failed, rc: -2
  while the OSS says
 > can't enable quota enforcement since space accounting isn't functional. Please run tunefs.lustre 
--quota ...


osd-*.*.quota_slave.info reads

target name:    hebetest-OST000..
pool ID:    0
type:   dt
quota enabled:  none
conn to master: setup
space acct: ug
user uptodate:  glb[0],slv[0],reint[0]
group uptodate: glb[0],slv[0],reint[0]
project uptodate: glb[0],slv[0],reint[0]

everywhere.
I have switched on quota enforcement in the meantime, so it is
quota enabled:  ug
by now. As expected, no impact.

But, "Space accounting" -  should just be there, once I have a ZFS backend?

rc: -2  = /* No such file or directory */   - which file is missing?
Did I manage to delete a quota file on the OSTs? How did I do that?


The "isn't functional" error message shows up in LU-9790, but that is about 
project quota.


Regards,
Thomas
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] lustre causing dropped packets

2017-12-05 Thread jongwoohan
Did you check your connection with iperf and iperf3 in TCP bandwidth? in that 
case, these tools cannot find out packet drops.

Try checking out your block device backend responsibility with benchmark tools 
like vdbench or bonnie++. Sometimes bad block device causes incorrect data 
transfer.

-Original Message-
From: Brian Andrus 
To: "lustre-discuss@lists.lustre.org"; 
Cc: 
Sent: 2017-12-06 (수) 01:38:04
Subject: [lustre-discuss] lustre causing dropped packets
 
All,

I have a small setup I am testing (1 MGS, 2 OSS) that is connected via 
40G ethernet.

I notice that when I run anything that writes to the lustre filesystem 
causes dropped packets. Reads do not seem to cause this. I have also 
tested the network (iperf, iperf3, general traffic) with no dropped packets.

Is there something with writes that can cause dropped packets?


Brian Andrus

___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


[lustre-discuss] lustre causing dropped packets

2017-12-05 Thread Brian Andrus

All,

I have a small setup I am testing (1 MGS, 2 OSS) that is connected via 
40G ethernet.


I notice that when I run anything that writes to the lustre filesystem 
causes dropped packets. Reads do not seem to cause this. I have also 
tested the network (iperf, iperf3, general traffic) with no dropped packets.


Is there something with writes that can cause dropped packets?


Brian Andrus

___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org