[lustre-discuss] confused about mdt space

2020-03-30 Thread 肖正刚
Hello, I have some question about metadata space.

1) I have ten 960GB SAS SSDs for mdt,after done raid10,we have 4.7TB space free.

after formated as mdt,we only have 2.6TB space free; so where the
2.1TB space go ?

2) for the 2.6TB space, what's it used for?
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


[lustre-discuss] Shrinking grant with 2.12 clients

2020-03-30 Thread Simon Guilbault
Hi,

We seem to be hitting a performance issue with Lustre clients 2.12.2 and
2.12.3. Over time, the grant size of the OSC is shrinking and getting under
1MB and does not grow back. This lowers the performance of this client to a
few MB/s, even in the kB/s for some OST. This does not seem to happen on
2.10.8 clients since they don’t have the “grant_shrink” flag. The servers
are running 2.12.3 with ZFS 0.7.9.

Here is what we can see as performance per OST with a simple dd test, the
worst OST is #5 with 222 kB/s. A client with 2.10 on the same OST is
reaching > 800MB/s.

for i in {0..37}; do lfs setstripe --ost $i --stripe-count 1 ost$i ; done

for i in {0..37}; do dd if=/dev/zero of=ost$i bs=1M count=100; done

104857600 bytes (105 MB) copied, 0.142473 s, 736 MB/s

104857600 bytes (105 MB) copied, 9.22021 s, 11.4 MB/s

104857600 bytes (105 MB) copied, 0.0905684 s, 1.2 GB/s

104857600 bytes (105 MB) copied, 6.36873 s, 16.5 MB/s

104857600 bytes (105 MB) copied, 0.0929602 s, 1.1 GB/s

104857600 bytes (105 MB) copied, 471.699 s, 222 kB/s

104857600 bytes (105 MB) copied, 0.177067 s, 592 MB/s

[...]


As an example, this slow client have a grant_size of 0.8MB after being up
for a while:

lctl get_param osc.lustre04-OST0005*.cur_grant_bytes

osc.lustre04-OST0005-osc-98128d818000.cur_grant_bytes=883028

In the debug logs, I can see a request sent as sync IO since the grant size
is now too small to contain the 1.7MB request

0008:0020:10.0:1585145743.107840:0:116122:0:(osc_cache.c:1590:osc_enter_cache())
lustre04-OST0005-osc-98128d818000: grant { dirty: 0/512000 dirty_pages:
448/24562964 dropped: 0 avail: 883028, dirty_grant: 0, reserved: 0, flight:
0 } lru {in list: 146368, left: 64, waiters: 0 }need:1703936

0008:0020:10.0:1585145743.107842:0:116122:0:(osc_cache.c:1539:osc_enter_cache_try())
lustre04-OST0005-osc-98128d818000: grant { dirty: 0/512000 dirty_pages:
448/24562964 dropped: 0 avail: 883028, dirty_grant: 0, reserved: 0, flight:
0 } lru {in list: 146368, left: 64, waiters: 0 }need:1703936

0008:0020:10.0:1585145743.107843:0:116122:0:(osc_cache.c:1666:osc_enter_cache())
lustre04-OST0005-osc-98128d818000: grant { dirty: 0/512000 dirty_pages:
448/24562964 dropped: 0 avail: 883028, dirty_grant: 0, reserved: 0, flight:
0 } lru {in list: 146368, left: 64, waiters: 0 }no grant space, fall back
to sync i/o

There is currently 30GB granted on a OST with about 22TB free.

[root@lustre04-oss1 ~]# lctl get_param
obdfilter/lustre04-OST0005/tot_granted

obdfilter.lustre04-OST0005.tot_granted=30257446912

Somehow, the client does not receive a bigger grant, so it seems to stay
forever under 1MB.

0008:0020:4.0:1585145743.107950:0:22701:0:(osc_request.c:705:osc_announce_cached())
dirty: 0 undirty: 2080374783 dropped 0 grant: 883028

0008:0020:14.0:1585145743.236923:0:22702:0:(osc_request.c:727:osc_update_grant())
got 0 extra grant

Is this a known issue ? I could not find a similar ticket in JIRA, but I do
see some references to disabling grant_shrink in LU-12651 and LU-12759.
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org