Re: [lustre-discuss] Hidden QoS in Lustre ?

2020-10-29 Thread Tung-Han Hsieh
Dear Simon,

Thank you very much for your useful information. Now we are arranging
the system maintenance date in order to upgrade to Lustre-2.12.5. Then
we will follow your suggestion to see whether this problem could be
fixed.

Here I report a test of under continuous I/O, how the cur_grant_bytes
changed overtime. Again the client runs the following script for
continuous reading in the background:

# The Lustre file system was mounted under /home
while [ 1 ]; do
tar cf - /home/large/data | ssh remote_host "cat > /dev/null"
done

And every 20 mins, in the same client we copied a 600MB file from one
directory to another within Lustre, and check the "cur_grant_bytes" by
the following command running in the same client:

/opt/lustre/sbin/lctl get_param osc.*.cur_grant_bytes

The result is (every line separated by around 20 mins):

osc.chome-OST-osc-88a03915.cur_grant_bytes=1880752127
osc.chome-OST-osc-88a03915.cur_grant_bytes=1410564096
osc.chome-OST-osc-88a03915.cur_grant_bytes=1059201024
osc.chome-OST-osc-88a03915.cur_grant_bytes=794400768
osc.chome-OST-osc-88a03915.cur_grant_bytes=595800576
osc.chome-OST-osc-88a03915.cur_grant_bytes=446850432
osc.chome-OST-osc-88a03915.cur_grant_bytes=335137824
osc.chome-OST-osc-88a03915.cur_grant_bytes=251353368
osc.chome-OST-osc-88a03915.cur_grant_bytes=188515026
osc.chome-OST-osc-88a03915.cur_grant_bytes=141386270
osc.chome-OST-osc-88a03915.cur_grant_bytes=106039703
osc.chome-OST-osc-88a03915.cur_grant_bytes=79529778
osc.chome-OST-osc-88a03915.cur_grant_bytes=59647334
osc.chome-OST-osc-88a03915.cur_grant_bytes=44735501
osc.chome-OST-osc-88a03915.cur_grant_bytes=33551626
osc.chome-OST-osc-88a03915.cur_grant_bytes=25163720
osc.chome-OST-osc-88a03915.cur_grant_bytes=18872790
osc.chome-OST-osc-88a03915.cur_grant_bytes=14154593
osc.chome-OST-osc-88a03915.cur_grant_bytes=10615945
osc.chome-OST-osc-88a03915.cur_grant_bytes=7961959
osc.chome-OST-osc-88a03915.cur_grant_bytes=5971470
osc.chome-OST-osc-88a03915.cur_grant_bytes=4478603
osc.chome-OST-osc-88a03915.cur_grant_bytes=3358953
osc.chome-OST-osc-88a03915.cur_grant_bytes=2519215
osc.chome-OST-osc-88a03915.cur_grant_bytes=1889412
osc.chome-OST-osc-88a03915.cur_grant_bytes=1417059
osc.chome-OST-osc-88a03915.cur_grant_bytes=1062795
osc.chome-OST-osc-88a03915.cur_grant_bytes=797097
osc.chome-OST-osc-88a03915.cur_grant_bytes=797097


The value 797097 seems to be the minimum. When it dropped to 1062795,
the time of cp dramatically increased from around 1 sec to 1 min. In
addition, during the test, the cluster is completely idling. And it
is obvious that this test does not saturate the loading of both network
and MDT / OST hardware (they have almost no loading).

I am wondering whether this could be a bug to report to the development
team.

Best Regards,

T.H.Hsieh

On Thu, Oct 29, 2020 at 09:49:42AM -0400, Simon Guilbault wrote:
> Our current workaround was to use the following command on the MGS with
> Lustre 2.12.5 that include the patches in LU-12651 and LU-12759 (we were
> using a patched 2.12.4 a few months ago):
> lctl set_param -P osc.*.grant_shrink=0
> 
> We could not find the root cause of the underlying problem, dynamic grant
> shrinking seems to be useful when the OSTs are running out of free space.
> 
> On Wed, Oct 28, 2020 at 11:47 PM Tung-Han Hsieh <
> thhs...@twcp1.phys.ntu.edu.tw> wrote:
> 
> > Dear Simon,
> >
> > Thank you very much for your hint. Yes, you are right. We compared
> > the grant size of two client by (running in each client):
> >
> > lctl get_param osc.*.cur_grant_bytes
> >
> > - Client A: It has run the following large data transfer for over 36 hrs.
> >
> > while [ 1 ]; do
> > tar cf - /home/large/data | ssh remote_host "cat > /dev/null"
> > done
> >
> >   The value of "cur_grant_bytes" is 796134.
> >
> > - Client B: It is almost idling during the action of Client A.
> >
> >   The value of "cur_grant_bytes" is 1715863552.
> >
> > If this is the reason that hit the I/O performance of Client A seriously,
> > is it possible to maintain it at a constant value at least for the head
> > node (since the head node is the most probable one to have large and long
> > time data I/O of the whole cluster, especially for a data center) ?
> >
> > I would be also like to ask: Why this value has to be dynamically adjusted
> > ?
> >
> > Thank you very much for your comment in advance.
> >
> > Best Regards,
> >
> > T.H.Hsieh
> >
> > On Wed, Oct 28, 2020 at 02:00:21PM -0400, Simon Guilbault wrote:
> > > Hi, we had a similar performance problem on our login/DTNs node a few
> > > months ago, the problem was the gran

Re: [lustre-discuss] Hidden QoS in Lustre ?

2020-10-29 Thread Simon Guilbault
Our current workaround was to use the following command on the MGS with
Lustre 2.12.5 that include the patches in LU-12651 and LU-12759 (we were
using a patched 2.12.4 a few months ago):
lctl set_param -P osc.*.grant_shrink=0

We could not find the root cause of the underlying problem, dynamic grant
shrinking seems to be useful when the OSTs are running out of free space.

On Wed, Oct 28, 2020 at 11:47 PM Tung-Han Hsieh <
thhs...@twcp1.phys.ntu.edu.tw> wrote:

> Dear Simon,
>
> Thank you very much for your hint. Yes, you are right. We compared
> the grant size of two client by (running in each client):
>
> lctl get_param osc.*.cur_grant_bytes
>
> - Client A: It has run the following large data transfer for over 36 hrs.
>
> while [ 1 ]; do
> tar cf - /home/large/data | ssh remote_host "cat > /dev/null"
> done
>
>   The value of "cur_grant_bytes" is 796134.
>
> - Client B: It is almost idling during the action of Client A.
>
>   The value of "cur_grant_bytes" is 1715863552.
>
> If this is the reason that hit the I/O performance of Client A seriously,
> is it possible to maintain it at a constant value at least for the head
> node (since the head node is the most probable one to have large and long
> time data I/O of the whole cluster, especially for a data center) ?
>
> I would be also like to ask: Why this value has to be dynamically adjusted
> ?
>
> Thank you very much for your comment in advance.
>
> Best Regards,
>
> T.H.Hsieh
>
> On Wed, Oct 28, 2020 at 02:00:21PM -0400, Simon Guilbault wrote:
> > Hi, we had a similar performance problem on our login/DTNs node a few
> > months ago, the problem was the grant size was shrinking and was getting
> > stuck under 1MB. Once under 1MB, the client had to send every request to
> > the OST using sync IO.
> >
> > Check the output of the following command:
> > lctl get_param osc.*.cur_grant_bytes
> >
> > On Wed, Oct 28, 2020 at 12:08 AM Tung-Han Hsieh <
> > thhs...@twcp1.phys.ntu.edu.tw> wrote:
> >
> > > Dear All,
> > >
> > > Sorry that I am not sure whether this mail was successfully posted to
> > > the lustre-discuss mailing list or not. So I resent it again. Please
> > > ignore it if you already read it before.
> > >
> > >
> ===
> > >
> > > Dear Andreas,
> > >
> > > Thank you very much for your kindly suggestions. These days I got a
> chance
> > > to follow your suggestions for the test. This email is to report the
> > > results
> > > I have done so far. What I have done were:
> > >
> > > 1. Upgrade one client (with Infiniband) to Lustre 2.13.56_44_gf8a8d3f
> > >(obtained from github). The compiling information is:
> > >
> > >- Linux kernel 4.19.123.
> > >- Infiniband MLNX_OFED_SRC-4.6-1.0.1.1.
> > >- ./configure --prefix=/opt/lustre \
> > >  --with-o2ib=/path/of/mlnx-ofed-kernel-4.6 \
> > >  --disable-server --enable-mpitests=no
> > >- make
> > >- make install
> > >
> > > 2. We mounted the lustre file system (lustre MDT/OST servers: version
> > >2.12.4 with Infiniband with ZFS backend) by this command:
> > >
> > >- mount -t lustre -o flock mdt@o2ib:/chome /home
> > >
> > > 3. The script to simulate large data transfer is following:
> > >(the directory "/home/large/data" contains 758 files, each size
> 600MB)
> > >
> > >while [ 1 ]; do
> > >tar cf - /home/large/data | ssh remote_host "cat > /dev/null"
> > >done
> > >
> > >ps. Note that this scenario is common in a large data center, while
> > >some users transferring large data out of the data center
> through
> > >the head node; while other users might copy files and do their
> > >normal works in the same head node.
> > >
> > > 4. During the data transfer in the background, I occationally ran this
> > >command in the same client to test whether there is any abnormality
> > >in I/O performance (where /home/dir1/file has size 600MB):
> > >
> > >cp /home/dir1/file /home/dir2/
> > >
> > >In the beginning this command can complete in about 1 sec. But after
> > >around 18 hours (not exactly, because the test ran overnight while
> > >I was sleeping), the problem appeared. The time to complete the same
> > >cp command was more than 1 minute.
> > >
> > >During the test, I am sure that the whole cluster was idling. The
> MDT
> > >and OST servers did not have other loading. The CPU usage of the
> testing
> > >client was below 0.3.
> > >
> > >Then I stopped the test, and let the whole system completely idle.
> But
> > >after 3 hours, the I/O abnormality of the same "cp" command was
> still
> > >there. Only after I unmounted /home and remounted /home, the
> abnormality
> > >of "cp" recovered to normal.
> > >
> > > Before and after remounting /home (which I call "reset"), I did the
> > > following tests:
> > >
> > > 1. Using "top" to check the memory usag

Re: [lustre-discuss] ZFS atime is it required?

2020-10-29 Thread Andreas Dilger
On Oct 23, 2020, at 14:03, Kumar, Amit 
mailto:ahku...@mail.smu.edu>> wrote:

Dear All,

Quick question, can I get away by setting “zfs set atime=off 
on_all_my_voulmes_mgt_mdt_and_osts” ? I ask this as it is noted to be 
performance boosting tip with the assumption filesystems(Lustre) handles all 
access times?

You don't really need atime enabled on the OSTs, but I also don't think 
"atime=off" will make any difference.  That is a VFS/ZPL level option, and 
Lustre osd-zfs doesn't use any of the ZPL code, but rather handles atime 
internally.

Cheers, Andreas
--
Andreas Dilger
Principal Lustre Architect
Whamcloud






___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org