Re: [lustre-discuss] how to set max_pages_per_rpc (I have done something wrong and need help)

2017-11-23 Thread Harald van Pee
Hi,

I have done 
lctl set_param -d 
lctl conf_param -d 

on the mgs/mdt
after this all problems are gone and all clients are able to mount the lustre 
filesystem again.

Best
Harald

On Thursday 16 November 2017 15:10:43 Harald van Pee wrote:
> Thank you,
> 
> I will start next week and report what helps.
> Unfortunately in this cluster we have to wait with the upgrade to 2.10.1
> until next year.
> 
> Best whishes
> Harald
> 
> On Wednesday 15 November 2017 23:09:49 Dilger, Andreas wrote:
> > On Nov 15, 2017, at 12:56, Harald van Pee <p...@hiskp.uni-bonn.de> wrote:
> > > Hello Andreas,
> > > 
> > > thanks for your information, now I have the feeling I'm not completly
> > > lost. With erasing configuration parameters do you mean the
> > > writeconf procedure? (chapter 14.4)
> > 
> > Yes.
> > 
> > > Or is it possible to erase the unknown parameter?
> > 
> > You could try "lctl conf_param -d " to delete the parameter.
> > 
> > Cheers, Andreas
> > 
> > > On Wednesday 15 November 2017 20:37:20 Dilger, Andreas wrote:
> > >> The problem that Lustre clients fail to mount when they get an unknown
> > >> parameter is fixed in newer Lustre releases (2.9+) via patch
> > >> https://review.whamcloud.com/21112 .
> > >> 
> > >> The current maintenance release is 2.10.1.
> > >> 
> > >> You could also work around this by erasing the configuration
> > >> parameters (see Lustre manual).
> > >> 
> > >> Cheers, Andreas
> > >> 
> > >> On Nov 15, 2017, at 09:26, Harald van Pee
> > >> <p...@hiskp.uni-bonn.de<mailto:p...@hiskp.uni-bonn.de>> wrote:
> > >> 
> > >> Here are more informations:
> > >> 
> > >> if I try to mount the filesystem on the client I get similar messages
> > >> as from the failing conf_param command. It seems one have to remove
> > >> this failed configuration but how?
> > >> Here the syslog output on the client:
> > >> 
> > >> kernel: [ 4203.506437] LustreError: 3698:0:
> > >> (obd_mount.c:1340:lustre_fill_super()) Unable to mount  (-2)
> > >> kernel: [ 5028.547095] LustreError: 3830:0:
> > >> (obd_config.c:1202:class_process_config()) no device for:
> > >> hiskp3-OST-osc- 880416680800
> > >> kernel: [ 5028.547105] LustreError: 3830:0:
> > >> (obd_config.c:1606:class_config_llog_handler())
> > >> MGC192.168.128.200@o2ib: cfg command failed: rc = -22
> > >> kernel: [ 5028.547112] Lustre:cmd=cf00f 0:hiskp3-OST-osc
> > >> 1:osc.max_pages_per_rpc=256
> > >> kernel: [ 5028.547112]
> > >> kernel: [ 5028.547156] LustreError: 15b-f: MGC192.168.128.200@o2ib:
> > >> The configuration from log 'hiskp3-client'failed from the MGS (-22). 
> > >> Make sure this client and the MGS are running compatible versions of
> > >> Lustre. kernel: [ 5028.547407] LustreError:
> > >> 1680:0:(lov_obd.c:946:lov_cleanup()) hiskp3-clilov-880416680800:
> > >> lov tgt 1 not cleaned! deathrow=0, lovrc=1 kernel: [ 5028.547415]
> > >> LustreError: 1680:0:(lov_obd.c:946:lov_cleanup()) Skipped 3 previous
> > >> similar messages
> > >> kernel: [ 5028.550906] Lustre: Unmounted hiskp3-client
> > >> kernel: [ 5028.551407] LustreError: 3815:0:
> > >> (obd_mount.c:1340:lustre_fill_super()) Unable to mount  (-22)
> > >> 
> > >> 
> > >> 
> > >> On Wednesday 15 November 2017 16:06:29 Harald van Pee wrote:
> > >> Dear all,
> > >> 
> > >> I want to set max_pages_per_rpc to 64 instead of 256
> > >> lustre mgs/mdt version 2.53
> > >> lustre oss version 2.53
> > >> lustre client 2.6
> > >> 
> > >> on client I have done:
> > >> lctl get_param osc.hiskp3-OST*.max_pages_per_rpc
> > >> osc.hiskp3-OST0001-osc-88105dba4800.max_pages_per_rpc=256
> > >> osc.hiskp3-OST0002-osc-88105dba4800.max_pages_per_rpc=256
> > >> osc.hiskp3-OST0003-osc-88105dba4800.max_pages_per_rpc=256
> > >> osc.hiskp3-OST0004-osc-88105dba4800.max_pages_per_rpc=256
> > >> lctl set_param osc.hiskp3-OST*.max_pages_per_rpc=64
> > >> 
> > >> this works, but after remount I get again 256 therefore I want to make
> > >> it permant with
> > >> lctl conf_param hiskp3-OST*.osc.max_pages_per_rpc=64
> > >> 
> > 

Re: [lustre-discuss] how to set max_pages_per_rpc (I have done something wrong and need help)

2017-11-16 Thread Harald van Pee
Thank you,

I will start next week and report what helps.
Unfortunately in this cluster we have to wait with the upgrade to 2.10.1
until next year.

Best whishes
Harald


On Wednesday 15 November 2017 23:09:49 Dilger, Andreas wrote:
> On Nov 15, 2017, at 12:56, Harald van Pee <p...@hiskp.uni-bonn.de> wrote:
> > Hello Andreas,
> > 
> > thanks for your information, now I have the feeling I'm not completly
> > lost. With erasing configuration parameters do you mean the
> > writeconf procedure? (chapter 14.4)
> 
> Yes.
> 
> > Or is it possible to erase the unknown parameter?
> 
> You could try "lctl conf_param -d " to delete the parameter.
> 
> Cheers, Andreas
> 
> > On Wednesday 15 November 2017 20:37:20 Dilger, Andreas wrote:
> >> The problem that Lustre clients fail to mount when they get an unknown
> >> parameter is fixed in newer Lustre releases (2.9+) via patch
> >> https://review.whamcloud.com/21112 .
> >> 
> >> The current maintenance release is 2.10.1.
> >> 
> >> You could also work around this by erasing the configuration parameters
> >> (see Lustre manual).
> >> 
> >> Cheers, Andreas
> >> 
> >> On Nov 15, 2017, at 09:26, Harald van Pee
> >> <p...@hiskp.uni-bonn.de<mailto:p...@hiskp.uni-bonn.de>> wrote:
> >> 
> >> Here are more informations:
> >> 
> >> if I try to mount the filesystem on the client I get similar messages as
> >> from the failing conf_param command. It seems one have to remove this
> >> failed configuration but how?
> >> Here the syslog output on the client:
> >> 
> >> kernel: [ 4203.506437] LustreError: 3698:0:
> >> (obd_mount.c:1340:lustre_fill_super()) Unable to mount  (-2)
> >> kernel: [ 5028.547095] LustreError: 3830:0:
> >> (obd_config.c:1202:class_process_config()) no device for:
> >> hiskp3-OST-osc- 880416680800
> >> kernel: [ 5028.547105] LustreError: 3830:0:
> >> (obd_config.c:1606:class_config_llog_handler()) MGC192.168.128.200@o2ib:
> >> cfg command failed: rc = -22
> >> kernel: [ 5028.547112] Lustre:cmd=cf00f 0:hiskp3-OST-osc
> >> 1:osc.max_pages_per_rpc=256
> >> kernel: [ 5028.547112]
> >> kernel: [ 5028.547156] LustreError: 15b-f: MGC192.168.128.200@o2ib: The
> >> configuration from log 'hiskp3-client'failed from the MGS (-22).  Make
> >> sure this client and the MGS are running compatible versions of Lustre.
> >> kernel: [ 5028.547407] LustreError:
> >> 1680:0:(lov_obd.c:946:lov_cleanup()) hiskp3-clilov-ffff880416680800:
> >> lov tgt 1 not cleaned! deathrow=0, lovrc=1 kernel: [ 5028.547415]
> >> LustreError: 1680:0:(lov_obd.c:946:lov_cleanup()) Skipped 3 previous
> >> similar messages
> >> kernel: [ 5028.550906] Lustre: Unmounted hiskp3-client
> >> kernel: [ 5028.551407] LustreError: 3815:0:
> >> (obd_mount.c:1340:lustre_fill_super()) Unable to mount  (-22)
> >> 
> >> 
> >> 
> >> On Wednesday 15 November 2017 16:06:29 Harald van Pee wrote:
> >> Dear all,
> >> 
> >> I want to set max_pages_per_rpc to 64 instead of 256
> >> lustre mgs/mdt version 2.53
> >> lustre oss version 2.53
> >> lustre client 2.6
> >> 
> >> on client I have done:
> >> lctl get_param osc.hiskp3-OST*.max_pages_per_rpc
> >> osc.hiskp3-OST0001-osc-88105dba4800.max_pages_per_rpc=256
> >> osc.hiskp3-OST0002-osc-88105dba4800.max_pages_per_rpc=256
> >> osc.hiskp3-OST0003-osc-88105dba4800.max_pages_per_rpc=256
> >> osc.hiskp3-OST0004-osc-88105dba4800.max_pages_per_rpc=256
> >> lctl set_param osc.hiskp3-OST*.max_pages_per_rpc=64
> >> 
> >> this works, but after remount I get again 256 therefore I want to make
> >> it permant with
> >> lctl conf_param hiskp3-OST*.osc.max_pages_per_rpc=64
> >> 
> >> But I get the message, that this command have to be given on mdt
> >> unfortunately I go to our combined mgs/mdt and get
> >> 
> >> Lustre: Setting parameter hiskp3-OST-osc.osc.max_pages_per_rpc in
> >> log hiskp3-client
> >> LustreError: 956:0:(obd_config.c:1221:class_process_config()) no device
> >> for: hiskp3-OST-osc-MDT
> >> LustreError: 956:0:(obd_config.c:1591:class_config_llog_handler())
> >> MGC192.168.128.200@o2ib: cfg command failed: rc = -22
> >> Lustre:cmd=cf00f 0:hiskp3-OST-osc-MDT
> >> 1:osc.max_pages_per_rpc=64
> >> 
> >> than I can not mount cli

Re: [lustre-discuss] how to set max_pages_per_rpc (I have done something wrong and need help)

2017-11-15 Thread Harald van Pee
Hello Andreas,

thanks for your information, now I have the feeling I'm not completly lost.
With erasing configuration parameters do you mean the
writeconf procedure? (chapter 14.4)

Or is it possible to erase the unknown parameter?

Thanks in advance
Harald



On Wednesday 15 November 2017 20:37:20 Dilger, Andreas wrote:
> The problem that Lustre clients fail to mount when they get an unknown
> parameter is fixed in newer Lustre releases (2.9+) via patch
> https://review.whamcloud.com/21112 .
> 
> The current maintenance release is 2.10.1.
> 
> You could also work around this by erasing the configuration parameters
> (see Lustre manual).
> 
> Cheers, Andreas
> 
> On Nov 15, 2017, at 09:26, Harald van Pee
> <p...@hiskp.uni-bonn.de<mailto:p...@hiskp.uni-bonn.de>> wrote:
> 
> Here are more informations:
> 
> if I try to mount the filesystem on the client I get similar messages as
> from the failing conf_param command. It seems one have to remove this
> failed configuration but how?
> Here the syslog output on the client:
> 
> kernel: [ 4203.506437] LustreError: 3698:0:
> (obd_mount.c:1340:lustre_fill_super()) Unable to mount  (-2)
> kernel: [ 5028.547095] LustreError: 3830:0:
> (obd_config.c:1202:class_process_config()) no device for:
> hiskp3-OST-osc- 880416680800
> kernel: [ 5028.547105] LustreError: 3830:0:
> (obd_config.c:1606:class_config_llog_handler()) MGC192.168.128.200@o2ib:
> cfg command failed: rc = -22
> kernel: [ 5028.547112] Lustre:cmd=cf00f 0:hiskp3-OST-osc
> 1:osc.max_pages_per_rpc=256
> kernel: [ 5028.547112]
> kernel: [ 5028.547156] LustreError: 15b-f: MGC192.168.128.200@o2ib: The
> configuration from log 'hiskp3-client'failed from the MGS (-22).  Make sure
> this client and the MGS are running compatible versions of Lustre.
> kernel: [ 5028.547407] LustreError: 1680:0:(lov_obd.c:946:lov_cleanup())
> hiskp3-clilov-880416680800: lov tgt 1 not cleaned! deathrow=0, lovrc=1
> kernel: [ 5028.547415] LustreError: 1680:0:(lov_obd.c:946:lov_cleanup())
> Skipped 3 previous similar messages
> kernel: [ 5028.550906] Lustre: Unmounted hiskp3-client
> kernel: [ 5028.551407] LustreError: 3815:0:
> (obd_mount.c:1340:lustre_fill_super()) Unable to mount  (-22)
> 
> 
> 
> On Wednesday 15 November 2017 16:06:29 Harald van Pee wrote:
> Dear all,
> 
> I want to set max_pages_per_rpc to 64 instead of 256
> lustre mgs/mdt version 2.53
> lustre oss version 2.53
> lustre client 2.6
> 
> on client I have done:
> lctl get_param osc.hiskp3-OST*.max_pages_per_rpc
> osc.hiskp3-OST0001-osc-88105dba4800.max_pages_per_rpc=256
> osc.hiskp3-OST0002-osc-88105dba4800.max_pages_per_rpc=256
> osc.hiskp3-OST0003-osc-88105dba4800.max_pages_per_rpc=256
> osc.hiskp3-OST0004-osc-88105dba4800.max_pages_per_rpc=256
> lctl set_param osc.hiskp3-OST*.max_pages_per_rpc=64
> 
> this works, but after remount I get again 256 therefore I want to make it
> permant with
> lctl conf_param hiskp3-OST*.osc.max_pages_per_rpc=64
> 
> But I get the message, that this command have to be given on mdt
> unfortunately I go to our combined mgs/mdt and get
> 
> Lustre: Setting parameter hiskp3-OST-osc.osc.max_pages_per_rpc in log
> hiskp3-client
> LustreError: 956:0:(obd_config.c:1221:class_process_config()) no device
> for: hiskp3-OST-osc-MDT
> LustreError: 956:0:(obd_config.c:1591:class_config_llog_handler())
> MGC192.168.128.200@o2ib: cfg command failed: rc = -22
> Lustre:cmd=cf00f 0:hiskp3-OST-osc-MDT
> 1:osc.max_pages_per_rpc=64
> 
> than I can not mount client and want to go back
> lctl set_param osc.hiskp3-OST*.max_pages_per_rpc=64
> 
> Lustre: Modifying parameter hiskp3-OST-osc.osc.max_pages_per_rpc in log
> hiskp3-client
> Lustre: Skipped 1 previous similar message
> LustreError: 966:0:(obd_config.c:1221:class_process_config()) no device
> for: hiskp3-OST-osc-MDT
> LustreError: 966:0:(obd_config.c:1591:class_config_llog_handler())
> MGC192.168.128.200@o2ib: cfg command failed: rc = -22
> Lustre:cmd=cf00f 0:hiskp3-OST-osc-MDT
> 1:osc.max_pages_per_rpc=256
> 
> obviously what I have done was completly wrong and I can no longer mount a
> client, mounted clients are working.
> How can I get it back working?
> hiskp3-MDT ist the label of the mgs/mdt but hiskp3-OST-osc-MDT
> seems to be incorrect
> 
> What I have to do to get the mgs/mdt working again?
> Its your production cluster
> Any help is welcome
> 
> Best
> Harald
> 
> 
> 
> 
> 
> 
> ___
> lustre-discuss mailing list
> lustre-discuss@lists.lustre.org<mailto:lustre-discuss@lists.lustre.org>
> http://list

Re: [lustre-discuss] Dependency errors with Lustre 2.10.1 packages

2017-11-15 Thread Harald van Pee
Hi,

do you have installed the wc version of e2fsprogs?
I would expect this solves at least most of the dependenies.

Harald


On Wednesday 15 November 2017 19:40:38 Michael Watters wrote:
> I am attempting to install lustre packages on a new OST node running
> CentOS 7.4.1708 and it appears that there is a broken dependency in the
> rpm packages.  Attempting to install the lustre package results in an
> error as shown below.
> 
> [root@lustre-ost03 ~]# yum install lustre
> Loaded plugins: fastestmirror, versionlock
> Loading mirror speeds from cached hostfile
> Resolving Dependencies
> --> Running transaction check
> ---> Package lustre.x86_64 0:2.10.1-1.el7 will be installed
> --> Processing Dependency: kmod-lustre = 2.10.1 for package:
> lustre-2.10.1-1.el7.x86_64 --> Processing Dependency: lustre-osd for
> package: lustre-2.10.1-1.el7.x86_64 --> Processing Dependency:
> lustre-osd-mount for package: lustre-2.10.1-1.el7.x86_64 --> Processing
> Dependency: libyaml-0.so.2()(64bit) for package:
> lustre-2.10.1-1.el7.x86_64 --> Running transaction check
> ---> Package kmod-lustre.x86_64 0:2.10.1-1.el7 will be installed
> ---> Package kmod-lustre-osd-ldiskfs.x86_64 0:2.10.1-1.el7 will be
> installed --> Processing Dependency: ldiskfsprogs >= 1.42.7.wc1 for
> package: kmod-lustre-osd-ldiskfs-2.10.1-1.el7.x86_64 ---> Package
> libyaml.x86_64 0:0.1.4-11.el7_0 will be installed
> ---> Package lustre-osd-ldiskfs-mount.x86_64 0:2.10.1-1.el7 will be
> installed --> Finished Dependency Resolution
> Error: Package: kmod-lustre-osd-ldiskfs-2.10.1-1.el7.x86_64 (lustre)
>Requires: ldiskfsprogs >= 1.42.7.wc1
>  You could try using --skip-broken to work around the problem
> 
> I've checked the repos and don't see a package for ldiskfsprogs at all. 
> Does anybody know how to resolve this?

___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


[lustre-discuss] mgs stops working after accidently setting a non existing paramter (no device for)

2017-11-15 Thread Harald van Pee
Dear all, 

I changed the subject, because its most important for us to get the mgs 
running again and that it would be possible to allow mount a client.

Why ever I have managed to set a paramter for which no device exists,
indeed there is no ost.
But obviously the hiskp3-client log has logged a cmd to set
parameter for device hiskp3-OST-osc
which was not my intention.

The mgs/mdt is not used as a client, and probably therefore
lctl get_param osc.hiskp3-OST*.max_pages_per_rpc
error: get_param: /proc/{fs,sys}/{lnet,lustre}/osc/hiskp3-
OST*/max_pages_per_rpc: Found no match
and
lctl set_param osc.hiskp3-OST*.max_pages_per_rpc=256
does not work there.
but indeed
ls /proc/fs/lustre/osc/
shows:
hiskp3-OST0001-osc-MDT  hiskp3-OST0002-osc-MDT  hiskp3-OST0003-osc-
MDT  hiskp3-OST0004-osc-MDT  num_refs

(but no OST).
Can I get ride of this misconfiguration by setting
lctl conf_param hiskp3-OST.osc.max_pages_per_rpc=0
??

And if so, how I have to proceed after that command?

Thanks in advance
Harald



On Wednesday 15 November 2017 17:26:25 Harald van Pee wrote:
> Here are more informations:
> 
> if I try to mount the filesystem on the client I get similar messages as
> from the failing conf_param command. It seems one have to remove this
> failed configuration but how?
> Here the syslog output on the client:
> 
> kernel: [ 4203.506437] LustreError: 3698:0:
> (obd_mount.c:1340:lustre_fill_super()) Unable to mount  (-2)
>  kernel: [ 5028.547095] LustreError: 3830:0:
> (obd_config.c:1202:class_process_config()) no device for:
> hiskp3-OST-osc- 880416680800
>  kernel: [ 5028.547105] LustreError: 3830:0:
> (obd_config.c:1606:class_config_llog_handler()) MGC192.168.128.200@o2ib:
> cfg command failed: rc = -22
>  kernel: [ 5028.547112] Lustre:cmd=cf00f 0:hiskp3-OST-osc
> 1:osc.max_pages_per_rpc=256
>  kernel: [ 5028.547112]
>  kernel: [ 5028.547156] LustreError: 15b-f: MGC192.168.128.200@o2ib: The
> configuration from log 'hiskp3-client'failed from the MGS (-22).  Make sure
> this client and the MGS are running compatible versions of Lustre.
>  kernel: [ 5028.547407] LustreError: 1680:0:(lov_obd.c:946:lov_cleanup())
> hiskp3-clilov-880416680800: lov tgt 1 not cleaned! deathrow=0, lovrc=1
>  kernel: [ 5028.547415] LustreError: 1680:0:(lov_obd.c:946:lov_cleanup())
> Skipped 3 previous similar messages
>  kernel: [ 5028.550906] Lustre: Unmounted hiskp3-client
>  kernel: [ 5028.551407] LustreError: 3815:0:
> (obd_mount.c:1340:lustre_fill_super()) Unable to mount  (-22)
> 
> On Wednesday 15 November 2017 16:06:29 Harald van Pee wrote:
> > Dear all,
> > 
> > I want to set max_pages_per_rpc to 64 instead of 256
> > lustre mgs/mdt version 2.53
> > lustre oss version 2.53
> > lustre client 2.6
> > 
> > on client I have done:
> > lctl get_param osc.hiskp3-OST*.max_pages_per_rpc
> > osc.hiskp3-OST0001-osc-88105dba4800.max_pages_per_rpc=256
> > osc.hiskp3-OST0002-osc-88105dba4800.max_pages_per_rpc=256
> > osc.hiskp3-OST0003-osc-88105dba4800.max_pages_per_rpc=256
> > osc.hiskp3-OST0004-osc-88105dba4800.max_pages_per_rpc=256
> > lctl set_param osc.hiskp3-OST*.max_pages_per_rpc=64
> > 
> > this works, but after remount I get again 256 therefore I want to make it
> > permant with
> > 
> >  lctl conf_param hiskp3-OST*.osc.max_pages_per_rpc=64
> > 
> > But I get the message, that this command have to be given on mdt
> > unfortunately I go to our combined mgs/mdt and get
> > 
> > Lustre: Setting parameter hiskp3-OST-osc.osc.max_pages_per_rpc in log
> > hiskp3-client
> > LustreError: 956:0:(obd_config.c:1221:class_process_config()) no device
> > for: hiskp3-OST-osc-MDT
> > LustreError: 956:0:(obd_config.c:1591:class_config_llog_handler())
> > MGC192.168.128.200@o2ib: cfg command failed: rc = -22
> > Lustre:cmd=cf00f 0:hiskp3-OST-osc-MDT
> > 1:osc.max_pages_per_rpc=64
> > 
> > than I can not mount client and want to go back
> > lctl set_param osc.hiskp3-OST*.max_pages_per_rpc=64
> > 
> > Lustre: Modifying parameter hiskp3-OST-osc.osc.max_pages_per_rpc in
> > log hiskp3-client
> > Lustre: Skipped 1 previous similar message
> > LustreError: 966:0:(obd_config.c:1221:class_process_config()) no device
> > for: hiskp3-OST-osc-MDT
> > LustreError: 966:0:(obd_config.c:1591:class_config_llog_handler())
> > MGC192.168.128.200@o2ib: cfg command failed: rc = -22
> > Lustre:cmd=cf00f 0:hiskp3-OST-osc-MDT
> > 1:osc.max_pages_per_rpc=256
> > 
> > obviously what I have done was completly wrong and I can no longer mount
> > a client, mounted clients are working.
> > 

Re: [lustre-discuss] how to set max_pages_per_rpc (I have done something wrong and need help)

2017-11-15 Thread Harald van Pee
Here are more informations:

if I try to mount the filesystem on the client I get similar messages as from 
the failing conf_param command. It seems one have to remove this failed 
configuration but how?
Here the syslog output on the client:

kernel: [ 4203.506437] LustreError: 3698:0:
(obd_mount.c:1340:lustre_fill_super()) Unable to mount  (-2)
 kernel: [ 5028.547095] LustreError: 3830:0:
(obd_config.c:1202:class_process_config()) no device for: hiskp3-OST-osc-
880416680800
 kernel: [ 5028.547105] LustreError: 3830:0:
(obd_config.c:1606:class_config_llog_handler()) MGC192.168.128.200@o2ib: cfg 
command failed: rc = -22
 kernel: [ 5028.547112] Lustre:cmd=cf00f 0:hiskp3-OST-osc  
1:osc.max_pages_per_rpc=256  
 kernel: [ 5028.547112] 
 kernel: [ 5028.547156] LustreError: 15b-f: MGC192.168.128.200@o2ib: The 
configuration from log 'hiskp3-client'failed from the MGS (-22).  Make sure 
this client and the MGS are running compatible versions of Lustre.
 kernel: [ 5028.547407] LustreError: 1680:0:(lov_obd.c:946:lov_cleanup()) 
hiskp3-clilov-880416680800: lov tgt 1 not cleaned! deathrow=0, lovrc=1
 kernel: [ 5028.547415] LustreError: 1680:0:(lov_obd.c:946:lov_cleanup()) 
Skipped 3 previous similar messages
 kernel: [ 5028.550906] Lustre: Unmounted hiskp3-client
 kernel: [ 5028.551407] LustreError: 3815:0:
(obd_mount.c:1340:lustre_fill_super()) Unable to mount  (-22)



On Wednesday 15 November 2017 16:06:29 Harald van Pee wrote:
> Dear all,
> 
> I want to set max_pages_per_rpc to 64 instead of 256
> lustre mgs/mdt version 2.53
> lustre oss version 2.53
> lustre client 2.6
> 
> on client I have done:
> lctl get_param osc.hiskp3-OST*.max_pages_per_rpc
> osc.hiskp3-OST0001-osc-88105dba4800.max_pages_per_rpc=256
> osc.hiskp3-OST0002-osc-88105dba4800.max_pages_per_rpc=256
> osc.hiskp3-OST0003-osc-88105dba4800.max_pages_per_rpc=256
> osc.hiskp3-OST0004-osc-88105dba4800.max_pages_per_rpc=256
> lctl set_param osc.hiskp3-OST*.max_pages_per_rpc=64
> 
> this works, but after remount I get again 256 therefore I want to make it
> permant with
>  lctl conf_param hiskp3-OST*.osc.max_pages_per_rpc=64
> 
> But I get the message, that this command have to be given on mdt
> unfortunately I go to our combined mgs/mdt and get
> 
> Lustre: Setting parameter hiskp3-OST-osc.osc.max_pages_per_rpc in log
> hiskp3-client
> LustreError: 956:0:(obd_config.c:1221:class_process_config()) no device
> for: hiskp3-OST-osc-MDT
> LustreError: 956:0:(obd_config.c:1591:class_config_llog_handler())
> MGC192.168.128.200@o2ib: cfg command failed: rc = -22
> Lustre:cmd=cf00f 0:hiskp3-OST-osc-MDT 
> 1:osc.max_pages_per_rpc=64
> 
> than I can not mount client and want to go back
> lctl set_param osc.hiskp3-OST*.max_pages_per_rpc=64
> 
> Lustre: Modifying parameter hiskp3-OST-osc.osc.max_pages_per_rpc in log
> hiskp3-client
> Lustre: Skipped 1 previous similar message
> LustreError: 966:0:(obd_config.c:1221:class_process_config()) no device
> for: hiskp3-OST-osc-MDT
> LustreError: 966:0:(obd_config.c:1591:class_config_llog_handler())
> MGC192.168.128.200@o2ib: cfg command failed: rc = -22
> Lustre:cmd=cf00f 0:hiskp3-OST-osc-MDT 
> 1:osc.max_pages_per_rpc=256
> 
> obviously what I have done was completly wrong and I can no longer mount a
> client, mounted clients are working.
> How can I get it back working?
> hiskp3-MDT ist the label of the mgs/mdt but hiskp3-OST-osc-MDT
> seems to be incorrect
> 
> What I have to do to get the mgs/mdt working again?
> Its your production cluster
> Any help is welcome
> 
> Best
> Harald
> 
> 
> 
> 
> 
> 
> _______
> lustre-discuss mailing list
> lustre-discuss@lists.lustre.org
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

-- 
Harald van Pee

Helmholtz-Institut fuer Strahlen- und Kernphysik der Universitaet Bonn
Nussallee 14-16 - 53115 Bonn - Tel +49-228-732213 - Fax +49-228-732505
mail: p...@hiskp.uni-bonn.de
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


[lustre-discuss] how to set max_pages_per_rpc (I have done something wrong and need help)

2017-11-15 Thread Harald van Pee
Dear all,

I want to set max_pages_per_rpc to 64 instead of 256
lustre mgs/mdt version 2.53
lustre oss version 2.53
lustre client 2.6

on client I have done:
lctl get_param osc.hiskp3-OST*.max_pages_per_rpc
osc.hiskp3-OST0001-osc-88105dba4800.max_pages_per_rpc=256
osc.hiskp3-OST0002-osc-88105dba4800.max_pages_per_rpc=256
osc.hiskp3-OST0003-osc-88105dba4800.max_pages_per_rpc=256
osc.hiskp3-OST0004-osc-88105dba4800.max_pages_per_rpc=256
lctl set_param osc.hiskp3-OST*.max_pages_per_rpc=64

this works, but after remount I get again 256 therefore I want to make it 
permant with
 lctl conf_param hiskp3-OST*.osc.max_pages_per_rpc=64

But I get the message, that this command have to be given on mdt
unfortunately I go to our combined mgs/mdt and get

Lustre: Setting parameter hiskp3-OST-osc.osc.max_pages_per_rpc in log 
hiskp3-client
LustreError: 956:0:(obd_config.c:1221:class_process_config()) no device for: 
hiskp3-OST-osc-MDT
LustreError: 956:0:(obd_config.c:1591:class_config_llog_handler()) 
MGC192.168.128.200@o2ib: cfg command failed: rc = -22
Lustre:cmd=cf00f 0:hiskp3-OST-osc-MDT  1:osc.max_pages_per_rpc=64  

than I can not mount client and want to go back 
lctl set_param osc.hiskp3-OST*.max_pages_per_rpc=64

Lustre: Modifying parameter hiskp3-OST-osc.osc.max_pages_per_rpc in log 
hiskp3-client
Lustre: Skipped 1 previous similar message
LustreError: 966:0:(obd_config.c:1221:class_process_config()) no device for: 
hiskp3-OST-osc-MDT
LustreError: 966:0:(obd_config.c:1591:class_config_llog_handler()) 
MGC192.168.128.200@o2ib: cfg command failed: rc = -22
Lustre:cmd=cf00f 0:hiskp3-OST-osc-MDT  1:osc.max_pages_per_rpc=256  

obviously what I have done was completly wrong and I can no longer mount a 
client, mounted clients are working.
How can I get it back working?
hiskp3-MDT ist the label of the mgs/mdt but hiskp3-OST-osc-MDT
seems to be incorrect

What I have to do to get the mgs/mdt working again?
Its your production cluster 
Any help is welcome

Best 
Harald






___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] lustre 2.10.1 MOFED 4.1 QDR/FDR mixing

2017-11-14 Thread Harald van Pee
Hi Jeff, 

thanks for your answer.
Can I be sure that there is no autoprobing which sets any configuration 
differently?
The options given for mkfs.lustre and in /etc/modprobe.d/lustre.conf 
will be the same, is this enough?

Best
Harald

On Tuesday 14 November 2017 18:39:49 Jeff Johnson wrote:
> Harald,
> 
> As long as your new servers and clients all have the same settings in their
> config files as your currently running configuration you should be fine.
> 
> --Jeff
> 
> 
> On Tue, Nov 14, 2017 at 9:24 AM, Harald van Pee <p...@hiskp.uni-bonn.de>
> 
> wrote:
> > Dear all,
> > 
> > I have installed lustre 2.10.1 from source with MOFED 4.1.
> > mdt/mgs and oss run on centos 7.4
> > clients on debian 9 (kernel 4.9)
> > 
> > our test (cluster 1x mgs/mdt + 1x oss + 1x client) all with mellanox ib
> > qdr nics
> > runs without problems on a mellanox fdr switch.
> > Now we have additional clients and servers with fdr and qdr nics.
> > 
> > Do I need any special configuration (beside options lnet networks=o2ib0)
> > if I add additional fdr clients and/or servers?
> > 
> > Was the configuration probed? And does it make a difference if I would
> > start
> > with fdr servers and clients and add qdr servers and clients or the other
> > way
> > around?
> > 
> > Thanks in advance
> > Harald
> > 
> > 
> > ___
> > lustre-discuss mailing list
> > lustre-discuss@lists.lustre.org
> > http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


[lustre-discuss] lustre 2.10.1 MOFED 4.1 QDR/FDR mixing

2017-11-14 Thread Harald van Pee
Dear all,

I have installed lustre 2.10.1 from source with MOFED 4.1.
mdt/mgs and oss run on centos 7.4
clients on debian 9 (kernel 4.9)

our test (cluster 1x mgs/mdt + 1x oss + 1x client) all with mellanox ib qdr 
nics
runs without problems on a mellanox fdr switch.
Now we have additional clients and servers with fdr and qdr nics.

Do I need any special configuration (beside options lnet networks=o2ib0) 
if I add additional fdr clients and/or servers?

Was the configuration probed? And does it make a difference if I would start 
with fdr servers and clients and add qdr servers and clients or the other way 
around?

Thanks in advance
Harald


___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] Lustre and OFED

2017-07-28 Thread Harald van Pee
Hello

On Friday 28 July 2017 15:48:12 Ben Evans wrote:
> Eli, just to clarify are you talking about using the in-kernel OFED vs. a
> vendor (Mellanox) OFED, or 

In our case we are using the OFED of the debian distribution used.

> are you talking about using the ConnectX-3
> hardware in IPoIB mode and just using it as a faster Ethernet?

is  possible? How one have to do this?

Harald


> 
> -Ben Evans
> 
> From: lustre-discuss
>  ts.lustre.org>> on behalf of "E.S. Rosenberg"
> > Date:
> Thursday, July 27, 2017 at 4:55 PM
> To:
> "lustre-discuss@lists.lustre.org"
> >
> Subject: [lustre-discuss] Lustre and OFED
> 
> Hi all,
> 
> How 'needed' is OFED for Lustre? In the LUG talks it is mentioned every
> once in a while and that got me thinking a bit.
> 
> What things are gained by installing OFED? Performance? Accurate traffic
> reports?
> 
> Currently I am using a lustre system without OFED but our IB hardware is
> from the FDR generation so not bleeding edge and probably doesn't need
> OFED because of that
> 
> Thanks,
> Eli
> 
> Tech specs:
> Servers: CentOS 6.8 + Lustre 2.8 (kernel from Lustre RPMs)
> Clients: Debian + kernel 4.2 + Lustre 2.8
> IB: ConnectX-3 FDR

___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] Lustre and OFED

2017-07-27 Thread Harald van Pee
Hi Eli,

we are running lustre without OFED on debian client and server. 
With lustre 2.4.0 on client and servers no problem at all since years.
With lustre 2.5.3 on servers and 2.6. 92 no problems at least for monthes.
with lustre 2.5.3 on servers and 2.7 on clients allways ib connection loss.
Here I'm wondering if a more recent OFED version could help?

We are mostly interested in a rock solid lustre version, lustre 2.6 is fast 
enough for us, but has a memory leak caused by cache usage, lustre 2.7 was 
perfect for us in tests with a small number of machines, but fails completly 
for the full cluster and/or certain tasks.

Best
Harald


On Donnerstag, 27. Juli 2017 22:55:33 CEST E.S. Rosenberg wrote:
> Hi all,
> 
> How 'needed' is OFED for Lustre? In the LUG talks it is mentioned every
> once in a while and that got me thinking a bit.
> 
> What things are gained by installing OFED? Performance? Accurate traffic
> reports?
> 
> Currently I am using a lustre system without OFED but our IB hardware is
> from the FDR generation so not bleeding edge and probably doesn't need OFED
> because of that
> 
> Thanks,
> Eli
> 
> Tech specs:
> Servers: CentOS 6.8 + Lustre 2.8 (kernel from Lustre RPMs)
> Clients: Debian + kernel 4.2 + Lustre 2.8
> IB: ConnectX-3 FDR


___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] Clients looses IB connection to OSS.

2017-07-19 Thread Harald van Pee
Hi, 

I just wondering if there is also a problem with lustre 2.7 and mellanox ib
even if there is no infiniband router.
We are using lustre with infiniband as the only lnet connection.
Lustre 2.5.3 on the server side and 2.6 on clients runs stable for monthes, 
but we see a memory leak (SUnreclaim grows, and free the cache or unmount does 
not help)
and I just tried 2.7 on client side this week.
Now we have very often 
Connection to OST000X (at ...@o2ib) was lost
messages.
But no message about rdma problems. Only reboot helps on client and server 
side.

The work around Thomas has mentioned does not help on client side.

Is there any work around or solution?
I can not make an update before holiday therefore I think it would be best 
to go back to 2.6?

Any help would be welcome.

Harald


On Monday 01 May 2017 17:59:28 Thomas Stibor wrote:
> Hi,
> 
> see JIRA: https://jira.hpdd.intel.com/browse/LU-5718
> 
> What seems to work as a quick fix (for older versions) is to set the
> value of parameter max_pages_per_rpc=64
> 
> As written in https://jira.hpdd.intel.com/browse/LU-5718
> the issue is resolved, however for upcoming version 2.10.0
> 
> Cheers
>  Thomas
> 
> On Mon, May 01, 2017 at 04:47:32PM +0200, Hans Henrik Happe wrote:
> > Hi,
> > 
> > We have experienced problems with loosing connection to OSS. It starts
> > with:
> > 
> > May  1 03:35:46 node872 kernel: LNetError:
> > 5545:0:(o2iblnd_cb.c:1094:kiblnd_init_rdma()) RDMA has too many
> > fragments for peer 10.21.10.116@o2ib (256), src idx/frags: 128/236 dst
> > idx/frags: 128/236
> > May  1 03:35:46 node872 kernel: LNetError:
> > 5545:0:(o2iblnd_cb.c:1689:kiblnd_reply()) Can't setup rdma for GET from
> > 10.21.10.116@o2ib: -90
> > 
> > The rest of the log is attached.
> > 
> > After this Lustre access is very slow. I.e. a 'df' can take minutes.
> > Also 'lctl ping' to the OSS give I/O errors. Doing 'lnet net del/add'
> > makes ping work again until file I/O starts. Then I/O errors again.
> > 
> > We use both IB and TCP on servers, so no routers.
> > 
> > In the attached log astro-OST0001 has been moved to the other server in
> > the HA pair. This is because 'lctl dl -t' showed strange output when on
> > the right server:
> > 
> > # lctl dl -t
> > 
> >   0 UP mgc MGC10.21.10.102@o2ib 0b0bbbce-63b6-bf47-403c-28f0c53e8307 5
> >   1 UP lov astro-clilov-88107412e800
> > 
> > 53add9a3-e719-26d9-afb4-3fe9b0fa03bd 4
> > 
> >   2 UP lmv astro-clilmv-88107412e800
> > 
> > 53add9a3-e719-26d9-afb4-3fe9b0fa03bd 4
> > 
> >   3 UP mdc astro-MDT-mdc-88107412e800
> > 
> > 53add9a3-e719-26d9-afb4-3fe9b0fa03bd 5 10.21.10.102@o2ib
> > 
> >   4 UP osc astro-OST0002-osc-88107412e800
> > 
> > 53add9a3-e719-26d9-afb4-3fe9b0fa03bd 5 10.21.10.116@o2ib
> > 
> >   5 UP osc astro-OST0001-osc-88107412e800
> > 
> > 53add9a3-e719-26d9-afb4-3fe9b0fa03bd 5 172.20.10.115@tcp1
> > 
> >   6 UP osc astro-OST0003-osc-88107412e800
> > 
> > 53add9a3-e719-26d9-afb4-3fe9b0fa03bd 5 10.21.10.117@o2ib
> > 
> >   7 UP osc astro-OST-osc-88107412e800
> > 
> > 53add9a3-e719-26d9-afb4-3fe9b0fa03bd 5 10.21.10.114@o2ib
> > 
> > So astro-OST0001 seems to be connected through 172.20.10.115@tcp1, even
> > though it uses 10.21.10.115@o2ib (verified by performance test and
> > disabling tcp1 on IB nodes).
> > 
> > Please ask for more details if needed.
> > 
> > Cheers,
> > Hans Henrik
> > 
> > 
> > May  1 03:35:46 node872 kernel: LNetError:
> > 5545:0:(o2iblnd_cb.c:1094:kiblnd_init_rdma()) RDMA has too many
> > fragments for peer 10.21.10.116@o2ib (256), src idx/frags: 128/236 dst
> > idx/frags: 128/236 May  1 03:35:46 node872 kernel: LNetError:
> > 5545:0:(o2iblnd_cb.c:1689:kiblnd_reply()) Can't setup rdma for GET from
> > 10.21.10.116@o2ib: -90 May  1 03:35:46 node872 kernel: LustreError:
> > 5545:0:(events.c:201:client_bulk_callback()) event type 1, status -5,
> > desc 88103dd63000 May  1 03:35:46 node872 kernel: Lustre:
> > 5606:0:(client.c:2063:ptlrpc_expire_one_request()) @@@ Request sent has
> > failed due to network error: [sent 1493602541/real 1493602541] 
> > req@880e99cea080 x1565604440535580/t0(0)
> > o4->astro-OST0002-osc-881070c95c00@10.21.10.116@o2ib:6/4 lens
> > 608/448 e 0 to 1 dl 1493602585 ref 2 fl Rpc:X/0/ rc 0/-1 May  1
> > 03:35:46 node872 kernel: Lustre: astro-OST0002-osc-881070c95c00:
> > Connection to astro-OST0002 (at 10.21.10.116@o2ib) was lost; in progress
> > operations using this service will wait for recovery to complete May  1
> > 03:35:46 node872 kernel: Lustre: astro-OST0002-osc-881070c95c00:
> > Connection restored to 10.21.10.116@o2ib (at 10.21.10.116@o2ib) May  1
> > 03:35:46 node872 kernel: LustreError:
> > 5545:0:(events.c:201:client_bulk_callback()) event type 1, status -5,
> > desc 88103dd63000 May  1 03:35:46 node872 kernel: LustreError:
> > 5545:0:(events.c:201:client_bulk_callback()) event type 1, status -5,
> > desc 88103dd63000 May  1 03:35:46 node872 kernel: LustreError:
> > 

[Lustre-discuss] new ost not recognized

2008-04-27 Thread Harald van Pee
I have mounted two new ost to an existing lustre fs of 10 ost in total.
It seams that everything was o.k. because the size increases on one client.
I want to use this new disk space but I recognized that on the new ost the 
disk space was not used.
I looked for
recovery_status
and there was written INACTIVE.
lctl dl 
on mds shows no new osts.
I unmount the new ost but than the fs on the client freezes. Then I mount 
again and got 
status: COMPLETE
recovered_clients: 13

Thats strange because I would expect 14 clients + mdt 
and indeed 
lctl dl 
on one client gives only 10 ost instead of 12
the same was on mds.

It seems that on one client and on the mds the new osts are not recognized.
Can anybody explain what happens, and give me a hint how I can use the new 
disk space?

I use version 1.6.0.1 on servers and version 1.6.1 on clients
Thanks in advance
Harald


-- 
Harald van Pee

Helmholtz-Institut fuer Strahlen- und Kernphysik der Universitaet Bonn
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] files/directories are temporarily unavailable on patchless clients

2008-03-04 Thread Harald van Pee
Hi,

I have updated all clients to patched version 1.6.1, the servers still are 
1.6.0.1. No lustre related error message  occured since (2 weeks).

I think its reasonable (necessary?) to e2fsck all osts and the mdt?
The mdt resides on an drbd device configured as failover.

I now have the following questions.
1. Is there a recommended order to do the file system checks? mdt first and 
than the osts or vice versa?

2. If I umount the mdt should I use -f ? I assume there will be no file system 
access possible as long the mdt is back again. Would it be better to umount 
all servers and clients and than the mdt?

3. I think each ost can be checked during the others are working, but I am 
unsure if I should use -f to umount or not?

4. should I unmount all clients? If this is recommended  anyway, its maybe 
better to stop file system access for a couple of hours (2TB 70% used), but 
do the filesystem checks in parallel.

Thanks in advance
Harald



On Monday 21 January 2008 11:55 pm, Andreas Dilger wrote:
 On Jan 21, 2008  18:55 +0100, Harald van Pee wrote:
  The directory is just not there! Directory or file not found.
 
  in my opinion there is no error message on the clients which is directly
  related to the problem on our node0010 today I have seen this problem a
  several time. Mostly the directory is not seen! Probably all of the other
  directories can be accessed at the same time.
 
  and here all lustre related messages from the last days (others are
  mostly timestamps!)
 
 
 
  Jan 17 07:41:16 node0010 kernel: Lustre: 5723:0:
  (namei.c:235:ll_mdc_blocking_ast()) More than 1 alias dir 133798800 alias

 A quick search in bugzilla for this error message shows bug 12123,
 which is fixed in the 1.6.1 release, and also has a patch.

 Cheers, Andreas
 --
 Andreas Dilger
 Sr. Staff Engineer, Lustre Group
 Sun Microsystems of Canada, Inc.

-- 
Harald van Pee

Helmholtz-Institut fuer Strahlen- und Kernphysik der Universitaet Bonn
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] files/directories are temporarily unavailable on patchless clients

2008-01-25 Thread Harald van Pee
On Thursday 24 January 2008 08:13 pm, you wrote:
 Hello Harald,

  Jan 21 18:12:51 node0010 kernel: Lustre: 5717:0:
  (namei.c:235:ll_mdc_blocking_ast()) More than 1 alias dir 134120476 alias
  2 Jan 21 18:12:51 node0010 kernel: Lustre: 5717:0:
  (namei.c:235:ll_mdc_blocking_ast()) Skipped 6 previous similar messages

 this looks very much like a real bug (and I don't have time to look into
 it). I would also guess it is fixed by more recent lustre version. I think
 there have been many changes of the patchless client between 1.6.0.1 and
 1.6.1 or 1.6.2.
 You really can't update your client systems by now?

Hm at the moment not, we have urgent jobs running all arround the day.
All heavy writing tasks we have done to local disks now, none of the error 
messages occoured since.

But its worth to think about that. Updating the clients alone should be 
possible much earlier than updating all machines, and of course can be done 
machine by machine.

But I would assume, that to be sure that no serious file system corruption 
will happen, I should also make a file system check on all the ost and maybe 
also mdt?

But you are right, because 1.6.0.1 servers with 1.6.1 clients is a supported 
configuration right? And therefore updating the clients asap would be a good 
idea!
Any objections about that?

Harald

-- 
Harald van Pee

Helmholtz-Institut fuer Strahlen- und Kernphysik der Universitaet Bonn
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss