from:"Hans Henrik Happe"

Re: [lustre-discuss] Lnet configuration and debugging

2024-10-01 Thread Hans Henrik Happe


Hi,

We have a tcp1 config, but lnet.conf looks like this:

net:
    - net type: tcp1
  local NI(s):
    - nid: @tcp1
  status: up
  interfaces:
  0: eth0

replace  with NID IP. I guess you need "- net type" instead of just 
"- net".


Cheers,
Hans Henrik

On 17/09/2024 11.50, Steve Brasier wrote:

Hi.

I've got an /etc/lnet.conf on a RockyLinux 9.4 client running 
lustre 2.15.5-1.el9 which has this lnet.conf:


[root@stg-login-0 rocky]# cat /etc/lnet.conf
net:
    - net: tcp1
        interfaces:
            0: eth0

Running systemctl start lnet just hangs forever, with the syslog just 
showing

Sep 13 15:31:35 stg-login-0 systemd[1]: Starting lnet management...

and its actually the below which hangs:
[root@stg-login-0 rocky]# /usr/sbin/lnetctl import /etc/lnet.conf
i.e. module load and lnet configure work OK.

However it looks like it autoconfigured an interface on tcp (not tcp1):
[root@stg-login-0 rocky]# lnetctl net show
net:
    - net type: lo
      local NI(s):
        - nid: 0@lo
          status: up
    - net type: tcp
      local NI(s):
        - nid: 10.179.2.45@tcp
          status: up

So:
1. How can I debug this hanging please?

2. Do the client and server NIDs need to be in the same IPv4 subnet? I 
have a client NID of 10.179.2.45@tcp1 and a server NID 
of 10.167.128.1@tcp1, with IP routing between them such that icmp ping 
works between them, is that OK?


many thanks for any help!


http://stackhpc.com/
Please note I work Tuesday to Friday.

___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

Re: [lustre-discuss] LNET issues

2024-09-11 Thread Hans Henrik Happe via lustre-discuss


Hi,

We started having the same issue after upgrading  servers from 2.12.9 to 
2.15.5 and clients from 2.15.3 to 2.15.5. Only a couple of older OSS had 
the issue. They use Connectx-3 FDR card and the mlx4 driver. After 
replacing them with newer Connectx-4, which use the mlx5 driver, we 
haven't had issue so far. We still have FDR/mlx4 clients using it.


It is the OS (Rocky 8 on servers and Rocky 9 on clients) provided drivers.

Are you using IB cards that use mlx4 driver on the OSS.

Cheers,
Hans Henrik

On 04/09/2024 19.50, Alastair Basden via lustre-discuss wrote:

Hi Makie,

Yes, sorry, that should be:

From the client (172.18.178.216):
lnetctl ping 172.18.185.8@o2ib
manage:
    - ping:
  errno: -1
  descr: failed to ping 172.18.185.8@o2ib: Input/output error


From the server (172.18.185.8):
lnetctl ping 172.18.178.216@o2ib
manage:
    - ping:
  errno: -1
  descr: failed to ping 172.18.178.216@o2ib: Input/output error



And yet a standard ping works.

Pinging to/from other clients and other OSSs works.  i.e. the file 
system is fully functional and in production, just this client and one 
or two others are having problems.


We are a link down on the core-edge switch link on the edge switch 
with this client attached.  Given that a standard ping works, 
connectivity is there.  But perhaps there is some rdma issue?


Cheers,
Alastair.

On Wed, 4 Sep 2024, Makia Minich wrote:

[You don't often get email from ma...@systemfabricworks.com. Learn 
why this is important at https://aka.ms/LearnAboutSenderIdentification ]


[EXTERNAL EMAIL]

The IP for the nid in your “net show” isn’t any of the nids you 
pinged. Is an address misconfigured somewhere?


On Sep 4, 2024, at 2:52 AM, Alastair Basden via lustre-discuss 
 wrote:


Hi,

We are having some Lnet issues, and wonder if anyone can advise.

Client is 2.15.5, server is 2.12.6.

Fabric is IB.

The file system mounts, but OSTs on a couple of OSSs are not 
contactable.


Client and servers can ping each other over the IB network.

However, a lnetctl ping fails to/from the bad OSSs to this client.  
To other clients it's all fine.


i.e. for most of the clients it is working well, just one or two not 
so.


Server to client:
lnetctl ping 172.18.178.201@o2ib
manage:
   - ping:
 errno: -1
 descr: failed to ping 172.18.178.201@o2ib: Input/output error

Client to server:
anage:
   - ping:
 errno: -1
 descr: failed to ping 172.18.185.10@o2ib: Input/output error



And the o2ib network is noted as down:
lnetctl net show --net o2ib --verbose
net:
   - net type: o2ib
 local NI(s):
   - nid: 172.18.178.216@o2ib
 status: down
 interfaces:
 0: ibs1f0
 statistics:
 send_count: 45032
 recv_count: 45030
 drop_count: 0
 tunables:
 peer_timeout: 100
 peer_credits: 32
 peer_buffer_credits: 0
 credits: 256
 lnd tunables:
 peercredits_hiw: 16
 map_on_demand: 1
 concurrent_sends: 32
 fmr_pool_size: 512
 fmr_flush_trigger: 384
 fmr_cache: 1
 ntx: 512
 conns_per_peer: 1
 dev cpt: 0
 CPT: "[0,1]"



Could this be a hardware error, even though the IB is working?

Could it be related to https://jira.whamcloud.com/browse/LU-16378 ?

Are there any suggestions on how to bring up the lnet network or fix 
the problems?


Thanks,
Alastair.
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org



___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

Re: [lustre-discuss] How to activate an OST on a client ?

2024-08-29 Thread Hans Henrik Happe via lustre-discuss

Hi,

We just had a similar issue on 2.15.5. Infiniband clients not
reconnecting after a target outage.

Deleting the LNet net and importing the config again solved it without
reboot and unmount:

# letctl net del --net 02ib
# lnetctl import < /etc/lnet.conf

Cheers,
Hans Henrik

On 28/08/2024 18.18, Lixin Liu via lustre-discuss wrote:

We had the same problem after we upgraded Lustre servers from 2.12.8
to 2.15.3.

Clients were running 2.15.3 on CentOS 7. Random OST dropped out
frequently on

busy login nodes (almost daily), but less so on compute nodes. “lctl”
command

cannot active OSTs and reboot we the only way to clear the problem.

In June, we upgraded all client OS to AlmaLinux 9.3 and Lustre version
to 2.15.4 on

both servers and clients (missed 2.15.5 release by about 2 weeks).
After the upgrade,

we no longer have this problem.

In our case, I wonder this was OmniPath related. Servers on AlamLinux
8 was using

in kernel driver, but CentOS 7 clients are using driver from
Intel/Cornelis release.

Alma 9 clients are now also using in kernel driver.

Cheers,

Lixin.

*From: *lustre-discuss on
behalf of Cameron Harr via lustre-discuss

*Reply-To: *Cameron Harr
*Date: *Wednesday, August 28, 2024 at 8:19 AM
*To: *"lustre-discuss@lists.lustre.org"
*Subject: *Re: [lustre-discuss] How to activate an OST on a client ?

There's also an "lctl --device activate" that I've used in the
past though I don't know what conditions need to be for it to work.

On 8/27/24 07:46, Andreas Dilger via lustre-discuss wrote:

Hi Jan,

There is "lctl --device recover" that will trigger a
reconnect to the named OST device (per "lctl dl" output), but not
sure if that will help.

Cheers, Andreas

On Aug 22, 2024, at 06:36, Haarst, Jan van via lustre-discuss

wrote:

Hi,

Probably the wording of the subject doesn’t actually cover the
issue, what we see is this :

We have a client behind a router (linking tcp to Omnipath)
that shows an inactive OST (all on 2.15.5).

Other clients that go through the router do not have this issue.

One client had the same issue, although it showed a different
OST as inactive.

After a reboot, all was well again on that machine.

The clients can lctl ping the OSSs.

So although we have a workaround (reboot the client), it would
be nice to:

1. Fix the issue without a reboot
2. Fix the underlying issue.

It might be unrelated, but we also see another routing issue
every now and then:

The router stops routing request toward a certain OSS, and
this can be fixed by deleting the peer_nid of the OSS from the
router.

I am probably missing informative logs, but I’m more than
happy to try to generate them, if somebody has a pointer to how.

We are a bit stumped right now.

With kind regards,

Jan van Haarst

HPC Administrator

For Anunna/HPC questions, please use https://support.wur.nl

(with
HPC as service)

Aanwezig: maandag, dinsdag, donderdag & vrijdag

Facilitair Bedrijf, onderdeel van Wageningen University &
Research

Afdeling Informatie Technologie

Postbus 59, 6700 AB, Wageningen

Gebouw 116, Akkermaalsbos 12, 6700 WB, Wageningen

http://www.wur.nl/nl/Disclaimer.htm

___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

___

lustre-discuss mailing list

lustre-discuss@lists.lustre.org

https://urldefense.us/v3/__http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org__;!!G2kpM7uM-TzIFchu!1YPSOGUFPvipdg8HUxDkmcB7rvfUxuSATnKZq-9LFTP16TrMxtlrPe7m3ccX4BmKFoLsVnaKiIL3u4pxK2GT6mNJQIy33g$

___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.

[lustre-discuss] Issue after 2.15.5 upgrade

2024-08-01 Thread Hans Henrik Happe via lustre-discuss


Hi,

Last week we upgraded to Lustre 2.15.5 from 2.12.9. It went almost 
without any issues. However, clients using TCP logs this message, when 
mounting one of the two filesystems:


Issue #1:
-

Aug  1 09:39:41 fend08 kernel: Lustre: Lustre: Build Version: 2.15.5
Aug  1 09:39:41 fend08 kernel: LustreError: 
31623:0:(mgc_request.c:1566:mgc_apply_recover_logs()) mgc: cannot find 
UUID by nid '10.21.10.122@o2ib': rc = -2
Aug  1 09:39:41 fend08 kernel: Lustre: 
31623:0:(mgc_request.c:1784:mgc_process_recover_nodemap_log()) 
MGC172.20.10.101@tcp1: error processing recovery log hpc-cliir: rc = -2
Aug  1 09:39:41 fend08 kernel: Lustre: 
31623:0:(mgc_request.c:2150:mgc_process_log()) MGC172.20.10.101@tcp1: IR 
log hpc-cliir failed, not fatal: rc = -2
Aug  1 09:39:41 fend08 root[31712]: ksocklnd-config: skip setting up 
route for bond0: don't overwrite existing route

Aug  1 09:39:42 fend08 kernel: Lustre: Mounted hpc-client

This is not happening when using Infiniband.

How can we fix this?


Issue #2 (might or might not be related):
-

The status of target connections after mounting is:

# lfs check all
hpc-OST0003-osc-90532327f000 active.
hpc-OST0004-osc-90532327f000 active.
hpc-OST0005-osc-90532327f000 active.
hpc-OST0006-osc-90532327f000 active.
lfs check: error: check 'hpc-OST0007-osc-90532327f000': Resource 
temporarily unavailable (11)
lfs check: error: check 'hpc-OST0008-osc-90532327f000': Resource 
temporarily unavailable (11)

hpc-OST0009-osc-90532327f000 active.
hpc-OST000a-osc-90532327f000 active.
hpc-OST000b-osc-90532327f000 active.
hpc-OST000c-osc-90532327f000 active.
hpc-OST000d-osc-90532327f000 active.
hpc-OST000e-osc-90532327f000 active.
hpc-MDT-mdc-90532327f000 active.
MGC172.20.10.101@tcp1 active.

OST000[7-e] are on host 172.20.10.122@tcp1 (10.21.10.122@o2ib).

Due to this situation it hangs when hitting OST000[7-8].

Unmounting and mounting it again clear the error on OST000[7-8] and make 
it usable (Issue #1 still showing). With a clean LNet start the issue 
comes back.


Disabling 'discovery' in LNet makes this issue go away (Issue #1 still 
showing).


Reverting to Lustre 2.15.3 also makes it go away (Issue #1 still 
showing). Perhaps all the TCP issues in 2.15.4 was not fixed by LU-17664.



A few notes about our system:
--

- It's ZFS based.
- It was created back in 2015. MGS, and MDTs have survived since then 
(zfs send/receive), while new OSTs have been added over time an old ones 
have been taken out.

- There are 2 filesystems on an MDS pair. One MDT on each MDS.
- Dual network stack with Infiniband and TCP. For historical reasons we 
are using tcp1 and not the default tcp0. No routers.


Cheers,
Hans Henrik___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

Re: [lustre-discuss] 2.15.4 hangs during mount using TCP

2024-04-05 Thread Hans Henrik Happe via lustre-discuss


Hi,

I'm happy to report that LU-17664 fixed this.

Cheers,
Hans Henrik

On 22/03/2024 16.23, Hans Henrik Happe via lustre-discuss wrote:

Hi,

After updating to lustre 2.15.4 I've had trouble mounting over TCP. 
Using Infiniband works fine, but over TCP it just hangs without errors 
on client or servers.


OS is Rocky 9.2 on client and CentOS 7.9 on servers running 2.12.9.

Rocky 9.2 + 2.15.3 works, but both Rocky 9.2 and 9.3 with 2.15.4 hangs.

Anyone having the same issue?

A few notes about our system:

- It's ZFS based.
- It was created back in 2015. MGS, and MDTs have survived since then 
(zfs send/receive), while new OSTs have been added over time an old 
ones have been taken out.
- There are 2 filesystems on an MDS pair. One MDT on each MDS. Both 
have the hanging problem.
- Dual network stack with Infiniband and TCP. For historical reasons 
we are using tcp1 and not the default tcp0. No routers.


I'll dive into getting more debugging info out. Any pointers on how to 
do this efficiently would be much appreciated.


Cheers,
Hans Henrik



___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

[lustre-discuss] 2.15.4 hangs during mount using TCP

2024-03-22 Thread Hans Henrik Happe via lustre-discuss


Hi,

After updating to lustre 2.15.4 I've had trouble mounting over TCP. 
Using Infiniband works fine, but over TCP it just hangs without errors 
on client or servers.


OS is Rocky 9.2 on client and CentOS 7.9 on servers running 2.12.9.

Rocky 9.2 + 2.15.3 works, but both Rocky 9.2 and 9.3 with 2.15.4 hangs.

Anyone having the same issue?

A few notes about our system:

- It's ZFS based.
- It was created back in 2015. MGS, and MDTs have survived since then 
(zfs send/receive), while new OSTs have been added over time an old ones 
have been taken out.
- There are 2 filesystems on an MDS pair. One MDT on each MDS. Both have 
the hanging problem.
- Dual network stack with Infiniband and TCP. For historical reasons we 
are using tcp1 and not the default tcp0. No routers.


I'll dive into getting more debugging info out. Any pointers on how to 
do this efficiently would be much appreciated.


Cheers,
Hans Henrik

___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

Re: [lustre-discuss] ZFS Support For Lustre

2023-02-18 Thread Hans Henrik Happe via lustre-discuss


Hi,

The repos, in general, only work for the kernel they were build for. 
This will be the supported kernel by the release. Look in the changelog. 
For 2.15.1:


https://wiki.lustre.org/Lustre_2.15.1_Changelog

To make newer kernels work you have to compile yourself and that might 
not even work without patching and hasn't gone through testing.


You are better of following the supported kernel on the servers. The 
client is usually more likely to compile/work on newer kernels, but 
patching might be needed.


Cheers,
Hans Henrik

On 14.02.2023 13.16, Nick dan via lustre-discuss wrote:

Hi

I am using Lustre Version 2.15.1 on RedHat 8.8
As mentioned in the link, 
https://wiki.whamcloud.com/display/PUB/Lustre+Support+Matrix , the ZFS 
Version required is 2.1.2.
However, when I am trying to install ZFS from 
https://downloads.whamcloud.com/public/lustre/lustre-2.15.1/el8.6/server/RPMS/x86_64/

I am getting the following error
[root@st01 user]# yum install 
https://downloads.whamcloud.com/public/lustre/lustre-2.15.1/el8.6/server/RPMS/x86_64/zfs-2.1.2-1.el8.x86_64.rpm

Updating Subscription Management repositories.
Last metadata expiration check: 2:01:30 ago on Tue 14 Feb 2023 
03:38:48 PM IST.

zfs-2.1.2-1.el8.x86_64.rpm                  248 kB/s | 649 kB     00:02
Error:
 Problem: conflicting requests
  - nothing provides zfs-kmod = 2.1.2 needed by zfs-2.1.2-1.el8.x86_64
(try to add '--skip-broken' to skip uninstallable packages or 
'--nobest' to use not only best candidate packages)


I have installed the other required packages like libzfs, libzpool, 
libnvpair, libutil.


I am not able to download kmod-zfs version 2.1.2, as the latest 
version getting downloaded is 2.1.9


Can you help with this or suggest another way to download all 
supported ZFS Packages?


Thanks,
Nick Dan

___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

Re: [lustre-discuss] NVMe-over-fabric with Lustre

2023-01-09 Thread Hans Henrik Happe via lustre-discuss


On 06.01.2023 09.12, Nick dan via lustre-discuss wrote:

Hi

Can you give a detailed document/information about NVMe-oF using the 
Lustre File System.


NVMe-oF will provide you with a block device. Use that as a block device 
for Lustre. ldisk/zfs use block devices to create the filesystems on top 
of which Lustre create the shared parallel filesystem.


Cheers,
Hans Henrik
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

Re: [lustre-discuss] Lustre recycle bin

2022-10-16 Thread Hans Henrik Happe via lustre-discuss


Hi Francois,

Perhaps, you could try lfsck with the -o option to find orphan objects 
(man lctl-lfsck-start).


Cheers,
Hans Henrik

On 17.10.2022 08.01, Cloete, F. (Francois) via lustre-discuss wrote:


Hi Andreas,

Our OSTs still display high file-system usage after removing folders.

Are there any commands that could be run to confirm if the allocated 
space which was used by those files have been released successfully ?


Thanks

Francois

*From:*Andreas Dilger 
*Sent:* Saturday, 15 October 2022 00:20
*To:* Cloete, F. (Francois) 
*Cc:* lustre-discuss@lists.lustre.org
*Subject:* Re: [lustre-discuss] Lustre recycle bin




You don't often get email from adil...@whamcloud.com. Learn why this 
is important 




*CAUTION - EXTERNAL SENDER -***Please be careful when opening links 
and attachments. Nedbank - IT Information Security Department (ISD)


There isn't a recycle bin, but filenames are deleted from the 
filesystem quickly and the data objects are deleted in the background 
asynchronously (with transactions to prevent the space being leaked). 
 If there are a lot of files this may take some time, rebooting will 
not speed it up.




On Oct 14, 2022, at 10:00, Cloete, F. (Francois) via
lustre-discuss  wrote:

Hi Community,

Is anyone aware of a recycle bin parameter for Lustre?

Just deleted a whole lot of files but for some reason the space is
not getting cleared.

Server rebooted, file-system un-mounted etc.

Thanks


*Nedbank Limited group of companies (Nedbank) disclaimer and
confidentiality notice:*

This email, including any attachments (email), contains
information that is confidential and is meant only for intended
recipients. You may not share or copy the email or any part of it,
unless the sender has specifically allowed you to do so. If you
are not an intended recipient, please delete the email permanently
and let Nedbank know that you have deleted it by replying to the
sender or calling the Nedbank Contact Centre on 0860 555 111.

This email is not confirmation of a transaction or a Nedbank
statement and is not offering or inviting anyone to take up any
financial products or services, unless the content specifically
indicates that it does so. Nedbank will not be liable for any
errors or omissions in this email. The views and opinions are
those of the author and not necessarily those of Nedbank.

The names of the Nedbank Board of Directors and Company Secretary
are available here: www.nedbank.co.za/terms/DirectorsNedbank.htm
. Nedbank Ltd
Reg No 1951/09/06.


___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

Cheers, Andreas

--

Andreas Dilger

Lustre Principal Architect

Whamcloud


___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

Re: [lustre-discuss] DOM Configuration

2022-10-05 Thread Hans Henrik Happe via lustre-discuss


Hi Francisco,

Just to make sure. You are aware that it will only work for new 
files/dirs under "/folder_name/"? Also, only small files or the first 
part of a large file (<32M) will get a performance benefit.


Cheers,
Hans Henrik


On 30.09.2022 14.16, Cloete, F. (Francois) via lustre-discuss wrote:


Good day Lustre Community.

Would appreciate any guidance regarding best practices on the below 
DOM Lustre config in our environment.


We have applied DOM on our Application file-systems thinking that this 
would give us a performance benefit in our environment.


We have not noticed any significant performance benefit after applying 
these changes.


Below is the setstripe command used for the folders mentioned above.

/lfs setstripe -E 32M -L mdt -E -1 -S 4M folder_name/

//

/Below are the versions we are using./

/[root@server1 ~]# rpm -qa|grep -i lustre/

/kmod-lustre-osd-ldiskfs-2.12.8_6_g5457c37-1.el7.x86_64/

/kernel-3.10.0-1160.49.1.el7_lustre.x86_64/

/kernel-headers-3.10.0-1160.49.1.el7_lustre.x86_64/

/lustre-tests-2.12.8_6_g5457c37-1.el7.x86_64/

/lustre-osd-ldiskfs-mount-2.12.8_6_g5457c37-1.el7.x86_64/

/lustre-iokit-2.12.8_6_g5457c37-1.el7.x86_64/

/lustre-resource-agents-2.12.8_6_g5457c37-1.el7.x86_64/

/kmod-lustre-2.12.8_6_g5457c37-1.el7.x86_64/

/kmod-lustre-tests-2.12.8_6_g5457c37-1.el7.x86_64/

/lustre-2.12.8_6_g5457c37-1.el7.x86_64/

//

//

/[root@server ~]# cat /etc/*release*/

/NAME="Red Hat Enterprise Linux Server"/

/VERSION="7.9 (Maipo)"/

/ID="rhel"/

/ID_LIKE="fedora"/

/VARIANT="Server"/

/VARIANT_ID="server"/

/VERSION_ID="7.9"/

/PRETTY_NAME="Red Hat Enterprise Linux"/

/ANSI_COLOR="0;31"/

/CPE_NAME="cpe:/o:redhat:enterprise_linux:7.9:GA:server"/

/HOME_URL=https://www.redhat.com//

/BUG_REPORT_URL=https://bugzilla.redhat.com//

//

/REDHAT_BUGZILLA_PRODUCT="Red Hat Enterprise Linux 7"/

/REDHAT_BUGZILLA_PRODUCT_VERSION=7.9/

/REDHAT_SUPPORT_PRODUCT="Red Hat Enterprise Linux"/

/REDHAT_SUPPORT_PRODUCT_VERSION="7.9"/

/Red Hat Enterprise Linux Server release 7.9 (Maipo)/

/Red Hat Enterprise Linux Server release 7.9 (Maipo)/

/cpe:/o:redhat:enterprise_linux:7.9:ga:server/

//

/We are using ldiskfs on rhel7.9 VM’s in azure./

Thanks

Francois



*Nedbank Limited group of companies (Nedbank) disclaimer and 
confidentiality notice:*


This email, including any attachments (email), contains information 
that is confidential and is meant only for intended recipients. You 
may not share or copy the email or any part of it, unless the sender 
has specifically allowed you to do so. If you are not an intended 
recipient, please delete the email permanently and let Nedbank know 
that you have deleted it by replying to the sender or calling the 
Nedbank Contact Centre on 0860 555 111.


This email is not confirmation of a transaction or a Nedbank statement 
and is not offering or inviting anyone to take up any financial 
products or services, unless the content specifically indicates that 
it does so. Nedbank will not be liable for any errors or omissions in 
this email. The views and opinions are those of the author and not 
necessarily those of Nedbank.


The names of the Nedbank Board of Directors and Company Secretary are 
available here: www.nedbank.co.za/terms/DirectorsNedbank.htm. Nedbank 
Ltd Reg No 1951/09/06.




___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

Re: [lustre-discuss] How to speed up Lustre

2022-07-06 Thread Hans Henrik Happe via lustre-discuss


I haven't tried it, but the man page for setstripe  --pool explains it.

Cheers,
Hans Henrik

On 06.07.2022 22.50, Thomas Roth via lustre-discuss wrote:

Yes, I got it.
But Marion states that they switched
> to a PFL arrangement, where the first 64k lives on flash OST's 
(mounted on our metadata servers), and the remainder of larger files 
lives on HDD OST's.


So, how do you specify a particular OSTs (or group of OSTs) in a PFL?
The OST-equivalent of the "-L mdt" part ?

With SSDs and HDDs making up the OSTs, I would have guessed OST pools, 
but I'm only aware of a "lfs setstripe" that puts all of my file into 
a pool. How to put the first few kB of a file in pool A and the rest 
in pool B ?



Cheers
Thomas


On 7/6/22 21:42, Andreas Dilger wrote:

Thomas,
where the file data is stored depends entirely on the PFL layout used 
for the filesystem or parent directory.


For DoM files, you need to specify a DoM component, like:

 lfs setstripe -E 64K -L mdt -E 1G -c 1 -E 16G -c 4 -E eof -c 32 



so the first 64KB will be put onto the MDT where the file is created, 
the remaining 1GB onto a single OST, the next 15GB striped across 4 
OSTs, and the rest of the file striped across (up to) 32 OSTs.


64KB is the minimum DoM component size, but if the files are smaller 
(e.g. 3KB) they will only allocate space on the MDT in multiples of 
4KB blocks.  However, the default ldiskfs MDT formatting only leaves 
about 1 KB of space per inode, which would quickly run out unless DoM 
is restricted to specific directories with small files, or if the MDT 
is formatted with enough free space to accommodate this usage.  This 
is less of an issue with ZFS MDTs, but DoM files will still consume 
space much more quickly and reduce the available inode count by a 
factor of 16-64 more quickly than without DoM.


It is strongly recommended to use Lustre 2.15 with DoM to benefit 
from the automatic MDT space balancing, otherwise the MDT usage may 
become imbalanced if the admin (or users) do not actively manage the 
MDT selection for new user/project/job directories with "lfs mkdir -i".


Cheers, Andreas

On Jul 6, 2022, at 10:48, Thomas Roth via lustre-discuss 
mailto:lustre-discuss@lists.lustre.org>> 
wrote:


Hi Marion,

I do not fully understand how to "mount flash OSTs on a metadata server"
- You have a couple of SSDs, you assemble these into on block device 
and format it with "mkfs.lustre --ost ..." ? And then mount it just 
as any other OST?
- PFL then puts the first 64k on these OSTs and the rest of all files 
on the HDD-based OSTs?

So, no magic on the MDS?

I'm asking because we are considering something similar, but we would 
not have these flash-OSTs in the MDS-hardware but on separate OSS 
servers.



Regards,
Thomas

On 23/02/2022 04.35, Marion Hakanson via lustre-discuss wrote:
Hi again,
kara...@aselsan.com.tr said:
I was thinking that DoM is built in feature and it can be 
enabled/disabled
online for a certain directories. What do you mean by reformat to 
converting
to DoM (or away from it). I think just Metadata target size is 
important.

When we first turned on DoM, it's likely that our Lustre system was old
enough to need to be reformatted in order to support it.  Our flash
storage RAID configuration also needed to be expanded, but the system
was not yet in production so a reformat was no big deal at the time.
So perhaps your system will not be subject to this requirement (other
than expanding your MDT flash somehow).
kara...@aselsan.com.tr said:
I also thought creating flash OST on metadata server. But I was not 
sure what
to install on metadata server for this purpose. Can Metadata server 
be an OSS

server at the same time? If it is possible I would prefer flash OST on
Metadata server instead of DoM. Because Our metadata target size is 
small, it

seems I have to do risky operations to expand size.
Yes, our metadata servers are also OSS's at the same time.  The flash
OST's are separate volumes (and drives) from the MDT's, so less scary 
(:-).

kara...@aselsan.com.tr said:
imho, because of the less RPC traffic DoM shows more performance than 
flash

OST. Am I right?
The documentation does say there that using DoM for small files will 
produce

less RPC traffic than using OST's for small files.
But as I said earlier, for us, the amount of flash needed to support DoM
was a lot higher than with the flash OST approach (we have a high 
percentage,

by number, of small files).
I'll also note that we had a wish to mostly "set and forget" the layout
for our Lustre filesystem.  We have not figured out a way to predict
or control where small files (or large ones) are going to end up, so
trying to craft optimal layouts in particular directories for particular
file sizes has turned out to not be feasible for us.  PFL has been a
win for us here, for that reason.
Our conclusion was that in order to take advantage of the perfo

[lustre-discuss] Target index choice

2022-04-08 Thread Hans Henrik Happe via lustre-discuss


Hi,

Is or will there be a downside to choosing discontinuous index numbers? 
I.e. encode OSS number YY and target number XX like 0xYYXX.


I guess it could hurt if layouts are packed to save space.

Cheers,
Hans Henrik___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

Re: [lustre-discuss] Dependency issue with Lustre+ZFS support

2022-04-06 Thread Hans Henrik Happe via lustre-discuss

Could it be that you have installed a ZFS 2.0.7, which clashes with the 
needed 0.7.13 in the Lustre repo?


I have some documentation that targets our own specific use. It explains 
how to compile ZFS and Lustre:


https://github.com/ucphhpc/storage

Note: It's still very unpolished.

Cheers,
Hans Henrik

On 05.04.2022 01.38, Finn Rawles Malliagh via lustre-discuss wrote:

Hi all,

I am currently trying to install Lustre with ZFS support using the 
steps set out by https://wiki.lustre.org/Installing_the_Lustre_Software


I reached up to step 4.2.6 where I was shown this error after running 
the command below.


Does anybody have any idea on how to solve this dependency issue?

I am running CentOS 7.9 (I have tried the latest kernel as well as the 
custom Lustre kernel with the same problems). Both times have been a 
fresh install of CentOS 7.9.


If anyone also has a start to finish guide for dummies on how to 
install Lustre with ZFS support that is up to date I would very much 
appreciate it. It seems like a lot of resources online are out of date 
or missing steps.


[root@mgs-mds x86_64]# yum --nogpgcheck --enablerepo=lustre-server 
install lustre-dkms lustre-osd-zfs-mount lustre lustre-resource-agents zfs

Loaded plugins: fastestmirror
Loading mirror speeds from cached hostfile
 * base: mirror.pulsant.com 
 * epel: mirror.hostnet.nl 
 * extras: mirror.mhd.uk.as44574.net 
 * updates: mirror.pulsant.com 
Resolving Dependencies
--> Running transaction check
---> Package lustre.x86_64 0:2.12.8_6_g5457c37-1.el7 will be installed
---> Package lustre-osd-zfs-mount.x86_64 0:2.12.8_6_g5457c37-1.el7 
will be installed
--> Processing Dependency: libzfs.so.2()(64bit) for package: 
lustre-osd-zfs-mount-2.12.8_6_g5457c37-1.el7.x86_64
Package libzfs2 is obsoleted by libzfs4, but obsoleting package does 
not provide for requirements
--> Processing Dependency: libnvpair.so.1()(64bit) for package: 
lustre-osd-zfs-mount-2.12.8_6_g5457c37-1.el7.x86_64
Package libnvpair1-0.8.6-1.el7.x86_64 is obsoleted by 
libnvpair3-2.0.7-1.el7.x86_64 which is already installed
---> Package lustre-resource-agents.x86_64 0:2.12.8_6_g5457c37-1.el7 
will be installed
--> Processing Dependency: resource-agents for package: 
lustre-resource-agents-2.12.8_6_g5457c37-1.el7.x86_64
---> Package lustre-zfs-dkms.noarch 0:2.12.8_6_g5457c37-1.el7 will be 
installed

---> Package zfs.x86_64 0:2.0.7-1.el7 will be installed
--> Processing Dependency: libzpool4 = 2.0.7 for package: 
zfs-2.0.7-1.el7.x86_64
--> Processing Dependency: libzfs4 = 2.0.7 for package: 
zfs-2.0.7-1.el7.x86_64
--> Processing Dependency: libzpool.so.4()(64bit) for package: 
zfs-2.0.7-1.el7.x86_64
--> Processing Dependency: libzfs_core.so.3()(64bit) for package: 
zfs-2.0.7-1.el7.x86_64
--> Processing Dependency: libzfs.so.4()(64bit) for package: 
zfs-2.0.7-1.el7.x86_64

--> Running transaction check
---> Package libzfs4.x86_64 0:2.0.7-1.el7 will be installed
---> Package libzpool4.x86_64 0:2.0.7-1.el7 will be installed
---> Package lustre-osd-zfs-mount.x86_64 0:2.12.8_6_g5457c37-1.el7 
will be installed
--> Processing Dependency: libzfs.so.2()(64bit) for package: 
lustre-osd-zfs-mount-2.12.8_6_g5457c37-1.el7.x86_64
Package libzfs2 is obsoleted by libzfs4, but obsoleting package does 
not provide for requirements
--> Processing Dependency: libnvpair.so.1()(64bit) for package: 
lustre-osd-zfs-mount-2.12.8_6_g5457c37-1.el7.x86_64
Package libnvpair1-0.8.6-1.el7.x86_64 is obsoleted by 
libnvpair3-2.0.7-1.el7.x86_64 which is already installed

---> Package resource-agents.x86_64 0:4.1.1-61.el7_9.15 will be installed
--> Processing Dependency: psmisc for package: 
resource-agents-4.1.1-61.el7_9.15.x86_64
--> Processing Dependency: /usr/sbin/rpc.nfsd for package: 
resource-agents-4.1.1-61.el7_9.15.x86_64
--> Processing Dependency: /usr/sbin/rpc.mountd for package: 
resource-agents-4.1.1-61.el7_9.15.x86_64
--> Processing Dependency: /usr/sbin/mount.cifs for package: 
resource-agents-4.1.1-61.el7_9.15.x86_64
--> Processing Dependency: /usr/sbin/fuser for package: 
resource-agents-4.1.1-61.el7_9.15.x86_64
--> Processing Dependency: /sbin/rpc.statd for package: 
resource-agents-4.1.1-61.el7_9.15.x86_64
--> Processing Dependency: /sbin/mount.nfs4 for package: 
resource-agents-4.1.1-61.el7_9.15.x86_64
--> Processing Dependency: /sbin/mount.nfs for package: 
resource-agents-4.1.1-61.el7_9.15.x86_64

--> Running transaction check
---> Package cifs-utils.x86_64 0:6.2-10.el7 will be installed
--> Processing Dependency: keyutils for package: 
cifs-utils-6.2-10.el7.x86_64
---> Package lustre-osd-zfs-mount.x86_64 0:2.12.8_6_g5457c37-1.el7 
will be installed
--> Processing Dependency: libzfs.so.2()(64bit) for package: 
lustre-osd-zfs-mount-2.12.8_6_g5457c37-1.el7.x86_64
Package libzfs2 is obsoleted by libzfs4, but obsoleting package does 
not provide for requirements
--> Processing Depen

Re: [lustre-discuss] MDT will not mount

2022-03-14 Thread Hans Henrik Happe via lustre-discuss

I'm happy to that the problem seems to be solved by deleting the 
CATALOGS file on the underlying MDT ZFS fs. As I gather from the manual 
[1] this should not be a problem, because it will be handled by LFSCK.


If I'm wrong about this, please let me know. Also, I'm happy to provide 
any information from this MDT to help asses if there is a bug somewhere.


LFSCK is running as we speak.

Cheers,
Hans Henrik

[1] https://doc.lustre.org/lustre_manual.xhtml#backup_fs_level.restore

On 11.03.2022 12.49, Hans Henrik Happe via lustre-discuss wrote:
I tried tunefs.lustre --erase-params --writeconf the targets. Guess it 
is not great because the clients were not unmounted, but I made sure 
they are not trying to connect.


This makes it possible to mount the MDT, but when the first OST mount 
starts the MDT has a lot of errors. After starting the second OST the 
MDS crashes (syslog attached).


Cheers,
Hans Henrik

On 10.03.2022 15.48, Hans Henrik Happe via lustre-discuss wrote:
Sorry for all the mail load, but I hope this info can help figuring 
out what's wrong and determine if this was caused by a bug. I think


I read the CONFIGS on the MDT with llog_reader. See attachments.

Cheers,
Hans Henrik

On 10.03.2022 12.23, Hans Henrik Happe via lustre-discuss wrote:
After upgrading to Lustre 2.12.8 I found that the first mount after 
a reboot behaves differently:


Mounting mds02/astro0 on /mnt/lustre/local/astro-MDT
mount.lustre: mount mds02/astro0 at /mnt/lustre/local/astro-MDT 
failed: No space left on device


And a different syslog output (attached syslog-0).

Doing the mount again has this error:

Mounting mds02/astro0 on /mnt/lustre/local/astro-MDT
mount.lustre: mount mds02/astro0 at /mnt/lustre/local/astro-MDT 
failed: File exists


And a syslog like the one first posted. Attached the new output in 
syslog-1.


Finally, stopping Lustre (Only MGS in this case) and the lnet 
service does free resources making lustre_rmmod fail:


# lustre_rmmod
rmmod: ERROR: Module osp is in use


Cheers,
Hans Henrik

On 10.03.2022 11.15, Hans Henrik Happe via lustre-discuss wrote:
Forgot to say this is Lustre 2.12.6 and CentOS 7.9 
(3.10.0-1160.6.1.el7.x86_64).


On 10.03.2022 10.27, Hans Henrik Happe via lustre-discuss wrote:

Hi,

A reboot of the MDS stalled and got forced reset. After that the 
MDS would not start. The syslog is attached.


I'm not sure what the "class_register_device()) 
astro-OST0002-osc-MDT" part is supposed to do but 
astro-OST0002 is not mounted at this time. I guess this comes from 
the MGS.


Cheers,
Hans Henrik








___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

Re: [lustre-discuss] MDT will not mount

2022-03-11 Thread Hans Henrik Happe via lustre-discuss

I tried tunefs.lustre --erase-params --writeconf the targets. Guess it 
is not great because the clients were not unmounted, but I made sure 
they are not trying to connect.


This makes it possible to mount the MDT, but when the first OST mount 
starts the MDT has a lot of errors. After starting the second OST the 
MDS crashes (syslog attached).


Cheers,
Hans Henrik

On 10.03.2022 15.48, Hans Henrik Happe via lustre-discuss wrote:
Sorry for all the mail load, but I hope this info can help figuring 
out what's wrong and determine if this was caused by a bug. I think


I read the CONFIGS on the MDT with llog_reader. See attachments.

Cheers,
Hans Henrik

On 10.03.2022 12.23, Hans Henrik Happe via lustre-discuss wrote:
After upgrading to Lustre 2.12.8 I found that the first mount after a 
reboot behaves differently:


Mounting mds02/astro0 on /mnt/lustre/local/astro-MDT
mount.lustre: mount mds02/astro0 at /mnt/lustre/local/astro-MDT 
failed: No space left on device


And a different syslog output (attached syslog-0).

Doing the mount again has this error:

Mounting mds02/astro0 on /mnt/lustre/local/astro-MDT
mount.lustre: mount mds02/astro0 at /mnt/lustre/local/astro-MDT 
failed: File exists


And a syslog like the one first posted. Attached the new output in 
syslog-1.


Finally, stopping Lustre (Only MGS in this case) and the lnet service 
does free resources making lustre_rmmod fail:


# lustre_rmmod
rmmod: ERROR: Module osp is in use


Cheers,
Hans Henrik

On 10.03.2022 11.15, Hans Henrik Happe via lustre-discuss wrote:
Forgot to say this is Lustre 2.12.6 and CentOS 7.9 
(3.10.0-1160.6.1.el7.x86_64).


On 10.03.2022 10.27, Hans Henrik Happe via lustre-discuss wrote:

Hi,

A reboot of the MDS stalled and got forced reset. After that the 
MDS would not start. The syslog is attached.


I'm not sure what the "class_register_device()) 
astro-OST0002-osc-MDT" part is supposed to do but astro-OST0002 
is not mounted at this time. I guess this comes from the MGS.


Cheers,
Hans Henrik





Mar 11 12:42:04 mds02 kernel: Lustre: MGS: Logs for fs astro were removed by 
user request.  All servers must be restarted in order to regenerate the logs: 
rc = 0
Mar 11 12:42:04 mds02 kernel: Lustre: astro-MDT: nosquash_nids set to 
172.20.1.10@tcp1
Mar 11 12:42:04 mds02 kernel: Lustre: astro-MDT: Imperative Recovery not 
enabled, recovery window 300-900
Mar 11 12:42:29 mds02 kernel: Lustre: astro-MDT: Connection restored to 
0d2c198e-514c-3ae5-fc31-48e0424f131d (at 0@lo)
Mar 11 12:42:46 mds02 systemd: Started Session c4 of user root.
Mar 11 12:42:51 mds02 kernel: Lustre: MGS: Connection restored to 
b11aa8af-1dd3-d728-0e81-6f595456b689 (at 10.21.10.114@o2ib)
Mar 11 12:42:51 mds02 kernel: Lustre: MGS: Regenerating astro-OST log by 
user request: rc = 0
Mar 11 12:42:58 mds02 kernel: Lustre: 
10971:0:(llog_cat.c:93:llog_cat_new_log()) astro-OST-osc-MDT: there are 
no more free slots in catalog [0x186:0x1:0x0]:0
Mar 11 12:42:58 mds02 kernel: LustreError: 
10971:0:(osp_sync.c:1524:osp_sync_init()) astro-OST-osc-MDT: can't 
initialize llog: rc = -28
Mar 11 12:42:58 mds02 kernel: LustreError: 
10971:0:(obd_config.c:559:class_setup()) setup astro-OST-osc-MDT failed 
(-28)
Mar 11 12:42:58 mds02 kernel: LustreError: 
10971:0:(obd_config.c:1835:class_config_llog_handler()) MGC10.21.10.102@o2ib: 
cfg command failed: rc = -28
Mar 11 12:42:58 mds02 kernel: Lustre:cmd=cf003 0:astro-OST-osc-MDT  
1:astro-OST_UUID  2:10.21.10.114@o2ib  
Mar 11 12:42:58 mds02 kernel: LustreError: 
9282:0:(mgc_request.c:599:do_requeue()) failed processing log: -28
Mar 11 12:44:16 mds02 kernel: Lustre: MGS: Connection restored to 
9842fe3a-0ff5-afc6-292f-cff60a4897ba (at 10.21.10.115@o2ib)
Mar 11 12:44:16 mds02 kernel: Lustre: Skipped 1 previous similar message
Mar 11 12:44:16 mds02 kernel: Lustre: MGS: Regenerating astro-OST0001 log by 
user request: rc = 0
Mar 11 12:44:25 mds02 kernel: LustreError: 
11466:0:(obd_config.c:764:class_add_conn()) try to add conn on immature client 
dev

Message from syslogd@mds02 at Mar 11 12:44:25 ...
 kernel:LustreError: 11466:0:(lod_lov.c:244:lod_add_device()) ASSERTION( 
obd->obd_lu_dev->ld_site == lod->lod_dt_dev.dd_lu_dev.ld_site ) failed: 
Mar 11 12:44:25 mds02 kernel: LustreError: 
11466:0:(lod_lov.c:244:lod_add_device()) ASSERTION( obd->obd_lu_dev->ld_site == 
lod->lod_dt_dev.dd_lu_dev.ld_site ) failed: 

Message from syslogd@mds02 at Mar 11 12:44:25 ...
 kernel:LustreError: 11466:0:(lod_lov.c:244:lod_add_device()) LBUG
Mar 11 12:44:25 mds02 kernel: LustreError: 
11466:0:(lod_lov.c:244:lod_add_device()) LBUG
Mar 11 12:44:25 mds02 kernel: Pid: 11466, comm: llog_process_th 
3.10.0-1160.45.1.el7.x86_64 #1 SMP Wed Oct 13 17:20:51 UTC 2021
Mar 11 12:44:25 mds02 kernel: Call Trace:
Mar 11 12:44:25 mds02 kernel: [] libcfs_call_trace+0x8c/0xc0 
[libcfs]
Mar 11 12:44:25 mds02 kernel: [] lbug_with_loc+0x4c/

Re: [lustre-discuss] MDT will not mount

2022-03-10 Thread Hans Henrik Happe via lustre-discuss

Sorry for all the mail load, but I hope this info can help figuring out 
what's wrong and determine if this was caused by a bug. I think


I read the CONFIGS on the MDT with llog_reader. See attachments.

Cheers,
Hans Henrik

On 10.03.2022 12.23, Hans Henrik Happe via lustre-discuss wrote:
After upgrading to Lustre 2.12.8 I found that the first mount after a 
reboot behaves differently:


Mounting mds02/astro0 on /mnt/lustre/local/astro-MDT
mount.lustre: mount mds02/astro0 at /mnt/lustre/local/astro-MDT 
failed: No space left on device


And a different syslog output (attached syslog-0).

Doing the mount again has this error:

Mounting mds02/astro0 on /mnt/lustre/local/astro-MDT
mount.lustre: mount mds02/astro0 at /mnt/lustre/local/astro-MDT 
failed: File exists


And a syslog like the one first posted. Attached the new output in 
syslog-1.


Finally, stopping Lustre (Only MGS in this case) and the lnet service 
does free resources making lustre_rmmod fail:


# lustre_rmmod
rmmod: ERROR: Module osp is in use


Cheers,
Hans Henrik

On 10.03.2022 11.15, Hans Henrik Happe via lustre-discuss wrote:
Forgot to say this is Lustre 2.12.6 and CentOS 7.9 
(3.10.0-1160.6.1.el7.x86_64).


On 10.03.2022 10.27, Hans Henrik Happe via lustre-discuss wrote:

Hi,

A reboot of the MDS stalled and got forced reset. After that the MDS 
would not start. The syslog is attached.


I'm not sure what the "class_register_device()) 
astro-OST0002-osc-MDT" part is supposed to do but astro-OST0002 
is not mounted at this time. I guess this comes from the MGS.


Cheers,
Hans Henrik

rec #1 type=1062 len=224 offset 8192
rec #2 type=1062 len=136 offset 8416
rec #3 type=1062 len=176 offset 8552
rec #4 type=1062 len=224 offset 8728
rec #5 type=1062 len=224 offset 8952
rec #6 type=1062 len=120 offset 9176
rec #7 type=1062 len=112 offset 9296
rec #8 type=1062 len=160 offset 9408
rec #9 type=1062 len=224 offset 9568
rec #10 type=1062 len=224 offset 9792
rec #11 type=1062 len=88 offset 10016
rec #12 type=1062 len=88 offset 10104
rec #13 type=1062 len=88 offset 10192
rec #14 type=1062 len=144 offset 10280
rec #15 type=1062 len=152 offset 10424
rec #16 type=1062 len=88 offset 10576
rec #17 type=1062 len=88 offset 10664
rec #18 type=1062 len=88 offset 10752
rec #19 type=1062 len=120 offset 10840
rec #20 type=1062 len=88 offset 10960
rec #21 type=1062 len=88 offset 11048
rec #22 type=1062 len=88 offset 11136
rec #23 type=1062 len=120 offset 11224
rec #24 type=1062 len=136 offset 11344
rec #25 type=1062 len=224 offset 11480
rec #26 type=1062 len=224 offset 11704
rec #27 type=1062 len=88 offset 11928
rec #28 type=1062 len=88 offset 12016
rec #29 type=1062 len=88 offset 12104
rec #30 type=1062 len=144 offset 12192
rec #31 type=1062 len=152 offset 12336
rec #32 type=1062 len=88 offset 12488
rec #33 type=1062 len=88 offset 12576
rec #34 type=1062 len=88 offset 12664
rec #35 type=1062 len=120 offset 12752
rec #36 type=1062 len=88 offset 12872
rec #37 type=1062 len=88 offset 12960
rec #38 type=1062 len=88 offset 13048
rec #39 type=1062 len=120 offset 13136
rec #40 type=1062 len=136 offset 13256
rec #41 type=1062 len=224 offset 13392
rec #42 type=1062 len=224 offset 13616
rec #43 type=1062 len=104 offset 13840
rec #44 type=1062 len=224 offset 13944
rec #45 type=1062 len=224 offset 14168
rec #46 type=1062 len=120 offset 14392
rec #47 type=1062 len=224 offset 14512
rec #48 type=1062 len=224 offset 14736
rec #49 type=1062 len=104 offset 14960
rec #50 type=1062 len=224 offset 15064
rec #51 type=1062 len=224 offset 15288
rec #52 type=1062 len=104 offset 15512
rec #53 type=1062 len=224 offset 15616
rec #54 type=1062 len=224 offset 15840
rec #55 type=1062 len=136 offset 16064
rec #57 type=1062 len=224 offset 16384
rec #58 type=1062 len=224 offset 16608
rec #59 type=1062 len=120 offset 16832
rec #60 type=1062 len=224 offset 16952
rec #61 type=1062 len=224 offset 17176
rec #62 type=1062 len=136 offset 17400
rec #63 type=1062 len=224 offset 17536
rec #64 type=1062 len=224 offset 17760
rec #65 type=1062 len=136 offset 17984
rec #66 type=1062 len=224 offset 18120
rec #67 type=1062 len=224 offset 18344
rec #68 type=1062 len=136 offset 18568
rec #69 type=1062 len=224 offset 18704
rec #70 type=1062 len=224 offset 18928
rec #71 type=1062 len=88 offset 19152
rec #72 type=1062 len=120 offset 19240
rec #73 type=1062 len=88 offset 19360
rec #74 type=1062 len=120 offset 19448
rec #75 type=1062 len=88 offset 19568
rec #76 type=1062 len=120 offset 19656
rec #77 type=1062 len=88 offset 19776
rec #78 type=1062 len=120 offset 19864
rec #79 type=1062 len=224 offset 19984
rec #80 type=1062 len=224 offset 20208
rec #81 type=1062

Re: [lustre-discuss] MDT will not mount

2022-03-10 Thread Hans Henrik Happe via lustre-discuss

After upgrading to Lustre 2.12.8 I found that the first mount after a 
reboot behaves differently:


Mounting mds02/astro0 on /mnt/lustre/local/astro-MDT
mount.lustre: mount mds02/astro0 at /mnt/lustre/local/astro-MDT 
failed: No space left on device


And a different syslog output (attached syslog-0).

Doing the mount again has this error:

Mounting mds02/astro0 on /mnt/lustre/local/astro-MDT
mount.lustre: mount mds02/astro0 at /mnt/lustre/local/astro-MDT 
failed: File exists


And a syslog like the one first posted. Attached the new output in syslog-1.

Finally, stopping Lustre (Only MGS in this case) and the lnet service 
does free resources making lustre_rmmod fail:


# lustre_rmmod
rmmod: ERROR: Module osp is in use


Cheers,
Hans Henrik

On 10.03.2022 11.15, Hans Henrik Happe via lustre-discuss wrote:
Forgot to say this is Lustre 2.12.6 and CentOS 7.9 
(3.10.0-1160.6.1.el7.x86_64).


On 10.03.2022 10.27, Hans Henrik Happe via lustre-discuss wrote:

Hi,

A reboot of the MDS stalled and got forced reset. After that the MDS 
would not start. The syslog is attached.


I'm not sure what the "class_register_device()) 
astro-OST0002-osc-MDT" part is supposed to do but astro-OST0002 
is not mounted at this time. I guess this comes from the MGS.


Cheers,
Hans Henrik
Mar 10 12:08:15 mds02 kernel: Lustre: MGS: Connection restored to 
3be12548-8d1b-39d8-1ec0-0381833f8bc2 (at 172.20.200.30@tcp1)
Mar 10 12:08:15 mds02 kernel: Lustre: Skipped 42 previous similar messages
Mar 10 12:08:33 mds02 kernel: Lustre: 5191:0:(llog_cat.c:93:llog_cat_new_log()) 
astro-OST0002-osc-MDT: there are no more free slots in catalog 
[0x2:0x1:0x0]:0
Mar 10 12:08:33 mds02 kernel: LustreError: 
5191:0:(osp_sync.c:1524:osp_sync_init()) astro-OST0002-osc-MDT: can't 
initialize llog: rc = -28
Mar 10 12:08:33 mds02 kernel: LustreError: 
5191:0:(obd_config.c:559:class_setup()) setup astro-OST0002-osc-MDT failed 
(-28)
Mar 10 12:08:33 mds02 kernel: LustreError: 
5191:0:(obd_config.c:1835:class_config_llog_handler()) MGC10.21.10.102@o2ib: 
cfg command failed: rc = -28
Mar 10 12:08:33 mds02 kernel: Lustre:cmd=cf003 0:astro-OST0002-osc-MDT  
1:astro-OST0002_UUID  2:172.21.10.116@tcp  
Mar 10 12:08:33 mds02 kernel: LustreError: 15c-8: MGC10.21.10.102@o2ib: The 
configuration from log 'astro-MDT' failed (-28). This may be the result of 
communication errors between this node and the MGS, a bad configuration, or 
other errors. See the syslog for more information.
Mar 10 12:08:33 mds02 kernel: LustreError: 
5131:0:(obd_mount_server.c:1397:server_start_targets()) failed to start server 
astro-MDT: -28
Mar 10 12:08:33 mds02 kernel: LustreError: 
5131:0:(obd_mount_server.c:1992:server_fill_super()) Unable to start targets: 
-28
Mar 10 12:08:33 mds02 kernel: Lustre: Failing over astro-MDT
Mar 10 12:08:33 mds02 kernel: Lustre: server umount astro-MDT complete
Mar 10 12:08:33 mds02 kernel: LustreError: 
5131:0:(obd_mount.c:1608:lustre_fill_super()) Unable to mount  (-28)

Mar 10 12:10:56 mds02 kernel: LustreError: 
5622:0:(genops.c:556:class_register_device()) astro-OST0002-osc-MDT: 
already exists, won't add
Mar 10 12:10:56 mds02 kernel: LustreError: 
5622:0:(obd_config.c:1835:class_config_llog_handler()) MGC10.21.10.102@o2ib: 
cfg command failed: rc = -17
Mar 10 12:10:56 mds02 kernel: Lustre:cmd=cf001 0:astro-OST0002-osc-MDT  
1:osp  2:astro-MDT-mdtlov_UUID  
Mar 10 12:10:56 mds02 kernel: LustreError: 15c-8: MGC10.21.10.102@o2ib: The 
configuration from log 'astro-MDT' failed (-17). This may be the result of 
communication errors between this node and the MGS, a bad configuration, or 
other errors. See the syslog for more information.
Mar 10 12:10:56 mds02 kernel: LustreError: 
5566:0:(obd_mount_server.c:1397:server_start_targets()) failed to start server 
astro-MDT: -17
Mar 10 12:10:56 mds02 kernel: LustreError: 
5566:0:(obd_mount_server.c:1992:server_fill_super()) Unable to start targets: 
-17
Mar 10 12:10:56 mds02 kernel: Lustre: Failing over astro-MDT
Mar 10 12:10:56 mds02 kernel: Lustre: server umount astro-MDT complete
Mar 10 12:10:56 mds02 kernel: LustreError: 
5566:0:(obd_mount.c:1608:lustre_fill_super()) Unable to mount  (-17)

___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

Re: [lustre-discuss] MDT will not mount

2022-03-10 Thread Hans Henrik Happe via lustre-discuss

Forgot to say this is Lustre 2.12.6 and CentOS 7.9 
(3.10.0-1160.6.1.el7.x86_64).


On 10.03.2022 10.27, Hans Henrik Happe via lustre-discuss wrote:

Hi,

A reboot of the MDS stalled and got forced reset. After that the MDS 
would not start. The syslog is attached.


I'm not sure what the "class_register_device()) 
astro-OST0002-osc-MDT" part is supposed to do but astro-OST0002 is 
not mounted at this time. I guess this comes from the MGS.


Cheers,
Hans Henrik

___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

[lustre-discuss] MDT will not mount

2022-03-10 Thread Hans Henrik Happe via lustre-discuss


Hi,

A reboot of the MDS stalled and got forced reset. After that the MDS 
would not start. The syslog is attached.


I'm not sure what the "class_register_device()) 
astro-OST0002-osc-MDT" part is supposed to do but astro-OST0002 is 
not mounted at this time. I guess this comes from the MGS.


Cheers,
Hans HenrikMar 10 10:03:49 mds02 kernel: Lustre: MGS: Connection restored to 
d8787407-db0d-ccfb-e5ab-adeb41b86c1d (at 0@lo)
Mar 10 10:03:49 mds02 kernel: Lustre: Skipped 197 previous similar messages
Mar 10 10:03:59 mds02 kernel: LustreError: 137-5: astro-MDT_UUID: not 
available for connect from 10.21.207.78@o2ib (no target). If you are running an 
HA pair check that the target is mounted on the other server.
Mar 10 10:03:59 mds02 kernel: LustreError: Skipped 155 previous similar messages
Mar 10 10:04:00 mds02 kernel: LustreError: 
8923:0:(genops.c:556:class_register_device()) astro-OST0002-osc-MDT: 
already exists, won't add
Mar 10 10:04:00 mds02 kernel: LustreError: 
8923:0:(obd_config.c:1835:class_config_llog_handler()) MGC10.21.10.102@o2ib: 
cfg command failed: rc = -17
Mar 10 10:04:00 mds02 kernel: Lustre:cmd=cf001 0:astro-OST0002-osc-MDT  
1:osp  2:astro-MDT-mdtlov_UUID  
Mar 10 10:04:00 mds02 kernel: LustreError: 15c-8: MGC10.21.10.102@o2ib: The 
configuration from log 'astro-MDT' failed (-17). This may be the result of 
communication errors between this node and the MGS, a bad configuration, or 
other errors. See the syslog for more information.
Mar 10 10:04:00 mds02 kernel: LustreError: 
7016:0:(obd_mount_server.c:1397:server_start_targets()) failed to start server 
astro-MDT: -17
Mar 10 10:04:00 mds02 kernel: LustreError: 
7016:0:(obd_mount_server.c:1992:server_fill_super()) Unable to start targets: 
-17
Mar 10 10:04:00 mds02 kernel: Lustre: Failing over astro-MDT
Mar 10 10:04:01 mds02 kernel: Lustre: astro-MDT: Not available for connect 
from 10.21.208.26@o2ib (stopping)
Mar 10 10:04:01 mds02 kernel: Lustre: Skipped 129 previous similar messages
Mar 10 10:04:15 mds02 kernel: LustreError: 137-5: astro-MDT_UUID: not 
available for connect from 172.20.2.101@tcp1 (no target). If you are running an 
HA pair check that the target is mounted on the other server.
Mar 10 10:04:15 mds02 kernel: LustreError: 137-5: astro-MDT_UUID: not 
available for connect from 172.20.2.101@tcp1 (no target). If you are running an 
HA pair check that the target is mounted on the other server.
Mar 10 10:04:15 mds02 kernel: LustreError: Skipped 35 previous similar messages
Mar 10 10:04:15 mds02 kernel: LustreError: Skipped 1 previous similar message
Mar 10 10:04:20 mds02 kernel: Lustre: server umount astro-MDT complete
Mar 10 10:04:20 mds02 kernel: LustreError: 
7016:0:(obd_mount.c:1608:lustre_fill_super()) Unable to mount  (-17)
Mar 10 10:04:37 mds02 kernel: Lustre: MGS: Connection restored to  (at 
10.21.207.58@o2ib)

___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

Re: [lustre-discuss] How to increase Lustre Metadata size

2022-03-07 Thread Hans Henrik Happe via lustre-discuss

It should be as easy as adding disk to the ZFS pool. However, if you are 
not used to such operation, do some tests on a separate ZFS pool (you 
can use files as disks for such tests). Undoing mistakes when adding 
disks to a pool can be hard to undo if something goes wrong.


Cheers,
Hans Henrik

On 03.03.2022 08.26, Taner KARAGÖL via lustre-discuss wrote:


*_UNCLASSIFIED_*

Hi folks;

I want to use Data on Metadata but our metadata size is not enoght to 
store data. Underlying fs is zfs.


What is the easiest way to increase metadata target zpool size?  I 
have  googled and find that easiest way is adding disks to metadata 
target. Before and after adding disks to zpool,  is there anything to 
do at the Lustre?


Lustre version: 2.12.5

ZFS version: 0.7

Best Regards;


*Dikkat:*

Bu elektronik posta mesaji kisisel ve ozeldir. Eger size 
gonderilmediyse lutfen gondericiyi bilgilendirip mesaji siliniz. 
Firmamiza gelen ve giden mesajlar virus taramasindan gecirilmekte, 
guvenlik nedeni ile kontrol edilerek saklanmaktadir. Mesajdaki 
gorusler ve bakis acisi gondericiye ait olup Aselsan A.S. resmi gorusu 
olmak zorunda degildir.


*

**Attention: *

This e-mail message is privileged and confidential. If you are not the 
intended recipient please delete the message and notify the sender. 
E-mails to and from the company are monitored for operational reasons 
and in accordance with lawful business practices. Any views or 
opinions presented are solely those of the author and do not 
necessarily represent the views of the company.





___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

Re: [lustre-discuss] Interoperability 2.12.7 client <-> 2.12.8 server

2022-03-07 Thread Hans Henrik Happe via lustre-discuss


I think it was this one:

https://git.whamcloud.com/?p=fs/lustre-release.git;a=commit;h=878561880d2aba038db95e199f82b186f22daa45

On 07.03.2022 09.05, Hans Henrik Happe via lustre-discuss wrote:

Hi Thomas,

They should work together, but there are other requirements that need 
to be fulfilled:


https://wiki.lustre.org/Lustre_2.12.8_Changelog

I guess your servers are CentOS 7.9 as required for 2.12.8.

I had an issue with Rocky 8.5 and the latest kernel with 2.12.8. While 
RHEL 8.5 is supported there was something new after 
4.18.0-348.2.1.el8_5, which caused problems. I found an LU fixing it 
post 2.12.8 (can't remember the number), but downgrading to 
4.18.0-348.2.1.el8_5 was the quick fix.


Cheers,
Hans Henrik

On 03.03.2022 08.40, Thomas Roth via lustre-discuss wrote:

Dear all,

this might be just something I forgot or did not read thoroughly, but 
shouldn't a 2.12.7-client work with 2.12.8 - servers?


The 2.12.8-changelog has the standard disclaimer

Interoperability Support:
   Clients & Servers: Latest 2.10.X and Latest 2.11.X




I have this test cluster that I upgraded recently to 2.12.8 on the 
servers.


The fist client I attached now is a fresh install of rhel 8.5 (Alma).
I installed 'kmod-lustre-client' and `lustre-client` from 
https://downloads.whamcloud.com/public/lustre/lustre-2.12.8/el8.5.2111/

I copied a directory containing ~5000 files - no visible issues


The next client was also installed with rhel 8.5 (Alma), but now 
using 'lustre-client-2.12.7-1' and 'lustre-client-dkms-2.12.7-1' from
https://downloads.whamcloud.com/public/lustre/lustre-2.12.7/el8/client/RPMS/x86_64/ 



As on my first client, I copied a directory containing ~5000 files. 
The copy stalled, and the OSTs exploded in my face


kernel: LustreError: 23345:0:(events.c:310:request_in_callback()) 
event type 2, status -103, 

service ost_io
kernel: LustreError: 
40265:0:(pack_generic.c:605:__lustre_unpack_msg()) message length 0 
too small 

for magic/version check
kernel: LustreError: 
40265:0:(sec.c:2217:sptlrpc_svc_unwrap_request()) error unpacking 
request from 

12345-10.20.2.167@o2ib6 x1726208297906176
kernel: LustreError: 23345:0:(events.c:310:request_in_callback()) 
event type 2, status -103, 

service ost_io


The latter message is repeated ad infinitum.

The client log blames the network:

Request sent has failed due to network error
 Connection to was lost; in progress operations using this service 
will wait for recovery to complete


LustreError: 181316:0:(events.c:205:client_bulk_callback()) event 
type 1, status -103, desc86e248d6
LustreError: 181315:0:(events.c:205:client_bulk_callback()) event 
type 1, status -5, desc 

e569130f



There is also a client running Debian 9 and Lustre 2.12.6 (compiled 
from git) - no trouble at all.



The I switched those two rhel8.5-clients: reinstalled the OS, gave 
the first one the 2.12.7 -packages, the second on the 2.12.8 - and 
the error followed: again the client running with 
'lustre-client-dkms-2.12.7-1' immedeately ran into trouble, causing 
the same error messages in the logs.

So this is not a network problem in the sense of broken hardware etc.


What did I miss?
Some important Jira I did not read?


Regards
Thomas





___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

Re: [lustre-discuss] Interoperability 2.12.7 client <-> 2.12.8 server

2022-03-07 Thread Hans Henrik Happe via lustre-discuss


Hi Thomas,

They should work together, but there are other requirements that need to 
be fulfilled:


https://wiki.lustre.org/Lustre_2.12.8_Changelog

I guess your servers are CentOS 7.9 as required for 2.12.8.

I had an issue with Rocky 8.5 and the latest kernel with 2.12.8. While 
RHEL 8.5 is supported there was something new after 
4.18.0-348.2.1.el8_5, which caused problems. I found an LU fixing it 
post 2.12.8 (can't remember the number), but downgrading to 
4.18.0-348.2.1.el8_5 was the quick fix.


Cheers,
Hans Henrik

On 03.03.2022 08.40, Thomas Roth via lustre-discuss wrote:

Dear all,

this might be just something I forgot or did not read thoroughly, but 
shouldn't a 2.12.7-client work with 2.12.8 - servers?


The 2.12.8-changelog has the standard disclaimer

Interoperability Support:
   Clients & Servers: Latest 2.10.X and Latest 2.11.X




I have this test cluster that I upgraded recently to 2.12.8 on the 
servers.


The fist client I attached now is a fresh install of rhel 8.5 (Alma).
I installed 'kmod-lustre-client' and `lustre-client` from 
https://downloads.whamcloud.com/public/lustre/lustre-2.12.8/el8.5.2111/

I copied a directory containing ~5000 files - no visible issues


The next client was also installed with rhel 8.5 (Alma), but now using 
'lustre-client-2.12.7-1' and 'lustre-client-dkms-2.12.7-1' from
https://downloads.whamcloud.com/public/lustre/lustre-2.12.7/el8/client/RPMS/x86_64/ 



As on my first client, I copied a directory containing ~5000 files. 
The copy stalled, and the OSTs exploded in my face


kernel: LustreError: 23345:0:(events.c:310:request_in_callback()) 
event type 2, status -103, 

service ost_io
kernel: LustreError: 
40265:0:(pack_generic.c:605:__lustre_unpack_msg()) message length 0 
too small 

for magic/version check
kernel: LustreError: 
40265:0:(sec.c:2217:sptlrpc_svc_unwrap_request()) error unpacking 
request from 

12345-10.20.2.167@o2ib6 x1726208297906176
kernel: LustreError: 23345:0:(events.c:310:request_in_callback()) 
event type 2, status -103, 

service ost_io


The latter message is repeated ad infinitum.

The client log blames the network:

Request sent has failed due to network error
 Connection to was lost; in progress operations using this service 
will wait for recovery to complete


LustreError: 181316:0:(events.c:205:client_bulk_callback()) event 
type 1, status -103, desc86e248d6
LustreError: 181315:0:(events.c:205:client_bulk_callback()) event 
type 1, status -5, desc 

e569130f



There is also a client running Debian 9 and Lustre 2.12.6 (compiled 
from git) - no trouble at all.



The I switched those two rhel8.5-clients: reinstalled the OS, gave the 
first one the 2.12.7 -packages, the second on the 2.12.8 - and the 
error followed: again the client running with 
'lustre-client-dkms-2.12.7-1' immedeately ran into trouble, causing 
the same error messages in the logs.

So this is not a network problem in the sense of broken hardware etc.


What did I miss?
Some important Jira I did not read?


Regards
Thomas




___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

[lustre-discuss] Crash due to transaction in readonly mode (snapshot)

2021-09-23 Thread Hans Henrik Happe via lustre-discuss

Hi,

We had a crash with this in MDS log:

Sep 22 13:45:07 sci-mds01 kernel: LustreError:
258240:0:(osd_handler.c:354:osd_trans_create()) 03781251-MDT:
someone try to start transaction under readonly mode, should be disabled.
Sep 22 13:45:07 sci-mds01 kernel: CPU: 31 PID: 94594 Comm:
mdt_rdpg05_005 Kdump: loaded Tainted: P   OE    
3.10.0-1160.6.1.el7.x86_64 #1
Sep 22 13:45:07 sci-mds01 kernel: Hardware name: Dell Inc. PowerEdge
R640/0HG0J8, BIOS 2.10.2 02/24/2021
Sep 22 13:45:07 sci-mds01 kernel: Call Trace:
Sep 22 13:45:07 sci-mds01 kernel: [] dump_stack+0x19/0x1b
Sep 22 13:45:07 sci-mds01 kernel: []
osd_trans_create+0x3ca/0x410 [osd_zfs]
Sep 22 13:45:07 sci-mds01 kernel: CPU: 10 PID: 258241 Comm:
mdt_rdpg05_001 Kdump: loaded Tainted: P   OE    
3.10.0-1160.6.1.el7.x86_64 #1
Sep 22 13:45:07 sci-mds01 kernel: []
top_trans_create+0x8a/0x200 [ptlrpc]
Sep 22 13:45:07 sci-mds01 kernel: Hardware name: Dell Inc. PowerEdge
R640/0HG0J8, BIOS 2.10.2 02/24/2021
Sep 22 13:45:07 sci-mds01 kernel: []
lod_trans_create+0x3c/0x50 [lod]


Looks similar to this:
http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/2018-August/015854.html

When restarting, the MGS starts fine, but the one MDT (science-MDT)
does not:

Sep 23 16:10:17 sci-mds00 kernel: Lustre: MGS: Connection restored to
0dd6cfa0-bdf7-c8ac-7bb9-182f7874e165 (at 0@lo)
Sep 23 16:10:17 sci-mds00 kernel: Lustre: Skipped 1 previous similar message
Sep 23 16:10:19 sci-mds00 kernel: Lustre:
52424:0:(llog_cat.c:93:llog_cat_new_log()) science-OST1100-osc-MDT:
there are no more free slots in catalog [0x2:0x1:0x0]:0
Sep 23 16:10:19 sci-mds00 kernel: LustreError:
52424:0:(osp_sync.c:1524:osp_sync_init()) science-OST1100-osc-MDT:
can't initialize llog: rc = -28
Sep 23 16:10:19 sci-mds00 kernel: LustreError:
52424:0:(obd_config.c:559:class_setup()) setup
science-OST1100-osc-MDT failed (-28)
Sep 23 16:10:19 sci-mds00 kernel: LustreError:
52424:0:(obd_config.c:1835:class_config_llog_handler())
MGC10.120.10.90@tcp: cfg command failed: rc = -28
Sep 23 16:10:19 sci-mds00 kernel: Lustre:    cmd=cf003
0:science-OST1100-osc-MDT  1:science-OST1100_UUID  2:10.120.10.110@tcp 
Sep 23 16:10:19 sci-mds00 kernel: LustreError: 15c-8:
MGC10.120.10.90@tcp: The configuration from log 'science-MDT' failed
(-28). This may be the result of communication errors between this node
and the MGS, a bad configuration, or other errors. Set.
Sep 23 16:10:19 sci-mds00 kernel: LustreError:
52172:0:(obd_mount_server.c:1397:server_start_targets()) failed to start
server science-MDT: -28
Sep 23 16:10:19 sci-mds00 kernel: LustreError:
52172:0:(obd_mount_server.c:1992:server_fill_super()) Unable to start
targets: -28
Sep 23 16:10:19 sci-mds00 kernel: Lustre: Failing over science-MDT
Sep 23 16:10:19 sci-mds00 kernel: Lustre: server umount science-MDT
complete
Sep 23 16:10:19 sci-mds00 kernel: LustreError:
52172:0:(obd_mount.c:1608:lustre_fill_super()) Unable to mount  (-28)


We have tried to --writeconf it, but that only moves the problem to this
error when mounting an OST:

Sep 23 12:04:16 sci-mds00 kernel: Lustre: MGS: Logs for fs science were
removed by user request.  All servers must be restarted in order to
regenerate the logs: rc = 0
Sep 23 12:04:16 sci-mds00 kernel: Lustre: science-MDT: Imperative
Recovery not enabled, recovery window 300-900
Sep 23 12:04:38 sci-mds00 kernel: Lustre: MGS: Connection restored to
68b4cd3a-6c73-19c5-2925-935e42bdaf2b (at 10.120.10.111@tcp)
Sep 23 12:04:38 sci-mds00 kernel: Lustre: Skipped 2 previous similar
messages
Sep 23 12:04:38 sci-mds00 kernel: Lustre: MGS: Regenerating
science-OST1100 log by user request: rc = 0
Sep 23 12:04:45 sci-mds00 kernel: LustreError:
5547:0:(genops.c:556:class_register_device())
science-OST1100-osc-MDT: already exists, won't add
Sep 23 12:04:45 sci-mds00 kernel: LustreError:
5547:0:(obd_config.c:1835:class_config_llog_handler())
MGC10.120.10.90@tcp: cfg command failed: rc = -17
Sep 23 12:04:45 sci-mds00 kernel: Lustre:    cmd=cf001
0:science-OST1100-osc-MDT  1:osp  2:science-MDT-mdtlov_UUID 
Sep 23 12:04:45 sci-mds00 kernel: LustreError:
1345:0:(mgc_request.c:599:do_requeue()) failed processing log: -17

Any ideas how to solve this.

Cheers,
Hans Henrik
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

Re: [lustre-discuss] How to reduce the size of an existing OST ?

2021-04-28 Thread Hans Henrik Happe via lustre-discuss

Hi,

One option is of cause to completely remove the ost as described in the
manual and then do the new setup.

Another option, would be to copy (zfs send/recv) the ost away from the
disks, then creating the new multi ost setup and copy it back. You might
need to migrate it data away from the ost first to make it fit the new
setup.

A third is to just have multiple osts on the same ZFS pool, but I guess
this is not a supported/tested case and there might even be reasons that
it will not work at all. At least you need to setup quotas to ensure
free space in Lustre isn't inflated.

Cheers,
Hans Henrik

On 26.04.2021 09.29, Tung-Han Hsieh wrote:
> Dear All,
>
> We have an existing OST with ZFS backend which occupies the full size
> (63 TB) of a large storage. Now this OST already has 56% full of data.
> The machine which holds this OST does not have other OSTs. But for a
> long time, we found that the loading of this machine is unreasonably
> high (usually > 50). But the other machines with several OSTs in the
> same Lustre file system have quite low loading (usually < 1.0). So we
> are thinking to create multiple OSTs in that high loading machine and
> see whether the loading could be reduced.
>
> Since it is quite difficult to find another storage to fully backup
> the target OST, so now the question is: Whether there is a safe way
> to reduce the size of the existing OST, such that we can create new
> OSTs on the remaining space of the same device ?
>
> I guess the procedures might be:
>
> 1. Shutdown the whole lustre file system (unmount all clients, OSTs,
>and MDTs)
>
> 2. For the target OST, try to do defragment to move all data together,
>in order to make a large continous space free.
>
> 3. Reduce the total size of that OST using tools something like the
>resizefs (I don't know whether there is such a tool)
>
> 4. Reduce the total size of the backend ZFS dataset of that OST.
>
> All the above procedures were anology (and also from my imagination)
> to the idea of Linux LVM, which has a standard way to reduce a logical
> volume containing an ext4 file system. Just curious and want to discuss
> whether there is any possibility for a Lustre OST.
>
> Thanks for your attention.
>
> Cheers,
>
> T.H.Hsieh
> ___
> lustre-discuss mailing list
> lustre-discuss@lists.lustre.org
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

Re: [lustre-discuss] How to remove an OST completely

2021-03-04 Thread Hans Henrik Happe via lustre-discuss

Hi,

The manual describe this:

https://doc.lustre.org/lustre_manual.xhtml#lustremaint.remove_ost

There is a note telling you that it will still be there, but can be
replaced.

Hope you migrated your data away from the OST also. Otherwise you would
have lost it.

Cheers,
Hans Henrik

On 03.03.2021 11.22, Tung-Han Hsieh via lustre-discuss wrote:
> Dear All,
>
> Here is a question about how to remove an OST completely without
> restarting the Lustre file system. Our Lustre version is 2.12.6.
>
> We did the following steps to remove the OST:
>
> 1. Lock the OST (e.g., chome-OST0008) such that it will not create
>new files (run in the MDT server):
>
>lctl set_param osc.chome-OST0008-osc-MDT.max_create_count=0
>
> 2. Locate the list of files in the target OST: (e.g., chome-OST0008):
>(run in the client):
>
>lfs find --obd chome-OST0008_UUID /home
>
> 3. Remove OST (run in the MDT server):
>lctl conf_param osc.chome-OST0008-osc-MDT.active=0
>
> 4. Unmount the OST partition (run in the OST server)
>
> After that, the total size of the Lustre file system decreased, and
> everything looks fine. However, without restarting (i.e., rebooting
> Lustre MDT / OST servers), we still feel that the removed OST is
> still exists. For example, in MDT:
>
> # lctl get_param osc.*.active
> osc.chome-OST-osc-MDT.active=1
> osc.chome-OST0001-osc-MDT.active=1
> osc.chome-OST0002-osc-MDT.active=1
> osc.chome-OST0003-osc-MDT.active=1
> osc.chome-OST0008-osc-MDT.active=0
> osc.chome-OST0010-osc-MDT.active=1
> osc.chome-OST0011-osc-MDT.active=1
> osc.chome-OST0012-osc-MDT.active=1
> osc.chome-OST0013-osc-MDT.active=1
> osc.chome-OST0014-osc-MDT.active=1
>
> We still see chome-OST0008. And in dmesg of MDT, we see a lot of:
>
> LustreError: 4313:0:(osp_object.c:594:osp_attr_get()) 
> chome-OST0008-osc-MDT:osp_attr_get update error 
> [0x10008:0x10a54c:0x0]: rc = -108
>
> In addition, when running LFSCK in the MDT server:
>
>   lctl lfsck_start -A
>
> even after all the works of MDT and OST are completed, we still see that 
> (run in MDT server):
>
>   lctl get_param mdd.*.lfsck_layout
>
> the status is not completed:
>
> mdd.chome-MDT.lfsck_layout=
> name: lfsck_layout
> magic: 0xb1732fed
> version: 2
> status: partial
> flags: incomplete
> param: all_targets
> last_completed_time: 1614762495
> time_since_last_completed: 4325 seconds
> 
>
> We suspect that the "incomplete" part might due to the already removed
> chome-OST0008.
>
> Is there any way to completely remove the chome-OST0008 from the Lustre
> file system ? since that OST device has already been reformatted for
> other usage.
>
> Thanks very much.
>
>
> T.H.Hsieh
> ___
> lustre-discuss mailing list
> lustre-discuss@lists.lustre.org
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

[lustre-discuss] Asynchronous mirror to other filesystem

2021-02-16 Thread Hans Henrik Happe

Hi,

I've been wondering if it is possible to make an asynchronous mirror of
a Lustre filesystem like on ZFS with send/receive. Using Lustre ZFS
snapshots and transferring the MDTs and OSTs with ZFS send/receive would
work, but we would have to have the same setup at the mirror side. Even
down to the network IPs unless we change the target configs before mounting.

Therefore, I have wondered if we could use changelogs and Lustre ZFS
snapshot to transfer the changes incrementally. In theory it should be
possible, but the details always catch up on you.

I would like to copy all changes from snapshot to snapshot, but I think
we would loose data that is not committed to storage at snapshot time.
The procedure would be:

1. Make snapshot
2. Use changelog to mirror all changes between snapshots.

Now, the problem is that, i.e. a CLOSE event, might have gone to the
changelog, but cached data might not have been committed to storage, so
it will not show up in the snapshot. This is okay, if only we get the
data in a later snapshot, but we will not get an event about new data en
the changelog.

We could use the main mount of the fs for mirroring data, but then we
might get updates that happened after the snapshot.

Is there something I missed or perhaps ways to get around this issue? If
there was a way to know if a file is completely committed to storage, we
would know when to forget about it.

Perhaps, you know of other ways to make an asynchronous mirror?

Cheers,
Hans Henrik
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

Re: [lustre-discuss] ZFS w/Lustre problem

2020-11-09 Thread Hans Henrik Happe

I sounds like this issue, but I'm not sure what your dnodesize is:

https://github.com/openzfs/zfs/issues/8458

ZFS 0.8.1+ on the receiving side should fix it. Then again ZFS 0.8 is
not supported in Lustre 2.12, so it's a bit hard to restore, without
copying the underlying devices.

Cheers,
Hans Henrik

On 06.11.2020 21.23, Steve Thompson wrote:
> This may be a question for the ZFS list...
>
> I have Lustre 2.12.5 on Centos 7.8 with ZFS 0.7.13, 10GB network. I
> make snapshots of the Lustre filesystem with 'lctl snapshot_create'
> and at a later time transfer these snapshots to a backup system with
> zfs send/recv. This works well for everything but the MDT. For the
> MDT, I find that the zfs recv always fails when a little less than 1GB
> has been transferred (this being an incremental send/recv of snapshots
> taken a day apart):
>
> # zfs send -v -c -i fs0pool/mdt0@03-nov-2020 fs0pool/mdt0@04-nov-2020 | \
> zfs recv -F backups/fs0pool/mdt0
> 
> 12:11:18    946M   fs0pool/mdt0@04-nov-2020-01:00
> 12:11:19    946M   fs0pool/mdt0@04-nov-2020-01:00
> 12:11:20    946M   fs0pool/mdt0@04-nov-2020-01:00
> cannot receive incremental stream: dataset does not exist
>
> while if the data transfer is much smaller, the send/recv works. Since
> once I get a failure it is not possible to complete a send/recv for
> any subsequent day, I am doing a full snapshot send to a file; this
> always works and takes about 5/6 minutes for my MDT. When using zfs
> send/recv, the recv is always very very slow (several hours to get to
> the above failure point, even when using mbuffer). I am using custom
> zfs replication scripts, but it fails also using the zrep package.
>
> Does anyone know of a possible explanation? Is there any version of
> ZFS 0.8 that works with Lustre 2.12.5?
>
> Thanks,
> Steve

___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

Re: [lustre-discuss] Issue of mounting lustre through specified interface

2020-07-04 Thread Hans Henrik Happe

Hi,

We also stumbled into this. It is described here:

https://jira.whamcloud.com/browse/LU-11840

The best workaround we found was to disable discovery on 2.12 clients:

# lnetctl set discovery 0

Cheers,
Hans Henrik

On 04.07.2020 09.26, Tung-Han Hsieh wrote:
> Dear All,
>
> We have Lustre servers (MDS, OSS) with Lustre-2.10.7 installed, with
> both tcp and o2ib interfaces:
>
> [  193.016516] Lustre: Lustre: Build Version: 2.10.7
> [  193.486408] LNet: Added LNI 192.168.62.151@o2ib [8/256/0/180]
> [  193.538200] LNet: Added LNI 192.168.60.151@tcp [8/256/0/180]
> [  193.538372] LNet: Accept secure, port 988
>
> We have several clients, all with Lustre-2.12.4. Some have both tcp
> and o2ib interfaces. These clients can mount Lustre server with o2ib
> interface without any problem, i.e.,
>
> mount -t lustre -o flock 192.168.62.151@o2ib:/chome /home
> (this is OK)
>
> However, we have another client with Lustre-2.12.4, too, which only
> has tcp interface. It cannot mount server through tcp interface:
>
> mount -t lustre -o flock 192.168.60.151@tcp:/chome /home
> (this is failed with "Input/output error, Is the MGS running ?")
>
> Checking the dmesg message of this client, it reads:
>
> =
> [3106477.006512] LNetError: 
> 15970:0:(lib-move.c:1999:lnet_handle_find_routed_path()) no route to 
> 192.168.62.151@o2ib from 
> [3106483.142436] LustreError: 
> 122230:0:(mgc_request.c:249:do_config_log_add()) MGC192.168.60.151@tcp: 
> failed processing log, type 1: rc = -5
> [3106492.293968] LustreError: 122238:0:(mgc_request.c:599:do_requeue()) 
> failed processing log: -5
> [3106513.861586] LustreError: 15c-8: MGC192.168.60.151@tcp: The configuration 
> from log 'chome-client' failed (-5). This may be the result of communication 
> errors between this node and the MGS, a bad configuration, or other errors. 
> See the syslog for more information.
> [3106513.862052] Lustre: Unmounted chome-client
> [3106513.862281] LustreError: 122230:0:(obd_mount.c:1608:lustre_fill_super()) 
> Unable to mount  (-5)
> =
>
> Surprisingly that, although I have specified the tcp interface to
> mount, but Lustre itself still tries to mount with o2ib interface.
>
> I also tested whether LNet works or not.
> (Server NID: 192.168.60.151@tcp, Client NID: 192.168.60.30@tcp)
>
> From the server side:
> # /opt/lustre/sbin/lctl ping 192.168.60.30
> 12345-0@lo
> 12345-192.168.60.30@tcp
>
> From the client side:
> # /opt/lustre/sbin/lctl ping 192.168.60.151
> 12345-0@lo
> 12345-192.168.62.151@o2ib
> 12345-192.168.60.151@tcp
>
> Hence it looks fine.
>
> The module options (/etc/modprobe.d/lustre.conf) for server and client are:
> - Server:
>   options lnet networks="o2ib0(ib0),tcp0(eth0)"
> - Client:
>   options lnet networks="tcp0(eth0)"
>
> The building options for server and client are:
> - Server (Lustre-2.10.7):
>   ./configure --prefix=/opt/lustre \
>   --with-linux= \
>   --with-o2ib=
>
> - Client (Lustre-2.12.4):
>   ./configure --prefix=/opt/lustre \
>   --with-linux= \
>   --with-o2ib=no \
>   --disable-server
>
> Could anyone suggest how to solve this problem ?
>
>
> Thanks very much.
>
>
> T.H.Hsieh
> ___
> lustre-discuss mailing list
> lustre-discuss@lists.lustre.org
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

Re: [lustre-discuss] SSK key refresh issue

2020-06-03 Thread Hans Henrik Happe

We only have one MGS with two filesystems. It evolved into to two due to
some reconfiguration and the end game is to remove one. Anyway, one of
the modifications to the new one was to have daily key refresh. Either
that made the refresh issue more likely or having two filesystems is not
a good idea. It seemed to work with two so we went on and started to
rsync some data over. Then we hit the refresh issue. For now I'm just
asking if multiple filesystems could cause issues.

Anyway, I'm setting up a debug system to test if I can reproduce it with
a single fs. Then I'll get back with more info about the actual error.

I've attached some client output from the failing system. This was on a
nodemap with a 120s expire key, for fast testing. It seems that there
need to be I/O during the refresh to hit this.

A bit more info about the setup. All except mgs is configured for ski
messaging. mdt2ost is using the default nodemap.

Cheers,
Hans Henrik

On 03.06.2020 18.01, Sebastien Buisson wrote:
> Hi,
>
> Do you use one shared MGS for all your file systems, or does each file system 
> have its own MGS? In the latter case, are the MGSes running on the same node?
>
> You are mentioning a key refresh issue, so I am wondering if you see this 
> issue with multiple file systems only, or if it occurs when you have just one 
> file system setup?
>
> Cheers,
> Sebastien.
>
>> Le 3 juin 2020 à 15:07, Hans Henrik Happe  a écrit :
>>
>> Hi,
>>
>> I'm trying to hunt down an issue where SSK is failing key refresh on
>> 2.12.4. Mounting the filesystem works, but active sessions dies at refresh.
>>
>> First I would like to get a few things cleared.
>>
>> Is multiple Lustre filesystems on the same servers supported with SSK?
>>
>> If so, is it supported to use the same nodemap on each filesystem?
>> Obviously, with different keys for each fs.
>>
>> A mount from an ssh to the root account will create this keyring on
>> CentOS 7:
>>
>> # keyctl show
>> Session Keyring
>> 669565440 --alswrv  0 0  keyring: _ses
>> 458158660 --alswrv  0 65534   \_ keyring: _uid.0
>> 129939379 --alswrv  0 0   \_ user: lustre:erda
>>
>> 65534 usually is nfsnobody but is does not exist on the system. Would
>> this be an issue? Even if nfsnobody existed?
>>
>> A mount through sudo will create this keyring on CentOS 7:
>>
>> # keyctl show
>> Session Keyring
>> 381836048 --alswrv  0 65534  keyring: _uid_ses.0
>> 423400032 --alswrv  0 65534   \_ keyring: _uid.0
>> 934942793 --alswrv  0 0   \_ user: lustre:erda
>>
>> Again is this a problem?
>>
>>
>> Cheers,
>> Hans Henrik
>> ___
>> lustre-discuss mailing list
>> lustre-discuss@lists.lustre.org
>> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

Jun  2 08:49:55 sci-mds01 kernel: Lustre: Lustre: Build Version: 2.12.4
Jun  2 08:49:55 sci-mds01 kernel: LNet: Added LNI 172.25.10.51@tcp [8/256/0/180]
Jun  2 08:49:55 sci-mds01 kernel: LNet: Accept secure, port 988
Jun  2 08:49:56 sci-mds01 kernel: LNetError: 
47936:0:(peer.c:3451:lnet_peer_ni_add_to_recoveryq_locked()) lpni 
172.25.10.110@tcp added to recovery queue. Health = 900
Jun  2 08:49:59 sci-mds01 kernel: Lustre: 
48016:0:(gss_svc_upcall.c:1216:gss_init_svc_upcall()) Init channel is not 
opened by lsvcgssd, following request might be dropped until lsvcgssd is active
Jun  2 08:49:59 sci-mds01 kernel: Key type lgssc registered
Jun  2 08:49:59 sci-mds01 kernel: Lustre: 
48022:0:(sec_gss.c:377:gss_cli_ctx_uptodate()) client refreshed ctx 
9db846289b00 idx 0xe4574184e6adcf6d (0->science-MDT_UUID), expiry 
1591080709(+110s)
Jun  2 08:49:59 sci-mds01 kernel: Lustre: 
48022:0:(gss_svc_upcall.c:882:gss_svc_upcall_install_rvs_ctx()) create reverse 
svc ctx 9de627265a40 to science-MDT_UUID: idx 0x9bf0d0270b0486e
Jun  2 08:50:00 sci-mds01 kernel: Lustre: Mounted science-client
Jun  2 08:50:00 sci-mds01 kernel: Lustre: 
48029:0:(sec_gss.c:377:gss_cli_ctx_uptodate()) client refreshed ctx 
9db84628b600 idx 0x441bc9ec5b94197f (0->science-OST1101_UUID), expiry 
1591080710(+110s)
Jun  2 08:50:00 sci-mds01 kernel: Lustre: 
48029:0:(gss_svc_upcall.c:882:gss_svc_upcall_install_rvs_ctx()) create reverse 
svc ctx 9dc0f77f3440 to science-OST1101_UUID: idx 0x9bf0d0270b04870
Jun  2 08:50:00 sci-mds01 kernel: Lustre: 
48029:0:(gss_svc_upcall.c:882:gss_svc_upcall_install_rvs_ctx()) Skipped 1 
previous similar message
Jun  2 08:50:01 sci-mds01 systemd: Started Session 5540 of user root.
Jun  2 08:51:50 sci-mds01 kernel: Lustre: 
48111:0:(sec_gss.c:315:cli_ctx_expire()) ctx 
9db846288300(0->science-OST1103_UUID) get expired: 15910807

[lustre-discuss] SSK key refresh issue

2020-06-03 Thread Hans Henrik Happe

Hi,

I'm trying to hunt down an issue where SSK is failing key refresh on
2.12.4. Mounting the filesystem works, but active sessions dies at refresh.

First I would like to get a few things cleared.

Is multiple Lustre filesystems on the same servers supported with SSK?

If so, is it supported to use the same nodemap on each filesystem?
Obviously, with different keys for each fs.

A mount from an ssh to the root account will create this keyring on
CentOS 7:

# keyctl show
Session Keyring
 669565440 --alswrv  0 0  keyring: _ses
 458158660 --alswrv  0 65534   \_ keyring: _uid.0
 129939379 --alswrv  0 0   \_ user: lustre:erda

65534 usually is nfsnobody but is does not exist on the system. Would
this be an issue? Even if nfsnobody existed?

A mount through sudo will create this keyring on CentOS 7:

# keyctl show
Session Keyring
 381836048 --alswrv  0 65534  keyring: _uid_ses.0
 423400032 --alswrv  0 65534   \_ keyring: _uid.0
 934942793 --alswrv  0 0   \_ user: lustre:erda

Again is this a problem?


Cheers,
Hans Henrik
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

Re: [lustre-discuss] Nodemap and setreuid/setregid

2020-05-01 Thread Hans Henrik Happe

Hi,

I forgot to send the bug report to the list, so her it is:

https://jira.whamcloud.com/browse/LU-13361

Cheers,
Hans Henrik

On 10.03.2020 09.38, Hans Henrik Happe wrote:
> Hi,
>
> That explains it. I will file a bug report.
>
> Cheers,
> Hans Henrik
>
> On 03.03.2020 16.30, Sebastien Buisson wrote:
>> Hi,
>>
>> I was focused on nodemaps, so I did not try with SSK.
>>
>> Cheers,
>> Sebastien.
>>
>>> Le 3 mars 2020 à 16:12, Hans Henrik Happe  a écrit :
>>>
>>> Hi,
>>>
>>> Did the test 2.12.4 with the same result. Also, I narrowed it down to
>>> SSK only. It also happens without nodemaps being activated.
>>>
>>> @Sebastian: I wonder if you did test this with SSK? I was very focused
>>> on nodemaps being the cause to start with.
>>>
>>> Cheers,
>>> Hans Henrik
>>>
>>> On 29.02.2020 23.44, Hans Henrik Happe wrote:
>>>> Hi,
>>>>
>>>> Sorry for the delay. I had to spend some time nursing the glusterfs that
>>>> this lustre fs will replace :-)
>>>>
>>>> Anyway, I've created a procedure to reproduce the issue. It's attached
>>>> together with the testing program.
>>>>
>>>> Basically, its a simple single mgs,mdt,oss setup, with a nodemap, that
>>>> maps a client to a fileset. This works fine. However, when turning on
>>>> SSK for cli2mdt the issue appears.
>>>>
>>>> This was for 2.12.3, I will move on to 2.12.4 just to check.
>>>>
>>>> Cheers,
>>>> Hans Henrik
>>>>
>>>> On 06.02.2020 23.08, Hans Henrik Happe wrote:
>>>>> Hi Sebastien,
>>>>>
>>>>> Thanks for looking into this.
>>>>>
>>>>> You are right that nodemap deactivation didn't affect the outcome. I
>>>>> must have made a mistake and cannot reproduce.
>>>>>
>>>>> The uid/gid are on the mds. I can do a sudo to the user and run the test
>>>>> program successfully.
>>>>>
>>>>> I forgot to mention that I use SSK in ski mode.
>>>>>
>>>>> I think I will start from scratch and see if I can reproduce and find
>>>>> out at what point it stops working.
>>>>>
>>>>> Cheers,
>>>>> Hans Henrik
>>>>>
>>>>> On 06.02.2020 18.19, Sebastien Buisson wrote:
>>>>>> Hi,
>>>>>>
>>>>>> I am not able to reproduce your issue. I compiled your C program, in all 
>>>>>> cases I am not getting Permission Denied.
>>>>>>
>>>>>> You say that it works when you deactivate the nodemap. But given that 
>>>>>> you have a fileset on your nodemap entry « sif », when you deactivate it 
>>>>>> you might end up doing IOs in a different directory. So you might 
>>>>>> compare different things.
>>>>>> Also, does the uid/gid 20501 exist on server side?
>>>>>>
>>>>>> Cheers,
>>>>>> Sebastien.
>>>>>>
>>>>>>> Le 6 févr. 2020 à 14:29, Hans Henrik Happe  a écrit :
>>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> Thanks for a very quick reply :-) Here are the map:
>>>>>>>
>>>>>>> # lctl get_param nodemap.sif.*
>>>>>>> nodemap.sif.admin_nodemap=1
>>>>>>> nodemap.sif.audit_mode=1
>>>>>>> nodemap.sif.deny_unknown=0
>>>>>>> nodemap.sif.exports=
>>>>>>> [
>>>>>>> { nid: 172.25.10.51@tcp, uuid: 56bb9b04-9bb5-d7b5-3f50-d62804690db1 },
>>>>>>> ]
>>>>>>> nodemap.sif.fileset=/sif
>>>>>>> nodemap.sif.id=2
>>>>>>> nodemap.sif.idmap=
>>>>>>> [
>>>>>>> { idtype: uid, client_id: 501, fs_id: 20501 },
>>>>>>> { idtype: gid, client_id: 501, fs_id: 20501 }
>>>>>>> ]
>>>>>>> nodemap.sif.map_mode=both
>>>>>>> nodemap.sif.ranges=
>>>>>>> [
>>>>>>> { id: 11, start_nid: 172.25.1.28@tcp, end_nid: 172.25.1.28@tcp },
>>>>>>> { id: 10, start_nid: 172.25.1.27@tcp, end_nid: 172.25.1.27@tcp },
>>>>>>> { id: 9, start_nid: 172.25.10.51@tcp, end_nid: 17

Re: [lustre-discuss] Nodemap and setreuid/setregid

2020-03-10 Thread Hans Henrik Happe

Hi,

That explains it. I will file a bug report.

Cheers,
Hans Henrik

On 03.03.2020 16.30, Sebastien Buisson wrote:
> Hi,
>
> I was focused on nodemaps, so I did not try with SSK.
>
> Cheers,
> Sebastien.
>
>> Le 3 mars 2020 à 16:12, Hans Henrik Happe  a écrit :
>>
>> Hi,
>>
>> Did the test 2.12.4 with the same result. Also, I narrowed it down to
>> SSK only. It also happens without nodemaps being activated.
>>
>> @Sebastian: I wonder if you did test this with SSK? I was very focused
>> on nodemaps being the cause to start with.
>>
>> Cheers,
>> Hans Henrik
>>
>> On 29.02.2020 23.44, Hans Henrik Happe wrote:
>>> Hi,
>>>
>>> Sorry for the delay. I had to spend some time nursing the glusterfs that
>>> this lustre fs will replace :-)
>>>
>>> Anyway, I've created a procedure to reproduce the issue. It's attached
>>> together with the testing program.
>>>
>>> Basically, its a simple single mgs,mdt,oss setup, with a nodemap, that
>>> maps a client to a fileset. This works fine. However, when turning on
>>> SSK for cli2mdt the issue appears.
>>>
>>> This was for 2.12.3, I will move on to 2.12.4 just to check.
>>>
>>> Cheers,
>>> Hans Henrik
>>>
>>> On 06.02.2020 23.08, Hans Henrik Happe wrote:
>>>> Hi Sebastien,
>>>>
>>>> Thanks for looking into this.
>>>>
>>>> You are right that nodemap deactivation didn't affect the outcome. I
>>>> must have made a mistake and cannot reproduce.
>>>>
>>>> The uid/gid are on the mds. I can do a sudo to the user and run the test
>>>> program successfully.
>>>>
>>>> I forgot to mention that I use SSK in ski mode.
>>>>
>>>> I think I will start from scratch and see if I can reproduce and find
>>>> out at what point it stops working.
>>>>
>>>> Cheers,
>>>> Hans Henrik
>>>>
>>>> On 06.02.2020 18.19, Sebastien Buisson wrote:
>>>>> Hi,
>>>>>
>>>>> I am not able to reproduce your issue. I compiled your C program, in all 
>>>>> cases I am not getting Permission Denied.
>>>>>
>>>>> You say that it works when you deactivate the nodemap. But given that you 
>>>>> have a fileset on your nodemap entry « sif », when you deactivate it you 
>>>>> might end up doing IOs in a different directory. So you might compare 
>>>>> different things.
>>>>> Also, does the uid/gid 20501 exist on server side?
>>>>>
>>>>> Cheers,
>>>>> Sebastien.
>>>>>
>>>>>> Le 6 févr. 2020 à 14:29, Hans Henrik Happe  a écrit :
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> Thanks for a very quick reply :-) Here are the map:
>>>>>>
>>>>>> # lctl get_param nodemap.sif.*
>>>>>> nodemap.sif.admin_nodemap=1
>>>>>> nodemap.sif.audit_mode=1
>>>>>> nodemap.sif.deny_unknown=0
>>>>>> nodemap.sif.exports=
>>>>>> [
>>>>>> { nid: 172.25.10.51@tcp, uuid: 56bb9b04-9bb5-d7b5-3f50-d62804690db1 },
>>>>>> ]
>>>>>> nodemap.sif.fileset=/sif
>>>>>> nodemap.sif.id=2
>>>>>> nodemap.sif.idmap=
>>>>>> [
>>>>>> { idtype: uid, client_id: 501, fs_id: 20501 },
>>>>>> { idtype: gid, client_id: 501, fs_id: 20501 }
>>>>>> ]
>>>>>> nodemap.sif.map_mode=both
>>>>>> nodemap.sif.ranges=
>>>>>> [
>>>>>> { id: 11, start_nid: 172.25.1.28@tcp, end_nid: 172.25.1.28@tcp },
>>>>>> { id: 10, start_nid: 172.25.1.27@tcp, end_nid: 172.25.1.27@tcp },
>>>>>> { id: 9, start_nid: 172.25.10.51@tcp, end_nid: 172.25.10.51@tcp }
>>>>>> ]
>>>>>> nodemap.sif.sepol=
>>>>>>
>>>>>> nodemap.sif.squash_gid=2
>>>>>> nodemap.sif.squash_uid=2
>>>>>> nodemap.sif.trusted_nodemap=0
>>>>>>
>>>>>> Cheers,
>>>>>> Hans Henrik
>>>>>>
>>>>>> On 06.02.2020 14.17, Sebastien Buisson wrote:
>>>>>>> Hi,
>>>>>>>
>>>>>>> It might be due to a property on the nodemap yo

Re: [lustre-discuss] Nodemap and setreuid/setregid

2020-03-03 Thread Hans Henrik Happe

Hi,

Did the test 2.12.4 with the same result. Also, I narrowed it down to
SSK only. It also happens without nodemaps being activated.

@Sebastian: I wonder if you did test this with SSK? I was very focused
on nodemaps being the cause to start with.

Cheers,
Hans Henrik

On 29.02.2020 23.44, Hans Henrik Happe wrote:
> Hi,
> 
> Sorry for the delay. I had to spend some time nursing the glusterfs that
> this lustre fs will replace :-)
> 
> Anyway, I've created a procedure to reproduce the issue. It's attached
> together with the testing program.
> 
> Basically, its a simple single mgs,mdt,oss setup, with a nodemap, that
> maps a client to a fileset. This works fine. However, when turning on
> SSK for cli2mdt the issue appears.
> 
> This was for 2.12.3, I will move on to 2.12.4 just to check.
> 
> Cheers,
> Hans Henrik
> 
> On 06.02.2020 23.08, Hans Henrik Happe wrote:
>> Hi Sebastien,
>>
>> Thanks for looking into this.
>>
>> You are right that nodemap deactivation didn't affect the outcome. I
>> must have made a mistake and cannot reproduce.
>>
>> The uid/gid are on the mds. I can do a sudo to the user and run the test
>> program successfully.
>>
>> I forgot to mention that I use SSK in ski mode.
>>
>> I think I will start from scratch and see if I can reproduce and find
>> out at what point it stops working.
>>
>> Cheers,
>> Hans Henrik
>>
>> On 06.02.2020 18.19, Sebastien Buisson wrote:
>>> Hi,
>>>
>>> I am not able to reproduce your issue. I compiled your C program, in all 
>>> cases I am not getting Permission Denied.
>>>
>>> You say that it works when you deactivate the nodemap. But given that you 
>>> have a fileset on your nodemap entry « sif », when you deactivate it you 
>>> might end up doing IOs in a different directory. So you might compare 
>>> different things.
>>> Also, does the uid/gid 20501 exist on server side?
>>>
>>> Cheers,
>>> Sebastien.
>>>
>>>> Le 6 févr. 2020 à 14:29, Hans Henrik Happe  a écrit :
>>>>
>>>> Hi,
>>>>
>>>> Thanks for a very quick reply :-) Here are the map:
>>>>
>>>> # lctl get_param nodemap.sif.*
>>>> nodemap.sif.admin_nodemap=1
>>>> nodemap.sif.audit_mode=1
>>>> nodemap.sif.deny_unknown=0
>>>> nodemap.sif.exports=
>>>> [
>>>>  { nid: 172.25.10.51@tcp, uuid: 56bb9b04-9bb5-d7b5-3f50-d62804690db1 },
>>>> ]
>>>> nodemap.sif.fileset=/sif
>>>> nodemap.sif.id=2
>>>> nodemap.sif.idmap=
>>>> [
>>>>  { idtype: uid, client_id: 501, fs_id: 20501 },
>>>>  { idtype: gid, client_id: 501, fs_id: 20501 }
>>>> ]
>>>> nodemap.sif.map_mode=both
>>>> nodemap.sif.ranges=
>>>> [
>>>>  { id: 11, start_nid: 172.25.1.28@tcp, end_nid: 172.25.1.28@tcp },
>>>>  { id: 10, start_nid: 172.25.1.27@tcp, end_nid: 172.25.1.27@tcp },
>>>>  { id: 9, start_nid: 172.25.10.51@tcp, end_nid: 172.25.10.51@tcp }
>>>> ]
>>>> nodemap.sif.sepol=
>>>>
>>>> nodemap.sif.squash_gid=2
>>>> nodemap.sif.squash_uid=2
>>>> nodemap.sif.trusted_nodemap=0
>>>>
>>>> Cheers,
>>>> Hans Henrik
>>>>
>>>> On 06.02.2020 14.17, Sebastien Buisson wrote:
>>>>> Hi,
>>>>>
>>>>> It might be due to a property on the nodemap you defined.
>>>>> Could you please dump your nodemap definition?
>>>>>
>>>>> Thanks,
>>>>> Sebastien.
>>>>>
>>>>>
>>>>>> Le 6 févr. 2020 à 14:14, Hans Henrik Happe 
>>>>>>  a écrit :
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> Has anyone had success with gocryptfs 1.7.x on top of a Lustre nodemap?
>>>>>>
>>>>>> I've tested with Lustre 2.12.3.
>>>>>>
>>>>>> I found that gocryptfs 1.6 worked. However, with 1.7.x I got a lot of
>>>>>> "Permission denied". I tried all permutations of trusted and admin on
>>>>>> the nodemap.
>>>>>>
>>>>>> By stracing a bit, I've created a small peace of code provoking the 
>>>>>> issue:
>>>>>>
>>>>>> ---
>>>>>>
>>>>>> #include 
>>>>>> #i

Re: [lustre-discuss] Nodemap and setreuid/setregid

2020-02-29 Thread Hans Henrik Happe

Hi,

Sorry for the delay. I had to spend some time nursing the glusterfs that
this lustre fs will replace :-)

Anyway, I've created a procedure to reproduce the issue. It's attached
together with the testing program.

Basically, its a simple single mgs,mdt,oss setup, with a nodemap, that
maps a client to a fileset. This works fine. However, when turning on
SSK for cli2mdt the issue appears.

This was for 2.12.3, I will move on to 2.12.4 just to check.

Cheers,
Hans Henrik

On 06.02.2020 23.08, Hans Henrik Happe wrote:
> Hi Sebastien,
>
> Thanks for looking into this.
>
> You are right that nodemap deactivation didn't affect the outcome. I
> must have made a mistake and cannot reproduce.
>
> The uid/gid are on the mds. I can do a sudo to the user and run the test
> program successfully.
>
> I forgot to mention that I use SSK in ski mode.
>
> I think I will start from scratch and see if I can reproduce and find
> out at what point it stops working.
>
> Cheers,
> Hans Henrik
>
> On 06.02.2020 18.19, Sebastien Buisson wrote:
>> Hi,
>>
>> I am not able to reproduce your issue. I compiled your C program, in all 
>> cases I am not getting Permission Denied.
>>
>> You say that it works when you deactivate the nodemap. But given that you 
>> have a fileset on your nodemap entry « sif », when you deactivate it you 
>> might end up doing IOs in a different directory. So you might compare 
>> different things.
>> Also, does the uid/gid 20501 exist on server side?
>>
>> Cheers,
>> Sebastien.
>>
>>> Le 6 févr. 2020 à 14:29, Hans Henrik Happe  a écrit :
>>>
>>> Hi,
>>>
>>> Thanks for a very quick reply :-) Here are the map:
>>>
>>> # lctl get_param nodemap.sif.*
>>> nodemap.sif.admin_nodemap=1
>>> nodemap.sif.audit_mode=1
>>> nodemap.sif.deny_unknown=0
>>> nodemap.sif.exports=
>>> [
>>>  { nid: 172.25.10.51@tcp, uuid: 56bb9b04-9bb5-d7b5-3f50-d62804690db1 },
>>> ]
>>> nodemap.sif.fileset=/sif
>>> nodemap.sif.id=2
>>> nodemap.sif.idmap=
>>> [
>>>  { idtype: uid, client_id: 501, fs_id: 20501 },
>>>  { idtype: gid, client_id: 501, fs_id: 20501 }
>>> ]
>>> nodemap.sif.map_mode=both
>>> nodemap.sif.ranges=
>>> [
>>>  { id: 11, start_nid: 172.25.1.28@tcp, end_nid: 172.25.1.28@tcp },
>>>  { id: 10, start_nid: 172.25.1.27@tcp, end_nid: 172.25.1.27@tcp },
>>>  { id: 9, start_nid: 172.25.10.51@tcp, end_nid: 172.25.10.51@tcp }
>>> ]
>>> nodemap.sif.sepol=
>>>
>>> nodemap.sif.squash_gid=2
>>> nodemap.sif.squash_uid=2
>>> nodemap.sif.trusted_nodemap=0
>>>
>>> Cheers,
>>> Hans Henrik
>>>
>>> On 06.02.2020 14.17, Sebastien Buisson wrote:
>>>> Hi,
>>>>
>>>> It might be due to a property on the nodemap you defined.
>>>> Could you please dump your nodemap definition?
>>>>
>>>> Thanks,
>>>> Sebastien.
>>>>
>>>>
>>>>> Le 6 févr. 2020 à 14:14, Hans Henrik Happe 
>>>>>  a écrit :
>>>>>
>>>>> Hi,
>>>>>
>>>>> Has anyone had success with gocryptfs 1.7.x on top of a Lustre nodemap?
>>>>>
>>>>> I've tested with Lustre 2.12.3.
>>>>>
>>>>> I found that gocryptfs 1.6 worked. However, with 1.7.x I got a lot of
>>>>> "Permission denied". I tried all permutations of trusted and admin on
>>>>> the nodemap.
>>>>>
>>>>> By stracing a bit, I've created a small peace of code provoking the issue:
>>>>>
>>>>> ---
>>>>>
>>>>> #include 
>>>>> #include 
>>>>> #include 
>>>>> #include 
>>>>>
>>>>> int main() {
>>>>>  int r;
>>>>>
>>>>>  setregid(-1, 501);
>>>>>  setreuid(-1, 501);
>>>>>
>>>>>  r = open("foo", O_CREAT, S_IRWXU);
>>>>>  if (r < 0) {
>>>>>perror("open");
>>>>>  }
>>>>>  return 0;
>>>>> }
>>>>>
>>>>> ---
>>>>>
>>>>>
>>>>>
>>>>> When run as root in a directory owned by uid=501 and gid=501 in a
>>>>> nodemap based Lustre fs it returns:
>>>>>
>>>>> open: Permission denie

Re: [lustre-discuss] Nodemap and setreuid/setregid

2020-02-06 Thread Hans Henrik Happe

Hi Sebastien,

Thanks for looking into this.

You are right that nodemap deactivation didn't affect the outcome. I
must have made a mistake and cannot reproduce.

The uid/gid are on the mds. I can do a sudo to the user and run the test
program successfully.

I forgot to mention that I use SSK in ski mode.

I think I will start from scratch and see if I can reproduce and find
out at what point it stops working.

Cheers,
Hans Henrik

On 06.02.2020 18.19, Sebastien Buisson wrote:
> Hi,
> 
> I am not able to reproduce your issue. I compiled your C program, in all 
> cases I am not getting Permission Denied.
> 
> You say that it works when you deactivate the nodemap. But given that you 
> have a fileset on your nodemap entry « sif », when you deactivate it you 
> might end up doing IOs in a different directory. So you might compare 
> different things.
> Also, does the uid/gid 20501 exist on server side?
> 
> Cheers,
> Sebastien.
> 
>> Le 6 févr. 2020 à 14:29, Hans Henrik Happe  a écrit :
>>
>> Hi,
>>
>> Thanks for a very quick reply :-) Here are the map:
>>
>> # lctl get_param nodemap.sif.*
>> nodemap.sif.admin_nodemap=1
>> nodemap.sif.audit_mode=1
>> nodemap.sif.deny_unknown=0
>> nodemap.sif.exports=
>> [
>>  { nid: 172.25.10.51@tcp, uuid: 56bb9b04-9bb5-d7b5-3f50-d62804690db1 },
>> ]
>> nodemap.sif.fileset=/sif
>> nodemap.sif.id=2
>> nodemap.sif.idmap=
>> [
>>  { idtype: uid, client_id: 501, fs_id: 20501 },
>>  { idtype: gid, client_id: 501, fs_id: 20501 }
>> ]
>> nodemap.sif.map_mode=both
>> nodemap.sif.ranges=
>> [
>>  { id: 11, start_nid: 172.25.1.28@tcp, end_nid: 172.25.1.28@tcp },
>>  { id: 10, start_nid: 172.25.1.27@tcp, end_nid: 172.25.1.27@tcp },
>>  { id: 9, start_nid: 172.25.10.51@tcp, end_nid: 172.25.10.51@tcp }
>> ]
>> nodemap.sif.sepol=
>>
>> nodemap.sif.squash_gid=2
>> nodemap.sif.squash_uid=2
>> nodemap.sif.trusted_nodemap=0
>>
>> Cheers,
>> Hans Henrik
>>
>> On 06.02.2020 14.17, Sebastien Buisson wrote:
>>> Hi,
>>>
>>> It might be due to a property on the nodemap you defined.
>>> Could you please dump your nodemap definition?
>>>
>>> Thanks,
>>> Sebastien.
>>>
>>>
>>>> Le 6 févr. 2020 à 14:14, Hans Henrik Happe 
>>>>  a écrit :
>>>>
>>>> Hi,
>>>>
>>>> Has anyone had success with gocryptfs 1.7.x on top of a Lustre nodemap?
>>>>
>>>> I've tested with Lustre 2.12.3.
>>>>
>>>> I found that gocryptfs 1.6 worked. However, with 1.7.x I got a lot of
>>>> "Permission denied". I tried all permutations of trusted and admin on
>>>> the nodemap.
>>>>
>>>> By stracing a bit, I've created a small peace of code provoking the issue:
>>>>
>>>> ---
>>>>
>>>> #include 
>>>> #include 
>>>> #include 
>>>> #include 
>>>>
>>>> int main() {
>>>>  int r;
>>>>
>>>>  setregid(-1, 501);
>>>>  setreuid(-1, 501);
>>>>
>>>>  r = open("foo", O_CREAT, S_IRWXU);
>>>>  if (r < 0) {
>>>>perror("open");
>>>>  }
>>>>  return 0;
>>>> }
>>>>
>>>> ---
>>>>
>>>>
>>>>
>>>> When run as root in a directory owned by uid=501 and gid=501 in a
>>>> nodemap based Lustre fs it returns:
>>>>
>>>> open: Permission denied
>>>>
>>>> Works when I deactivate nodemap (lctl nodemap_activate 0) or just use a
>>>> plain local fs.
>>>>
>>>> I don't think this is intended behavior for nodemaps, but I might be wrong.
>>>>
>>>> Cheers,
>>>> Hans Henrik
>>>> ___
>>>> lustre-discuss mailing list
>>>>
>>>> lustre-discuss@lists.lustre.org
>>>> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>>
>> ___
>> lustre-discuss mailing list
>> lustre-discuss@lists.lustre.org
>> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
> 
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

Re: [lustre-discuss] Nodemap and setreuid/setregid

2020-02-06 Thread Hans Henrik Happe

Hi,

Thanks for a very quick reply :-) Here are the map:

# lctl get_param nodemap.sif.*
nodemap.sif.admin_nodemap=1
nodemap.sif.audit_mode=1
nodemap.sif.deny_unknown=0
nodemap.sif.exports=
[
 { nid: 172.25.10.51@tcp, uuid: 56bb9b04-9bb5-d7b5-3f50-d62804690db1 },
]
nodemap.sif.fileset=/sif
nodemap.sif.id=2
nodemap.sif.idmap=
[
 { idtype: uid, client_id: 501, fs_id: 20501 },
 { idtype: gid, client_id: 501, fs_id: 20501 }
]
nodemap.sif.map_mode=both
nodemap.sif.ranges=
[
 { id: 11, start_nid: 172.25.1.28@tcp, end_nid: 172.25.1.28@tcp },
 { id: 10, start_nid: 172.25.1.27@tcp, end_nid: 172.25.1.27@tcp },
 { id: 9, start_nid: 172.25.10.51@tcp, end_nid: 172.25.10.51@tcp }
]
nodemap.sif.sepol=

nodemap.sif.squash_gid=2
nodemap.sif.squash_uid=2
nodemap.sif.trusted_nodemap=0

Cheers,
Hans Henrik

On 06.02.2020 14.17, Sebastien Buisson wrote:
> Hi,
>
> It might be due to a property on the nodemap you defined.
> Could you please dump your nodemap definition?
>
> Thanks,
> Sebastien.
>
>> Le 6 févr. 2020 à 14:14, Hans Henrik Happe  a écrit :
>>
>> Hi,
>>
>> Has anyone had success with gocryptfs 1.7.x on top of a Lustre nodemap?
>>
>> I've tested with Lustre 2.12.3.
>>
>> I found that gocryptfs 1.6 worked. However, with 1.7.x I got a lot of
>> "Permission denied". I tried all permutations of trusted and admin on
>> the nodemap.
>>
>> By stracing a bit, I've created a small peace of code provoking the issue:
>>
>> ---
>>
>> #include 
>> #include 
>> #include 
>> #include 
>>
>> int main() {
>>  int r;
>>
>>  setregid(-1, 501);
>>  setreuid(-1, 501);
>>
>>  r = open("foo", O_CREAT, S_IRWXU);
>>  if (r < 0) {
>>perror("open");
>>  }
>>  return 0;
>> }
>>
>> ---
>>
>>
>>
>> When run as root in a directory owned by uid=501 and gid=501 in a
>> nodemap based Lustre fs it returns:
>>
>> open: Permission denied
>>
>> Works when I deactivate nodemap (lctl nodemap_activate 0) or just use a
>> plain local fs.
>>
>> I don't think this is intended behavior for nodemaps, but I might be wrong.
>>
>> Cheers,
>> Hans Henrik
>> ___
>> lustre-discuss mailing list
>> lustre-discuss@lists.lustre.org
>> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

[lustre-discuss] Nodemap and setreuid/setregid

2020-02-06 Thread Hans Henrik Happe

Hi,

Has anyone had success with gocryptfs 1.7.x on top of a Lustre nodemap?

I've tested with Lustre 2.12.3.

I found that gocryptfs 1.6 worked. However, with 1.7.x I got a lot of
"Permission denied". I tried all permutations of trusted and admin on
the nodemap.

By stracing a bit, I've created a small peace of code provoking the issue:

---

#include 
#include 
#include 
#include 

int main() {
  int r;

  setregid(-1, 501);
  setreuid(-1, 501);

  r = open("foo", O_CREAT, S_IRWXU);
  if (r < 0) {
perror("open");
  }
  return 0;
}

---



When run as root in a directory owned by uid=501 and gid=501 in a
nodemap based Lustre fs it returns:

open: Permission denied

Works when I deactivate nodemap (lctl nodemap_activate 0) or just use a
plain local fs.

I don't think this is intended behavior for nodemaps, but I might be wrong.

Cheers,
Hans Henrik
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

Re: [lustre-discuss] ZFS mdt

2020-01-27 Thread Hans Henrik Happe

Hi,

Obviously it depends on the number of inodes (files,dirs etc) you have.
However, if the zfs pool uses ashift=12, which will be the default for
most SSDs and large HDDs, it can have quite an overhead for MDTs. It
takes a lot of bandwidth on MDTs if you do a lot of metedata operations.

You can check it by running 'zdb |grep ashift'.

We are using ashift=9 on SSDs for this reason, even though the SDDs
would prefer it differently.

Cheers,
Hans Henrik

On 27.01.2020 19.07, Nehring, Shane R [LAS] wrote:
> Hey all,
> 
> We've been running a lustre volume for a few years now and it's been
> working quite well. We've been using ZFS as the backend storage and
> while that's been working well I've noticed that the space usage is
> a little weird on the mdt:
> 
> NAMEPROPERTY   VALUE SOURCE
> store/work-mdt  used   4.91T -
> store/work-mdt  logicalused960G  -
> store/work-mdt  referenced 4.91T -
> store/work-mdt  logicalreferenced  960G  -
> store/work-mdt  compressratio  1.00x -
> 
> Just wondering if anyone else has noticed this kind of overhead.
> 
> 
> ___
> lustre-discuss mailing list
> lustre-discuss@lists.lustre.org
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
> 
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

[lustre-discuss] Nodemap and multi-tenancy

2020-01-26 Thread Hans Henrik Happe

Hi,

When looking into the documentation (28.2.1) and also while testing, it
seems that it is not possible to give a tenant access to a fileset like
it was a regular lustre fs.

I would like to map IDs to a separate range including root (0). This
works when admin=0 for the nodemap, but then root will not be able to
modify other user's files. In admin=1 mode, root is not mapped and will
become id 0 on the underlying fs.

Have I missed a way to accomplish this? If not it would be on my
wishlist. Mapping ranges is also on that list.

I could also see a lot of quota control scenarios for this kind of
setup. I.e. allow to control quotas for mapped UIDs and GIDs, but not
others.

Cheers,
Hans Henrik




___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

Re: [lustre-discuss] Nodemap, ssk and mutiple fileset from one client

2020-01-26 Thread Hans Henrik Happe

Thanks, for the input. WRT to one LNET per fileset, is there some
technical reason for this design?

Cheers,
Hans Henrik

On 06.01.2020 09.41, Moreno Diego (ID SIS) wrote:
>
> I’m not sure about the SSK limitations but I know for sure that you
> can have multiple filesets belonging to the same filesystem on a
> client. As you already said, you’ll basically need to have one LNET
> per fileset (o2ib0, o2ib1, o2ib2), then mount each fileset with the
> option ‘-o network=’.
>
>  
>
> I gave a talk on our setup during last LAD (https://bit.ly/35oaPl7),
> slide 24 contains a few details on this. It’s for a routed
> configuration but we also had it working without LNET routers.
>
>  
>
> Diego
>
>  
>
>  
>
> *From: *lustre-discuss  on
> behalf of Jeremy Filizetti 
> *Date: *Tuesday, 31 December 2019 at 04:22
> *To: *Hans Henrik Happe 
> *Cc: *"lustre-discuss@lists.lustre.org" 
> *Subject: *Re: [lustre-discuss] Nodemap, ssk and mutiple fileset from
> one client
>
>  
>
> It doesn't look like this would be possible due to nodemap or SSK
> limitations.  As you pointed out, nodemap must associate a NID with a
> single nodemap.  SSK was intentionally tied to nodemap by design.  It
> does a lookup on the nodemap of a NID to verify it matches what is
> found in the server key.  I think even if you used multiple NIDs for a
> client like o2ib(ib0),o2ib1(ib0) you would still run into issues due
> to LNet, but I'm not certain on that.
>
>  
>
> Jeremy
>
>  
>
> On Mon, Dec 30, 2019 at 9:30 PM Hans Henrik Happe  <mailto:ha...@nbi.dk>> wrote:
>
> Hi,
>
> Is it possible to have one client mount multiple fileset's with
> different ssk keys.
>
> Basically, we would just like to hand out a key to clients that should
> be allowed to mount a specific fileset (subdir). First, it looks like
> the nodemap must contain the client NID for it to be able to
> mount. The
> key is not enough. Secondly, nodemaps are not allowed hold the same
> NIDs, so it seems impossible to have multiple ssk protected filesets
> mounted from one client, unless multiple NIDs are used?
>
> Example: For nodes A and B and filesets f0 (key0) and f1 (key1).
>
> A: Should be allowed to mount f0 (key0).
> B: Should be allowed to mount f0 (key0) and f1 (key1).
>
> Cheers,
> Hans Henrik
> ___
> lustre-discuss mailing list
> lustre-discuss@lists.lustre.org
> <mailto:lustre-discuss@lists.lustre.org>
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>

___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

[lustre-discuss] Nodemap, ssk and mutiple fileset from one client

2019-12-30 Thread Hans Henrik Happe

Hi,

Is it possible to have one client mount multiple fileset's with
different ssk keys.

Basically, we would just like to hand out a key to clients that should
be allowed to mount a specific fileset (subdir). First, it looks like
the nodemap must contain the client NID for it to be able to mount. The
key is not enough. Secondly, nodemaps are not allowed hold the same
NIDs, so it seems impossible to have multiple ssk protected filesets
mounted from one client, unless multiple NIDs are used?

Example: For nodes A and B and filesets f0 (key0) and f1 (key1).

A: Should be allowed to mount f0 (key0).
B: Should be allowed to mount f0 (key0) and f1 (key1).

Cheers,
Hans Henrik
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

Re: [lustre-discuss] Set skpath for MDS/OSS

2019-12-30 Thread Hans Henrik Happe

Thanks. That was my initial thinking, but it did not play well. I got
into a state that required a hard reset. I suspect it was due to bad
flavor configuration. Anyway, I got things working loading the key
manually, so now I can retry.

Cheers,
Hans Henrik

On 30.12.2019 16.46, Jeremy Filizetti wrote:
> You should be able to just add "--skpath" to the MOUNT_OPTIONS in
> /etc/sysconfig/lustre.
> 
> Jeremy
> 
> On Sat, Dec 28, 2019 at 11:50 AM Hans Henrik Happe  <mailto:ha...@nbi.dk>> wrote:
> 
> Hi,
> 
> What is the correct way to set skpath for an automated startup using
> ldev.conf? I'm using ZFS.
> 
> Cheers,
> Hans Henrik
> 
> ___
> lustre-discuss mailing list
> lustre-discuss@lists.lustre.org <mailto:lustre-discuss@lists.lustre.org>
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
> 
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

[lustre-discuss] Set skpath for MDS/OSS

2019-12-28 Thread Hans Henrik Happe

Hi,

What is the correct way to set skpath for an automated startup using
ldev.conf? I'm using ZFS.

Cheers,
Hans Henrik

___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

[lustre-discuss] MDT ashift and DoM on ZFS

2019-12-04 Thread Hans Henrik Happe

Hi,

It seems that it still saves a lot of space to use ashift=9 on ZFS MDTs.
However, for DoM one would want the performance of ashift=12 for SSDs.

What is your thoughts?

Thanks,
Hans Henrik
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

Re: [lustre-discuss] Lustre snapshots

2019-05-30 Thread Hans Henrik Happe

It's only the mapping from the snapshot name to hex value used for
mounting I'm asking about. It's explained in the doc (30.3.3. Mounting a
Snapshot).

The client cannot call 'lctl snapshot_list'. Guess the main fs and its
snapshot fs' are so separated that putting the mappings in the client
/proc structures of the main fs would become ugly.

We will just communicate client mount name through another channel.

Cheers,
Hans Henrik

On 30/05/2019 10.05, Andreas Dilger wrote:
> On May 30, 2019, at 01:50, Hans Henrik Happe  wrote:
>>
>> Hi,
>>
>> I've tested snapshots and they work as expected.
>>
>> However, I'm wondering how the clients should mount without knowing the
>> mount names of snapshots. As I see it there are two possibilities:
>>
>> 1. Clients get ssh (limited) access to MGS (Don't want that).
>> 2. The names are communicated through another channel. Perhaps, written
>> to a file on the head Lustre filesystem or just directly to all clients
>> that need snapshot mounting through ssh.
>>
>> If there isn't a better way, I think number two is the way to go.
> 
> You could use automount to mount the snapshots on the clients, when they are 
> needed.  The automount map could be created automatically from the snapshot 
> list.
> 
> Probably it makes the most sense to limit snapshot access to a subset of 
> nodes, such as user login nodes, so that users do not try to compute from the 
> snapshot filesystems directly.
> 
>> Guess the limited length of Lustre fs names is preventing the use of the
>> snapshots names directly?
> 
> If you rotate the snapshots like Apple Time Machine, you could use generic 
> snapshot names like "last_month", "last_week", "yesterday", "6h_ago" and such 
> and not have to update the automount map.  The filesystem names could be 
> mostly irrelevant if the snapshot mountpoints are chosen properly, like 
> "$MOUNT/.snapshot/last_month/" or similar.
> 
> Cheers, Andreas
> --
> Andreas Dilger
> Principal Lustre Architect
> Whamcloud
> 
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

[lustre-discuss] Lustre snapshots

2019-05-30 Thread Hans Henrik Happe

Hi,

I've tested snapshots and they work as expected.

However, I'm wondering how the clients should mount without knowing the
mount names of snapshots. As I see it there are two possibilities:

1. Clients get ssh (limited) access to MGS (Don't want that).
2. The names are communicated through another channel. Perhaps, written
to a file on the head Lustre filesystem or just directly to all clients
that need snapshot mounting through ssh.

If there isn't a better way, I think number two is the way to go.

Guess the limited length of Lustre fs names is preventing the use of the
snapshots names directly?

Cheers,
Hans Henrik
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

Re: [lustre-discuss] 2.10 <-> 2.12 interoperability?

2019-05-03 Thread Hans Henrik Happe

On 03/05/2019 22.41, Andreas Dilger wrote:
> On May 3, 2019, at 14:33, Patrick Farrell  wrote:
>>
>> Thomas,
>>
>> As a general rule, Lustre only supports mixing versions on servers for 
>> rolling upgrades.
>>
>> - Patrick
> 
> And only then between maintenance versions of the same release (e.g. 2.10.6
> and 2.10.7).  If you are upgrading, say, 2.7.21 to 2.10.6 then you would need
> to fail over half of the targets, upgrade half of the servers, fail back (at
> which point all targets would be running on the same new version), upgrade the
> other half of the servers, and then restore normal operation.
> 
> There is also a backport of the LU-11507 patch for 2.10 that could be used
> instead of upgrading just one server to 2.12.
> 
> Cheers, Andreas

I think the documentation is quite clear:

http://doc.lustre.org/lustre_manual.xhtml#upgradinglustre

An upgrade path for major releases on the servers would be nice, though.
Wonder if this could be done with a mode where clients flush all they
got and are put into a blocking mode. I guess the hard part would be to
re-negotiate all the state after the upgrade, which is hard enough for
regular replays.

Cheers,
Hans Henrik

>> On Wednesday, April 24, 2019 3:54:09 AM, Thomas Roth  wrote:
>>>  
>>> Hi all,
>>>
>>> OS=CentOS 7.5
>>> Lustre 2.10.6
>>>
>>> One of the OSS (one OST only) was upgraded to zfs 0.7.13, and LU-11507 
>>> forced an upgrade of Lustre to 2.12
>>>
>>> Mounts, reconnects, recovers, but then is unusable, and the MDS reports:
>>>
>>> Lustre: 13650:0:(mdt_handler.c:5350:mdt_connect_internal()) test-MDT: 
>>> client
>>> test-MDT-lwp-OST0002_UUID does not support ibits lock, either very old 
>>> or an invalid client: flags
>>> 0x204140104320
>>>
>>>
>>> So far I have not found any hints that these versions would not cooperate, 
>>> or that I should have set a
>>> certain parameter.
>>> LU-10175 indicates that the ibits have some connection to data-on-mdt which 
>>> we don't use.
>>>
>>> Any suggestions?
>>>
>>>
>>> Regards,
>>> Thomas
> --
> Andreas Dilger
> Principal Lustre Architect
> Whamcloud
> 
> ___
> lustre-discuss mailing list
> lustre-discuss@lists.lustre.org
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
> 
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

Re: [lustre-discuss] ZFS tuning for MDT/MGS

2019-04-02 Thread Hans Henrik Happe

AFAIK, that is what sync=disabled does. It pretends syncs are commited.
It will flush after 5 seconds but there might be other output that will
stall it longer.

On 02/04/2019 14.28, Degremont, Aurelien wrote:
> This is very unlikely.
> The only reason that could happened is this hardware is acknowledging I/O to 
> Lustre that it did not really commit to disk like writeback cache, or a 
> Lustre bug. 
> 
> Le 02/04/2019 14:11, « lustre-discuss au nom de Hans Henrik Happe » 
>  a écrit :
> 
> Isn't there a possibility that the MDS falsely tells the client that a
> transaction has been committed to disk. After that the client might not
> be able to replay, if the MDS dies.
> 
> Cheers,
> Hans Henrik
> 
> On 19/03/2019 21.32, Andreas Dilger wrote:
> > You would need to lose the MDS within a few seconds after the client to
> > lose filesystem operations, since the clients will replay their
> > operations if the MDS crashes, and ZFS commits the current transaction
> > every 1s, so this setting only really affects "sync" from the client. 
> > 
> > Cheers, Andreas
> > 
> > On Mar 19, 2019, at 12:43, George Melikov  > <mailto:m...@gmelikov.ru>> wrote:
> > 
> >> Can you explain the reason about 'zfs set sync=disabled mdt0'? Are you
> >> ready to lose last transaction on that mdt during power failure? What
> >> did I miss?
> >>
> >> 14.03.2019, 01:00, "Riccardo Veraldi"  >> <mailto:riccardo.vera...@cnaf.infn.it>>:
> >>> these are the zfs settings I use on my MDSes
> >>>
> >>>  zfs set mountpoint=none mdt0
> >>>  zfs set sync=disabled mdt0
> >>>
> >>>  zfs set atime=off amdt0
> >>>  zfs set redundant_metadata=most mdt0
> >>>  zfs set xattr=sa mdt0
> >>>
> >>> if youor MDT partition is on a 4KB sector disk then you can use
> >>> ashift=12 when you create the filesystem but zfs is pretty smart and
> >>> in my case it recognized it automatically and used ashift=12
> >>> automatically.
> >>>
> >>> also here are the zfs kernel modules parameters i use to ahve better
> >>> performance. I use it on both MDS and OSSes
> >>>
> >>> options zfs zfs_prefetch_disable=1
> >>> options zfs zfs_txg_history=120
> >>> options zfs metaslab_debug_unload=1
> >>> #
> >>> options zfs zfs_vdev_scheduler=deadline
> >>> options zfs zfs_vdev_async_write_active_min_dirty_percent=20
> >>> #
> >>> options zfs zfs_vdev_scrub_min_active=48
> >>> options zfs zfs_vdev_scrub_max_active=128
> >>> #options zfs zfs_vdev_sync_write_min_active=64
> >>> #options zfs zfs_vdev_sync_write_max_active=128
> >>> #
> >>> options zfs zfs_vdev_sync_write_min_active=8
> >>> options zfs zfs_vdev_sync_write_max_active=32
> >>> options zfs zfs_vdev_sync_read_min_active=8
> >>> options zfs zfs_vdev_sync_read_max_active=32
> >>> options zfs zfs_vdev_async_read_min_active=8
> >>> options zfs zfs_vdev_async_read_max_active=32
> >>> options zfs zfs_top_maxinflight=320
> >>> options zfs zfs_txg_timeout=30
> >>> options zfs zfs_dirty_data_max_percent=40
> >>> options zfs zfs_vdev_async_write_min_active=8
> >>> options zfs zfs_vdev_async_write_max_active=32
> >>>
> >>> some people may disagree with me anyway after years of trying
> >>> different options I reached this stable configuration.
> >>>
> >>> then there are a bunch of other important Lustre level optimizations
> >>> that you can do if you are looking for performance increase.
> >>>
> >>> Cheers
> >>>
> >>> Rick
> >>>
> >>> On 3/13/19 11:44 AM, Kurt Strosahl wrote:
> >>>>
> >>>> Good Afternoon,
> >>>>
> >>>>
> >>>> I'm reviewing the zfs parameters for a new metadata system and I
> >>>> was looking to see if anyone had examples (good or bad) of zfs
> >>>> parameters?  I'm assuming that the MDT won't benefit f

Re: [lustre-discuss] ZFS tuning for MDT/MGS

2019-04-02 Thread Hans Henrik Happe

Isn't there a possibility that the MDS falsely tells the client that a
transaction has been committed to disk. After that the client might not
be able to replay, if the MDS dies.

Cheers,
Hans Henrik

On 19/03/2019 21.32, Andreas Dilger wrote:
> You would need to lose the MDS within a few seconds after the client to
> lose filesystem operations, since the clients will replay their
> operations if the MDS crashes, and ZFS commits the current transaction
> every 1s, so this setting only really affects "sync" from the client. 
> 
> Cheers, Andreas
> 
> On Mar 19, 2019, at 12:43, George Melikov  > wrote:
> 
>> Can you explain the reason about 'zfs set sync=disabled mdt0'? Are you
>> ready to lose last transaction on that mdt during power failure? What
>> did I miss?
>>
>> 14.03.2019, 01:00, "Riccardo Veraldi" > >:
>>> these are the zfs settings I use on my MDSes
>>>
>>>  zfs set mountpoint=none mdt0
>>>  zfs set sync=disabled mdt0
>>>
>>>  zfs set atime=off amdt0
>>>  zfs set redundant_metadata=most mdt0
>>>  zfs set xattr=sa mdt0
>>>
>>> if youor MDT partition is on a 4KB sector disk then you can use
>>> ashift=12 when you create the filesystem but zfs is pretty smart and
>>> in my case it recognized it automatically and used ashift=12
>>> automatically.
>>>
>>> also here are the zfs kernel modules parameters i use to ahve better
>>> performance. I use it on both MDS and OSSes
>>>
>>> options zfs zfs_prefetch_disable=1
>>> options zfs zfs_txg_history=120
>>> options zfs metaslab_debug_unload=1
>>> #
>>> options zfs zfs_vdev_scheduler=deadline
>>> options zfs zfs_vdev_async_write_active_min_dirty_percent=20
>>> #
>>> options zfs zfs_vdev_scrub_min_active=48
>>> options zfs zfs_vdev_scrub_max_active=128
>>> #options zfs zfs_vdev_sync_write_min_active=64
>>> #options zfs zfs_vdev_sync_write_max_active=128
>>> #
>>> options zfs zfs_vdev_sync_write_min_active=8
>>> options zfs zfs_vdev_sync_write_max_active=32
>>> options zfs zfs_vdev_sync_read_min_active=8
>>> options zfs zfs_vdev_sync_read_max_active=32
>>> options zfs zfs_vdev_async_read_min_active=8
>>> options zfs zfs_vdev_async_read_max_active=32
>>> options zfs zfs_top_maxinflight=320
>>> options zfs zfs_txg_timeout=30
>>> options zfs zfs_dirty_data_max_percent=40
>>> options zfs zfs_vdev_async_write_min_active=8
>>> options zfs zfs_vdev_async_write_max_active=32
>>>
>>> some people may disagree with me anyway after years of trying
>>> different options I reached this stable configuration.
>>>
>>> then there are a bunch of other important Lustre level optimizations
>>> that you can do if you are looking for performance increase.
>>>
>>> Cheers
>>>
>>> Rick
>>>
>>> On 3/13/19 11:44 AM, Kurt Strosahl wrote:

 Good Afternoon,


     I'm reviewing the zfs parameters for a new metadata system and I
 was looking to see if anyone had examples (good or bad) of zfs
 parameters?  I'm assuming that the MDT won't benefit from a
 recordsize of 1MB, and I've already set the ashift to 12.  I'm using
 an MDT/MGS made up of a stripe across mirrored ssds.


 w/r,

 Kurt


 ___
 lustre-discuss mailing list
 lustre-discuss@lists.lustre.org 
 
 http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.o… 
 
>>>
>>>
>>> ___
>>> lustre-discuss mailing list
>>> lustre-discuss@lists.lustre.org
>>> 
>>> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.o…
>>> 
>>>
>>
>>
>> 
>> Sincerely,
>> George Melikov
>>
>> ___
>> lustre-discuss mailing list
>> lustre-discuss@lists.lustre.org 
>> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
> 
> ___
> lustre-discuss mailing list
> lustre-discuss@lists.lustre.org
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
> 
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

Re: [lustre-discuss] Tools for backing up a ZFS MDT

2019-03-29 Thread Hans Henrik Happe

Hi Kurt,

Haven't got much experience with the complete send/receive to a remote
ZFS fs. However, I've created my own scripts for just sending to files.
Also, I've moved an MDT from ashift=12 to ashift=9 with send/recv. It
worked without any problems.

ZFS above 0.7.9 have issues together with dnodesize != legacy for receives:

https://github.com/zfsonlinux/zfs/issues/8458

I'll look into ZnapZend :-)

Cheers,
Hans Henrik

On 25/03/2019 20.37, Kurt Strosahl wrote:
> Good Afternoon,
> 
> 
>      I've been working on a new lustre file system, now with ZFS on the
> MDT (my current lustre file system uses ldiskfs).  One of the reasons
> for this was the ability to use ZFS snapshots to backup the MDT, and I'm
> wondering if anyone has experience with zfs backup tools like ZnapZend?
> 
> 
> w/r,
> 
> Kurt J. Strosahl
> System Administrator: Lustre, HPC
> Scientific Computing Group, Thomas Jefferson National Accelerator Facility
> 
> 
> ___
> lustre-discuss mailing list
> lustre-discuss@lists.lustre.org
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
> 
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

[lustre-discuss] Quota doc

2019-03-16 Thread Hans Henrik Happe

Hi,

I think the quota documentation is a bit unclear about verification. In
25.2.2.1 it states:

"The per-target enforcement status can still be verified by running the
following command on the MDS(s)"

I think it should be OSS for block and MDS for inode quota?

Cheers,
Hans Henrik
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

[lustre-discuss] Disaster recover files from ZFS OSTs

2019-02-16 Thread Hans Henrik Happe

Hi,

Moving a system away from Gluster to Lustre there is one feature we
miss. With Gluster the name space and data can easily be found on the
underlying filesystems. While we never needed it with Lustre, it would
be nice to have it as a last resort. Lustre has been rock solid, while
we have needed it plenty on Gluster.

Looking at the output of 'getstripe' I figured out how to find files
using the objids mod 128 (that's was how many dX dirs i found). Easy
with stripe-count=1, probably also with higher counts.

So given a backup of MDTs we should be able to do some poking around. We
could also do a database of getstripe info. Perhaps robinhood could help.

Is there some formal description of Lustre object layout? It seems
simple but I'm wondering if there are pitfalls. PFL seems to be pretty
well described.

Perhaps there are already tools for this, that we have missed?

Side note:

Looking at how Lustre put object files in large buckets, I was wondering
if this ZFS issue could become an issue:

https://github.com/zfsonlinux/zfs/issues/3967

Guess these buckets are rarely listed?

Cheers,
Hans Henrik
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

[lustre-discuss] help ldev.conf usage clarification

2018-09-13 Thread Hans Henrik Happe

Hi,

The third column should be the Lustre target names like:

MGS
-MDT
-OST
-OST0001

Check 'man ldev.conf'.

Cheers,
Hans Henrik

On 13-09-2018 00:11, Riccardo Veraldi wrote:
> Hello,
> 
> I wanted to ask some clarifiaction on ldev.conf usage and features.
> 
> I am using ldev.conf only on my ZFS lustre OSSes and MDS.
> 
> Anyway I hae a doubt on what should go in that file.
> 
> I have seen people having only the metadata configuration in it like for
> example:
> 
> mds01 - mgs zfs:lustre01-mgs/mgs
> mds01 - mdt0 zfs:lustre01-mdt0/mdt0
> 
> and people filling hte file with both mgs settings and listing also all
> the OSSes/OSTs then spreading the same ldev.conf file over all the OSSes
> like in this example with
> 
> 3 OSSes where each one has one OST:
> 
> 
> mds01 - mgs zfs:lustre01-mgs/mgs
> mds01 - mdt0 zfs:lustre01-mdt0/mdt0
> #
> drp-tst-ffb01 - OST01 zfs:lustre01-ost01/ost01
> drp-tst-ffb02 - OST02 zfs:lustre01-ost02/ost02
> drp-tst-ffb03 - OST03 zfs:lustre01-ost03/ost03
> 
> is this correct or only the metadata information should stay in ldev.conf ?
> 
> Also can ldev.conf be used with ldiskfs based cluster ? On ldiskfs based
> clsuters I usually mount the metadata partition and OSS partitions in
> fstab and my ldev.conf is empty.
> 
> thanks
> 
> Rick
> 
> 
> 
> ___
> lustre-discuss mailing list
> lustre-discuss@lists.lustre.org
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

[lustre-discuss] MOFED 4.4-1.0.0.0

2018-09-12 Thread Hans Henrik Happe

Hi again,

Took the chance and tested MOFED 4.3 on CentOS 6.10 even though it is
not supported. It works great. I have a case at Mellanox. Hope they will
find the issue.

Cheers,
Hans Henrik

On 05-09-2018 15:50, Hans Henrik Happe wrote:
> Hi,
> 
> Here's an update. Still no luck with CentOS 6.10 and MOFED 4.4-1.0.0.0.
> The same with the new 4.4-2.0.7.0 and Lustre 2.10.5.
> 
> However, CentOS 7.5 works with 4.4-2.0.7.0 in my setup. We are testing
> CentOS 7, but not ready to roll it out yet. It will be CentOS 6 until then.
> 
> Mellanox haven't been able to track it down. We might give them access
> to debug this.
> 
> Anyone, using CentOS 6.10 and MOFED 4.4 successfully with Lustre?
> 
> Cheers,
> Hans Henrik
> 
> 
> On 04-08-2018 00:04, Riccardo Veraldi wrote:
>> Hello,
>> Lustre 2.10.4 works perfectly with Centos 7.5, I Tested it a lot on
>> different cluster types.
>> if you have a chance to move to Centos 7.5 that would be the perfect
>> choice. It's always troublesome to mess up with Mellanox OFED.
>> Cheers
>>
>> Rick
>>
>> On 8/3/18 4:53 AM, Hans Henrik Happe wrote:
>>> Hi,
>>>
>>> Did anyone try Mellanox OFED 4.4-1.0.0.0?
>>>
>>> With Lustre 2.10.4 and CentOS 6.10 and 6.9 we have issues. Using CentOS
>>> 6.9 and the previous supported version there are no problems (CentOS
>>> 6.10 is not supported on the previous).
>>>
>>> We are using ConnectX-3 cards on kernel 2.6.32-696.18.7.el6.x86_64.
>>>
>>> First mount after start of openibd fails. Attached 'first.txt' shows the
>>> log.
>>>
>>> A second mount succeeds ('second.txt'). The OSTs are slowly added after
>>> some timeouts. Everything seems to work after this.
>>>
>>> After this we can unmount and mount again and everything is normal.
>>> However, reloading the driver (restart openibd) the mount fails again.
>>>
>>> I'll have a go at CentOS 7.5 and contact Mellanox next.
>>>
>>> Cheers,
>>> Hans Henrik
>>>
>>>
>>> ___
>>> lustre-discuss mailing list
>>> lustre-discuss@lists.lustre.org
>>> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>>
>>
> ___
> lustre-discuss mailing list
> lustre-discuss@lists.lustre.org
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
> 

___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

[lustre-discuss] MOFED 4.4-1.0.0.0

2018-09-05 Thread Hans Henrik Happe

Hi,

Here's an update. Still no luck with CentOS 6.10 and MOFED 4.4-1.0.0.0.
The same with the new 4.4-2.0.7.0 and Lustre 2.10.5.

However, CentOS 7.5 works with 4.4-2.0.7.0 in my setup. We are testing
CentOS 7, but not ready to roll it out yet. It will be CentOS 6 until then.

Mellanox haven't been able to track it down. We might give them access
to debug this.

Anyone, using CentOS 6.10 and MOFED 4.4 successfully with Lustre?

Cheers,
Hans Henrik


On 04-08-2018 00:04, Riccardo Veraldi wrote:
> Hello,
> Lustre 2.10.4 works perfectly with Centos 7.5, I Tested it a lot on
> different cluster types.
> if you have a chance to move to Centos 7.5 that would be the perfect
> choice. It's always troublesome to mess up with Mellanox OFED.
> Cheers
> 
> Rick
> 
> On 8/3/18 4:53 AM, Hans Henrik Happe wrote:
>> Hi,
>>
>> Did anyone try Mellanox OFED 4.4-1.0.0.0?
>>
>> With Lustre 2.10.4 and CentOS 6.10 and 6.9 we have issues. Using CentOS
>> 6.9 and the previous supported version there are no problems (CentOS
>> 6.10 is not supported on the previous).
>>
>> We are using ConnectX-3 cards on kernel 2.6.32-696.18.7.el6.x86_64.
>>
>> First mount after start of openibd fails. Attached 'first.txt' shows the
>> log.
>>
>> A second mount succeeds ('second.txt'). The OSTs are slowly added after
>> some timeouts. Everything seems to work after this.
>>
>> After this we can unmount and mount again and everything is normal.
>> However, reloading the driver (restart openibd) the mount fails again.
>>
>> I'll have a go at CentOS 7.5 and contact Mellanox next.
>>
>> Cheers,
>> Hans Henrik
>>
>>
>> ___
>> lustre-discuss mailing list
>> lustre-discuss@lists.lustre.org
>> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
> 
> 
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

[lustre-discuss] MOFED 4.4-1.0.0.0

2018-08-03 Thread Hans Henrik Happe

Hi,

Did anyone try Mellanox OFED 4.4-1.0.0.0?

With Lustre 2.10.4 and CentOS 6.10 and 6.9 we have issues. Using CentOS
6.9 and the previous supported version there are no problems (CentOS
6.10 is not supported on the previous).

We are using ConnectX-3 cards on kernel 2.6.32-696.18.7.el6.x86_64.

First mount after start of openibd fails. Attached 'first.txt' shows the
log.

A second mount succeeds ('second.txt'). The OSTs are slowly added after
some timeouts. Everything seems to work after this.

After this we can unmount and mount again and everything is normal.
However, reloading the driver (restart openibd) the mount fails again.

I'll have a go at CentOS 7.5 and contact Mellanox next.

Cheers,
Hans Henrik
Aug  3 13:26:49 node578 kernel: LNet: HW NUMA nodes: 2, HW CPU cores: 64, 
npartitions: 2
Aug  3 13:26:49 node578 kernel: alg: No test for adler32 (adler32-zlib)
Aug  3 13:26:49 node578 kernel: alg: No test for crc32 (crc32-table)
Aug  3 13:26:49 node578 kernel: alg: No test for crc32 (crc32-pclmul)
Aug  3 13:26:50 node578 kernel: Lustre: Lustre: Build Version: 2.10.4
Aug  3 13:26:50 node578 kernel: LNet: Added LNI 10.21.205.78@o2ib [8/256/0/180]
Aug  3 13:26:53 node578 kernel: LNet: 
73616:0:(o2iblnd_cb.c:3192:kiblnd_check_conns()) Timed out tx for 
10.21.10.111@o2ib: 4478161 seconds
Aug  3 13:26:53 node578 kernel: Lustre: 
73626:0:(client.c:2114:ptlrpc_expire_one_request()) @@@ Request sent has failed 
due to network error: [sent 1533295610/real 1533295613]  req@885f76c0bc80 
x1607776977551376/t0(0) o250->MGC10.21.10.111@o2ib@10.21.10.111@o2ib:26/25 lens 
520/544 e 0 to 1 dl 1533295615 ref 1 fl Rpc:eXN/0/ rc 0/-1
Aug  3 13:26:56 node578 kernel: LustreError: 
73562:0:(mgc_request.c:251:do_config_log_add()) MGC10.21.10.111@o2ib: failed 
processing log, type 1: rc = -5
Aug  3 13:27:05 node578 kernel: LustreError: 
73703:0:(mgc_request.c:603:do_requeue()) failed processing log: -5
Aug  3 13:27:18 node578 kernel: LNet: 
73616:0:(o2iblnd_cb.c:3192:kiblnd_check_conns()) Timed out tx for 
10.21.10.111@o2ib: 4478186 seconds
Aug  3 13:27:18 node578 kernel: Lustre: 
73626:0:(client.c:2114:ptlrpc_expire_one_request()) @@@ Request sent has failed 
due to network error: [sent 1533295635/real 1533295638]  req@88bfa9bc3cc0 
x1607776977551440/t0(0) o250->MGC10.21.10.111@o2ib@10.21.10.111@o2ib:26/25 lens 
520/544 e 0 to 1 dl 1533295645 ref 1 fl Rpc:eXN/0/ rc 0/-1
Aug  3 13:27:27 node578 kernel: LustreError: 15c-8: MGC10.21.10.111@o2ib: The 
configuration from log 'hpc-client' failed (-5). This may be the result of 
communication errors between this node and the MGS, a bad configuration, or 
other errors. See the syslog for more information.
Aug  3 13:27:27 node578 kernel: Lustre: Unmounted hpc-client
Aug  3 13:27:27 node578 kernel: LustreError: 
73562:0:(obd_mount.c:1582:lustre_fill_super()) Unable to mount  (-5)
Aug  3 13:41:33 node578 kernel: Lustre: hpc: root_squash is set to 99:99
Aug  3 13:41:33 node578 kernel: Lustre: hpc: nosquash_nids set to 
172.20.1.10@tcp1 172.20.1.221@tcp1 172.20.1.71@tcp1 10.121.16.11@tcp1
Aug  3 13:41:39 node578 kernel: Lustre: 
73626:0:(client.c:2114:ptlrpc_expire_one_request()) @@@ Request sent has timed 
out for sent delay: [sent 1533296493/real 0]  req@885f98ec79c0 
x1607776977551760/t0(0) 
o38->hpc-MDT-mdc-885f887bf800@10.21.10.101@o2ib:12/10 lens 520/544 e 0 
to 1 dl 1533296498 ref 2 fl Rpc:XN/0/ rc 0/-1
Aug  3 13:42:51 node578 kernel: LNet: 
73616:0:(o2iblnd_cb.c:3192:kiblnd_check_conns()) Timed out tx for 
10.21.10.102@o2ib: 4479119 seconds
Aug  3 13:42:51 node578 kernel: Lustre: 
73626:0:(client.c:2114:ptlrpc_expire_one_request()) @@@ Request sent has failed 
due to network error: [sent 1533296568/real 1533296571]  req@88bf97e36cc0 
x1607776977551840/t0(0) 
o38->hpc-MDT-mdc-885f887bf800@10.21.10.102@o2ib:12/10 lens 520/544 e 0 
to 1 dl 1533296573 ref 1 fl Rpc:eXN/0/ rc 0/-1
Aug  3 13:43:14 node578 kernel: Lustre: Mounted hpc-client
Aug  3 13:43:16 node578 kernel: LustreError: 
73774:0:(llite_lib.c:1772:ll_statfs_internal()) obd_statfs fails: rc = -5
Aug  3 13:43:17 node578 kernel: LNet: 
73616:0:(o2iblnd_cb.c:3192:kiblnd_check_conns()) Timed out tx for 
10.21.10.112@o2ib: 4479145 seconds
Aug  3 13:43:17 node578 kernel: Lustre: 
73626:0:(client.c:2114:ptlrpc_expire_one_request()) @@@ Request sent has failed 
due to network error: [sent 1533296594/real 1533296597]  req@885f98ec7cc0 
x1607776977551904/t0(0) 
o8->hpc-OST0001-osc-885f887bf800@10.21.10.112@o2ib:28/4 lens 520/544 e 0 to 
1 dl 1533296599 ref 1 fl Rpc:eXN/0/ rc 0/-1
Aug  3 13:43:18 node578 kernel: LustreError: 
73775:0:(llite_lib.c:1772:ll_statfs_internal()) obd_statfs fails: rc = -5
Aug  3 13:43:19 node578 kernel: LNet: 
73616:0:(o2iblnd_cb.c:3192:kiblnd_check_conns()) Timed out tx for 
10.21.10.121@o2ib: 5 seconds
Aug  3 13:43:19 node578 kernel: LNet: 
73616:0:(o2iblnd_cb.c:3192:kiblnd_check_conns()) Skipped 3 previous similar 
messages
A

Re: [lustre-discuss] 2.10.0 CentOS6.9 ksoftirqd CPU load

2018-05-30 Thread Hans Henrik Happe

Happy to report that 2.10.4 fixed this issue.

Cheers,
Hans Henrik

On 06-04-2018 14:48, Hans Henrik Happe wrote:
> Just for the record. 2.11.0 has fixed this. Not sure which LU though.
> 
> Cheers,
> Hans Henrik
> 
> On 30-09-2017 23:43, Hans Henrik Happe wrote:
>> On 27-09-2017 18:50, Dilger, Andreas wrote:
>>> On Sep 26, 2017, at 01:10, Hans Henrik Happe  wrote:
>>>> Hi,
>>>>
>>>> Did anyone else experience CPU load from ksoftirqd after 'modprobe
>>>> lustre'? On an otherwise idle node I see:
>>>>
>>>>   PID USER  PR   NI VIRT  RES  SHR S %CPU  %MEM TIME+   COMMAND
>>>>     9 root  20   0 0    0    0 S 28.5  0.0  2:05.58 ksoftirqd/1
>>>>
>>>>
>>>>    57 root  20   0 0    0    0 R 23.9  0.0  2:22.91
>>>> ksoftirqd/13
>>>>
>>>> The sum of those two is about 50% CPU.
>>>>
>>>> I have narrowed it down to the ptlrpc module. When I remove that, it
>>>> stops.
>>>>
>>>> I also tested the 2.10.1-RC1, which is the same.
>>> If you can run "echo l > /proc/sysrq-trigger" it will report the
>>> processes
>>> that are currently running on the CPUs of your system to the console
>>> (and
>>> also /var/log/messages, if it can write everything in time).
>>>
>>> You might need to do this several times to get a representative
>>> sample of
>>> the ksoftirqd process stacks to see what they are doing that is
>>> consuming
>>> so much CPU.
>>>
>>> Alternately, "echo t > /proc/sysrq-trigger" will report the stacks of
>>> all
>>> processes to the console (and /v/l/m), but there will be a lot of them,
>>> and no better chance that it catches what ksoftirqd is doing 25% of
>>> the time.
>> I've attached the stacks. Some wakeup which I guess are initiated by
>> something in the ptlrpc code.
>>
>> Cheers,
>> Hans Henrik
>>
>>
>>
>>
>> ___
>> lustre-discuss mailing list
>> lustre-discuss@lists.lustre.org
>> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>>
> ___
> lustre-discuss mailing list
> lustre-discuss@lists.lustre.org
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

Re: [lustre-discuss] zfs has native dnode accounting supported... no

2018-05-17 Thread Hans Henrik Happe

Thanks Andreas,

The patch works for me.

Cheers,
Hans Henrik

On 16-05-2018 10:55, Dilger, Andreas wrote:
> On May 16, 2018, at 00:22, Hans Henrik Happe  wrote:
>>
>> When building 2.10.4-RC1 on CentOS 7.5 I noticed this during configure:
>>
>> zfs has native dnode accounting supported... no
>>
>> I'm using the kmod version of ZFS 0.7.9 from the official repos.
>> Shouldn't native dnode accounting work with these versions?
>>
>> Is there a way to detect if a Lustre filesystem is using native dnode
>> accounting?
> 
> This looks like a bug.  The Lustre code was changed to detect ZFS project
> quota (which has a different function signature in ZFS 0.8) but isn't
> included in the ZFS 0.7.x releases, but lost the ability to detect the old
> dnode accounting function signature.
> 
> I've pushed patch https://review.whamcloud.com/32418 that should fix this.
> 
> Cheers, Andreas
> --
> Andreas Dilger
> Lustre Principal Architect
> Intel Corporation
> 
> 
> 
> 
> 
> 
> 
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

[lustre-discuss] zfs has native dnode accounting supported... no

2018-05-15 Thread Hans Henrik Happe

Hi,

When building 2.10.4-RC1 on CentOS 7.5 I noticed this during configure:

zfs has native dnode accounting supported... no

I'm using the kmod version of ZFS 0.7.9 from the official repos.
Shouldn't native dnode accounting work with these versions?

Is there a way to detect if a Lustre filesystem is using native dnode
accounting?

Cheers,
Hans Henrik
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

Re: [lustre-discuss] Synchronous writes on a loaded ZFS OST

2018-05-15 Thread Hans Henrik Happe



On 15-05-2018 18:05, Steve Thompson wrote:
> On Tue, 8 May 2018, Vicker, Darby (JSC-EG311) wrote:
> 
>> The fix suggested by Andreas in that thread worked fairly well and we
>> continue to use it.
> 
> I'd like to inquire whether the fixes of LU-4009 will be available in a
> future Lustre version, and if so, when it is the likely release date. My
> installation (2.10.3) is severely affected by the fsync() issue, and I'd
> like to patch it as soon as possible. TIA,

Should be in LU-10460, which is targeted for 2.10.4.

Cheers,
Hans Henrik
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

Re: [lustre-discuss] Synchronous writes on a loaded ZFS OST

2018-05-08 Thread Hans Henrik Happe

In general Lustre is very stable. Metadata performance feels okay and we
only have one mdt on 6 SSDs (3-way mirror).

We had another issue that also are ZIL related:

http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/2016-May/013500.html

Cheers,
Hans Henrik



On 08-05-2018 21:21, Riccardo Veraldi wrote:
> I Was considering Lustre for my home dirs but I am quite frightened to
> see you have problems.
> How is the overall performance are you happy ?
> thanks
> 
> Rick
> 
> 
> On 5/8/18 5:56 AM, Hans Henrik Happe wrote:
>> Hi,
>>
>> We had some users experiencing slow vim (the editor) updates on our
>> Lustre homedirs. Turns out vim is doing some fsyncs that does not play
>> well with a loaded ZFS OST.
>>
>> We tried testing with ioping, which does synced writes (like dd with
>> conv=fdatasync). When an OST is loaded (i.e. scrubbing) the ioping time
>> is multiple seconds (5-10). Without load we get 100-300ms, which still
>> is far from what a ZFS fs can deliver.
>>
>> To test if a ZFS fs also would be affected we created a test fs on the
>> OST pool and ran ioping*. With or without a scrub running, the ping
>> times averaged at around 40ms.
>>
>> Has anyone else experienced this? Can it be helped?
>>
>> Cheers,
>> Hans Henrik
>>
>> * Used -WWW because ioping -W runs on an unlinked file and ZFS will not
>> sync those to disk.
>> ___
>> lustre-discuss mailing list
>> lustre-discuss@lists.lustre.org
>> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
> 
> 


___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

[lustre-discuss] Synchronous writes on a loaded ZFS OST

2018-05-08 Thread Hans Henrik Happe

Hi,

We had some users experiencing slow vim (the editor) updates on our
Lustre homedirs. Turns out vim is doing some fsyncs that does not play
well with a loaded ZFS OST.

We tried testing with ioping, which does synced writes (like dd with
conv=fdatasync). When an OST is loaded (i.e. scrubbing) the ioping time
is multiple seconds (5-10). Without load we get 100-300ms, which still
is far from what a ZFS fs can deliver.

To test if a ZFS fs also would be affected we created a test fs on the
OST pool and ran ioping*. With or without a scrub running, the ping
times averaged at around 40ms.

Has anyone else experienced this? Can it be helped?

Cheers,
Hans Henrik

* Used -WWW because ioping -W runs on an unlinked file and ZFS will not
sync those to disk.
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

Re: [lustre-discuss] LNET Multi-rail

2018-04-10 Thread Hans Henrik Happe


Thanks for the info. A few observations I found so far:

- I think LU-10297 has solved my stability issues.
- lustre.conf does work with comma separation of interfaces. I.e. 
o2ib(ib0,ib1). However, peers need to be configured with ldev.conf or 
lnetctl.
- Defining peering ('lnetctl peer add' and ARP settings) on the client 
only, seems to make  multi-rail work both ways.


I'm a bit puzzled by the last observation. I expected that both ends 
needed to define peers? The client NID does not show as multi-rail 
(lnetctl peer show) on the server.


Cheers,
Hans Henrik

On 14-03-2018 03:00, Riccardo Veraldi wrote:

it works for me but you have to set up correctly lnet.conf either
manually or using  lnetctl to add peers. Then you export your
configuration in lnet.conf
and it will be loaded at reboot. I had to add my peers manually, I think
peer auto discovery is not yet operational on 2.10.3.
I suppose you are not using anymore lustre.conf to configure interfaces
(ib,tcp) and that you are using the new Lustre DLC style:

http://wiki.lustre.org/Dynamic_LNET_Configuration

Also I do not know if you did this yet but you should configure ARP
settings and also rt_tables for your ib interfaces if you use multi-rail.
Here is an example. I had to do that to have things working properly:

https://wiki.hpdd.intel.com/display/LNet/MR+Cluster+Setup

You may also want to check that your IB interfaces (if you have a dual
port infiniband like I have) can really double the performance when you
enable both of them.
The infiniband PCIe card bandwidth has to be capable of feeding enough
traffic to both dual ports or it will just be useful as a fail over device,
without improving the speed as you may want to.

In my configuration fail over is working. If I disconnect one port, the
other will still work. Of course if you disconnect it when traffic is
going through
you may have a problem with that stream of data. But new traffic will be
handled correctly. I do not know if there is a way to avoid this, I am
just talking about my experience and as I said I Am more interested in
performance than fail over.


Riccardo


On 3/13/18 8:05 AM, Hans Henrik Happe wrote:

Hi,

I'm testing LNET multi-rail with 2.10.3 and I ran into some questions
that I couldn't find in the documentation or elsewhere.

As I understand the design document "Dynamic peer discovery" will make
it possible to discover multi-rail peer without adding them manually?
Is that functionality in 2.10.3?

Will failover work without doing anything special? I've tested with
two IB ports and unplugging resulted in no I/O from client and
replugging didn't resolve it.

How do I make and active/passive setup? One example I would really
like to see in the documentation, is the obvious o2ib-tcp combination,
where tcp is used if o2ib is down and fails back if it comes op again.

Anyone using MR in production? Done at bit of testing with dual ib on
both server and client and had a few crashes.

Cheers,
Hans Henrik
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org





___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

Re: [lustre-discuss] 2.10.0 CentOS6.9 ksoftirqd CPU load

2018-04-06 Thread Hans Henrik Happe


Just for the record. 2.11.0 has fixed this. Not sure which LU though.

Cheers,
Hans Henrik

On 30-09-2017 23:43, Hans Henrik Happe wrote:

On 27-09-2017 18:50, Dilger, Andreas wrote:

On Sep 26, 2017, at 01:10, Hans Henrik Happe  wrote:

Hi,

Did anyone else experience CPU load from ksoftirqd after 'modprobe
lustre'? On an otherwise idle node I see:

  PID USER  PR   NI VIRT  RES  SHR S %CPU  %MEM TIME+   COMMAND
9 root  20   0 000 S 28.5  0.0  2:05.58 ksoftirqd/1


   57 root  20   0 000 R 23.9  0.0  2:22.91 ksoftirqd/13

The sum of those two is about 50% CPU.

I have narrowed it down to the ptlrpc module. When I remove that, it stops.

I also tested the 2.10.1-RC1, which is the same.

If you can run "echo l > /proc/sysrq-trigger" it will report the processes
that are currently running on the CPUs of your system to the console (and
also /var/log/messages, if it can write everything in time).

You might need to do this several times to get a representative sample of
the ksoftirqd process stacks to see what they are doing that is consuming
so much CPU.

Alternately, "echo t > /proc/sysrq-trigger" will report the stacks of all
processes to the console (and /v/l/m), but there will be a lot of them,
and no better chance that it catches what ksoftirqd is doing 25% of the time.
I've attached the stacks. Some wakeup which I guess are initiated by 
something in the ptlrpc code.


Cheers,
Hans Henrik




___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

[lustre-discuss] LNET Multi-rail

2018-03-13 Thread Hans Henrik Happe


Hi,

I'm testing LNET multi-rail with 2.10.3 and I ran into some questions 
that I couldn't find in the documentation or elsewhere.


As I understand the design document "Dynamic peer discovery" will make 
it possible to discover multi-rail peer without adding them manually? Is 
that functionality in 2.10.3?


Will failover work without doing anything special? I've tested with two 
IB ports and unplugging resulted in no I/O from client and replugging 
didn't resolve it.


How do I make and active/passive setup? One example I would really like 
to see in the documentation, is the obvious o2ib-tcp combination, where 
tcp is used if o2ib is down and fails back if it comes op again.


Anyone using MR in production? Done at bit of testing with dual ib on 
both server and client and had a few crashes.


Cheers,
Hans Henrik
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

Re: [lustre-discuss] mgsnode notation in mkfs and tunefs

2018-03-05 Thread Hans Henrik Happe

Hi,

I've recently experienced the same. To be honest, I can't remember how I 
used to do it. This way worked (2.10.3):

tunefs.lustre --erase-param --writeconf --param=mgsnode=::...

Cheers,
Hans Henrik

On 02-03-2018 15:19, Thomas Roth wrote:

Hi all,

(we are now on Lustre 2.10.2.)
It seems there is still a difference in how to declare --mgsnode between 
mkfs.lustre and tunefs.lustre.

For an OST, I did:

 > mkfs.lustre --ost --backfstype=zfs  
--mgsnode=10.20.3.0@o2ib5:10.20.3.1@o2ib5 --... osspool0/ost0

This OST mounts, is usable, all fine.

Then I had to writeconf, and out of tradition, added --erase-params to 
the command - so I had to add the mgsnodes as well:

 > tunefs.lustre --erase-param --writeconf 
--mgsnode=10.20.3.0@o2ib5:10.20.3.1@o2ib5 osspool0/ost0

This did not mount, so I repeated this and other writeconfs, and ended 
up with

 > tunefs.lustre --dryrun  osspool0/ost0
checking for existing Lustre data: found

    Read previous values:
Target: hebe-OST
Index:  0
Lustre FS:  hebe
Mount type: zfs
Flags:  0x142
   (OST update writeconf )
Persistent mount opts:
Parameters: 
mgsnode=10.20.3.0@o2ib5:10.20.3.1@o2ib5:10.20.3.0@o2ib5:10.20.3.1@o2ib5:10.20.3.0@o2ib5:10.20.3.1@o2ib5 

Seems I could have added more and more mgsnodes, and never gotten a good 
OST ;-)

I repaired this by:

 > tunefs.lustre --erase-param mgsnode   osspool0/ost0
 > tunefs.lustre --writeconf  --mgsnode=10.20.3.0@o2ib5 
--mgsnode=10.20.3.1@o2ib5 osspool0/ost0

This cluster is not yet in use - but if this happens with a production 
system, when --writeconf is anyhow your last resort - uncomfortable 
situation.

Cheers
Thomas
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

Re: [lustre-discuss] BAD CHECKSUM

2017-12-11 Thread Hans Henrik Happe

On 10-12-2017 06:07, Dilger, Andreas wrote:
> Based on the messages on the client, this isn’t related to mmap() or
> writes done by the client, since the data has the same checksum from
> before it was sent and after it got the checksum error returned from
> the server. That means the pages did not change on the client.
>
> Possible causes include the client network card, server network card,
> memory, or possibly the OFED driver?  It could of course be something
> in Lustre/LNet, though we haven’t had any reports of anything similar. 
>
> When the checksum code was first written, it was motivated by a faulty
> Ethernet NIC that had TCP checksum offload, but bad onboard cache, and
> the data was corrupted when copied onto the NIC but the TCP checksum
> was computed on the bad data and the checksum was “correct” when
> received by the server, so it didn’t cause TCP resends. 
>
> Are you seeing this on multiple servers?  The client log only shows
> one server, while the server log shows multiple clients.  If it is
> only happening on one server it might point to hardware. 
Yes, we are seeing it on all servers.
> Did you also upgrade the kernel and OFED at the same time as Lustre?
> You could try building Lustre 2.10.1 on the old 2.9.0 kernel and OFED
> to see if that works properly.
We upgraded to CentOS 7.4 and are using the included OFED on the
servers. Also, we upgraded the firmware on the server IB cards. We will
check further if this combination has compatibility issues.

Cheers,
Hans Henrik
>
> Cheers, Andreas
>
> On Dec 9, 2017, at 11:09, Hans Henrik Happe  <mailto:ha...@nbi.dk>> wrote:
>
>>
>>
>> On 09-12-2017 18:57, Hans Henrik Happe wrote:
>>> On 07-12-2017 21:36, Dilger, Andreas wrote:
>>>> On Dec 7, 2017, at 10:37, Hans Henrik Happe >>> <mailto:ha...@nbi.dk>> wrote:
>>>>> Hi,
>>>>>
>>>>> Can an application cause BAD CHECKSUM errors in Lustre logs by somehow
>>>>> overwriting memory while being DMA'ed to network?
>>>>>
>>>>> After upgrading to 2.10.1 on the server side we started seeing
>>>>> this from
>>>>> a user's application (MPI I/O). Both 2.9.0 and 2.10.1 clients emit
>>>>> these
>>>>> errors. We have not yet established weather the application is doing
>>>>> things correctly.
>>>> If applications are using mmap IO it is possible for the page to
>>>> become inconsistent after the checksum has been computed.  However,
>>>> mmap IO is
>>>> normally detected by the client and no message should be printed.
>>>>
>>>> There isn't anything that the application needs to do, since the
>>>> client will resend the data if there is a checksum error, but the
>>>> resends do slow down the IO.  If the inconsistency is on the
>>>> client, there is no cause for concern (though it would be good to
>>>> figure out the root cause).
>>>>
>>>> It would be interesting to see what the exact error message is,
>>>> since that will say whether the data became inconsistent on the
>>>> client, or over the network.  If the inconsistency is over the
>>>> network or on the server, then that may point to hardware issues.
>>> I've attached logs from a server and a client.
>>
>> There was a cut n' paste error in the first set of files. This should be
>> better.
>>
>> Looks like a something goes wrong over the network.
>>
>> Cheers,
>> Hans Henrik
>>
>> 
>> 
>> ___
>> lustre-discuss mailing list
>> lustre-discuss@lists.lustre.org <mailto:lustre-discuss@lists.lustre.org>
>> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

Re: [lustre-discuss] BAD CHECKSUM

2017-12-09 Thread Hans Henrik Happe



On 09-12-2017 18:57, Hans Henrik Happe wrote:
> On 07-12-2017 21:36, Dilger, Andreas wrote:
>> On Dec 7, 2017, at 10:37, Hans Henrik Happe  wrote:
>>> Hi,
>>>
>>> Can an application cause BAD CHECKSUM errors in Lustre logs by somehow
>>> overwriting memory while being DMA'ed to network?
>>>
>>> After upgrading to 2.10.1 on the server side we started seeing this from
>>> a user's application (MPI I/O). Both 2.9.0 and 2.10.1 clients emit these
>>> errors. We have not yet established weather the application is doing
>>> things correctly.
>> If applications are using mmap IO it is possible for the page to become 
>> inconsistent after the checksum has been computed.  However, mmap IO is
>> normally detected by the client and no message should be printed.
>>
>> There isn't anything that the application needs to do, since the client will 
>> resend the data if there is a checksum error, but the resends do slow down 
>> the IO.  If the inconsistency is on the client, there is no cause for 
>> concern (though it would be good to figure out the root cause).
>>
>> It would be interesting to see what the exact error message is, since that 
>> will say whether the data became inconsistent on the client, or over the 
>> network.  If the inconsistency is over the network or on the server, then 
>> that may point to hardware issues.
> I've attached logs from a server and a client.

There was a cut n' paste error in the first set of files. This should be
better.

Looks like a something goes wrong over the network.

Cheers,
Hans Henrik

Dec  7 13:53:02 node830 kernel: LustreError: 132-0: astro-OST-osc-881072dbf400: BAD WRITE CHECKSUM: changed in transit before arrival at OST: from 10.21.10.114@o2ib inode [0x2000135a4:0x169:0x0] object 0x0:55153652 extent [3157348640-3158396927], original client csum 7505b09c (type 4), server csum 602d88a8 (type 4), client csum now 7505b09c
Dec  7 13:53:02 node830 kernel: LustreError: Skipped 1 previous similar message
Dec  7 13:53:02 node830 kernel: LustreError: 14576:0:(osc_request.c:1611:osc_brw_redo_request()) @@@ redo for recoverable error -11  req@880b57ce7080 x1585957881290896/t30079750448(30079750448) o4->astro-OST-osc-881072dbf400@10.21.10.114@o2ib:6/4 lens 608/416 e 0 to 0 dl 1512651188 ref 2 fl Interpret:RM/0/0 rc 0/0
Dec  7 13:53:19 node830 kernel: LustreError: 132-0: astro-OST-osc-881072dbf400: BAD WRITE CHECKSUM: changed in transit before arrival at OST: from 10.21.10.114@o2ib inode [0x2000135a4:0x169:0x0] object 0x0:55153652 extent [316818-3169226751], original client csum d0df79c4 (type 4), server csum 1f1a7bf (type 4), client csum now d0df79c4
Dec  7 13:53:19 node830 kernel: LustreError: Skipped 25 previous similar messages
Dec  7 13:53:19 node830 kernel: LustreError: 14565:0:(osc_request.c:1611:osc_brw_redo_request()) @@@ redo for recoverable error -11  req@88106fc51c80 x1585957881291856/t30079752084(30079752084) o4->astro-OST-osc-881072dbf400@10.21.10.114@o2ib:6/4 lens 608/416 e 0 to 0 dl 1512651243 ref 2 fl Interpret:RM/0/0 rc 0/0
Dec  7 13:53:19 node830 kernel: LustreError: 14565:0:(osc_request.c:1611:osc_brw_redo_request()) Skipped 25 previous similar messages
Dec  7 13:53:59 node830 kernel: LustreError: 132-0: astro-OST-osc-881072dbf400: BAD WRITE CHECKSUM: changed in transit before arrival at OST: from 10.21.10.114@o2ib inode [0x2000135a4:0x169:0x0] object 0x0:55153652 extent [3157348640-3158396927], original client csum 7505b09c (type 4), server csum 120df09a (type 4), client csum now 7505b09c
Dec  7 13:53:59 node830 kernel: LustreError: Skipped 23 previous similar messages
Dec  7 13:53:59 node830 kernel: LustreError: 14569:0:(osc_request.c:1735:brw_interpret()) astro-OST-osc-881072dbf400: too many resent retries for object: 0:55153652, rc = -11.
Dec  7 13:54:00 node830 kernel: LustreError: 14561:0:(osc_request.c:1611:osc_brw_redo_request()) @@@ redo for recoverable error -11  req@880b57ce7080 x1585957881292880/t30079752266(30079752266) o4->astro-OST-osc-881072dbf400@10.21.10.114@o2ib:6/4 lens 608/416 e 0 to 0 dl 1512651284 ref 2 fl Interpret:RM/0/0 rc 0/0
Dec  7 13:54:00 node830 kernel: LustreError: 14561:0:(osc_request.c:1611:osc_brw_redo_request()) Skipped 23 previous similar messages
Dec  7 13:54:01 node830 kernel: LustreError: 14573:0:(osc_request.c:1735:brw_interpret()) astro-OST-osc-881072dbf400: too many resent retries for object: 0:55153652, rc = -11.
Dec  7 13:54:01 node830 kernel: LustreError: 14573:0:(osc_request.c:1735:brw_interpret()) Skipped 3 previous similar messages
Dec  7 13:54:58 node830 kernel: LustreError: 14561:0:(osc_request.c:1735:brw_interpret()) astro-OST-osc-881072dbf400: too many resent retries for object: 0:55153652, rc = -11.
Dec  7 13:55:03 n

Re: [lustre-discuss] BAD CHECKSUM

2017-12-09 Thread Hans Henrik Happe

On 07-12-2017 21:36, Dilger, Andreas wrote:
> On Dec 7, 2017, at 10:37, Hans Henrik Happe  wrote:
>> Hi,
>>
>> Can an application cause BAD CHECKSUM errors in Lustre logs by somehow
>> overwriting memory while being DMA'ed to network?
>>
>> After upgrading to 2.10.1 on the server side we started seeing this from
>> a user's application (MPI I/O). Both 2.9.0 and 2.10.1 clients emit these
>> errors. We have not yet established weather the application is doing
>> things correctly.
> If applications are using mmap IO it is possible for the page to become 
> inconsistent after the checksum has been computed.  However, mmap IO is
> normally detected by the client and no message should be printed.
>
> There isn't anything that the application needs to do, since the client will 
> resend the data if there is a checksum error, but the resends do slow down 
> the IO.  If the inconsistency is on the client, there is no cause for concern 
> (though it would be good to figure out the root cause).
>
> It would be interesting to see what the exact error message is, since that 
> will say whether the data became inconsistent on the client, or over the 
> network.  If the inconsistency is over the network or on the server, then 
> that may point to hardware issues.
I've attached logs from a server and a client.

Cheers,
Hans Henrik
Dec  7 13:53:02 node830 kernel: LustreError: 132-0: astro-OST-osc-881072dbf400: BAD WRITE CHECKSUM: changed in transit before arrival at OST: from 10.21.10.114@o2ib inode [0x2000135a4:0x169:0x0] object 0x0:55153652 extent [3157348640-3158396927 c
Dec  7 13:53:02 node830 kernel: LustreError: Skipped 1 previous similar message
Dec  7 13:53:02 node830 kernel: LustreError: 14576:0:(osc_request.c:1611:osc_brw_redo_request()) @@@ redo for recoverable error -11  req@880b57ce7080 x1585957881290896/t30079750448(30079750448) o4->astro-OST-osc-881072dbf400@10.21.10.114@o2:0
Dec  7 13:53:19 node830 kernel: LustreError: 132-0: astro-OST-osc-881072dbf400: BAD WRITE CHECKSUM: changed in transit before arrival at OST: from 10.21.10.114@o2ib inode [0x2000135a4:0x169:0x0] object 0x0:55153652 extent [316818-3169226751 4
Dec  7 13:53:19 node830 kernel: LustreError: Skipped 25 previous similar messages
Dec  7 13:53:19 node830 kernel: LustreError: 14565:0:(osc_request.c:1611:osc_brw_redo_request()) @@@ redo for recoverable error -11  req@88106fc51c80 x1585957881291856/t30079752084(30079752084) o4->astro-OST-osc-881072dbf400@10.21.10.114@o2:0
Dec  7 13:53:19 node830 kernel: LustreError: 14565:0:(osc_request.c:1611:osc_brw_redo_request()) Skipped 25 previous similar messages
Dec  7 13:53:59 node830 kernel: LustreError: 132-0: astro-OST-osc-881072dbf400: BAD WRITE CHECKSUM: changed in transit before arrival at OST: from 10.21.10.114@o2ib inode [0x2000135a4:0x169:0x0] object 0x0:55153652 extent [3157348640-3158396927 c
Dec  7 13:53:59 node830 kernel: LustreError: Skipped 23 previous similar messages
Dec  7 13:53:59 node830 kernel: LustreError: 14569:0:(osc_request.c:1735:brw_interpret()) astro-OST-osc-881072dbf400: too many resent retries for object: 0:55153652, rc = -11.
Dec  7 13:54:00 node830 kernel: LustreError: 14561:0:(osc_request.c:1611:osc_brw_redo_request()) @@@ redo for recoverable error -11  req@880b57ce7080 x1585957881292880/t30079752266(30079752266) o4->astro-OST-osc-881072dbf400@10.21.10.114@o2:0
Dec  7 13:54:00 node830 kernel: LustreError: 14561:0:(osc_request.c:1611:osc_brw_redo_request()) Skipped 23 previous similar messages
Dec  7 13:54:01 node830 kernel: LustreError: 14573:0:(osc_request.c:1735:brw_interpret()) astro-OST-osc-881072dbf400: too many resent retries for object: 0:55153652, rc = -11.
Dec  7 13:54:01 node830 kernel: LustreError: 14573:0:(osc_request.c:1735:brw_interpret()) Skipped 3 previous similar messages
Dec  7 13:54:58 node830 kernel: LustreError: 14561:0:(osc_request.c:1735:brw_interpret()) astro-OST-osc-881072dbf400: too many resent retries for object: 0:55153652, rc = -11.
Dec  7 13:55:03 node830 kernel: LustreError: 132-0: astro-OST-osc-881072dbf400: BAD WRITE CHECKSUM: changed in transit before arrival at OST: from 10.21.10.114@o2ib inode [0x2000135a4:0x169:0x0] object 0x0:55153652 extent [3146517280-3147563007 a
Dec  7 13:55:03 node830 kernel: LustreError: Skipped 46 previous similar messages
Dec  7 13:55:05 node830 kernel: LustreError: 14559:0:(osc_request.c:1611:osc_brw_redo_request()) @@@ redo for recoverable error -11  req@8808b43210c0 x1585957881295616/t30079754077(30079754077) o4->astro-OST-osc-881072dbf400@10.21.10.114@o2:0
Dec  7 13:55:05 node830 kernel: LustreError: 14559:0:(osc_request.c:1611:osc_brw_redo_request()) Skipped 41 previous similar messages
Dec  7 13:55:56 node830 kernel: LustreError: 14560:0:(osc_request.c:1735:brw_interpret

[lustre-discuss] BAD CHECKSUM

2017-12-07 Thread Hans Henrik Happe

Hi,

Can an application cause BAD CHECKSUM errors in Lustre logs by somehow
overwriting memory while being DMA'ed to network?

After upgrading to 2.10.1 on the server side we started seeing this from
a user's application (MPI I/O). Both 2.9.0 and 2.10.1 clients emit these
errors. We have not yet established weather the application is doing
things correctly.

Cheers,
Hans Henrik
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

Re: [lustre-discuss] 2.10.0 CentOS6.9 ksoftirqd CPU load

2017-09-30 Thread Hans Henrik Happe

On 27-09-2017 18:50, Dilger, Andreas wrote:
> On Sep 26, 2017, at 01:10, Hans Henrik Happe  wrote:
>> Hi,
>>
>> Did anyone else experience CPU load from ksoftirqd after 'modprobe
>> lustre'? On an otherwise idle node I see:
>>
>>  PID USER  PR   NI VIRT  RES  SHR S %CPU  %MEM TIME+   COMMAND
>>9 root  20   0 000 S 28.5  0.0  2:05.58 ksoftirqd/1
>>
>>
>>   57 root  20   0 000 R 23.9  0.0  2:22.91 ksoftirqd/13
>>
>> The sum of those two is about 50% CPU.
>>
>> I have narrowed it down to the ptlrpc module. When I remove that, it stops.
>>
>> I also tested the 2.10.1-RC1, which is the same.
> If you can run "echo l > /proc/sysrq-trigger" it will report the processes
> that are currently running on the CPUs of your system to the console (and
> also /var/log/messages, if it can write everything in time).
>
> You might need to do this several times to get a representative sample of
> the ksoftirqd process stacks to see what they are doing that is consuming
> so much CPU.
>
> Alternately, "echo t > /proc/sysrq-trigger" will report the stacks of all
> processes to the console (and /v/l/m), but there will be a lot of them,
> and no better chance that it catches what ksoftirqd is doing 25% of the time.
I've attached the stacks. Some wakeup which I guess are initiated by
something in the ptlrpc code.

Cheers,
Hans Henrik


Sep 30 23:26:01 node033 kernel: NMI backtrace for cpu 11
Sep 30 23:26:01 node033 kernel: CPU 11 Modules linked in: lustre(U) lov(U) mdc(U) fid(U) lmv(U) fld(U) ksocklnd(U) ptlrpc(U) obdclass(U) lnet(U) libcfs(U) nfs lockd fscache auth_rpcgss nfs_acl mmfs(U) mmfslinux(U) tracedev(U) autofs4 sha512_generic cr
c32c_intel sunrpc acpi_cpufreq freq_table mperf bonding ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr ipv6 xfs exportfs ext2 ipmi_devintf ipmi_si ipmi_msghandler microcode iTCO_wdt iTCO_vendor_support raid0 i
gb dca i2c_algo_bit i2c_core raid1 ptp pps_core sg serio_raw lpc_ich mfd_core i7core_edac edac_core shpchp ext4 jbd2 mbcache raid456 async_raid6_recov async_pq raid6_pq async_xor xor async_memcpy async_tx sd_mod crc_t10dif ahci hpsa dm_mirror dm_region_h
ash dm_log dm_mod [last unloaded: libcfs]
Sep 30 23:26:01 node033 kernel: 
Sep 30 23:26:01 node033 kernel: Pid: 0, comm: swapper Not tainted 2.6.32-696.6.3.el6.x86_64 #1 HP ProLiant DL170e G6  /ProLiant DL170e G6  
Sep 30 23:26:01 node033 kernel: RIP: 0010:[]  [] intel_idle+0xd1/0x1b0
Sep 30 23:26:01 node033 kernel: RSP: 0018:880c2d067e48  EFLAGS: 0046
Sep 30 23:26:01 node033 kernel: RAX: 0020 RBX: 0008 RCX: 0001
Sep 30 23:26:01 node033 kernel: RDX:  RSI: 880c2d067fd8 RDI: 81a98580
Sep 30 23:26:01 node033 kernel: RBP: 880c2d067ed8 R08: 0005 R09: 00c8
Sep 30 23:26:01 node033 kernel: R10: 0011f630c009c747 R11: 880c2d067e68 R12: 0004
Sep 30 23:26:01 node033 kernel: R13: 880c6a6beb40 R14: 0020 R15: 14e940c25ce4ad4a
Sep 30 23:26:01 node033 kernel: FS:  () GS:880c6a6a() knlGS:
Sep 30 23:26:01 node033 kernel: CS:  0010 DS: 0018 ES: 0018 CR0: 8005003b
Sep 30 23:26:01 node033 kernel: CR2: 7fe396313000 CR3: 01a8d000 CR4: 07e0
Sep 30 23:26:01 node033 kernel: DR0:  DR1:  DR2: 
Sep 30 23:26:01 node033 kernel: DR3:  DR6: 0ff0 DR7: 0400
Sep 30 23:26:01 node033 kernel: Process swapper (pid: 0, threadinfo 880c2d064000, task 880c2d052040)
Sep 30 23:26:01 node033 kernel: Stack:
Sep 30 23:26:01 node033 kernel: 88182d471020 880c6a6b1b80 000b 0011f630b5b0c60b
Sep 30 23:26:01 node033 kernel: 880c0002 0282  804f
Sep 30 23:26:01 node033 kernel: 880c2d067ed8 81444bb4  000b0a59013c
Sep 30 23:26:01 node033 kernel: Call Trace:
Sep 30 23:26:01 node033 kernel: [] ? menu_select+0x174/0x390
Sep 30 23:26:01 node033 kernel: [] cpuidle_idle_call+0x7a/0xe0
Sep 30 23:26:01 node033 kernel: [] cpu_idle+0xb6/0x110
Sep 30 23:26:01 node033 kernel: [] start_secondary+0x2c0/0x316
Sep 30 23:26:01 node033 kernel: Code: ff ff a8 08 75 25 31 d2 48 8d 86 38 c0 ff ff 48 89 d1 0f 01 c8 0f ae f0 48 8b 86 38 c0 ff ff a8 08 75 08 b1 01 4c 89 f0 0f 01 c9  5a 3f db ff 4c 29 f8 48 89 c7 e8 7f 61 d8 ff 49 89 c6 49 89 
Sep 30 23:26:01 node033 kernel: Call Trace:
Sep 30 23:26:01 node033 kernel: <#DB[1]>  <> Pid: 0, comm: swapper Not tainted 2.6.32-696.6.3.el6.x86_64 #1
Sep 30 23:26:01 node033 kernel: Call Trace:
Sep 30 23:26:01 node033 kernel:   [] ? show_regs+0x27/0x30
Sep 30 23:26:01 node033 kernel: [] ? arch_trigger_all_cpu_backtra

[lustre-discuss] 2.10.0 CentOS6.9 ksoftirqd CPU load

2017-09-26 Thread Hans Henrik Happe

Hi,

Did anyone else experience CPU load from ksoftirqd after 'modprobe
lustre'? On an otherwise idle node I see:

  PID USER  PR   NI VIRT  RES  SHR S %CPU  %MEM TIME+   COMMAND
9 root  20   0 000 S 28.5  0.0  2:05.58 ksoftirqd/1


   57 root  20   0 000 R 23.9  0.0  2:22.91 ksoftirqd/13

The sum of those two is about 50% CPU.

I have narrowed it down to the ptlrpc module. When I remove that, it stops.

I also tested the 2.10.1-RC1, which is the same.

Cheers,
Hans Henrik
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

Re: [lustre-discuss] Lustre/ZFS space accounting

2017-06-14 Thread Hans Henrik Happe

On 09-06-2017 18:06, Dilger, Andreas wrote:
> The error 28 on close may also be out of space (28 == ENOSPC). 
> 
> How many clients on your system?

~240 clients.

> I would recommend to use find/lfs find to locate some of the larger files on 
> OST0002 and lfs_migrate them to other OSTs. 

Of list, it was suggested to look at LU-2049 (Thanks). Could that be it.
I guess making more free space would help.

Cheers,
Hans Henrik

>> On Jun 9, 2017, at 01:27, Hans Henrik Happe  wrote:
>>
>> Hi,
>>
>> We have ruled that out by monitoring use. It is happening during
>> checkpointing. So a continuing process were old checkpoints get deleted
>> after new ones are made. There are many checkpoints before
>>
>> I messed things up in my first mail, so it wasn't clear why I talked
>> about space. Sometimes they they just get this (first number is MPI rank):
>>
>> 222: forrtl: Input/output error
>> 222: forrtl: severe (28): CLOSE error, unit 10, file "Unknown"
>>
>> Sometimes they get:
>>
>> 33: forrtl: No space left on device
>> 14: forrtl: No space left on device
>> 08: forrtl: Input/output error
>> 08: forrtl: severe (28): CLOSE error, unit 10, file "Unknown"
>>
>> Info: We have a ZFS snapshot of the osts and mdt. It's ZFS 0.6.5.7.
>>
>> Cheers,
>> Hans Henrik
>>
>>> On 09-06-2017 08:41, Thomas Roth wrote:
>>> Hi,
>>>
>>> I don't know about the error messages. But are you sure that the
>>> imbalance of the OST filling isn't due to some extremely large files
>>> written overnight or so (- with default striping, one file -> one OST).
>>> Our users are able to do that, without realizing.
>>>
>>> Regards,
>>> Thomas
>>>
>>>> On 08.06.2017 10:11, Hans Henrik Happe wrote:
>>>> Hi,
>>>>
>>>> We are on Lustre 2.8 with ZFS.
>>>>
>>>> Our users have seen some unexplainable errors:
>>>>
>>>> 062: forrtl: Input/output error
>>>>
>>>> Or
>>>>
>>>> 062: forrtl: severe (28): CLOSE error, unit 10, file “Unknown"
>>>>
>>>>
>>>> From attached 'lfs df -h' you can see that the OSTs are unbalanced and
>>>> OST0001 but far from being full. We are using default allocation setting
>>>> so we should be in weighted mode.
>>>>
>>>> I've tried to find an LU matching this but no luck. Also, log on
>>>> affected nodes and on servers are empty.
>>>>
>>>> Any suggestions about how to debug this?
>>>>
>>>> Cheers,
>>>> Hans Henrik
>>>>
>>>>
>>>>
>>>> ___
>>>> lustre-discuss mailing list
>>>> lustre-discuss@lists.lustre.org
>>>> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>>>>
>>>
>> ___
>> lustre-discuss mailing list
>> lustre-discuss@lists.lustre.org
>> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org





signature.asc
Description: OpenPGP digital signature
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

Re: [lustre-discuss] Lustre/ZFS space accounting

2017-06-09 Thread Hans Henrik Happe

Hi,

We have ruled that out by monitoring use. It is happening during
checkpointing. So a continuing process were old checkpoints get deleted
after new ones are made. There are many checkpoints before

I messed things up in my first mail, so it wasn't clear why I talked
about space. Sometimes they they just get this (first number is MPI rank):

222: forrtl: Input/output error
222: forrtl: severe (28): CLOSE error, unit 10, file "Unknown"

Sometimes they get:

33: forrtl: No space left on device
14: forrtl: No space left on device
08: forrtl: Input/output error
08: forrtl: severe (28): CLOSE error, unit 10, file "Unknown"

Info: We have a ZFS snapshot of the osts and mdt. It's ZFS 0.6.5.7.

Cheers,
Hans Henrik

On 09-06-2017 08:41, Thomas Roth wrote:
> Hi,
> 
> I don't know about the error messages. But are you sure that the
> imbalance of the OST filling isn't due to some extremely large files
> written overnight or so (- with default striping, one file -> one OST).
> Our users are able to do that, without realizing.
> 
> Regards,
> Thomas
> 
> On 08.06.2017 10:11, Hans Henrik Happe wrote:
>> Hi,
>>
>> We are on Lustre 2.8 with ZFS.
>>
>> Our users have seen some unexplainable errors:
>>
>> 062: forrtl: Input/output error
>>
>> Or
>>
>> 062: forrtl: severe (28): CLOSE error, unit 10, file “Unknown"
>>
>>
>>  From attached 'lfs df -h' you can see that the OSTs are unbalanced and
>> OST0001 but far from being full. We are using default allocation setting
>> so we should be in weighted mode.
>>
>> I've tried to find an LU matching this but no luck. Also, log on
>> affected nodes and on servers are empty.
>>
>> Any suggestions about how to debug this?
>>
>> Cheers,
>> Hans Henrik
>>
>>
>>
>> ___
>> lustre-discuss mailing list
>> lustre-discuss@lists.lustre.org
>> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>>
> 
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

[lustre-discuss] Lustre/ZFS space accounting

2017-06-08 Thread Hans Henrik Happe

Hi,

We are on Lustre 2.8 with ZFS.

Our users have seen some unexplainable errors:

062: forrtl: Input/output error

Or

062: forrtl: severe (28): CLOSE error, unit 10, file “Unknown"


From attached 'lfs df -h' you can see that the OSTs are unbalanced and
OST0001 but far from being full. We are using default allocation setting
so we should be in weighted mode.

I've tried to find an LU matching this but no luck. Also, log on
affected nodes and on servers are empty.

Any suggestions about how to debug this?

Cheers,
Hans Henrik
UUID   bytesUsed   Available Use% Mounted on
astro-MDT_UUID691.6G  163.0G  528.6G  24% 
/lustre/astro[MDT:0]
astro-OST_UUID157.2T  105.0T   52.3T  67% 
/lustre/astro[OST:0]
astro-OST0001_UUID158.5T  149.2T9.3T  94% 
/lustre/astro[OST:1]
astro-OST0002_UUID158.6T  144.8T   13.9T  91% 
/lustre/astro[OST:2]
astro-OST0003_UUID157.3T  105.4T   51.9T  67% 
/lustre/astro[OST:3]

filesystem summary:   631.6T  504.3T  127.3T  80% /lustre/astro
<>

signature.asc
Description: OpenPGP digital signature
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

[lustre-discuss] Clients looses IB connection to OSS.

2017-05-01 Thread Hans Henrik Happe

Hi,

We have experienced problems with loosing connection to OSS. It starts with:

May  1 03:35:46 node872 kernel: LNetError:
5545:0:(o2iblnd_cb.c:1094:kiblnd_init_rdma()) RDMA has too many
fragments for peer 10.21.10.116@o2ib (256), src idx/frags: 128/236 dst
idx/frags: 128/236
May  1 03:35:46 node872 kernel: LNetError:
5545:0:(o2iblnd_cb.c:1689:kiblnd_reply()) Can't setup rdma for GET from
10.21.10.116@o2ib: -90

The rest of the log is attached.

After this Lustre access is very slow. I.e. a 'df' can take minutes.
Also 'lctl ping' to the OSS give I/O errors. Doing 'lnet net del/add'
makes ping work again until file I/O starts. Then I/O errors again.

We use both IB and TCP on servers, so no routers.

In the attached log astro-OST0001 has been moved to the other server in
the HA pair. This is because 'lctl dl -t' showed strange output when on
the right server:

# lctl dl -t
  0 UP mgc MGC10.21.10.102@o2ib 0b0bbbce-63b6-bf47-403c-28f0c53e8307 5
  1 UP lov astro-clilov-88107412e800
53add9a3-e719-26d9-afb4-3fe9b0fa03bd 4
  2 UP lmv astro-clilmv-88107412e800
53add9a3-e719-26d9-afb4-3fe9b0fa03bd 4
  3 UP mdc astro-MDT-mdc-88107412e800
53add9a3-e719-26d9-afb4-3fe9b0fa03bd 5 10.21.10.102@o2ib
  4 UP osc astro-OST0002-osc-88107412e800
53add9a3-e719-26d9-afb4-3fe9b0fa03bd 5 10.21.10.116@o2ib
  5 UP osc astro-OST0001-osc-88107412e800
53add9a3-e719-26d9-afb4-3fe9b0fa03bd 5 172.20.10.115@tcp1
  6 UP osc astro-OST0003-osc-88107412e800
53add9a3-e719-26d9-afb4-3fe9b0fa03bd 5 10.21.10.117@o2ib
  7 UP osc astro-OST-osc-88107412e800
53add9a3-e719-26d9-afb4-3fe9b0fa03bd 5 10.21.10.114@o2ib

So astro-OST0001 seems to be connected through 172.20.10.115@tcp1, even
though it uses 10.21.10.115@o2ib (verified by performance test and
disabling tcp1 on IB nodes).

Please ask for more details if needed.

Cheers,
Hans Henrik

May  1 03:35:46 node872 kernel: LNetError: 5545:0:(o2iblnd_cb.c:1094:kiblnd_init_rdma()) RDMA has too many fragments for peer 10.21.10.116@o2ib (256), src idx/frags: 128/236 dst idx/frags: 128/236
May  1 03:35:46 node872 kernel: LNetError: 5545:0:(o2iblnd_cb.c:1689:kiblnd_reply()) Can't setup rdma for GET from 10.21.10.116@o2ib: -90
May  1 03:35:46 node872 kernel: LustreError: 5545:0:(events.c:201:client_bulk_callback()) event type 1, status -5, desc 88103dd63000
May  1 03:35:46 node872 kernel: Lustre: 5606:0:(client.c:2063:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1493602541/real 1493602541]  req@880e99cea080 x1565604440535580/t0(0) o4->astro-OST0002-osc-881070c95c00@10.21.10.116@o2ib:6/4 lens 608/448 e 0 to 1 dl 1493602585 ref 2 fl Rpc:X/0/ rc 0/-1
May  1 03:35:46 node872 kernel: Lustre: astro-OST0002-osc-881070c95c00: Connection to astro-OST0002 (at 10.21.10.116@o2ib) was lost; in progress operations using this service will wait for recovery to complete
May  1 03:35:46 node872 kernel: Lustre: astro-OST0002-osc-881070c95c00: Connection restored to 10.21.10.116@o2ib (at 10.21.10.116@o2ib)
May  1 03:35:46 node872 kernel: LustreError: 5545:0:(events.c:201:client_bulk_callback()) event type 1, status -5, desc 88103dd63000
May  1 03:35:46 node872 kernel: LustreError: 5545:0:(events.c:201:client_bulk_callback()) event type 1, status -5, desc 88103dd63000
May  1 03:35:46 node872 kernel: LustreError: 5545:0:(events.c:201:client_bulk_callback()) event type 1, status -5, desc 88103dd63000
May  1 03:35:46 node872 kernel: LustreError: 5545:0:(events.c:201:client_bulk_callback()) event type 1, status -5, desc 88103dd63000
May  1 03:35:46 node872 kernel: LustreError: 5545:0:(events.c:201:client_bulk_callback()) event type 1, status -5, desc 88103dd63000
May  1 03:35:46 node872 kernel: LustreError: 5545:0:(events.c:201:client_bulk_callback()) event type 1, status -5, desc 88103dd63000
May  1 03:35:46 node872 kernel: LustreError: 5545:0:(events.c:201:client_bulk_callback()) event type 1, status -5, desc 88103dd63000
May  1 03:35:52 node872 kernel: Lustre: 5579:0:(client.c:2063:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1493602546/real 1493602546]  req@88103e0f10c0 x1565604440535684/t0(0) o8->astro-OST0002-osc-881070c95c00@10.21.10.116@o2ib:28/4 lens 520/544 e 0 to 1 dl 1493602552 ref 1 fl Rpc:XN/0/ rc 0/-1
May  1 03:35:52 node872 kernel: Lustre: 5579:0:(client.c:2063:ptlrpc_expire_one_request()) Skipped 7 previous similar messages
May  1 03:36:17 node872 kernel: Lustre: 5579:0:(client.c:2063:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1493602571/real 1493602571]  req@881056dd39c0 x1565604440535728/t0(0) o8->astro-OST0002-osc-881070c95c00@10.21.10.115@o2ib:28/4 lens 520/544 e 0 to 1 dl 1493602577 ref 1 fl Rpc:XN/0/ rc 0/-1
May  1 03:36:18 node872 kernel: Lustre: astro-OST0001-osc-881070c95c00: Connection to astro-OST0001 (at 10.21.10.116@o2ib) was lost; in progress operations using t

Re: [lustre-discuss] Installing zfs and lustre

2016-10-22 Thread Hans Henrik Happe


Here are my notes about building lustre:

---

1. ZFS build

Use ZFS repo or follow this to roll your own:

http://zfsonlinux.org/generic-rpm.html

2. Lustre server and client build

2.1 Packages

yum -y groupinstall "Development Tools"
yum -y install libselinux-devel
yum -y install net-snmp-devel
yum -y install libyaml-devel
yum -y install python-docutils

2.2 Setup

useradd -m build
su - build

git clone git://git.hpdd.intel.com/fs/lustre-release.git
cd lustre-release

zfs-release=0.6.5.7
release=2.8.0

git checkout $release

(might need a patch for newer ZFS versions)

sh ./autogen.sh

2.3 Server

./configure --disable-ldiskfs --with-zfs --enable-quota --enable-utils 
--enable-gss --enable-snmp 
--with-spl-obj=/var/lib/dkms/spl/$zfs-release/$(uname -r)/x86_64 
--with-zfs-obj=/var/lib/dkms/zfs/$zfs-release/$(uname -r)/x86_64


make rpms

---

Hope it helps.

Cheers,
Hans Henrik


On 07-10-2016 23:32, Kyriazis, George wrote:

I managed to avoid this problem by hacking the .spec file of the SRPM,
but I am not hitting another problem.  When packaging the RPM files,
there are some config files that are missing.  They exist in the
original tarball (which is inside the SRPM), but not in the BUILDROOT.



Any ideas?



Thank you!



George



# buildrpm –-rebuild –-with zfs –-without ldiskfs 

…

…



dwz:
"./lib/modules/3.10.0-327.36.1.el7.x86_64/extra/kernel/fs/lustre/lov.ko.deb

ug" is not a shared library

dwz:
"./lib/modules/3.10.0-327.36.1.el7.x86_64/extra/kernel/fs/lustre/osc.ko.deb

ug" is not a shared library

/usr/lib/rpm/sepdebugcrcfix: Updated 68 CRC32s, 18 CRC32s did match.

18347 blocks

+ /usr/lib/rpm/check-buildroot

+ /usr/lib/rpm/redhat/brp-compress

+ /usr/lib/rpm/redhat/brp-strip-static-archive /usr/bin/strip

+ /usr/lib/rpm/brp-python-bytecompile /usr/bin/python 1

+ /usr/lib/rpm/redhat/brp-python-hardlink

+ /usr/lib/rpm/redhat/brp-java-repack-jars

Processing files: lustre-2.8.0-3.10.0_327.36.1.el7.x86_64.x86_64

error: File not found:
/root/rpmbuild/BUILDROOT/lustre-2.8.0-3.10.0_327.36.1.el7

.x86_64.x86_64/usr/libexec/lustre/lc_common

error: File not found:
/root/rpmbuild/BUILDROOT/lustre-2.8.0-3.10.0_327.36.1.el7

.x86_64.x86_64/usr/libexec/lustre/haconfig

error: File not found:
/root/rpmbuild/BUILDROOT/lustre-2.8.0-3.10.0_327.36.1.el7

.x86_64.x86_64/usr/bin/lustre_req_history

error: File not found:
/root/rpmbuild/BUILDROOT/lustre-2.8.0-3.10.0_327.36.1.el7

.x86_64.x86_64/etc/sysconfig/lustre

error: File not found:
/root/rpmbuild/BUILDROOT/lustre-2.8.0-3.10.0_327.36.1.el7

.x86_64.x86_64/etc/init.d/lustre





RPM build errors:

File not found:
/root/rpmbuild/BUILDROOT/lustre-2.8.0-3.10.0_327.36.1.el7.x8

6_64.x86_64/usr/libexec/lustre/lc_common

File not found:
/root/rpmbuild/BUILDROOT/lustre-2.8.0-3.10.0_327.36.1.el7.x8

6_64.x86_64/usr/libexec/lustre/haconfig

File not found:
/root/rpmbuild/BUILDROOT/lustre-2.8.0-3.10.0_327.36.1.el7.x8

6_64.x86_64/usr/bin/lustre_req_history

File not found:
/root/rpmbuild/BUILDROOT/lustre-2.8.0-3.10.0_327.36.1.el7.x8

6_64.x86_64/etc/sysconfig/lustre

File not found:
/root/rpmbuild/BUILDROOT/lustre-2.8.0-3.10.0_327.36.1.el7.x8

6_64.x86_64/etc/init.d/lustre





*From:* Kyriazis, George
*Sent:* Wednesday, October 5, 2016 5:24 PM
*To:* lustre-discuss@lists.lustre.org
*Subject:* Installing zfs and lustre



Hello lustre-discuss,



I am trying to setup lustre + zfs on a set of virtual machines, for
testing purposes.



I have managed to get plain lustre + ldiskfs working, which is great!



The main problem that I’m having is that the osd-zfs kernel module has
symbol version mismatches.  Initially I went down the path of trying to
download the right version of zfs, to match symbol versions, but had
trouble with it.




I then came across a post, here, in lustre-discuss, by Christopher
Morrone, saying not to do that, but rather, compile lustre from an
SRPM using stock kernel.  I am having a problem doing that, though,
last few lines of compilation below.


Any help is appreciated!



Thank you!

George



[root@l-2 lustre]# rpmbuild --rebuild --with zfs --without ldiskfs
lustre-2.8.0-3.10.0_327.3.1.el7_lustre.x86_64.src.rpm



…



make[3]: Leaving directory `/root/rpmbuild/BUILD/lustre-2.8.0/lustre/lov'

make[2]: Leaving directory `/root/rpmbuild/BUILD/lustre-2.8.0/lustre/lov'

Making install in osc

make[2]: Entering directory `/root/rpmbuild/BUILD/lustre-2.8.0/lustre/osc'

make[3]: Entering directory `/root/rpmbuild/BUILD/lustre-2.8.0/lustre/osc'

make[3]: Nothing to be done for `install-exec-am'.

/usr/bin/mkdir -p
'/root/rpmbuild/BUILDROOT/lustre-2.8.0-3.10.0_327.3.1.el7_lustre.x86_64.x86_64/lib/modules/3.10.0-327.36.1.el7.x86_64/extra/kernel/fs/lustre'

/usr/bin/install -c -m 644 osc.ko
'/root/rpmbuild/BUILDROOT/lustre-2.8.0-3.10.0_327.3.1.el7_lustre.x86_64.x86_64/lib/modules/3.10.0-327.36.1.el7.x86_64/extra/kernel/fs/lustre'

make[3]: Leaving directory `/root/rpmbuild/BUILD/lustre-2.8.0/lustre/osc'

mak

Re: [lustre-discuss] Quick ZFS pool question?

2016-10-07 Thread Hans Henrik Happe

Just curios, if you set reservation on an zfs OST fs, will the algorithm work? 
Also, will it go totally crazy or just not be able to make good decisions, 
because something external is grabbing the space?

Cheers,
Hans Henrik

On October 7, 2016 8:16:54 AM GMT+02:00, "Xiong, Jinshan" 
 wrote:
>
>> On Oct 6, 2016, at 2:04 AM, Phill Harvey-Smith
> wrote:
>> 
>> Hi all,
>> 
>> Having tested a simple setup for lustre / zfs, I'd like to trey and
>replicate on the test system what we currently have on the production
>system, which uses a much older version of lustre (2.0 IIRC).
>> 
>> Currently we have a combined mgs / mds node and a single oss node. we
>have 3 filesystems : home, storage and scratch.
>> 
>> The MGS/MDS node currently has the mgs on a seperate block device and
>the 3 mds on a combined lvm volume.
>> 
>> The OSS has an ost each (on a separate disks) for scratch and home
>and two ost for storage.
>> 
>> If we migrate this setup to a ZFS based one, will I need to create a
>separate zpool for each mdt / mgt / oss  or will I be able to create a
>single zpool and split it up between the individual mdt / oss blocks,
>if so how do I tell each filesystem how big it should be?
>
>We strongly recommend to create separate ZFS pools for OSTs, otherwise
>grant, which is a Lustre internal space reserve algorithm, won’t work
>properly.
>
>It’s possible to create a single zpool for MDTs and MGS, and you can
>use ‘zfs set reservation= ’ to reserve spaces for
>different targets.
>
>Jinshan
>
>> 
>> Cheers.
>> 
>> Phill.
>> ___
>> lustre-discuss mailing list
>> lustre-discuss@lists.lustre.org
>> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>
>___
>lustre-discuss mailing list
>lustre-discuss@lists.lustre.org
>http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

Re: [lustre-discuss] Small sequential reads on cached file are slow

2016-10-04 Thread Hans Henrik Happe

Thanks a lot. I'm looking forward to test this. Wounder why I haven't stumpled 
across this issue. While HPC workloads try to avoid syscall overhead, there are 
lots of software that rely on the OS to cope with small IOs. Think the average 
user will become a lot happier.

Cheers,
Hans Henrik

On October 2, 2016 11:21:55 PM GMT+02:00, "Dilger, Andreas" 
 wrote:
>Please test with Lustre master (pre-2.9.0). There were optimizations
>landed specifically to improve small file read performance.  While
>Lustre 2.9.0 isn't released yet, it is getting very close. 
>
>Cheers, Andreas
>
>> On Oct 2, 2016, at 06:54, Hans Henrik Happe  wrote:
>> 
>> Hi,
>> 
>> While testing small sequential reads I noticed that Lustre is more
>than 10 times slower than local fs when reading cached data. So
>basically no network I/O to OSSes. I wanted to check if readahead was
>working for this small I/O case, but it seems that cached case isn't.
>> 
>> The client is running Lustre 2.8 and CentOS 6.8.
>> 
>> Lustre:
>> 
>> $ dd if=file of=/dev/null bs=512
>> 2097152+0 records in
>> 2097152+0 records out
>> 1073741824 bytes (1.1 GB) copied, 20.5081 s, 52.4 MB/s
>> 
>> $ dd if=file of=/dev/null bs=4k
>> 262144+0 records in
>> 262144+0 records out
>> 1073741824 bytes (1.1 GB) copied, 2.91177 s, 369 MB/s
>> 
>> $ dd if=file of=/dev/null bs=1M
>> 1024+0 records in
>> 1024+0 records out
>> 1073741824 bytes (1.1 GB) copied, 0.160732 s, 6.7 GB/s
>> 
>> Local fs:
>> 
>> $ dd if=/tmp/file of=/dev/null bs=512
>> 2097152+0 records in
>> 2097152+0 records out
>> 1073741824 bytes (1.1 GB) copied, 1.56432 s, 686 MB/s
>> 
>> $ dd if=/tmp/file of=/dev/null bs=4k
>> 262144+0 records in
>> 262144+0 records out
>> 1073741824 bytes (1.1 GB) copied, 0.275451 s, 3.9 GB/s
>> 
>> $ dd if=/tmp/file of=/dev/null bs=1M
>> 1024+0 records in
>> 1024+0 records out
>> 1073741824 bytes (1.1 GB) copied, 0.148798 s, 7.2 GB/s
>> 
>> 
>> Is this a known issue?
>> 
>> Cheers,
>> Hans Henrik Happe
>> ___
>> lustre-discuss mailing list
>> lustre-discuss@lists.lustre.org
>> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

[lustre-discuss] Small sequential reads on cached file are slow

2016-10-02 Thread Hans Henrik Happe


Hi,

While testing small sequential reads I noticed that Lustre is more than 
10 times slower than local fs when reading cached data. So basically no 
network I/O to OSSes. I wanted to check if readahead was working for 
this small I/O case, but it seems that cached case isn't.


The client is running Lustre 2.8 and CentOS 6.8.

Lustre:

$ dd if=file of=/dev/null bs=512
2097152+0 records in
2097152+0 records out
1073741824 bytes (1.1 GB) copied, 20.5081 s, 52.4 MB/s

$ dd if=file of=/dev/null bs=4k
262144+0 records in
262144+0 records out
1073741824 bytes (1.1 GB) copied, 2.91177 s, 369 MB/s

$ dd if=file of=/dev/null bs=1M
1024+0 records in
1024+0 records out
1073741824 bytes (1.1 GB) copied, 0.160732 s, 6.7 GB/s

Local fs:

$ dd if=/tmp/file of=/dev/null bs=512
2097152+0 records in
2097152+0 records out
1073741824 bytes (1.1 GB) copied, 1.56432 s, 686 MB/s

$ dd if=/tmp/file of=/dev/null bs=4k
262144+0 records in
262144+0 records out
1073741824 bytes (1.1 GB) copied, 0.275451 s, 3.9 GB/s

$ dd if=/tmp/file of=/dev/null bs=1M
1024+0 records in
1024+0 records out
1073741824 bytes (1.1 GB) copied, 0.148798 s, 7.2 GB/s


Is this a known issue?

Cheers,
Hans Henrik Happe
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

Re: [lustre-discuss] Relaxing read consistency, from other node write

2016-05-09 Thread Hans Henrik Happe




On 09-05-2016 16:42, Patrick Farrell wrote:

Hans,

It sounds like you want the data to be available to the other nodes
before it's safely 'down' on the server.  Is that correct?


Yes, when in server cache the other clients should be allowed to read. I 
guess you could make it a replica of the writing clients cache, so it 
could replay in case of server failure.


I was just wondering if there was a quick workaround my current problem.

If the system could squeeze in these small writes without seconds of 
latency while large writes are hitting the drives hard I would be happy.




If so, then I believe there's no way to do that currently in Lustre.

If you were willing to accept the possibility of incorrect reads, then
you could check out group locking - It lets clients get a special type
of shared lock, which allows every client to think it has a lock on the
whole file.  That lets them do reads while someone else is writing, with
the caveat that they can get out of date data, and that one clients
cache is not invalidated when another client updates part of the file.

It's a pretty significant relaxation of the POSIX consistency semantics,
and is tricky to use safely without very well defined behavior from
clients.  And it might not be what you need...  But I think it's what's
available.

Cheers,
- Patrick

On 05/09/2016 02:15 AM, Hans Henrik Happe wrote:

Hi,

Some users experienced that reading a log file written on another node
the read of the last bytes were sometimes delayed teens of seconds.
This happens when other processes are writing heavily.

It seems that the data needs to be committed to persistent storage,
before the reading node can have it. That makes sense since the
writing node and the server could die, taking with them all knowledge
about the write. Is this a correct description?

I'm wondering if there is a way to relax this. I.e. ignore this
failure scenario or treat the cache entries in writing node and server
as enough redundancy?

WRT why we see these long delays I think I tracked it down to an ZFS
issue (https://github.com/zfsonlinux/zfs/issues/4603), but I'm only a
layman when it comes to the internals of ZFS and Lustre.

We are at 2.7.64, so we have to update to 2.8 soon. Going through the
commits I couldn't find anything that relates, but that might just be
my ignorance.

Cheers,
Hans Henrik
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

[lustre-discuss] Relaxing read consistency, from other node write

2016-05-09 Thread Hans Henrik Happe


Hi,

Some users experienced that reading a log file written on another node 
the read of the last bytes were sometimes delayed teens of seconds. This 
happens when other processes are writing heavily.


It seems that the data needs to be committed to persistent storage, 
before the reading node can have it. That makes sense since the writing 
node and the server could die, taking with them all knowledge about the 
write. Is this a correct description?


I'm wondering if there is a way to relax this. I.e. ignore this failure 
scenario or treat the cache entries in writing node and server as enough 
redundancy?


WRT why we see these long delays I think I tracked it down to an ZFS 
issue (https://github.com/zfsonlinux/zfs/issues/4603), but I'm only a 
layman when it comes to the internals of ZFS and Lustre.


We are at 2.7.64, so we have to update to 2.8 soon. Going through the 
commits I couldn't find anything that relates, but that might just be my 
ignorance.


Cheers,
Hans Henrik
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

Re: [lustre-discuss] issue in lnet con

2016-02-08 Thread Hans Henrik Happe


Hi,

The lnet module is missing something from the kernel. Please,
provide the related output of dmesg.

Cheers,
Hans Henrik

On 07-02-2016 04:48, Parag Khuraswar wrote:

Hi,

I am facing an issue while running “modprobe lnet”

FATAL: Error inserting lnet
(/lib/modules/2.6.32.504.el6_lustre/extra/kernel/net/lustre/lnet.ko):
Unknown symbol in module, or unknown parameter (see dmesg)

Regards,

Parag

+91 8308806004



___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

[lustre-discuss] ZFS inode accounting

2016-01-29 Thread Hans Henrik Happe


Hi,

I've been testing 2.7.64 on ZFS and discovered that inode accounting 
showed very high counts:


lfs quota -g others /lustre/hpc
Disk quotas for group others (gid 8000):
 Filesystem  kbytes   quota   limit   grace   files   quota   limit 
  grace
/lustre/hpc  41  1073741824 1073741824   - 
18446744073709536075   0   0   -


That looks like a 64-bit uint counter that are going below zero. It 
keeps counting down when I run mdtest with 2 or more processes. With one 
process it's the same before and after. Guess it must be a race.


The syslog on MDT shows this message:

LustreError: 8563:0:(osd_object.c:1485:osd_object_create()) hpc-MDT: 
failed to add [0x21b73:0x1cae8:0x0] to accounting ZAP for grp 8000 (-2)



After reading LU-2435 and LU-5638 I guess there are still some issues 
that needs to be sorted out?


Is there a way to make Lustre recheck the quota?

Cheers,
Hans Henrik
<>___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

[lustre-discuss] Network failover between IB and Ethernet

2015-12-29 Thread Hans Henrik Happe


Hi,

I have a setup where servers and clients are all on both an an ib and an 
ethernet network. Now, if one of the networks are lost (dead switch) or 
single paths to a node it would be nice if Lustre would failover to the 
working network.


Looking through old posts I only found some that are 7+ years old and 
they don't agree:


http://lists.lustre.org/htdig.cgi/lustre-discuss-lustre.org/2008-April/001487.html
http://lists.lustre.org/htdig.cgi/lustre-discuss-lustre.org/2008-July/002292.html

Playing with 2.7 my conclusion is that it will not failover like this. 
Is that correct? Are there plans to address it in the future?


Bonding will, of cause, work for Ethernet devices.

It seems that with dual OFED devices it is supported with the "ko2iblnd 
dev_failover=1" option. Perhaps this might work with Soft-RCoE?


Could routing be used to solve this problem?

Cheers,
Hans Henrik
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

85 matches

Mail list logo