[gpfsug-discuss] Moving data between dependent filesets

2019-08-05 Thread Sundermann, Jan Erik (SCC)
Dear all,

I am trying to understand how to move data efficiently between filesets sharing 
the same inode space. I have an independent fileset fs1 which contains data 
that I would like to move to a newly created dependent fileset fs2. fs1 and fs2 
are sharing the same inode space. Apparently calling mv is copying the data 
instead of just moving it. Using strace on mv prints lines like 

renameat2(AT_FDCWD, "subdir1/file257", AT_FDCWD, 
"../filesettest/subdir1/file257", 0) = -1 EXDEV (Invalid cross-device link)


Is there an efficient way to move the data between the filesets fs1 and fs2?


Best regards
Jan Erik



smime.p7s
Description: S/MIME cryptographic signature
___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


[gpfsug-discuss] Problems with remote mount via routed IB

2018-02-26 Thread Sundermann, Jan Erik (SCC)

Dear all

we are currently trying to remote mount a file system in a routed Infiniband 
test setup and face problems with dropped RDMA connections. The setup is the 
following: 

- Spectrum Scale Cluster 1 is setup on four servers which are connected to the 
same infiniband network. Additionally they are connected to a fast ethernet 
providing ip communication in the network 192.168.11.0/24.

- Spectrum Scale Cluster 2 is setup on four additional servers which are 
connected to a second infiniband network. These servers have IPs on their IB 
interfaces in the network 192.168.12.0/24.

- IP is routed between 192.168.11.0/24 and 192.168.12.0/24 on a dedicated 
machine.

- We have a dedicated IB hardware router connected to both IB subnets.


We tested that the routing, both IP and IB, is working between the two clusters 
without problems and that RDMA is working fine both for internal communication 
inside cluster 1 and cluster 2

When trying to remote mount a file system from cluster 1 in cluster 2, RDMA 
communication is not working as expected. Instead we see error messages on the 
remote host (cluster 2)


2018-02-23_13:48:47.037+0100: [I] VERBS RDMA connecting to 192.168.11.4 
(iccn004-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 fabnum 0 index 2
2018-02-23_13:48:49.890+0100: [I] VERBS RDMA connected to 192.168.11.4 
(iccn004-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 fabnum 0 sl 0 index 2
2018-02-23_13:48:53.138+0100: [E] VERBS RDMA closed connection to 192.168.11.1 
(iccn001-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 fabnum 0 error 733 
index 3
2018-02-23_13:48:53.854+0100: [I] VERBS RDMA connecting to 192.168.11.1 
(iccn001-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 fabnum 0 index 3
2018-02-23_13:48:54.954+0100: [E] VERBS RDMA closed connection to 192.168.11.3 
(iccn003-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 fabnum 0 error 733 
index 1
2018-02-23_13:48:55.601+0100: [I] VERBS RDMA connected to 192.168.11.1 
(iccn001-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 fabnum 0 sl 0 index 3
2018-02-23_13:48:57.775+0100: [I] VERBS RDMA connecting to 192.168.11.3 
(iccn003-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 fabnum 0 index 1
2018-02-23_13:48:59.557+0100: [I] VERBS RDMA connected to 192.168.11.3 
(iccn003-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 fabnum 0 sl 0 index 1
2018-02-23_13:48:59.876+0100: [E] VERBS RDMA closed connection to 192.168.11.2 
(iccn002-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 fabnum 0 error 733 
index 0
2018-02-23_13:49:02.020+0100: [I] VERBS RDMA connecting to 192.168.11.2 
(iccn002-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 fabnum 0 index 0
2018-02-23_13:49:03.477+0100: [I] VERBS RDMA connected to 192.168.11.2 
(iccn002-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 fabnum 0 sl 0 index 0
2018-02-23_13:49:05.119+0100: [E] VERBS RDMA closed connection to 192.168.11.4 
(iccn004-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 fabnum 0 error 733 
index 2
2018-02-23_13:49:06.191+0100: [I] VERBS RDMA connecting to 192.168.11.4 
(iccn004-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 fabnum 0 index 2
2018-02-23_13:49:06.548+0100: [I] VERBS RDMA connected to 192.168.11.4 
(iccn004-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 fabnum 0 sl 0 index 2
2018-02-23_13:49:11.578+0100: [E] VERBS RDMA closed connection to 192.168.11.1 
(iccn001-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 fabnum 0 error 733 
index 3
2018-02-23_13:49:11.937+0100: [I] VERBS RDMA connecting to 192.168.11.1 
(iccn001-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 fabnum 0 index 3
2018-02-23_13:49:11.939+0100: [I] VERBS RDMA connected to 192.168.11.1 
(iccn001-gpfs in gpfsstorage.localdomain) on mlx4_0 port 1 fabnum 0 sl 0 index 3


and in the cluster with the file system (cluster 1)

2018-02-23_13:47:36.112+0100: [E] VERBS RDMA rdma read error 
IBV_WC_RETRY_EXC_ERR to 192.168.12.5 (iccn005-ib in 
gpfsremoteclients.localdomain) on mlx4_0 port 1 fabnum 0 vendor_err 129 
2018-02-23_13:47:36.112+0100: [E] VERBS RDMA closed connection to 192.168.12.5 
(iccn005-ib in gpfsremoteclients.localdomain) on mlx4_0 port 1 fabnum 0 due to 
RDMA read error IBV_WC_RETRY_EXC_ERR index 3
2018-02-23_13:47:47.161+0100: [I] VERBS RDMA accepted and connected to 
192.168.12.5 (iccn005-ib in gpfsremoteclients.localdomain) on mlx4_0 port 1 
fabnum 0 sl 0 index 3
2018-02-23_13:48:04.317+0100: [E] VERBS RDMA rdma read error 
IBV_WC_RETRY_EXC_ERR to 192.168.12.5 (iccn005-ib in 
gpfsremoteclients.localdomain) on mlx4_0 port 1 fabnum 0 vendor_err 129 
2018-02-23_13:48:04.317+0100: [E] VERBS RDMA closed connection to 192.168.12.5 
(iccn005-ib in gpfsremoteclients.localdomain) on mlx4_0 port 1 fabnum 0 due to 
RDMA read error IBV_WC_RETRY_EXC_ERR index 3
2018-02-23_13:48:11.560+0100: [I] VERBS RDMA accepted and connected to 
192.168.12.5 (iccn005-ib in gpfsremoteclients.localdomain) on mlx4_0 port 1 
fabnum 0 sl 0 index 3
2018-02-23_13:48:32.523+0100: [E] VERBS RD

[gpfsug-discuss] Upgrade with architecture change

2017-06-07 Thread Sundermann, Jan Erik (SCC)
Hi,

we are operating a small Spectrum Scale cluster with about 100 clients and 6 
NSD servers. The cluster is FPO-enabled. For historical reasons the NSD servers 
are running on ppc64 while the clients are a mixture of ppc64le and x86_64 
machines. Most machines are running Red Hat Enterprise Linux 7 but we also have 
few machines running AIX.

At the moment we have installed Spectrum Scale version 4.1.1 but would like to 
do an upgrade to 4.2.3. In the course of the upgrade we would like to change 
the architecture of all NSD servers and reinstall them with ppc64le instead of 
ppc64. 

From what I’ve learned so far it should be possible to upgrade directly from 
4.1.1 to 4.2.3. Before doing the upgrade we would like to ask for some advice 
on the best strategy. 

For the NSD servers, one by one, we are thinking about doing the following:

1) Disable auto recovery
2) Unmount GPFS file system
3) Suspend disks
4) Shutdown gpfs
5) Reboot and reinstall with changed architecture ppc64le
6) Install gpfs 4.2.3
7) Recover cluster config using mmsdrrestore
8) Resume and start disks
9) Reenable auto recovery

Can GPFS handle the change of the NSD server’s architecture and would it be 
fine to operate a mixture of different architectures for the NSD servers?


Thanks,
Jan Erik

___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss