Hi all,

I have a 4-server system running a distributed-replicate setup, 4 x (2 + 1) = 
12. Bricks are staggered across the servers. Sharding is enabled. (v info shown 
below)

Now, the storage is slow on the these servers and not really up to the job so 
we have 4 new servers with SSDs. I have to move everything over to the new 
servers whilst not taking down the storage.

The four old servers are running Gluster 6.4 and the new ones, 6.5.

So having read tons of docs and mailing lists, etc, I think I ought to be able 
to use add-brick, remove-brick to get everything moved safely like so:

# gluster volume add-brick iscsi replica 3 arbiter 1 srv{13..15}:/brick1

# gluster volume remove-brick iscsi replica 3 srv{1..3}:/brick1 start

Then once complete, do:

# gluster volume remove-brick iscsi replica 3 srv{1..3}:/brick1 commit


So I created a test volume to try this out. On the third add/remove of 4, I get 
a 'failed' on the remove-brick status. The rebalance log shows:

[2020-02-28 22:25:28.133902] I [dht-rebalance.c:1589:dht_migrate_file] 0-testmig
rate-dht: /linux-5.4.22/arch/arm/boot/dts/exynos4412-itop-scp-core.dtsi: attempt
ing to move from testmigrate-replicate-0 to testmigrate-replicate-2
[2020-02-28 22:25:28.144258] W [MSGID: 108015] [afr-self-heal-name.c:138:__afr_s
elfheal_name_expunge] 0-testmigrate-replicate-0: expunging file a75a83b7-2c34-40
77-b4fc-3126a9d6058a/exynos4210-smdkv310.dts (11a47b1f-2c24-4d4b-9402-9130125cf9
53) on testmigrate-client-6
[2020-02-28 22:25:28.146321] E [MSGID: 109023] 
[dht-rebalance.c:1707:dht_migrate_file] 0-testmigrate-dht: Migrate file 
failed:/linux-5.4.22/arch/arm/boot/dts/exynos4210-smdkv310.dts: lookup failed 
on testmigrate-replicate-0 [No such file or directory]
[2020-02-28 22:25:28.149104] E [MSGID: 109023] 
[dht-rebalance.c:2874:gf_defrag_migrate_single_file] 0-testmigrate-dht: 
migrate-data failed for /linux-5.4.22/arch/arm/boot/dts/exynos4210-smdkv310.dts 
[No such file or directory]


This is show for 4 files.

When I look at the FUSE-mounted volume, the file is there and correct but the 
file permissions of this and lots of others are screwed. Lots of dirs with 
d--------- permissions, lots of root:root owned files.


So any advice for how to proceed from here:


I did a force on the remove-brick as the data seemed to be in place which is 
fine, but now I can't do an add-brick as gluster seems to think a rebalance is 
taking place:

---
volume add-brick: failed: Pre Validation failed on 
terek-stor.amazing-internet.net. Volume name testmigrate rebalance is in 
progress. Please retry after completion
---

$ sudo gluster volume rebalance testmigrate status                volume 
rebalance: testmigrate: failed: Rebalance not started for volume testmigrate.

Thanks for any insight anyone can offer.

Ronny





$ sudo gluster volume info iscsi

Volume Name: iscsi
Type: Distributed-Replicate
Volume ID: 40ff42a7-5dee-4a98-991b-c4ba5bc50438
Status: Started
Snapshot Count: 0
Number of Bricks: 4 x (2 + 1) = 12
Transport-type: tcp
Bricks:
Brick1: ahren-stor.amazing-internet.net:/data/glusterfs/iscsi/brick1/brick
Brick2: mareth-stor.amazing-internet.net:/data/glusterfs/iscsi/brick1/brick
Brick3: terek-stor.amazing-internet.net:/data/glusterfs/iscsi/brick1a/brick 
(arbiter)
Brick4: walker-stor.amazing-internet.net:/data/glusterfs/iscsi/brick2/brick
Brick5: ahren-stor.amazing-internet.net:/data/glusterfs/iscsi/brick2/brick
Brick6: mareth-stor.amazing-internet.net:/data/glusterfs/iscsi/brick2a/brick 
(arbiter)
Brick7: terek-stor.amazing-internet.net:/data/glusterfs/iscsi/brick3/brick
Brick8: walker-stor.amazing-internet.net:/data/glusterfs/iscsi/brick3/brick
Brick9: ahren-stor.amazing-internet.net:/data/glusterfs/iscsi/brick3a/brick 
(arbiter)
Brick10: mareth-stor.amazing-internet.net:/data/glusterfs/iscsi/brick4/brick
Brick11: terek-stor.amazing-internet.net:/data/glusterfs/iscsi/brick4/brick
Brick12: walker-stor.amazing-internet.net:/data/glusterfs/iscsi/brick4a/brick 
(arbiter)
Options Reconfigured:
performance.client-io-threads: off
nfs.disable: on
transport.address-family: inet
performance.quick-read: off
performance.read-ahead: off
performance.io-cache: off
performance.stat-prefetch: off
performance.open-behind: off
performance.readdir-ahead: off
performance.strict-o-direct: on
network.remote-dio: disable
cluster.eager-lock: enable
cluster.quorum-type: auto
cluster.data-self-heal-algorithm: full
cluster.locking-scheme: granular
cluster.shd-max-threads: 8
cluster.shd-wait-qlength: 10000
features.shard: on
features.shard-block-size: 64MB
user.cifs: off
server.allow-insecure: on
cluster.choose-local: off
auth.allow: 127.0.0.1,172.16.36.*,172.16.40.*
ssl.cipher-list: HIGH:!SSLv2
server.ssl: on
client.ssl: on
ssl.certificate-depth: 1
performance.cache-size: 1GB
client.event-threads: 4
server.event-threads: 4


-- 
Ronny Adsetts
Technical Director
Amazing Internet Ltd, London
t: +44 20 8977 8943
w: www.amazinginternet.com

Registered office: 85 Waldegrave Park, Twickenham, TW1 4TJ
Registered in England. Company No. 4042957


Attachment: signature.asc
Description: OpenPGP digital signature

________



Community Meeting Calendar:

Schedule -
Every Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://bluejeans.com/441850968

Gluster-users mailing list
[email protected]
https://lists.gluster.org/mailman/listinfo/gluster-users

Reply via email to