Installed the qa42 version on servers and clients and under load, it
worked as advertised (tho of course more slowly than I would have
liked :)) - removed ~1TB in just under 24 hr (on a DDR-IB connected 4
node set) ~ 40MB/s overall tho there were a huge number of tiny files.
The remove-brick cleared the brick (~1TB), tho with an initial set of
120 failures (what does this mean?)
Rebalanced
Node files size scanned failures status timestamp
-- --- ---
---
pbs2ib 15676 69728541188365886 120in progress May
22 17:33:14
pbs2ib 24844 134323243354449667 120in progress May
22 18:08:56
pbs2ib 37937 166673066147714175 120in progress May
22 19:08:21
pbs2ib 42014 173145657374806556 120in progress May
22 19:33:21
pbs2ib 418842 222883965887 5729324 120in progress May
23 07:15:19
pbs2ib 419148 222907742889 5730903 120in progress May
23 07:16:26
pbs2ib 507375 266212060954 6192573 120in progress May
23 09:48:05
pbs2ib 540201 312712114570 6325234 120in progress May
23 11:15:51
pbs2ib 630332 416533679754 6633562 120in progress May
23 14:24:16
pbs2ib 644156 416745820627 6681746 120in progress May
23 14:45:44
pbs2ib 732989 432162450646 7024331 120 completed May
23 17:26:20
(sorry for any wrapping)
and finally deleted the files:
root@pbs2:~
404 $ df -h
Filesystem Size Used Avail Use% Mounted on
/dev/md08.2T 1010G 7.2T 13% /bducgl - retained brick
/dev/sda1.9T 384M 1.9T 1% /bducgl1 - removed brick
altho it left the directory skeleton (is this a bug or feature?):
root@pbs2:/bducgl1
406 $ ls
aajames aelsadek amentes avuong1 btatevos chiaoyic dbecerra
aamelire aganesan anasr awaring bvillac clarkap dbkeator
aaskariz agkentanhml balakire calvinjs cmarcum dcs
abanaiya agold argardne bgajare casem cmarkega dcuccia
aboessen ahnsh arup biggsjcbatmall courtnem detwiler
abondar aihlerasidhwa binz cesar crex dgorur
abraatz aisenber asuncion bjanakal cestark cschendhealion
abriscoe akathaatenner blind cfalvoculverr dkyu
abuschalai2 atfrank blutescgalasso daliz dmsmith
acohanalamngathinabmmiller cgarner danieldmvuong
acstern allisons athsu bmobashe chadwicr dariusa dphillip
ademirta almquist aveidlab brentmchangd1 dasherdshanthi
etc
And once completed with the 'commit' command, it no longer reports the
brick as part of the volume:
$ gluster volume info gli
Volume Name: gli
Type: Distribute
Volume ID: 76cc5e88-0ac4-42ac-a4a3-31bf2ba611d4
Status: Started
Number of Bricks: 4
Transport-type: tcp,rdma
Bricks:
Brick1: pbs1ib:/bducgl
Brick2: pbs2ib:/bducgl
-- no more pbs2ib:/bducgl
Brick3: pbs3ib:/bducgl
Brick4: pbs4ib:/bducgl
Options Reconfigured:
performance.io-cache: on
performance.quick-read: on
performance.io-thread-count: 64
And no longer reports the removed brick as part of the gluster volume:
$ gluster volume status
Status of volume: gli
Gluster process PortOnline Pid
---
Brick pbs1ib:/bducgl 24016 Y 10770
Brick pbs2ib:/bducgl 24025 Y 1788
Brick pbs3ib:/bducgl 24018 Y 20953
Brick pbs4ib:/bducgl 24009 Y 20948
So this was a big improvement over the previous trial. the only
glitches were the 120 failures (which mean...?) and the directory
skeleton left on the removed brick which may be a feature..?
So it seems to have been fixed in qa42.
thanks!
hjm
On Tuesday 22 May 2012 00:02:02 Amar Tumballi wrote:
pbs2ib 8780091379699182236 2994733 in progress
Hi Harry,
Can you please test once again with 'glusterfs-3.3.0qa42' and
confirm the behavior? This seems like a bug (suspect it to be some
overflow type of bug, not sure yet). Please help us with opening a
bug report, meantime, we will investigate on this issue.
Regards,
Amar
--
Harry Mangalam - Research Computing, OIT, Rm 225 MSTB, UC Irvine
[ZOT 2225] / 92697 Google Voice Multiplexer: (949) 478-4487
415 South Circle View Dr, Irvine, CA, 92697 [shipping]
MSTB Lat/Long: (33.642025,-117.844414) (paste into Google Maps)
--
___
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users