Harry - did you get Krishnan's note? /me still wading through toxic mail server madness.
-JM ----- Original Message ----- > And a few more data points: it appears the reason for the flaky > gluster fs is > that not all the servers are running glusterfsd's (see below). Is > there a way > to force the servers to all start the glusterfsd's as they're > supposed to? > > The mystery rebalance did complete, and seems to have fixed some but > not all > problem files - ie: > > > drwx------ 2 spoorkas spoorkas 8211 Jun 2 00:22 > > QPSK_2Tx_2Rx_BH_Method2/ > > ?--------- ? ? ? ? ? > > QPSK_2Tx_2Rx_ML_Method1 > > And the started/not started status has gotten weirder if possble.. > > The gluster volume is still being exported to clients, despite > gluster > insisting that the volume is not started (servers are pbs[1234] > result of > $ gluster volume status > pbs1:Volume gli is not started > pbs2:Volume gli is not started > pbs3:Volume gli is not started > pbs4:Volume gli is not started > > $ gluster volume info: > pbs1:Status: Stopped > pbs2:Status: Started <- aha! > pbs3:Status: Started <- aha! > pbs4:Status: Started > > This correlates with the glusterfsd status in which only pbs[23] are > running > glusterfsd: > > pbs2:root 1799 0.1 0.0 184296 16464 ? Ssl 13:07 0:06 > /usr/sbin/glusterfsd -s localhost --volfile-id gli.pbs2ib.bducgl -p > /var/lib/glusterd/vols/gli/run/pbs2ib-bducgl.pid -S > /tmp/c70b2f910e2fe1bb485b1d76ef63e3db.socket --brick-name /bducgl -l > /var/log/glusterfs/bricks/bducgl.log --xlator-option > *-posix.glusterd- > uuid=26de63bd-c5b7-48ba-b81d-5d77a533d077 --brick-port 24025 24026 > --xlator- > option gli-server.transport.rdma.listen-port=24026 --xlator-option > gli- > server.listen-port=24025 > > pbs3:root 1751 0.1 0.0 184168 16468 ? Ssl 13:07 0:06 > /usr/sbin/glusterfsd -s localhost --volfile-id gli.pbs3ib.bducgl -p > /var/lib/glusterd/vols/gli/run/pbs3ib-bducgl.pid -S > /tmp/7096377992feb7f5a7805cafd82c3100.socket --brick-name /bducgl -l > /var/log/glusterfs/bricks/bducgl.log --xlator-option > *-posix.glusterd- > uuid=c79c4084-d6b9-4af9-b975-40dd6aa99b42 --brick-port 24018 24020 > --xlator- > option gli-server.transport.rdma.listen-port=24020 --xlator-option > gli- > server.listen-port=24018 > > pbs[14] are only running the glusterd process, not any glusterfsd's > > In previous startups, pbs4 WAS running a glusterfsd, but pbs1 has not > run one > since the powerdown AFAIK. > > hjm > > > On Saturday, October 06, 2012 10:19:14 PM harry mangalam wrote: > > ...and should have added: > > > > the rebalance log (the volume claimed to be rebalancing before I > > shut it > > down but was idle or wedged at that time) is active as well with > > about 1 > > warning of a "1 subvolumes down -- not fixing" for every 3 > > informational > > messages: > > > > 2012-10-06 22:05:35.396650] I > > [dht-rebalance.c:1058:gf_defrag_migrate_data] > > 0-gli-dht: migrate data called on /nlduong/nduong2-t- > > illiac/workspace/m5_sim/trunk/src/arch/.svn/tmp/wcprops > > > > [2012-10-06 22:05:35.451925] I > > [dht-layout.c:593:dht_layout_normalize] > > 0-gli- dht: found anomalies in /nlduong/nduong2-t- > > illiac/workspace/m5_sim/trunk/src/arch/.svn/wcprops. holes=1 > > overlaps=0 > > > > [2012-10-06 22:05:35.451957] W > > [dht-selfheal.c:875:dht_selfheal_directory] > > 0- gli-dht: 1 subvolumes down -- not fixing > > > > > > previously... > > > > gluster 3.3, running on ubuntu 10.04, was running OK, had to shut > > down for a > > power outage. > > > > When I tried to shut it down, it insisted that it was rebalancing, > > but > > seeemed wedged - no activity in the logs. > > > > Was able to shut it down tho. > > > > After power was restored, tried to restart the volume but altho the > > 4 peers > > claimed to be visible and could ping each other etc: > > ============================================== > > Sat Oct 06 21:38:07 [0.81 0.71 0.58] > > root@pbs2:/var/log/glusterfs/bricks > > 567 $ gluster peer status > > Number of Peers: 3 > > > > Hostname: pbs3ib > > Uuid: c79c4084-d6b9-4af9-b975-40dd6aa99b42 > > State: Peer in Cluster (Connected) > > > > Hostname: 10.255.77.2 > > Uuid: 3fcd023c-9cc9-4d1c-84c4-babfb4492e38 > > State: Peer in Cluster (Connected) > > > > Hostname: pbs4ib > > Uuid: 2a593581-bf45-446c-8f7c-212c53297803 > > State: Peer in Cluster (Connected) > > ============================================== > > > > and the volume info seemed to be OK: > > ============================================== > > Sat Oct 06 21:36:11 [0.75 0.67 0.56] > > root@pbs2:/var/log/glusterfs/bricks > > 565 $ gluster volume info gli > > > > Volume Name: gli > > Type: Distribute > > Volume ID: 76cc5e88-0ac4-42ac-a4a3-31bf2ba611d4 > > Status: Started > > Number of Bricks: 4 > > Transport-type: tcp,rdma > > Bricks: > > Brick1: pbs1ib:/bducgl > > Brick2: pbs2ib:/bducgl > > Brick3: pbs3ib:/bducgl > > Brick4: pbs4ib:/bducgl > > Options Reconfigured: > > performance.write-behind-window-size: 1024MB > > performance.flush-behind: on > > performance.cache-size: 268435456 > > nfs.disable: on > > performance.io-thread-count: 64 > > performance.quick-read: on > > performance.io-cache: on > > > > ============================================== > > some utilities claim that it was not started, even tho some clients > > /are > > using the volume/ (tho there are some file oddities) > > (from a client): > > > > -rw-r--r-- 1 hmangala hmangala 32935 Jun 23 2010 INSTALL.txt > > ?--------- ? ? ? ? ? R-2.15.0 > > drwxr-xr-x 2 hmangala hmangala 18 Sep 10 14:20 bonnie/ > > drwxr-xr-x 2 root root 18 Sep 10 13:41 bonnie2/ > > > > drwx------ 2 spoorkas spoorkas 8211 Jun 2 00:22 > > QPSK_2Tx_2Rx_BH_Method2/ > > ?--------- ? ? ? ? ? > > QPSK_2Tx_2Rx_ML_Method1 > > drwx------ 2 spoorkas spoorkas 8237 Jun 3 11:22 > > QPSK_2Tx_2Rx_ML_Method2/ > > drwx------ 2 spoorkas spoorkas 12288 Jun 4 01:24 QPSK_2Tx_3Rx_BH/ > > drwx------ 2 spoorkas spoorkas 4232 Jun 2 00:26 > > QPSK_2Tx_3Rx_BH_Method1/ > > drwx------ 2 spoorkas spoorkas 8274 Jun 2 00:34 > > QPSK_2Tx_3Rx_BH_Method2/ > > ?--------- ? ? ? ? ? > > QPSK_2Tx_3Rx_ML_Method1 > > ?--------- ? ? ? ? ? > > QPSK_2Tx_3Rx_ML_Method2 > > -rw-r--r-- 1 spoorkas spoorkas 0 Apr 17 14:16 > > simple.sh.e1802207 > > > > (These files appear to be intact on the individual bricks tho.) > > > > ============================================== > > Sat Oct 06 21:38:18 [0.76 0.71 0.58] > > root@pbs2:/var/log/glusterfs/bricks > > 568 $ gluster volume status > > Volume gli is not started > > ============================================== > > > > and since that is the case, other utilities also claim this: > > > > ============================================== > > Sat Oct 06 21:41:25 [1.04 0.84 0.65] > > root@pbs2:/var/log/glusterfs/bricks > > 571 $ gluster volume status gli detail > > Volume gli is not started > > ============================================== > > > > And since they think it's not started, I can't stop it. > > > > How is this resolvable? > -- > Harry Mangalam - Research Computing, OIT, Rm 225 MSTB, UC Irvine > [m/c 2225] / 92697 Google Voice Multiplexer: (949) 478-4487 > 415 South Circle View Dr, Irvine, CA, 92697 [shipping] > MSTB Lat/Long: (33.642025,-117.844414) (paste into Google Maps) > -- > Passive-Aggressive Supporter of the The Canada Party: > <http://www.americabutbetter.com/> > > _______________________________________________ > Gluster-users mailing list > Gluster-users@gluster.org > http://supercolony.gluster.org/mailman/listinfo/gluster-users > _______________________________________________ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users