Re: [Gluster-users] volume started but not 'startable', not 'stoppable'

John Mark Walker Mon, 08 Oct 2012 12:13:45 -0700

Harry - did you get Krishnan's note? 

/me still wading through toxic mail server madness.


-JM


----- Original Message -----
> And a few more data points: it appears the reason for the flaky
> gluster fs is
> that not all the servers are running glusterfsd's (see below).  Is
> there a way
> to force the servers to all start the glusterfsd's as they're
> supposed to?
> 
> The mystery rebalance did complete, and seems to have fixed some but
> not all
> problem files - ie:
> 
> > drwx------ 2 spoorkas spoorkas  8211 Jun  2 00:22
> > QPSK_2Tx_2Rx_BH_Method2/
> > ?--------- ? ?        ?            ?            ?
> > QPSK_2Tx_2Rx_ML_Method1
> 
> And the started/not started status has gotten weirder if possble..
> 
> The gluster volume is still being exported to clients, despite
> gluster
> insisting that the volume is not started (servers are pbs[1234]
> result of
> $ gluster volume status
> pbs1:Volume gli is not started
> pbs2:Volume gli is not started
> pbs3:Volume gli is not started
> pbs4:Volume gli is not started
> 
> $ gluster volume info:
> pbs1:Status: Stopped
> pbs2:Status: Started  <- aha!
> pbs3:Status: Started  <- aha!
> pbs4:Status: Started
> 
> This correlates with the glusterfsd status in which only pbs[23] are
> running
> glusterfsd:
> 
> pbs2:root      1799  0.1  0.0 184296 16464 ?        Ssl  13:07   0:06
> /usr/sbin/glusterfsd -s localhost --volfile-id gli.pbs2ib.bducgl -p
> /var/lib/glusterd/vols/gli/run/pbs2ib-bducgl.pid -S
> /tmp/c70b2f910e2fe1bb485b1d76ef63e3db.socket --brick-name /bducgl -l
> /var/log/glusterfs/bricks/bducgl.log --xlator-option
> *-posix.glusterd-
> uuid=26de63bd-c5b7-48ba-b81d-5d77a533d077 --brick-port 24025 24026
> --xlator-
> option gli-server.transport.rdma.listen-port=24026 --xlator-option
> gli-
> server.listen-port=24025
> 
> pbs3:root      1751  0.1  0.0 184168 16468 ?        Ssl  13:07   0:06
> /usr/sbin/glusterfsd -s localhost --volfile-id gli.pbs3ib.bducgl -p
> /var/lib/glusterd/vols/gli/run/pbs3ib-bducgl.pid -S
> /tmp/7096377992feb7f5a7805cafd82c3100.socket --brick-name /bducgl -l
> /var/log/glusterfs/bricks/bducgl.log --xlator-option
> *-posix.glusterd-
> uuid=c79c4084-d6b9-4af9-b975-40dd6aa99b42 --brick-port 24018 24020
> --xlator-
> option gli-server.transport.rdma.listen-port=24020 --xlator-option
> gli-
> server.listen-port=24018
> 
> pbs[14] are only running the glusterd process, not any glusterfsd's
> 
> In previous startups, pbs4 WAS running a glusterfsd, but pbs1 has not
> run one
> since the powerdown AFAIK.
> 
> hjm
> 
> 
> On Saturday, October 06, 2012 10:19:14 PM harry mangalam wrote:
> > ...and should have added:
> > 
> > the rebalance log (the volume claimed to be rebalancing before I
> > shut it
> > down but was idle or wedged at that time) is active as well with
> > about 1
> > warning of a "1 subvolumes down -- not fixing" for every 3
> > informational
> > messages:
> > 
> >  2012-10-06 22:05:35.396650] I
> >  [dht-rebalance.c:1058:gf_defrag_migrate_data]
> > 0-gli-dht: migrate data called on /nlduong/nduong2-t-
> > illiac/workspace/m5_sim/trunk/src/arch/.svn/tmp/wcprops
> > 
> > [2012-10-06 22:05:35.451925] I
> > [dht-layout.c:593:dht_layout_normalize]
> > 0-gli- dht: found anomalies in /nlduong/nduong2-t-
> > illiac/workspace/m5_sim/trunk/src/arch/.svn/wcprops. holes=1
> > overlaps=0
> > 
> > [2012-10-06 22:05:35.451957] W
> > [dht-selfheal.c:875:dht_selfheal_directory]
> > 0- gli-dht: 1 subvolumes down -- not fixing
> > 
> > 
> > previously...
> > 
> > gluster 3.3, running on ubuntu 10.04, was running OK, had to shut
> > down for a
> > power outage.
> > 
> > When I tried to shut it down, it insisted that it was rebalancing,
> > but
> > seeemed wedged - no activity in the logs.
> > 
> > Was able to shut it down tho.
> > 
> > After power was restored, tried to restart the volume but altho the
> > 4 peers
> > claimed to be visible and could ping each other etc:
> > ==============================================
> > Sat Oct 06 21:38:07 [0.81 0.71 0.58]
> >  root@pbs2:/var/log/glusterfs/bricks
> > 567 $ gluster peer status
> > Number of Peers: 3
> > 
> > Hostname: pbs3ib
> > Uuid: c79c4084-d6b9-4af9-b975-40dd6aa99b42
> > State: Peer in Cluster (Connected)
> > 
> > Hostname: 10.255.77.2
> > Uuid: 3fcd023c-9cc9-4d1c-84c4-babfb4492e38
> > State: Peer in Cluster (Connected)
> > 
> > Hostname: pbs4ib
> > Uuid: 2a593581-bf45-446c-8f7c-212c53297803
> > State: Peer in Cluster (Connected)
> > ==============================================
> > 
> > and the volume info seemed to be OK:
> > ==============================================
> > Sat Oct 06 21:36:11 [0.75 0.67 0.56]
> >  root@pbs2:/var/log/glusterfs/bricks
> > 565 $ gluster volume info gli
> > 
> > Volume Name: gli
> > Type: Distribute
> > Volume ID: 76cc5e88-0ac4-42ac-a4a3-31bf2ba611d4
> > Status: Started
> > Number of Bricks: 4
> > Transport-type: tcp,rdma
> > Bricks:
> > Brick1: pbs1ib:/bducgl
> > Brick2: pbs2ib:/bducgl
> > Brick3: pbs3ib:/bducgl
> > Brick4: pbs4ib:/bducgl
> > Options Reconfigured:
> > performance.write-behind-window-size: 1024MB
> > performance.flush-behind: on
> > performance.cache-size: 268435456
> > nfs.disable: on
> > performance.io-thread-count: 64
> > performance.quick-read: on
> > performance.io-cache: on
> > 
> > ==============================================
> > some utilities claim that it was not started, even tho some clients
> > /are
> > using the volume/ (tho there are some file oddities)
> > (from a client):
> > 
> > -rw-r--r-- 1 hmangala hmangala       32935 Jun 23  2010 INSTALL.txt
> > ?--------- ? ?        ?                  ?            ? R-2.15.0
> > drwxr-xr-x 2 hmangala hmangala          18 Sep 10 14:20 bonnie/
> > drwxr-xr-x 2 root     root              18 Sep 10 13:41 bonnie2/
> > 
> > drwx------ 2 spoorkas spoorkas  8211 Jun  2 00:22
> > QPSK_2Tx_2Rx_BH_Method2/
> > ?--------- ? ?        ?            ?            ?
> > QPSK_2Tx_2Rx_ML_Method1
> > drwx------ 2 spoorkas spoorkas  8237 Jun  3 11:22
> > QPSK_2Tx_2Rx_ML_Method2/
> > drwx------ 2 spoorkas spoorkas 12288 Jun  4 01:24 QPSK_2Tx_3Rx_BH/
> > drwx------ 2 spoorkas spoorkas  4232 Jun  2 00:26
> > QPSK_2Tx_3Rx_BH_Method1/
> > drwx------ 2 spoorkas spoorkas  8274 Jun  2 00:34
> > QPSK_2Tx_3Rx_BH_Method2/
> > ?--------- ? ?        ?            ?            ?
> > QPSK_2Tx_3Rx_ML_Method1
> > ?--------- ? ?        ?            ?            ?
> > QPSK_2Tx_3Rx_ML_Method2
> > -rw-r--r-- 1 spoorkas spoorkas     0 Apr 17 14:16
> > simple.sh.e1802207
> > 
> > (These files appear to be intact on the individual bricks tho.)
> > 
> > ==============================================
> > Sat Oct 06 21:38:18 [0.76 0.71 0.58]
> >  root@pbs2:/var/log/glusterfs/bricks
> > 568 $ gluster volume status
> > Volume gli is not started
> > ==============================================
> > 
> > and since that is the case, other utilities also claim this:
> > 
> > ==============================================
> > Sat Oct 06 21:41:25 [1.04 0.84 0.65]
> >  root@pbs2:/var/log/glusterfs/bricks
> > 571 $ gluster volume status gli detail
> > Volume gli is not started
> > ==============================================
> > 
> > And since they think it's not started, I can't stop it.
> > 
> > How is this resolvable?
> --
> Harry Mangalam - Research Computing, OIT, Rm 225 MSTB, UC Irvine
> [m/c 2225] / 92697 Google Voice Multiplexer: (949) 478-4487
> 415 South Circle View Dr, Irvine, CA, 92697 [shipping]
> MSTB Lat/Long: (33.642025,-117.844414) (paste into Google Maps)
> --
> Passive-Aggressive Supporter of the The Canada Party:
>   <http://www.americabutbetter.com/>
> 
> _______________________________________________
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://supercolony.gluster.org/mailman/listinfo/gluster-users
> 
_______________________________________________
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] volume started but not 'startable', not 'stoppable'

Reply via email to