Re: [Gluster-users] Unable to mount gluster volume via mount -t nfs
Alexey, Can you try with $ mount -vv -t nfs -overs=3 :/ On Fri, Aug 16, 2013 at 9:17 PM, Alexey Shalin wrote: > > root@ispcp:~# mount -t nfs 192.168.15.165:/storage /storage > mount.nfs: Unknown error 521 > root@ispcp:~# > > [2013-08-17 04:09:46.444600] E [nfs3.c:306:__nfs3_get_volume_id] > (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.4.0/xlator/nfs/server.so(nfs3_getattr+0x18c) > [0x7f33126acf5c] > (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.4.0/xlator/nfs/server.so(nfs3_getattr_reply+0x20) > [0x7f33126ac930] > (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.4.0/xlator/nfs/server.so(nfs3_request_xlator_deviceid+0x51) > [0x7f33126ac8d1]))) 0-nfs-nfsv3: invalid argument: xl > [2013-08-17 04:09:55.859975] E [nfs3.c:839:nfs3_getattr] 0-nfs-nfsv3: Bad > Handle > [2013-08-17 04:09:55.860027] W [nfs3-helpers.c:3389:nfs3_log_common_res] > 0-nfs-nfsv3: XID: 9d86c988, GETATTR: NFS: 10001(Illegal NFS file handle), > POSIX: 14(Bad address) > > How to mount it ? > > > --- > Старший Системный Администратор > Алексей Шалин > ОсОО "Хостер kg" - http://www.hoster.kg > ул. Ахунбаева 123 (здание БГТС) > h...@hoster.kg > > ___ > Gluster-users mailing list > Gluster-users@gluster.org > http://supercolony.gluster.org/mailman/listinfo/gluster-users -- *Religious confuse piety with mere ritual, the virtuous confuse regulation with outcomes* ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users
[Gluster-users] Unable to mount gluster volume via mount -t nfs
root@ispcp:~# mount -t nfs 192.168.15.165:/storage /storage mount.nfs: Unknown error 521 root@ispcp:~# [2013-08-17 04:09:46.444600] E [nfs3.c:306:__nfs3_get_volume_id] (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.4.0/xlator/nfs/server.so(nfs3_getattr+0x18c) [0x7f33126acf5c] (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.4.0/xlator/nfs/server.so(nfs3_getattr_reply+0x20) [0x7f33126ac930] (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.4.0/xlator/nfs/server.so(nfs3_request_xlator_deviceid+0x51) [0x7f33126ac8d1]))) 0-nfs-nfsv3: invalid argument: xl [2013-08-17 04:09:55.859975] E [nfs3.c:839:nfs3_getattr] 0-nfs-nfsv3: Bad Handle [2013-08-17 04:09:55.860027] W [nfs3-helpers.c:3389:nfs3_log_common_res] 0-nfs-nfsv3: XID: 9d86c988, GETATTR: NFS: 10001(Illegal NFS file handle), POSIX: 14(Bad address) How to mount it ? --- Старший Системный Администратор Алексей Шалин ОсОО "Хостер kg" - http://www.hoster.kg ул. Ахунбаева 123 (здание БГТС) h...@hoster.kg ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Slow on writing
of example same client wrote file to nfs share root@ispcp:/mnt# dd if=/dev/zero of=./bigfile${i} count=1024 bs=10k 1024+0 records in 1024+0 records out 10485760 bytes (10 MB) copied, 0.133489 s, 78.6 MB/s much faster :( cat /etc/mtab fusectl /sys/fs/fuse/connections fusectl rw,relatime 0 0 nas.storage:/storage /storage fuse.glusterfs rw,relatime,user_id=0,group_id=0,default_permissions,allow_other,max_read=131072 0 0 how to mount /storage on client with noatime, nodiratime ? on peers: 1) RAID 10 (hardware) /dev/sda1 /storage ext4 rw,noatime,nodiratime,errors=remount-ro,user_xattr,noacl,barrier=1,data=ordered 0 0 2) RAID 5 (software) /dev/md5 /storage ext4 rw,noatime,nodiratime,noacl 0 0 --- Старший Системный Администратор Алексей Шалин ОсОО "Хостер kg" - http://www.hoster.kg ул. Ахунбаева 123 (здание БГТС) h...@hoster.kg ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users
[Gluster-users] Slow on writing
Hello, guys I wrote small script : #!/bin/bash for i in {1..1000}; do size=$((RANDOM%5+1)) dd if=/dev/zero of=/storage/test/bigfile${i} count=1024 bs=${size}k done This script creates files with different size on volume here is output: 2097152 bytes (2.1 MB) copied, 0.120632 s, 17.4 MB/s 1024+0 records in 1024+0 records out 1048576 bytes (1.0 MB) copied, 0.14548 s, 7.2 MB/s 1024+0 records in 1024+0 records out 2097152 bytes (2.1 MB) copied, 0.125532 s, 16.7 MB/s 1024+0 records in 1024+0 records out 3145728 bytes (3.1 MB) copied, 0.144503 s, 21.8 MB/s 1024+0 records in 1024+0 records out 1048576 bytes (1.0 MB) copied, 0.0994717 s, 10.5 MB/s 1024+0 records in 1024+0 records out 4194304 bytes (4.2 MB) copied, 0.142613 s, 29.4 MB/s 1024+0 records in 1024+0 records out 4194304 bytes (4.2 MB) copied, 0.103823 s, 40.4 MB/s 1024+0 records in 1024+0 records out 1048576 bytes (1.0 MB) copied, 0.138864 s, 7.6 MB/s 1024+0 records in 1024+0 records out 3145728 bytes (3.1 MB) copied, 0.102374 s, 30.7 MB/s 1024+0 records in 1024+0 records out 3145728 bytes (3.1 MB) copied, 0.166409 s, 18.9 MB/s 1024+0 records in 1024+0 records out 1048576 bytes (1.0 MB) copied, 0.169923 s, 6.2 MB/s 1024+0 records in 1024+0 records out 2097152 bytes (2.1 MB) copied, 0.142017 s, 14.8 MB/s 1024+0 records in 1024+0 records out 2097152 bytes (2.1 MB) copied, 0.159753 s, 13.1 MB/s 1024+0 records in 1024+0 records out 3145728 bytes (3.1 MB) copied, 0.146142 s, 21.5 MB/s ^C180+0 records in 180+0 records out 737280 bytes (737 kB) copied, 0.0306554 s, 24.1 MB/s as you can see - the speed very slow :( I have configured ethernet networks on bricks - as bond (but only on ope peer) Output of iperf iperf -c 192.168.15.165 Client connecting to 192.168.15.165, TCP port 5001 TCP window size: 640 KByte (default) [ 3] local 192.168.15.159 port 37095 connected with 192.168.15.165 port 5001 [ ID] Interval Transfer Bandwidth [ 3] 0.0-10.0 sec 1.03 GBytes 880 Mbits/sec I will run this command dd if=/dev/zero of=/storage/test/bigfile${i} count=1024 bs=10k on peer.. speed is very high: root@nas:~# dd if=/dev/zero of=/storage/test/bigfile${i} count=1024 bs=10k 1024+0 records in 1024+0 records out 10485760 bytes (10 MB) copied, 0.0114412 s, 916 MB/s root@nas:~# my volume config: root@nas:~# gluster volume info Volume Name: storage Type: Replicate Volume ID: 8abee05f-9aa1-41d7-9f72-363c6fd8fc74 Status: Started Number of Bricks: 1 x 2 = 2 Transport-type: tcp Bricks: Brick1: nas.storage:/storage Brick2: back.storage:/storage Options Reconfigured: performance.cache-size: 256MB my installation is default, any one have good guide how to get good performance ? Thank you --- Старший Системный Администратор Алексей Шалин ОсОО "Хостер kg" - http://www.hoster.kg ул. Ахунбаева 123 (здание БГТС) h...@hoster.kg ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] How do I know with what peer (brick) is client working now ?
Thank you, :) --- Старший Системный Администратор Алексей Шалин ОсОО "Хостер kg" - http://www.hoster.kg ул. Ахунбаева 123 (здание БГТС) h...@hoster.kg ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users
[Gluster-users] standby-server
I am looking at glusterfs for a HA application w/o local tech support (don't ask, third-world country, techs are hard to find). My current plan is to do a replica-4 + hot spare server. Of the four in-use bricks, two will be on "servers" and the other two will be on a "client" machine and a "hot-backup" client machine. No striping, all content on each local machine, each machine using its own disk for all reading. Part of my plan is to have a cold-spare server in the rack, not powered on. (This server will also be a cold spare for another server). I am wondering if this would be a viable way to set up this configuration: Set up glusterfs as replica-5. 1. server1 2. server2 3. client 4. client-standby 5. server-spare Initialize and set up glusterfs with all 5 bricks in the system (no file content). Install system at client site, and test with all 5 bricks in system. Shut down spare server. Once a month, power up spare server, run full heal, shut down. Power up server-spare for any software updates. If server1 or server2 dies (or needs maintenance), tell them to power up server-spare, and let it heal. It seems to me that this would be easier than setting up a replica-4 system and then jumping through all the hoops to replace a server from scratch. Comments, reactions, pot-shots welcome. Ted Miller -- "He is no fool who gives what he cannot keep, to gain what he cannot lose." - - Jim Elliot For more information about Jim Elliot and his unusual life, see http://www.christianliteratureandliving.com/march2003/carolyn.html. Ted Miller Design Engineer HCJB Global Technology Center 2830 South 17th St Elkhart, IN 46517 574--970-4272 my desk 574--970-4252 receptionist ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Replacing a failed brick
Ok, it appears that the following worked. Thanks for the nudge in the right direction: volume replace-brick test-a 10.250.4.65:/localmnt/g2lv5 10.250.4.65:/localmnt/g2lv6 commit force then volume heal test-a full and monitor the progress with volume heal test-a info However that does not solve my problem for what to do when a brick is corrupted somehow, if I don't have enough space to first heal it and then replace it. That did get me thinking though, "what if I replace the brick, forgoe the heal, replace it again and then do a heal?" That seems to work. So if I lose one brick, here is the process that I used to recover it: 1) create a directory that is just to temporary trick gluster and allow us to maintain the correct replica count: mkdir /localmnt/garbage 2) replace the dead brick with our garbage directory: volume replace-brick test-a 10.250.4.65:/localmnt/g2lv5 10.250.4.65:/localmnt/garbage commit force 3) fix our dead brick using whatever process is required. in this case, for testing, we had to remove some gluster bits or it throws the "already part of a volume error": setfattr -x trusted.glusterfs.volume-id /localmnt/g2lv5 setfattr -x trusted.gfid /localmnt/g2lv5 4) now that our dead brick is fixed, swap it for the garbage/temporary brick: volume replace-brick test-a 10.250.4.65:/localmnt/garbage 10.250.4.65:/localmnt/g2lv5 commit force 5) now all that we have to do is let gluster heal the volume: volume heal test-a full Is there anything wrong with this procedure? Cheers, Dave On Fri, Aug 16, 2013 at 11:03 AM, David Gibbons wrote: > Ravi, > > Thanks for the tips. When I run a volume status: > gluster> volume status test-a > Status of volume: test-a > Gluster process PortOnline Pid > > -- > Brick 10.250.4.63:/localmnt/g1lv2 49152 Y > 8072 > Brick 10.250.4.65:/localmnt/g2lv2 49152 Y > 3403 > Brick 10.250.4.63:/localmnt/g1lv3 49153 Y > 8081 > Brick 10.250.4.65:/localmnt/g2lv3 49153 Y > 3410 > Brick 10.250.4.63:/localmnt/g1lv4 49154 Y > 8090 > Brick 10.250.4.65:/localmnt/g2lv4 49154 Y > 3417 > Brick 10.250.4.63:/localmnt/g1lv5 49155 Y > 8099 > Brick 10.250.4.65:/localmnt/g2lv5 N/A N > N/A > Brick 10.250.4.63:/localmnt/g1lv1 49156 Y > 8576 > Brick 10.250.4.65:/localmnt/g2lv1 49156 Y > 3431 > NFS Server on localhost 2049Y > 3440 > Self-heal Daemon on localhost N/A Y > 3445 > NFS Server on 10.250.4.63 2049Y > 8586 > Self-heal Daemon on 10.250.4.63 N/A Y > 8593 > > There are no active volume tasks > -- > > Attempting to start the volume results in: > gluster> volume start test-a force > volume start: test-a: failed: Failed to get extended attribute > trusted.glusterfs.volume-id for brick dir /localmnt/g2lv5. Reason : No data > available > -- > > It doesn't like when I try to fire off a heal either: > gluster> volume heal test-a > Launching Heal operation on volume test-a has been unsuccessful > -- > > Although that did lead me to this: > gluster> volume heal test-a info > Gathering Heal info on volume test-a has been successful > > Brick 10.250.4.63:/localmnt/g1lv2 > Number of entries: 0 > > Brick 10.250.4.65:/localmnt/g2lv2 > Number of entries: 0 > > Brick 10.250.4.63:/localmnt/g1lv3 > Number of entries: 0 > > Brick 10.250.4.65:/localmnt/g2lv3 > Number of entries: 0 > > Brick 10.250.4.63:/localmnt/g1lv4 > Number of entries: 0 > > Brick 10.250.4.65:/localmnt/g2lv4 > Number of entries: 0 > > Brick 10.250.4.63:/localmnt/g1lv5 > Number of entries: 0 > > Brick 10.250.4.65:/localmnt/g2lv5 > Status: Brick is Not connected > Number of entries: 0 > > Brick 10.250.4.63:/localmnt/g1lv1 > Number of entries: 0 > > Brick 10.250.4.65:/localmnt/g2lv1 > Number of entries: 0 > -- > > So perhaps I need to re-connect the brick? > > Cheers, > Dave > > > > On Fri, Aug 16, 2013 at 12:43 AM, Ravishankar N wrote: > >> On 08/15/2013 10:05 PM, David Gibbons wrote: >> >> Hi There, >> >> I'm currently testing Gluster for possible production use. I haven't >> been able to find the answer to this question in the forum arch or in the >> public docs. It's possible that I don't know which keywords to search for. >> >> Here's the question (more details below): let's say that one of my >> bricks "fails" -- *not* a whole node failure but a single brick failure >> within the node. How do I replace a single brick on a node and force a sync >> from one of the replicas? >> >> I have two nodes with 5 bricks each: >> gluster> volume info test-a >> >> Volume Name: test-a >> Type: Distributed-Replicate >> Volume ID: e8957773-dd36-44ae-b80a-01e22c7
Re: [Gluster-users] Replacing a failed brick
This tells you that this brick isn't running. That's probably because it was formatted and lost it's volume-id extended attribute. See http://www.joejulian.name/blog/replacing-a-brick-on-glusterfs-340/ Once that's fixed, on 10.250.4.65: gluster volume start test-a force On 08/16/2013 08:03 AM, David Gibbons wrote: Brick 10.250.4.65:/localmnt/g2lv5 N/A N N/A ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users
[Gluster-users] Problems with data integrity between client, volume, and replicated bricks
Hi gurus, I've been banging my head against a test volume for about a month and a half now, and I'm having some serious problems figuring out what's going on. I'm running on Ubuntu 12.04 amd64 I'm running Gluster 3.4.0final-ubuntu1~precise1 My cluster is made up of four machines, each machine has two 4TB HDDs (ext4), with replication My test client has an HDD with 913GB of test data in 156,544 files Forgive the weird path names, but I wanted to use a setup with something akin to the real data that I'd be using, and in production there's going to be weird path names aplenty. I include the path names here just in case someone sees something obvious, like "You compared the wrong files" or "You can't use path names like that with gluster!" But for your reading pleasure, I also list output below with the path names removed so that you can clearly see similarities or differences from client to volume to brick. Disclaimer: I have done some outage tests with this volume in the past by unplugging a drive, plugging it back in, and then doing a full heal. The volume currently shows 1023 failed heals on bkupc1-b:/export/b/ (brick #2). But that was before I started this particular test. For this test all the old files and directories had been deleted from the volume beforehand so that I could start with an empty volume. And no outages -- simulated or otherwise -- have taken place for this test. (I have confirmed that every file listed by a gluster as heal-failed no longer exists. And yet, even though I have deleted the volume's contents, the failed heals count remains.) I thought this might be important to disclose. If so desired I can repeat the test after deleting the volume and recreating it from scratch. However, once in production, doing this would be highly unfeasible as a solution to a problem. So if this is the cause of my angst, then I'd rather know how to fix things as they sit now as opposed to scrapping the volume and starting anew. Here's a detailed description of my latest test: 1) The client mounts the volume with fuse.glusterfs (rw,default_permissions,allow_other,max_read=131072) as /data/bkupc1 2) I perform an rsync of the data to the volume. I have the whole test scripted and I'll list the juicy bits: cd /export/d/eraseme/ if [ -d /data/bkupc1/BACKUPS/ ]; then mv /data/bkupc1/BACKUPS /data/bkupc1/BACKUPS.old ( /bin/rm -fr /data/bkupc1/BACKUPS.old & ) fi mkdir /data/bkupc1/BACKUPS rsync \ -a \ -v \ --delete \ --delete-excluded \ --force \ --ignore-errors \ --one-file-system \ --progress \ --stats \ --exclude '/tmp' \ --exclude '/var/tmp' \ --exclude '**core' \ --partial \ --inplace \ ./ \ /data/bkupc1/BACKUPS/ NOTE: If the directory /data/bkupc1/BACKUPS/ exists from a previous run of this test then I move it, and then delete it in the background while rsync is running. Output: ... Number of files: 156554 Number of files transferred: 147980 Total file size: 886124490325 bytes Total transferred file size: 886124487184 bytes Literal data: 886124487184 bytes Matched data: 0 bytes File list size: 20189800 File list generation time: 0.001 seconds File list transfer time: 0.000 seconds Total bytes sent: 886258975318 Total bytes received: 2845881 sent 886258975318 bytes received 2845881 bytes 45981053.79 bytes/sec total size is 886124490325 speedup is 1.00 3) My client has md5 checksums for it's files, so next my script checks the files on the volume: cd /data/bkupc1/BACKUPS/ md5sum -c --quiet md5sums data/884b9a38-0443-11e3-b8fb-f46d04e15793/884a7040-0443-11e3-b8fb-f46d04e15793/87fdc790-0443-11e3-b8fb-f46d04e15793/87fb6cfc-0443-11e3-b8fb-f46d04e15793/gF-Eqm1GHw7NPNQOQoeJLNNlfL5ydR0FzVZDHdK9OShRHknwgkqCG0M1yWnryQ,cdfk6Ysdk99eoncEHxnDrEQZF: FAILED md5sum: WARNING: 1 computed checksum did NOT match a) Taking a closer look at this file: On the client: root@client:/export/d/eraseme# ls -ald data/884b9a38-0443-11e3-b8fb-f46d04e15793/884a7040-0443-11e3-b8fb-f46d04e15793/87fdc790-0443-11e3-b8fb-f46d04e15793/87fb6cfc-0443-11e3-b8fb-f46d04e15793/gF-Eqm1GHw7NPNQOQoeJLNNlfL5ydR0FzVZDHdK9OShRHknwgkqCG0M1yWnryQ,cdfk6Ysdk99eoncEHxnDrEQZF -rw-r--r-- 1 peek peek 646041328 Nov 13 2009 data/884b9a38-0443-11e3-b8fb-f46d04e15793/884a7040-0443-11e3-b8fb-f46d04e15793/87fdc790-0443-11e3-b8fb-f46d04e15793/87fb6cfc-0443-11e3-b8fb-f46d04e15793/gF-Eqm1GHw7NPNQOQoeJLNNlfL5ydR0FzVZDHdK9OShRHknwgkqCG0M1yWnryQ,cdfk6Ysdk99eoncEHxnDrEQZF On the volume: root@bkupc1-a:/data/bkupc1/BACKUPS# ls -ald data/884b9a38-0443-11e3-b8fb-f46d04e15793/884a7040-0443-11e3-b8fb-f46d04e15793/87fdc790-0443-11e3-b8fb-f46d04e15793/87fb6cfc-0443-11e3-b8fb-f46d04e15793/gF-Eqm1GHw7NPNQOQoeJLNNlfL5ydR0FzVZDHdK9OShRHknwgkqCG0M1yWnryQ,cdfk6Ysdk99eoncEHxnDrEQZF -rw-r--r-- 1 peek peek 646041328 Nov 13 2009 data/884b9a38-0443-11e3-b8fb-f46d04e15793/884a7040-0443-11e3-b8fb-f46d04e15793/87fdc790-0443-11e3-b8fb-f46d04e15793/87fb6cfc-0443-11e3-b8fb-f46d04e15793/gF-Eqm1
Re: [Gluster-users] Replacing a failed brick
Ravi, Thanks for the tips. When I run a volume status: gluster> volume status test-a Status of volume: test-a Gluster process PortOnline Pid -- Brick 10.250.4.63:/localmnt/g1lv2 49152 Y 8072 Brick 10.250.4.65:/localmnt/g2lv2 49152 Y 3403 Brick 10.250.4.63:/localmnt/g1lv3 49153 Y 8081 Brick 10.250.4.65:/localmnt/g2lv3 49153 Y 3410 Brick 10.250.4.63:/localmnt/g1lv4 49154 Y 8090 Brick 10.250.4.65:/localmnt/g2lv4 49154 Y 3417 Brick 10.250.4.63:/localmnt/g1lv5 49155 Y 8099 Brick 10.250.4.65:/localmnt/g2lv5 N/A N N/A Brick 10.250.4.63:/localmnt/g1lv1 49156 Y 8576 Brick 10.250.4.65:/localmnt/g2lv1 49156 Y 3431 NFS Server on localhost 2049Y 3440 Self-heal Daemon on localhost N/A Y 3445 NFS Server on 10.250.4.63 2049Y 8586 Self-heal Daemon on 10.250.4.63 N/A Y 8593 There are no active volume tasks -- Attempting to start the volume results in: gluster> volume start test-a force volume start: test-a: failed: Failed to get extended attribute trusted.glusterfs.volume-id for brick dir /localmnt/g2lv5. Reason : No data available -- It doesn't like when I try to fire off a heal either: gluster> volume heal test-a Launching Heal operation on volume test-a has been unsuccessful -- Although that did lead me to this: gluster> volume heal test-a info Gathering Heal info on volume test-a has been successful Brick 10.250.4.63:/localmnt/g1lv2 Number of entries: 0 Brick 10.250.4.65:/localmnt/g2lv2 Number of entries: 0 Brick 10.250.4.63:/localmnt/g1lv3 Number of entries: 0 Brick 10.250.4.65:/localmnt/g2lv3 Number of entries: 0 Brick 10.250.4.63:/localmnt/g1lv4 Number of entries: 0 Brick 10.250.4.65:/localmnt/g2lv4 Number of entries: 0 Brick 10.250.4.63:/localmnt/g1lv5 Number of entries: 0 Brick 10.250.4.65:/localmnt/g2lv5 Status: Brick is Not connected Number of entries: 0 Brick 10.250.4.63:/localmnt/g1lv1 Number of entries: 0 Brick 10.250.4.65:/localmnt/g2lv1 Number of entries: 0 -- So perhaps I need to re-connect the brick? Cheers, Dave On Fri, Aug 16, 2013 at 12:43 AM, Ravishankar N wrote: > On 08/15/2013 10:05 PM, David Gibbons wrote: > > Hi There, > > I'm currently testing Gluster for possible production use. I haven't > been able to find the answer to this question in the forum arch or in the > public docs. It's possible that I don't know which keywords to search for. > > Here's the question (more details below): let's say that one of my > bricks "fails" -- *not* a whole node failure but a single brick failure > within the node. How do I replace a single brick on a node and force a sync > from one of the replicas? > > I have two nodes with 5 bricks each: > gluster> volume info test-a > > Volume Name: test-a > Type: Distributed-Replicate > Volume ID: e8957773-dd36-44ae-b80a-01e22c78a8b4 > Status: Started > Number of Bricks: 5 x 2 = 10 > Transport-type: tcp > Bricks: > Brick1: 10.250.4.63:/localmnt/g1lv2 > Brick2: 10.250.4.65:/localmnt/g2lv2 > Brick3: 10.250.4.63:/localmnt/g1lv3 > Brick4: 10.250.4.65:/localmnt/g2lv3 > Brick5: 10.250.4.63:/localmnt/g1lv4 > Brick6: 10.250.4.65:/localmnt/g2lv4 > Brick7: 10.250.4.63:/localmnt/g1lv5 > Brick8: 10.250.4.65:/localmnt/g2lv5 > Brick9: 10.250.4.63:/localmnt/g1lv1 > Brick10: 10.250.4.65:/localmnt/g2lv1 > > I formatted 10.250.4.65:/localmnt/g2lv5 (to simulate a "failure"). What > is the next step? I have tried various combinations of removing and > re-adding the brick, replacing the brick, etc. I read in a previous message > to this list that replace-brick was for planned changes which makes sense, > so that's probably not my next step. > > You must first check if the 'formatted' brick 10.250.4.65:/localmnt/g2lv5 > is online using the `gluster volume status` command. If not start the > volume using `gluster volume start force`. You can then use the > gluster volume heal command which would copy the data from the other > replica brick into your formatted brick. > Hope this helps. > -Ravi > > > Cheers, > Dave > > > ___ > Gluster-users mailing > listGluster-users@gluster.orghttp://supercolony.gluster.org/mailman/listinfo/gluster-users > > > ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] How do I know with what peer (brick) is client working now ?
The client connects with all the bricks in the volume. Alexey Shalin wrote: >Hello >How do I know with what peer (brick) is client working now ? > > >Thank you > >--- >Старший Системный Администратор >Алексей Шалин >ОсОО "Хостер kg" - http://www.hoster.kg >ул. Ахунбаева 123 (здание БГТС) >h...@hoster.kg > >___ >Gluster-users mailing list >Gluster-users@gluster.org >http://supercolony.gluster.org/mailman/listinfo/gluster-users -- Sent from my Android device with K-9 Mail. Please excuse my brevity.___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Need help understanding the glusterd issue
Thanks Vijay & Prashant. I will look the integrity of the state file you mentioned. But It was failing to resolve the bricks's IP address which I had got on my laptop after logging from home. I corrected that and could work now, although I have some queries around ? IMO, it should not stop "glusterd" from starting ? and may allow to leave not-reachable volumes in "STOPPED" state as it supports "start /stop" states. I may be overlooking the bigger picture/usecase here. please correct. Thanks, Chetan Risbud. - Original Message - From: "Vijay Bellur" To: "Chetan Risbud" Cc: "gluster-users Discussion List" Sent: Friday, August 16, 2013 11:20:37 AM Subject: Re: Need help understanding the glusterd issue On 08/16/2013 10:32 AM, Chetan Risbud wrote: > HI All, > > I am init related failures while restarting glusterd. I did restart a > glusterd as I had changed the ring files for some other swift related > activity after adding a new volume. Is there any workaround for this problem? CC'ing gluster-users as this is the relevant mailer for this. > /var/log/glusterfs/etc-glusterfs-glusterd.vol.log > > > > > [2013-08-16 04:55:24.399286] I [glusterfsd.c:1910:main] 0-/usr/sbin/glusterd: > Started running /usr/sbin/glusterd version 3.4.0 (/usr/sbin/glusterd -p > /run/glusterd.pid) > [2013-08-16 04:55:24.404097] I [glusterd.c:962:init] 0-management: Using > /var/lib/glusterd as working directory > [2013-08-16 04:55:24.407802] I [socket.c:3480:socket_init] > 0-socket.management: SSL support is NOT enabled > [2013-08-16 04:55:24.407835] I [socket.c:3495:socket_init] > 0-socket.management: using system polling thread > [2013-08-16 04:55:24.407972] E [rpc-transport.c:253:rpc_transport_load] > 0-rpc-transport: /usr/lib64/glusterfs/3.4.0/rpc-transport/rdma.so: cannot > open shared object file: No such file or directory > [2013-08-16 04:55:24.407995] W [rpc-transport.c:257:rpc_transport_load] > 0-rpc-transport: volume 'rdma.management': transport-type 'rdma' is not valid > or not found on this machine > [2013-08-16 04:55:24.408009] W [rpcsvc.c:1387:rpcsvc_transport_create] > 0-rpc-service: cannot create listener, initing the transport failed > [2013-08-16 04:55:25.867973] I > [glusterd-store.c:1328:glusterd_restore_op_version] 0-glusterd: retrieved > op-version: 2 > [2013-08-16 04:55:25.884692] E > [glusterd-store.c:1845:glusterd_store_retrieve_volume] 0-: Unknown key: > brick-0 > [2013-08-16 04:55:25.884771] E > [glusterd-store.c:1845:glusterd_store_retrieve_volume] 0-: Unknown key: > brick-1 > [2013-08-16 04:55:26.110537] E > [glusterd-store.c:1845:glusterd_store_retrieve_volume] 0-: Unknown key: > brick-0 > [2013-08-16 04:55:26.110617] E > [glusterd-store.c:1845:glusterd_store_retrieve_volume] 0-: Unknown key: > brick-1 > [2013-08-16 04:55:26.185491] E > [glusterd-store.c:1845:glusterd_store_retrieve_volume] 0-: Unknown key: > brick-0 > [2013-08-16 04:55:26.185571] E > [glusterd-store.c:1845:glusterd_store_retrieve_volume] 0-: Unknown key: > brick-1 > [2013-08-16 04:55:29.250542] E > [glusterd-store.c:2472:glusterd_resolve_all_bricks] 0-glusterd: resolve brick > failed in restore You seem to have an incomplete state file in /var/lib/glusterd/vols/ and hence initialization of glusterd seems to have failed. Can you please check that out? Regards, Vijay > [2013-08-16 04:55:29.250615] E [xlator.c:390:xlator_init] 0-management: > Initialization of volume 'management' failed, review your volfile again > [2013-08-16 04:55:29.250634] E [graph.c:292:glusterfs_graph_init] > 0-management: initializing translator failed ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users
[Gluster-users] How do I know with what peer (brick) is client working now ?
Hello How do I know with what peer (brick) is client working now ? Thank you --- Старший Системный Администратор Алексей Шалин ОсОО "Хостер kg" - http://www.hoster.kg ул. Ахунбаева 123 (здание БГТС) h...@hoster.kg ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users