Re: [Gluster-users] Unable to mount gluster volume via mount -t nfs

2013-08-16 Thread Harshavardhana
Alexey,

Can you try with

$ mount -vv -t nfs -overs=3 :/ 




On Fri, Aug 16, 2013 at 9:17 PM, Alexey Shalin  wrote:

>
> root@ispcp:~# mount -t nfs 192.168.15.165:/storage /storage
> mount.nfs: Unknown error 521
> root@ispcp:~#
>
> [2013-08-17 04:09:46.444600] E [nfs3.c:306:__nfs3_get_volume_id]
> (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.4.0/xlator/nfs/server.so(nfs3_getattr+0x18c)
> [0x7f33126acf5c]
> (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.4.0/xlator/nfs/server.so(nfs3_getattr_reply+0x20)
> [0x7f33126ac930]
> (-->/usr/lib/x86_64-linux-gnu/glusterfs/3.4.0/xlator/nfs/server.so(nfs3_request_xlator_deviceid+0x51)
> [0x7f33126ac8d1]))) 0-nfs-nfsv3: invalid argument: xl
> [2013-08-17 04:09:55.859975] E [nfs3.c:839:nfs3_getattr] 0-nfs-nfsv3: Bad
> Handle
> [2013-08-17 04:09:55.860027] W [nfs3-helpers.c:3389:nfs3_log_common_res]
> 0-nfs-nfsv3: XID: 9d86c988, GETATTR: NFS: 10001(Illegal NFS file handle),
> POSIX: 14(Bad address)
>
> How to mount it ?
>
>
> ---
> Старший Системный Администратор
> Алексей Шалин
> ОсОО "Хостер kg" - http://www.hoster.kg
> ул. Ахунбаева 123 (здание БГТС)
> h...@hoster.kg
>
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://supercolony.gluster.org/mailman/listinfo/gluster-users




-- 
*Religious confuse piety with mere ritual, the virtuous confuse regulation
with outcomes*
___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users

[Gluster-users] Unable to mount gluster volume via mount -t nfs

2013-08-16 Thread Alexey Shalin

root@ispcp:~# mount -t nfs 192.168.15.165:/storage /storage
mount.nfs: Unknown error 521
root@ispcp:~#

[2013-08-17 04:09:46.444600] E [nfs3.c:306:__nfs3_get_volume_id] 
(-->/usr/lib/x86_64-linux-gnu/glusterfs/3.4.0/xlator/nfs/server.so(nfs3_getattr+0x18c)
 [0x7f33126acf5c] 
(-->/usr/lib/x86_64-linux-gnu/glusterfs/3.4.0/xlator/nfs/server.so(nfs3_getattr_reply+0x20)
 [0x7f33126ac930] 
(-->/usr/lib/x86_64-linux-gnu/glusterfs/3.4.0/xlator/nfs/server.so(nfs3_request_xlator_deviceid+0x51)
 [0x7f33126ac8d1]))) 0-nfs-nfsv3: invalid argument: xl
[2013-08-17 04:09:55.859975] E [nfs3.c:839:nfs3_getattr] 0-nfs-nfsv3: Bad Handle
[2013-08-17 04:09:55.860027] W [nfs3-helpers.c:3389:nfs3_log_common_res] 
0-nfs-nfsv3: XID: 9d86c988, GETATTR: NFS: 10001(Illegal NFS file handle), 
POSIX: 14(Bad address)

How to mount it ?


---
Старший Системный Администратор
Алексей Шалин
ОсОО "Хостер kg" - http://www.hoster.kg
ул. Ахунбаева 123 (здание БГТС)
h...@hoster.kg

___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Slow on writing

2013-08-16 Thread Alexey Shalin
of example same client wrote file to nfs share
root@ispcp:/mnt# dd if=/dev/zero of=./bigfile${i} count=1024 bs=10k
1024+0 records in
1024+0 records out
10485760 bytes (10 MB) copied, 0.133489 s, 78.6 MB/s

much faster :(

cat /etc/mtab

fusectl /sys/fs/fuse/connections fusectl rw,relatime 0 0
nas.storage:/storage /storage fuse.glusterfs 
rw,relatime,user_id=0,group_id=0,default_permissions,allow_other,max_read=131072
 0 0

how to mount /storage on client with noatime, nodiratime ?

on peers:
1) RAID 10 (hardware)
/dev/sda1 /storage ext4 
rw,noatime,nodiratime,errors=remount-ro,user_xattr,noacl,barrier=1,data=ordered 
0 0
2) RAID 5 (software)
/dev/md5 /storage ext4 rw,noatime,nodiratime,noacl 0 0




---
Старший Системный Администратор
Алексей Шалин
ОсОО "Хостер kg" - http://www.hoster.kg
ул. Ахунбаева 123 (здание БГТС)
h...@hoster.kg

___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users

[Gluster-users] Slow on writing

2013-08-16 Thread Alexey Shalin

Hello, guys


I wrote small script :
#!/bin/bash

for i in {1..1000}; do
size=$((RANDOM%5+1))
dd if=/dev/zero of=/storage/test/bigfile${i} count=1024 bs=${size}k
done

This script creates files with different size on volume

here is output:
2097152 bytes (2.1 MB) copied, 0.120632 s, 17.4 MB/s
1024+0 records in
1024+0 records out
1048576 bytes (1.0 MB) copied, 0.14548 s, 7.2 MB/s
1024+0 records in
1024+0 records out
2097152 bytes (2.1 MB) copied, 0.125532 s, 16.7 MB/s
1024+0 records in
1024+0 records out
3145728 bytes (3.1 MB) copied, 0.144503 s, 21.8 MB/s
1024+0 records in
1024+0 records out
1048576 bytes (1.0 MB) copied, 0.0994717 s, 10.5 MB/s
1024+0 records in
1024+0 records out
4194304 bytes (4.2 MB) copied, 0.142613 s, 29.4 MB/s
1024+0 records in
1024+0 records out
4194304 bytes (4.2 MB) copied, 0.103823 s, 40.4 MB/s
1024+0 records in
1024+0 records out
1048576 bytes (1.0 MB) copied, 0.138864 s, 7.6 MB/s
1024+0 records in
1024+0 records out
3145728 bytes (3.1 MB) copied, 0.102374 s, 30.7 MB/s
1024+0 records in
1024+0 records out
3145728 bytes (3.1 MB) copied, 0.166409 s, 18.9 MB/s
1024+0 records in
1024+0 records out
1048576 bytes (1.0 MB) copied, 0.169923 s, 6.2 MB/s
1024+0 records in
1024+0 records out
2097152 bytes (2.1 MB) copied, 0.142017 s, 14.8 MB/s
1024+0 records in
1024+0 records out
2097152 bytes (2.1 MB) copied, 0.159753 s, 13.1 MB/s
1024+0 records in
1024+0 records out
3145728 bytes (3.1 MB) copied, 0.146142 s, 21.5 MB/s
^C180+0 records in
180+0 records out
737280 bytes (737 kB) copied, 0.0306554 s, 24.1 MB/s


as you can see - the speed very slow :(

I  have  configured ethernet networks on bricks - as bond (but only on
ope peer)
Output of iperf
 iperf -c 192.168.15.165

Client connecting to 192.168.15.165, TCP port 5001
TCP window size:  640 KByte (default)

[  3] local 192.168.15.159 port 37095 connected with 192.168.15.165 port 5001
[ ID] Interval   Transfer Bandwidth
[  3]  0.0-10.0 sec  1.03 GBytes   880 Mbits/sec


I  will  run this command dd if=/dev/zero of=/storage/test/bigfile${i}
count=1024 bs=10k on peer.. speed is very high:

root@nas:~#  dd if=/dev/zero of=/storage/test/bigfile${i} count=1024 bs=10k
1024+0 records in
1024+0 records out
10485760 bytes (10 MB) copied, 0.0114412 s, 916 MB/s
root@nas:~#

my volume config:
root@nas:~# gluster volume info

Volume Name: storage
Type: Replicate
Volume ID: 8abee05f-9aa1-41d7-9f72-363c6fd8fc74
Status: Started
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: nas.storage:/storage
Brick2: back.storage:/storage
Options Reconfigured:
performance.cache-size: 256MB


my  installation  is  default, any one have good guide how to get good
performance ?

Thank you


---
Старший Системный Администратор
Алексей Шалин
ОсОО "Хостер kg" - http://www.hoster.kg
ул. Ахунбаева 123 (здание БГТС)
h...@hoster.kg

___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] How do I know with what peer (brick) is client working now ?

2013-08-16 Thread Alexey Shalin
Thank you, :)

---
Старший Системный Администратор
Алексей Шалин
ОсОО "Хостер kg" - http://www.hoster.kg
ул. Ахунбаева 123 (здание БГТС)
h...@hoster.kg

___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users

[Gluster-users] standby-server

2013-08-16 Thread Ted Miller
I am looking at glusterfs for a HA application w/o local tech support (don't 
ask, third-world country, techs are hard to find).


My current plan is to do a replica-4 + hot spare server.  Of the four in-use 
bricks, two will be on "servers" and the other two will be on a "client" 
machine and a "hot-backup" client machine.  No striping, all content on each 
local machine, each machine using its own disk for all reading.


Part of my plan is to have a cold-spare server in the rack, not powered on.  
(This server will also be a cold spare for another server).


I am wondering if this would be a viable way to set up this configuration:

Set up glusterfs as replica-5.

1. server1
2. server2
3. client
4. client-standby
5. server-spare

Initialize and set up glusterfs with all 5 bricks in the system (no file 
content).

Install system at client site, and test with all 5 bricks in system.

Shut down spare server.

Once a month, power up spare server, run full heal, shut down.
Power up server-spare for any software updates.

If server1 or server2 dies (or needs maintenance), tell them to power up 
server-spare, and let it heal.


It seems to me that this would be easier than setting up a replica-4 system 
and then jumping through all the hoops to replace a server from scratch.


Comments, reactions, pot-shots welcome.
Ted Miller

--
"He is no fool who gives what he cannot keep, to gain what he cannot lose." - - 
Jim Elliot
For more information about Jim Elliot and his unusual life, see 
http://www.christianliteratureandliving.com/march2003/carolyn.html.

Ted Miller
Design Engineer
HCJB Global Technology Center
2830 South 17th St
Elkhart, IN  46517
574--970-4272 my desk
574--970-4252 receptionist


___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] Replacing a failed brick

2013-08-16 Thread David Gibbons
Ok, it appears that the following worked. Thanks for the nudge in the right
direction:

volume replace-brick test-a 10.250.4.65:/localmnt/g2lv5
10.250.4.65:/localmnt/g2lv6
commit force

then
volume heal test-a full

and monitor the progress with
volume heal test-a info

However that does not solve my problem for what to do when a brick is
corrupted somehow, if I don't have enough space to first heal it and then
replace it.

That did get me thinking though, "what if I replace the brick, forgoe the
heal, replace it again and then do a heal?" That seems to work.

So if I lose one brick, here is the process that I used to recover it:
1) create a directory that is just to temporary trick gluster and allow us
to maintain the correct replica count: mkdir /localmnt/garbage
2) replace the dead brick with our garbage directory: volume replace-brick
test-a 10.250.4.65:/localmnt/g2lv5 10.250.4.65:/localmnt/garbage commit
force
3) fix our dead brick using whatever process is required. in this case, for
testing, we had to remove some gluster bits or it throws the "already part
of a volume error":
setfattr -x trusted.glusterfs.volume-id /localmnt/g2lv5
setfattr -x trusted.gfid /localmnt/g2lv5
4) now that our dead brick is fixed, swap it for the garbage/temporary
brick: volume replace-brick test-a 10.250.4.65:/localmnt/garbage
10.250.4.65:/localmnt/g2lv5 commit force
5) now all that we have to do is let gluster heal the volume: volume heal
test-a full

Is there anything wrong with this procedure?

Cheers,
Dave




On Fri, Aug 16, 2013 at 11:03 AM, David Gibbons
wrote:

> Ravi,
>
> Thanks for the tips. When I run a volume status:
> gluster> volume status test-a
> Status of volume: test-a
> Gluster process PortOnline  Pid
>
> --
> Brick 10.250.4.63:/localmnt/g1lv2   49152   Y
> 8072
> Brick 10.250.4.65:/localmnt/g2lv2   49152   Y
> 3403
> Brick 10.250.4.63:/localmnt/g1lv3   49153   Y
> 8081
> Brick 10.250.4.65:/localmnt/g2lv3   49153   Y
> 3410
> Brick 10.250.4.63:/localmnt/g1lv4   49154   Y
> 8090
> Brick 10.250.4.65:/localmnt/g2lv4   49154   Y
> 3417
> Brick 10.250.4.63:/localmnt/g1lv5   49155   Y
> 8099
> Brick 10.250.4.65:/localmnt/g2lv5   N/A N
> N/A
> Brick 10.250.4.63:/localmnt/g1lv1   49156   Y
> 8576
> Brick 10.250.4.65:/localmnt/g2lv1   49156   Y
> 3431
> NFS Server on localhost 2049Y
> 3440
> Self-heal Daemon on localhost   N/A Y
> 3445
> NFS Server on 10.250.4.63   2049Y
> 8586
> Self-heal Daemon on 10.250.4.63 N/A Y
> 8593
>
> There are no active volume tasks
> --
>
> Attempting to start the volume results in:
> gluster> volume start test-a force
> volume start: test-a: failed: Failed to get extended attribute
> trusted.glusterfs.volume-id for brick dir /localmnt/g2lv5. Reason : No data
> available
> --
>
> It doesn't like when I try to fire off a heal either:
> gluster> volume heal test-a
> Launching Heal operation on volume test-a has been unsuccessful
> --
>
> Although that did lead me to this:
> gluster> volume heal test-a info
> Gathering Heal info on volume test-a has been successful
>
> Brick 10.250.4.63:/localmnt/g1lv2
> Number of entries: 0
>
> Brick 10.250.4.65:/localmnt/g2lv2
> Number of entries: 0
>
> Brick 10.250.4.63:/localmnt/g1lv3
> Number of entries: 0
>
> Brick 10.250.4.65:/localmnt/g2lv3
> Number of entries: 0
>
> Brick 10.250.4.63:/localmnt/g1lv4
> Number of entries: 0
>
> Brick 10.250.4.65:/localmnt/g2lv4
> Number of entries: 0
>
> Brick 10.250.4.63:/localmnt/g1lv5
> Number of entries: 0
>
> Brick 10.250.4.65:/localmnt/g2lv5
> Status: Brick is Not connected
> Number of entries: 0
>
> Brick 10.250.4.63:/localmnt/g1lv1
> Number of entries: 0
>
> Brick 10.250.4.65:/localmnt/g2lv1
> Number of entries: 0
> --
>
> So perhaps I need to re-connect the brick?
>
> Cheers,
> Dave
>
>
>
> On Fri, Aug 16, 2013 at 12:43 AM, Ravishankar N wrote:
>
>>  On 08/15/2013 10:05 PM, David Gibbons wrote:
>>
>> Hi There,
>>
>>  I'm currently testing Gluster for possible production use. I haven't
>> been able to find the answer to this question in the forum arch or in the
>> public docs. It's possible that I don't know which keywords to search for.
>>
>>  Here's the question (more details below): let's say that one of my
>> bricks "fails" -- *not* a whole node failure but a single brick failure
>> within the node. How do I replace a single brick on a node and force a sync
>> from one of the replicas?
>>
>>  I have two nodes with 5 bricks each:
>>  gluster> volume info test-a
>>
>>  Volume Name: test-a
>> Type: Distributed-Replicate
>> Volume ID: e8957773-dd36-44ae-b80a-01e22c7

Re: [Gluster-users] Replacing a failed brick

2013-08-16 Thread Joe Julian
This tells you that this brick isn't running. That's probably because it 
was formatted and lost it's volume-id extended attribute. See 
http://www.joejulian.name/blog/replacing-a-brick-on-glusterfs-340/


Once that's fixed, on 10.250.4.65:

  gluster volume start test-a force


On 08/16/2013 08:03 AM, David Gibbons wrote:

Brick 10.250.4.65:/localmnt/g2lv5   N/A N   N/A


___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users


[Gluster-users] Problems with data integrity between client, volume, and replicated bricks

2013-08-16 Thread Michael Peek
Hi gurus,

I've been banging my head against a test volume for about a month and a
half now, and I'm having some serious problems figuring out what's going on.

I'm running on Ubuntu 12.04 amd64
I'm running Gluster 3.4.0final-ubuntu1~precise1
My cluster is made up of four machines, each machine has two 4TB HDDs
(ext4), with replication
My test client has an HDD with 913GB of test data in 156,544 files

Forgive the weird path names, but I wanted to use a setup with something
akin to the real data that I'd be using, and in production there's going
to be weird path names aplenty.  I include the path names here just in
case someone sees something obvious, like "You compared the wrong files"
or "You can't use path names like that with gluster!"  But for your
reading pleasure, I also list output below with the path names removed
so that you can clearly see similarities or differences from client to
volume to brick.

Disclaimer:  I have done some outage tests with this volume in the past
by unplugging a drive, plugging it back in, and then doing a full heal. 
The volume currently shows 1023 failed heals on bkupc1-b:/export/b/
(brick #2).  But that was before I started this particular test.  For
this test all the old files and directories had been deleted from the
volume beforehand so that I could start with an empty volume.  And no
outages -- simulated or otherwise -- have taken place for this test.  (I
have confirmed that every file listed by a gluster as heal-failed no
longer exists.  And yet, even though I have deleted the volume's
contents, the failed heals count remains.)  I thought this might be
important to disclose.  If so desired I can repeat the test after
deleting the volume and recreating it from scratch.  However, once in
production, doing this would be highly unfeasible as a solution to a
problem.  So if this is the cause of my angst, then I'd rather know how
to fix things as they sit now as opposed to scrapping the volume and
starting anew.

Here's a detailed description of my latest test:

1) The client mounts the volume with fuse.glusterfs
(rw,default_permissions,allow_other,max_read=131072) as /data/bkupc1

2) I perform an rsync of the data to the volume.  I have the whole test
scripted and I'll list the juicy bits:

cd /export/d/eraseme/
if [ -d /data/bkupc1/BACKUPS/ ]; then
mv /data/bkupc1/BACKUPS /data/bkupc1/BACKUPS.old
( /bin/rm -fr /data/bkupc1/BACKUPS.old & )
fi
mkdir /data/bkupc1/BACKUPS
rsync \
-a \
-v \
--delete \
--delete-excluded \
--force \
--ignore-errors \
--one-file-system \
--progress \
--stats \
--exclude '/tmp' \
--exclude '/var/tmp' \
--exclude '**core' \
--partial \
--inplace \
./ \
/data/bkupc1/BACKUPS/

NOTE: If the directory /data/bkupc1/BACKUPS/ exists from a previous run
of this test then I move it, and then delete it in the background while
rsync is running.

Output:
...
Number of files: 156554
Number of files transferred: 147980
Total file size: 886124490325 bytes
Total transferred file size: 886124487184 bytes
Literal data: 886124487184 bytes
Matched data: 0 bytes
File list size: 20189800
File list generation time: 0.001 seconds
File list transfer time: 0.000 seconds
Total bytes sent: 886258975318
Total bytes received: 2845881
 
sent 886258975318 bytes  received 2845881 bytes  45981053.79 bytes/sec
total size is 886124490325  speedup is 1.00

3) My client has md5 checksums for it's files, so next my script checks
the files on the volume:

cd /data/bkupc1/BACKUPS/
md5sum -c --quiet md5sums
data/884b9a38-0443-11e3-b8fb-f46d04e15793/884a7040-0443-11e3-b8fb-f46d04e15793/87fdc790-0443-11e3-b8fb-f46d04e15793/87fb6cfc-0443-11e3-b8fb-f46d04e15793/gF-Eqm1GHw7NPNQOQoeJLNNlfL5ydR0FzVZDHdK9OShRHknwgkqCG0M1yWnryQ,cdfk6Ysdk99eoncEHxnDrEQZF:
FAILED
md5sum: WARNING: 1 computed checksum did NOT match

a) Taking a closer look at this file:

On the client:

root@client:/export/d/eraseme# ls -ald
data/884b9a38-0443-11e3-b8fb-f46d04e15793/884a7040-0443-11e3-b8fb-f46d04e15793/87fdc790-0443-11e3-b8fb-f46d04e15793/87fb6cfc-0443-11e3-b8fb-f46d04e15793/gF-Eqm1GHw7NPNQOQoeJLNNlfL5ydR0FzVZDHdK9OShRHknwgkqCG0M1yWnryQ,cdfk6Ysdk99eoncEHxnDrEQZF
-rw-r--r-- 1 peek peek 646041328 Nov 13  2009
data/884b9a38-0443-11e3-b8fb-f46d04e15793/884a7040-0443-11e3-b8fb-f46d04e15793/87fdc790-0443-11e3-b8fb-f46d04e15793/87fb6cfc-0443-11e3-b8fb-f46d04e15793/gF-Eqm1GHw7NPNQOQoeJLNNlfL5ydR0FzVZDHdK9OShRHknwgkqCG0M1yWnryQ,cdfk6Ysdk99eoncEHxnDrEQZF

On the volume:

root@bkupc1-a:/data/bkupc1/BACKUPS# ls -ald
data/884b9a38-0443-11e3-b8fb-f46d04e15793/884a7040-0443-11e3-b8fb-f46d04e15793/87fdc790-0443-11e3-b8fb-f46d04e15793/87fb6cfc-0443-11e3-b8fb-f46d04e15793/gF-Eqm1GHw7NPNQOQoeJLNNlfL5ydR0FzVZDHdK9OShRHknwgkqCG0M1yWnryQ,cdfk6Ysdk99eoncEHxnDrEQZF
-rw-r--r-- 1 peek peek 646041328 Nov 13  2009
data/884b9a38-0443-11e3-b8fb-f46d04e15793/884a7040-0443-11e3-b8fb-f46d04e15793/87fdc790-0443-11e3-b8fb-f46d04e15793/87fb6cfc-0443-11e3-b8fb-f46d04e15793/gF-Eqm1

Re: [Gluster-users] Replacing a failed brick

2013-08-16 Thread David Gibbons
Ravi,

Thanks for the tips. When I run a volume status:
gluster> volume status test-a
Status of volume: test-a
Gluster process PortOnline  Pid
--
Brick 10.250.4.63:/localmnt/g1lv2   49152   Y   8072
Brick 10.250.4.65:/localmnt/g2lv2   49152   Y   3403
Brick 10.250.4.63:/localmnt/g1lv3   49153   Y   8081
Brick 10.250.4.65:/localmnt/g2lv3   49153   Y   3410
Brick 10.250.4.63:/localmnt/g1lv4   49154   Y   8090
Brick 10.250.4.65:/localmnt/g2lv4   49154   Y   3417
Brick 10.250.4.63:/localmnt/g1lv5   49155   Y   8099
Brick 10.250.4.65:/localmnt/g2lv5   N/A N   N/A
Brick 10.250.4.63:/localmnt/g1lv1   49156   Y   8576
Brick 10.250.4.65:/localmnt/g2lv1   49156   Y   3431
NFS Server on localhost 2049Y   3440
Self-heal Daemon on localhost   N/A Y   3445
NFS Server on 10.250.4.63   2049Y   8586
Self-heal Daemon on 10.250.4.63 N/A Y   8593

There are no active volume tasks
--

Attempting to start the volume results in:
gluster> volume start test-a force
volume start: test-a: failed: Failed to get extended attribute
trusted.glusterfs.volume-id for brick dir /localmnt/g2lv5. Reason : No data
available
--

It doesn't like when I try to fire off a heal either:
gluster> volume heal test-a
Launching Heal operation on volume test-a has been unsuccessful
--

Although that did lead me to this:
gluster> volume heal test-a info
Gathering Heal info on volume test-a has been successful

Brick 10.250.4.63:/localmnt/g1lv2
Number of entries: 0

Brick 10.250.4.65:/localmnt/g2lv2
Number of entries: 0

Brick 10.250.4.63:/localmnt/g1lv3
Number of entries: 0

Brick 10.250.4.65:/localmnt/g2lv3
Number of entries: 0

Brick 10.250.4.63:/localmnt/g1lv4
Number of entries: 0

Brick 10.250.4.65:/localmnt/g2lv4
Number of entries: 0

Brick 10.250.4.63:/localmnt/g1lv5
Number of entries: 0

Brick 10.250.4.65:/localmnt/g2lv5
Status: Brick is Not connected
Number of entries: 0

Brick 10.250.4.63:/localmnt/g1lv1
Number of entries: 0

Brick 10.250.4.65:/localmnt/g2lv1
Number of entries: 0
--

So perhaps I need to re-connect the brick?

Cheers,
Dave



On Fri, Aug 16, 2013 at 12:43 AM, Ravishankar N wrote:

>  On 08/15/2013 10:05 PM, David Gibbons wrote:
>
> Hi There,
>
>  I'm currently testing Gluster for possible production use. I haven't
> been able to find the answer to this question in the forum arch or in the
> public docs. It's possible that I don't know which keywords to search for.
>
>  Here's the question (more details below): let's say that one of my
> bricks "fails" -- *not* a whole node failure but a single brick failure
> within the node. How do I replace a single brick on a node and force a sync
> from one of the replicas?
>
>  I have two nodes with 5 bricks each:
>  gluster> volume info test-a
>
>  Volume Name: test-a
> Type: Distributed-Replicate
> Volume ID: e8957773-dd36-44ae-b80a-01e22c78a8b4
> Status: Started
> Number of Bricks: 5 x 2 = 10
> Transport-type: tcp
> Bricks:
> Brick1: 10.250.4.63:/localmnt/g1lv2
> Brick2: 10.250.4.65:/localmnt/g2lv2
> Brick3: 10.250.4.63:/localmnt/g1lv3
> Brick4: 10.250.4.65:/localmnt/g2lv3
> Brick5: 10.250.4.63:/localmnt/g1lv4
> Brick6: 10.250.4.65:/localmnt/g2lv4
> Brick7: 10.250.4.63:/localmnt/g1lv5
> Brick8: 10.250.4.65:/localmnt/g2lv5
> Brick9: 10.250.4.63:/localmnt/g1lv1
> Brick10: 10.250.4.65:/localmnt/g2lv1
>
>  I formatted 10.250.4.65:/localmnt/g2lv5 (to simulate a "failure"). What
> is the next step? I have tried various combinations of removing and
> re-adding the brick, replacing the brick, etc. I read in a previous message
> to this list that replace-brick was for planned changes which makes sense,
> so that's probably not my next step.
>
> You must first check if the 'formatted' brick 10.250.4.65:/localmnt/g2lv5
> is online using the `gluster volume status` command. If not start the
> volume using `gluster volume start force`. You can then use the
> gluster volume heal command which would copy the data from the other
> replica brick into your formatted brick.
> Hope this helps.
> -Ravi
>
>
>  Cheers,
> Dave
>
>
> ___
> Gluster-users mailing 
> listGluster-users@gluster.orghttp://supercolony.gluster.org/mailman/listinfo/gluster-users
>
>
>
___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] How do I know with what peer (brick) is client working now ?

2013-08-16 Thread Joe Julian
The client connects with all the bricks in the volume.

Alexey Shalin  wrote:
>Hello
>How do I know with what peer (brick) is client working now ?
>
>
>Thank you
>
>---
>Старший Системный Администратор
>Алексей Шалин
>ОсОО "Хостер kg" - http://www.hoster.kg
>ул. Ахунбаева 123 (здание БГТС)
>h...@hoster.kg
>
>___
>Gluster-users mailing list
>Gluster-users@gluster.org
>http://supercolony.gluster.org/mailman/listinfo/gluster-users

-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Need help understanding the glusterd issue

2013-08-16 Thread Chetan Risbud
Thanks Vijay & Prashant. 

I will look the integrity of the state file you mentioned. But It was failing 
to resolve the bricks's IP address which I had got on my laptop after logging 
from home.
I corrected that and could work now, although I have some queries around ? IMO, 
it should not stop "glusterd" from starting ? and may allow to leave 
not-reachable volumes in "STOPPED" state as it supports "start /stop" states. I 
may be overlooking the bigger picture/usecase here. please correct. 

Thanks,
Chetan Risbud.

 

- Original Message -
From: "Vijay Bellur" 
To: "Chetan Risbud" 
Cc: "gluster-users Discussion List" 
Sent: Friday, August 16, 2013 11:20:37 AM
Subject: Re: Need help understanding the glusterd issue

On 08/16/2013 10:32 AM, Chetan Risbud wrote:
> HI All,
>
> I am init related failures while restarting glusterd. I did restart a 
> glusterd as I had changed the ring files for some other swift related 
> activity after adding a new volume. Is there any workaround for this problem?

CC'ing gluster-users as this is the relevant mailer for this.

> /var/log/glusterfs/etc-glusterfs-glusterd.vol.log
>
>
>
>
> [2013-08-16 04:55:24.399286] I [glusterfsd.c:1910:main] 0-/usr/sbin/glusterd: 
> Started running /usr/sbin/glusterd version 3.4.0 (/usr/sbin/glusterd -p 
> /run/glusterd.pid)
> [2013-08-16 04:55:24.404097] I [glusterd.c:962:init] 0-management: Using 
> /var/lib/glusterd as working directory
> [2013-08-16 04:55:24.407802] I [socket.c:3480:socket_init] 
> 0-socket.management: SSL support is NOT enabled
> [2013-08-16 04:55:24.407835] I [socket.c:3495:socket_init] 
> 0-socket.management: using system polling thread
> [2013-08-16 04:55:24.407972] E [rpc-transport.c:253:rpc_transport_load] 
> 0-rpc-transport: /usr/lib64/glusterfs/3.4.0/rpc-transport/rdma.so: cannot 
> open shared object file: No such file or directory
> [2013-08-16 04:55:24.407995] W [rpc-transport.c:257:rpc_transport_load] 
> 0-rpc-transport: volume 'rdma.management': transport-type 'rdma' is not valid 
> or not found on this machine
> [2013-08-16 04:55:24.408009] W [rpcsvc.c:1387:rpcsvc_transport_create] 
> 0-rpc-service: cannot create listener, initing the transport failed
> [2013-08-16 04:55:25.867973] I 
> [glusterd-store.c:1328:glusterd_restore_op_version] 0-glusterd: retrieved 
> op-version: 2
> [2013-08-16 04:55:25.884692] E 
> [glusterd-store.c:1845:glusterd_store_retrieve_volume] 0-: Unknown key: 
> brick-0
> [2013-08-16 04:55:25.884771] E 
> [glusterd-store.c:1845:glusterd_store_retrieve_volume] 0-: Unknown key: 
> brick-1
> [2013-08-16 04:55:26.110537] E 
> [glusterd-store.c:1845:glusterd_store_retrieve_volume] 0-: Unknown key: 
> brick-0
> [2013-08-16 04:55:26.110617] E 
> [glusterd-store.c:1845:glusterd_store_retrieve_volume] 0-: Unknown key: 
> brick-1
> [2013-08-16 04:55:26.185491] E 
> [glusterd-store.c:1845:glusterd_store_retrieve_volume] 0-: Unknown key: 
> brick-0
> [2013-08-16 04:55:26.185571] E 
> [glusterd-store.c:1845:glusterd_store_retrieve_volume] 0-: Unknown key: 
> brick-1
> [2013-08-16 04:55:29.250542] E 
> [glusterd-store.c:2472:glusterd_resolve_all_bricks] 0-glusterd: resolve brick 
> failed in restore

You seem to have an incomplete state file in 
/var/lib/glusterd/vols/ and hence initialization of glusterd 
seems to have failed. Can you please check that out?

Regards,
Vijay

> [2013-08-16 04:55:29.250615] E [xlator.c:390:xlator_init] 0-management: 
> Initialization of volume 'management' failed, review your volfile again
> [2013-08-16 04:55:29.250634] E [graph.c:292:glusterfs_graph_init] 
> 0-management: initializing translator failed

___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users


[Gluster-users] How do I know with what peer (brick) is client working now ?

2013-08-16 Thread Alexey Shalin
Hello
How do I know with what peer (brick) is client working now ?


Thank you

---
Старший Системный Администратор
Алексей Шалин
ОсОО "Хостер kg" - http://www.hoster.kg
ул. Ахунбаева 123 (здание БГТС)
h...@hoster.kg

___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users