Re: [Gluster-users] How to trigger a resync of a newly replaced empty brick in replicate config ?

2018-02-02 Thread Alessandro Ipe
 0completed0:03:53
Estimated time left for rebalance to complete :   359739:51:24
volume rebalance: home: success


Thanks,


A.



On Thursday, 1 February 2018 18:57:17 CET Serkan Çoban wrote:
> What is server4? You just mentioned server1 and server2 previously.
> Can you post the output of gluster v status volname
> 
> On Thu, Feb 1, 2018 at 8:13 PM, Alessandro Ipe  
> wrote:
> > Hi,
> > 
> > 
> > Thanks. However "gluster v heal volname full" returned the following error
> > message
> > Commit failed on server4. Please check log file for details.
> > 
> > I have checked the log files in /var/log/glusterfs on server4 (by grepping
> > heal), but did not get any match. What should I be looking for and in
> > which
> > log file, please ?
> > 
> > Note that there is currently a rebalance process running on the volume.
> > 
> > 
> > Many thanks,
> > 
> > 
> > A.
> > 
> > On Thursday, 1 February 2018 17:32:19 CET Serkan Çoban wrote:
> >> You do not need to reset brick if brick path does not change. Replace
> >> the brick format and mount, then gluster v start volname force.
> >> To start self heal just run gluster v heal volname full.
> >> 
> >> On Thu, Feb 1, 2018 at 6:39 PM, Alessandro Ipe 
> > 
> > wrote:
> >> > Hi,
> >> > 
> >> > 
> >> > My volume home is configured in replicate mode (version 3.12.4) with
> >> > the
> >> > bricks server1:/data/gluster/brick1
> >> > server2:/data/gluster/brick1
> >> > 
> >> > server2:/data/gluster/brick1 was corrupted, so I killed gluster daemon
> >> > for
> >> > that brick on server2, umounted it, reformated it, remounted it and did
> >> > a>
> >> > 
> >> >> gluster volume reset-brick home server2:/data/gluster/brick1
> >> >> server2:/data/gluster/brick1 commit force>
> >> > 
> >> > I was expecting that the self-heal daemon would start copying data from
> >> > server1:/data/gluster/brick1 (about 7.4 TB) to the empty
> >> > server2:/data/gluster/brick1, which it only did for directories, but
> >> > not
> >> > for files.
> >> > 
> >> > For the moment, I launched on the fuse mount point
> >> > 
> >> >> find . | xargs stat
> >> > 
> >> > but crawling the whole volume (100 TB) to trigger self-healing of a
> >> > single
> >> > brick of 7.4 TB is unefficient.
> >> > 
> >> > Is there any trick to only self-heal a single brick, either by setting
> >> > some attributes to its top directory, for example ?
> >> > 
> >> > 
> >> > Many thanks,
> >> > 
> >> > 
> >> > Alessandro
> >> > 
> >> > 
> >> > ___
> >> > Gluster-users mailing list
> >> > Gluster-users@gluster.org
> >> > http://lists.gluster.org/mailman/listinfo/gluster-users
> > 
> > --
> > 
> >  Dr. Ir. Alessandro Ipe
> >  Department of Observations Tel. +32 2 373 06 31
> >  Remote Sensing from Space
> >  Royal Meteorological Institute
> >  Avenue Circulaire 3Email:
> >  B-1180 BrusselsBelgium alessandro@meteo.be
> >  Web: http://gerb.oma.be


-- 

 Dr. Ir. Alessandro Ipe   
 Department of Observations Tel. +32 2 373 06 31
 Remote Sensing from Space
 Royal Meteorological Institute  
 Avenue Circulaire 3Email:  
 B-1180 BrusselsBelgium alessandro@meteo.be 
 Web: http://gerb.oma.be   



___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] How to trigger a resync of a newly replaced empty brick in replicate config ?

2018-02-01 Thread Alessandro Ipe
Hi,


Thanks. However "gluster v heal volname full" returned the following error 
message
Commit failed on server4. Please check log file for details.

I have checked the log files in /var/log/glusterfs on server4 (by grepping 
heal), but did not get any match. What should I be looking for and in which 
log file, please ?

Note that there is currently a rebalance process running on the volume.


Many thanks,


A. 


On Thursday, 1 February 2018 17:32:19 CET Serkan Çoban wrote:
> You do not need to reset brick if brick path does not change. Replace
> the brick format and mount, then gluster v start volname force.
> To start self heal just run gluster v heal volname full.
> 
> On Thu, Feb 1, 2018 at 6:39 PM, Alessandro Ipe  
wrote:
> > Hi,
> > 
> > 
> > My volume home is configured in replicate mode (version 3.12.4) with the
> > bricks server1:/data/gluster/brick1
> > server2:/data/gluster/brick1
> > 
> > server2:/data/gluster/brick1 was corrupted, so I killed gluster daemon for
> > that brick on server2, umounted it, reformated it, remounted it and did a> 
> >> gluster volume reset-brick home server2:/data/gluster/brick1
> >> server2:/data/gluster/brick1 commit force> 
> > I was expecting that the self-heal daemon would start copying data from
> > server1:/data/gluster/brick1 (about 7.4 TB) to the empty
> > server2:/data/gluster/brick1, which it only did for directories, but not
> > for files.
> > 
> > For the moment, I launched on the fuse mount point
> > 
> >> find . | xargs stat
> > 
> > but crawling the whole volume (100 TB) to trigger self-healing of a single
> > brick of 7.4 TB is unefficient.
> > 
> > Is there any trick to only self-heal a single brick, either by setting
> > some attributes to its top directory, for example ?
> > 
> > 
> > Many thanks,
> > 
> > 
> > Alessandro
> > 
> > 
> > ___
> > Gluster-users mailing list
> > Gluster-users@gluster.org
> > http://lists.gluster.org/mailman/listinfo/gluster-users


-- 

 Dr. Ir. Alessandro Ipe   
 Department of Observations Tel. +32 2 373 06 31
 Remote Sensing from Space
 Royal Meteorological Institute  
 Avenue Circulaire 3Email:  
 B-1180 BrusselsBelgium alessandro@meteo.be 
 Web: http://gerb.oma.be   



___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users


[Gluster-users] How to trigger a resync of a newly replaced empty brick in replicate config ?

2018-02-01 Thread Alessandro Ipe
Hi,


My volume home is configured in replicate mode (version 3.12.4) with the bricks
server1:/data/gluster/brick1
server2:/data/gluster/brick1

server2:/data/gluster/brick1 was corrupted, so I killed gluster daemon for that 
brick on server2, umounted it, reformated it, remounted it and did a
> gluster volume reset-brick home server2:/data/gluster/brick1 
> server2:/data/gluster/brick1 commit force

I was expecting that the self-heal daemon would start copying data from 
server1:/data/gluster/brick1 
(about 7.4 TB) to the empty server2:/data/gluster/brick1, which it only did for 
directories, but not for files. 

For the moment, I launched on the fuse mount point
> find . | xargs stat
but crawling the whole volume (100 TB) to trigger self-healing of a single 
brick of 7.4 TB is unefficient.

Is there any trick to only self-heal a single brick, either by setting some 
attributes to its top directory, for example ?


Many thanks,


Alessandro


___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] Memory leak in 3.6.9

2016-04-27 Thread Alessandro Ipe
OK, great... Any plan to backport those important fixes to the 3.6 branch ?
Because, I am not reading to upgrade to the 3.7 branch for a production system. 
My 
fears is that 3.7 will bring other new issues and all I want is a stable and 
reliable 
branch without extra new functionalities (and new bugs) that will just work 
under 
normal use.


Thanks,


A.


On Wednesday 27 April 2016 09:58:00 Tim wrote:



There have been alot of fixes since  3.6.9. Specifically, 
https://bugzilla.redhat.com/1311377[1] was fixed in  3.7.9. 
re:https://github.com/gluster/glusterfs/blob/release-3.7/doc/release-notes/3.7.9.md[2]

  
Hi,  
   
   
Apparently, version 3.6.9 is suffering from a SERIOUS memoryleak as 
illustrated 
in the following logs:  
2016-04-26T11:54:27.971564+00:00 tsunami1 kernel:[698635.210069] 
glusterfsd invoked oom-killer: gfp_mask=0x201da,order=0, 
oom_score_adj=0  
2016-04-26T11:54:27.974133+00:00 tsunami1 kernel:[698635.210076] Pid: 
28111, comm: glusterfsd Tainted: G W O3.7.10-1.1-desktop #1  
2016-04-26T11:54:27.974136+00:00 tsunami1 kernel:[698635.210077] Call 
Trace:  
2016-04-26T11:54:27.974137+00:00 tsunami1 kernel:[698635.210090] 
[] dump_trace+0x88/0x300  
2016-04-26T11:54:27.974137+00:00 tsunami1 kernel:[698635.210096] 
[] dump_stack+0x69/0x6f  
2016-04-26T11:54:27.974138+00:00 tsunami1 kernel:[698635.210101] 
[]dump_header+0x70/0x200  
2016-04-26T11:54:27.974139+00:00 tsunami1 kernel:[698635.210105] 
[]oom_kill_process+0x244/0x390  
2016-04-26T11:54:28.113125+00:00 tsunami1 kernel:[698635.210111] 
[]out_of_memory+0x451/0x490  
2016-04-26T11:54:28.113142+00:00 tsunami1 kernel:[698635.210116] 
[]__alloc_pages_nodemask+0x8ae/0x9f0  
2016-04-26T11:54:28.113143+00:00 tsunami1 kernel:[698635.210122] 
[]alloc_pages_current+0xb7/0x130  
2016-04-26T11:54:28.113144+00:00 tsunami1 kernel:[698635.210127] 
[]filemap_fault+0x283/0x440  
2016-04-26T11:54:28.113144+00:00 tsunami1 kernel:[698635.210131] 
[] __do_fault+0x6e/0x560  
2016-04-26T11:54:28.113145+00:00 tsunami1 kernel:[698635.210136] 
[]handle_pte_fault+0x97/0x490  
2016-04-26T11:54:28.113145+00:00 tsunami1 kernel:[698635.210141] 
[]__do_page_fault+0x16b/0x4c0  
2016-04-26T11:54:28.113562+00:00 tsunami1 kernel:[698635.210145] 
[] page_fault+0x28/0x30  
2016-04-26T11:54:28.113565+00:00 tsunami1 kernel:[698635.210158] 
[<7fa9d8a8292b>] 0x7fa9d8a8292a  
2016-04-26T11:54:28.120811+00:00 tsunami1 kernel:[698635.226243] Out of 
memory: Kill process 17144 (glusterfsd)score 694 or sacrifice child 
 
2016-04-26T11:54:28.120811+00:00 tsunami1 kernel:[698635.226251] Killed 
process 17144 (glusterfsd)total-vm:8956384kB, anon-rss:6670900kB, file-
rss:0kB  
   
It makes this version completely useless in production. Bricksservers 
have 8 GB 
of RAM (but will be upgraded to 16 GB).  
   
gluster volume info  returns:  
Volume Name: home  
Type: Distributed-Replicate  
Volume ID: 501741ed-4146-4022-af0b-41f5b1297766  
Status: Started  
Number of Bricks: 14 x 2 = 28  
Transport-type: tcp  
Bricks:  
Brick1: tsunami1:/data/glusterfs/home/brick1  
Brick2: tsunami2:/data/glusterfs/home/brick1  
Brick3: tsunami1:/data/glusterfs/home/brick2  
Brick4: tsunami2:/data/glusterfs/home/brick2  
Brick5: tsunami1:/data/glusterfs/home/brick3  
Brick6: tsunami2:/data/glusterfs/home/brick3  
Brick7: tsunami1:/data/glusterfs/home/brick4  
Brick8: tsunami2:/data/glusterfs/home/brick4  
Brick9: tsunami3:/data/glusterfs/home/brick1  
Brick10: tsunami4:/data/glusterfs/home/brick1  
Brick11: tsunami3:/data/glusterfs/home/brick2  
Brick12: tsunami4:/data/glusterfs/home/brick2  
Brick13: tsunami3:/data/glusterfs/home/brick3  
Brick14: tsunami4:/data/glusterfs/home/brick3  
Brick15: tsunami3:/data/glusterfs/home/brick4  
Brick16: tsunami4:/data/glusterfs/home/brick4  
Brick17: tsunami5:/data/glusterfs/home/brick1  
Brick18: tsunami6:/data/glusterfs/home/brick1  
Brick19: tsunami5:/data/glusterfs/home/brick2  
Brick20: tsunami6:/data/glusterfs/home/brick2  
Brick21: tsunami5:/data/glusterfs/home/brick3  
Brick22: tsunami6:/data/glusterfs/home/brick3  
Brick23: tsunami5:/data/glusterfs/home/brick4  
Brick24: tsunami6:/data/glusterfs/home/brick4  
Brick25: tsunami7:/data/glusterfs/home/brick1  
Brick26: tsunami8:/data/glusterfs/home/brick1  
Brick27: tsunami7:/data/glusterfs/home/brick2  
Brick28: tsunami8:/data/glusterfs/home/brick2  
Options Reconfigured:  
nfs.export-dir: /gerb-reproc/Archive  
nfs.volume-access: read-only  
cluster.ensure

[Gluster-users] Memory leak in 3.6.9

2016-04-26 Thread Alessandro Ipe
Hi,


Apparently, version 3.6.9 is suffering from a SERIOUS memory leak as 
illustrated in the following logs:
2016-04-26T11:54:27.971564+00:00 tsunami1 kernel: [698635.210069] glusterfsd 
invoked oom-killer: gfp_mask=0x201da, order=0, oom_score_adj=0
2016-04-26T11:54:27.974133+00:00 tsunami1 kernel: [698635.210076] Pid: 28111, 
comm: glusterfsd Tainted: GW  O 3.7.10-1.1-desktop #1
2016-04-26T11:54:27.974136+00:00 tsunami1 kernel: [698635.210077] Call Trace:
2016-04-26T11:54:27.974137+00:00 tsunami1 kernel: [698635.210090]  
[] dump_trace+0x88/0x300
2016-04-26T11:54:27.974137+00:00 tsunami1 kernel: [698635.210096]  
[] dump_stack+0x69/0x6f
2016-04-26T11:54:27.974138+00:00 tsunami1 kernel: [698635.210101]  
[] dump_header+0x70/0x200
2016-04-26T11:54:27.974139+00:00 tsunami1 kernel: [698635.210105]  
[] oom_kill_process+0x244/0x390
2016-04-26T11:54:28.113125+00:00 tsunami1 kernel: [698635.210111]  
[] out_of_memory+0x451/0x490
2016-04-26T11:54:28.113142+00:00 tsunami1 kernel: [698635.210116]  
[] __alloc_pages_nodemask+0x8ae/0x9f0
2016-04-26T11:54:28.113143+00:00 tsunami1 kernel: [698635.210122]  
[] alloc_pages_current+0xb7/0x130
2016-04-26T11:54:28.113144+00:00 tsunami1 kernel: [698635.210127]  
[] filemap_fault+0x283/0x440
2016-04-26T11:54:28.113144+00:00 tsunami1 kernel: [698635.210131]  
[] __do_fault+0x6e/0x560
2016-04-26T11:54:28.113145+00:00 tsunami1 kernel: [698635.210136]  
[] handle_pte_fault+0x97/0x490
2016-04-26T11:54:28.113145+00:00 tsunami1 kernel: [698635.210141]  
[] __do_page_fault+0x16b/0x4c0
2016-04-26T11:54:28.113562+00:00 tsunami1 kernel: [698635.210145]  
[] page_fault+0x28/0x30
2016-04-26T11:54:28.113565+00:00 tsunami1 kernel: [698635.210158]  
[<7fa9d8a8292b>] 0x7fa9d8a8292a
2016-04-26T11:54:28.120811+00:00 tsunami1 kernel: [698635.226243] Out of 
memory: Kill process 17144 (glusterfsd) score 694 or sacrifice child
2016-04-26T11:54:28.120811+00:00 tsunami1 kernel: [698635.226251] Killed 
process 17144 (glusterfsd) total-vm:8956384kB, anon-rss:6670900kB, file-rss:0kB

It makes this version completely useless in production. Bricks servers have 8 
GB of RAM (but will be upgraded to 16 GB).

gluster volume info  returns:
Volume Name: home
Type: Distributed-Replicate
Volume ID: 501741ed-4146-4022-af0b-41f5b1297766
Status: Started
Number of Bricks: 14 x 2 = 28
Transport-type: tcp
Bricks:
Brick1: tsunami1:/data/glusterfs/home/brick1
Brick2: tsunami2:/data/glusterfs/home/brick1
Brick3: tsunami1:/data/glusterfs/home/brick2
Brick4: tsunami2:/data/glusterfs/home/brick2
Brick5: tsunami1:/data/glusterfs/home/brick3
Brick6: tsunami2:/data/glusterfs/home/brick3
Brick7: tsunami1:/data/glusterfs/home/brick4
Brick8: tsunami2:/data/glusterfs/home/brick4
Brick9: tsunami3:/data/glusterfs/home/brick1
Brick10: tsunami4:/data/glusterfs/home/brick1
Brick11: tsunami3:/data/glusterfs/home/brick2
Brick12: tsunami4:/data/glusterfs/home/brick2
Brick13: tsunami3:/data/glusterfs/home/brick3
Brick14: tsunami4:/data/glusterfs/home/brick3
Brick15: tsunami3:/data/glusterfs/home/brick4
Brick16: tsunami4:/data/glusterfs/home/brick4
Brick17: tsunami5:/data/glusterfs/home/brick1
Brick18: tsunami6:/data/glusterfs/home/brick1
Brick19: tsunami5:/data/glusterfs/home/brick2
Brick20: tsunami6:/data/glusterfs/home/brick2
Brick21: tsunami5:/data/glusterfs/home/brick3
Brick22: tsunami6:/data/glusterfs/home/brick3
Brick23: tsunami5:/data/glusterfs/home/brick4
Brick24: tsunami6:/data/glusterfs/home/brick4
Brick25: tsunami7:/data/glusterfs/home/brick1
Brick26: tsunami8:/data/glusterfs/home/brick1
Brick27: tsunami7:/data/glusterfs/home/brick2
Brick28: tsunami8:/data/glusterfs/home/brick2
Options Reconfigured:
nfs.export-dir: /gerb-reproc/Archive
nfs.volume-access: read-only
cluster.ensure-durability: on
features.quota: on
performance.cache-size: 512MB
performance.io-thread-count: 32
performance.flush-behind: off
performance.write-behind-window-size: 4MB
performance.write-behind: off
nfs.disable: off
cluster.read-hash-mode: 2
diagnostics.brick-log-level: CRITICAL
cluster.lookup-unhashed: on
server.allow-insecure: on
auth.allow: localhost, 
cluster.readdir-optimize: on
performance.readdir-ahead: on
nfs.export-volumes: off

Are you aware if this issue ?


Thanks,


A.
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

[Gluster-users] glusterfs fuse mount on clients

2016-04-13 Thread Alessandro Ipe
Hi,


Our gluster system is currently made of 4 replicated pairs of servers, holding  
either 2 or 4 bricks of 4
 HDs in RAID 10.

We have a bunch of clients which are mounting the system through the native 
fuse glusterfs client, more
 specifically there are all using the same server1 to get the config of the 
volume with a 
backupvolfile-server=server2, i.e. in /etc/fstab as:
*server1*:/home  /mnt/server glusterfs  
defaults,_netdev,use-readdirp=no,direct-io-mode=disable,backupvolfile-server=*server2*,log-level=ERROR,log-file=/var/log/gluster.log
 0 0

Would it be better if my clients were using alternatively server1 to server8 in 
terms of network/disk loads ?


Many thanks,


Alessandro.
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Is rebalance completely broken on 3.5.3 ?

2015-04-02 Thread Alessandro Ipe
Hi Nithya,


Sorry that it took so long to respond...

1. Indeed, couple of weeks ago, I added 2 bricks (in replicate mode) with 
add-brick and since, I was never able to complete the required rebalance 
(however a rebalance fix-layout completed).
2. *home-rebalance.log*
[2015-03-13 21:32:58.066242] E [dht-rebalance.c:1328:gf_defrag_migrate_data] 
0-home-dht: /seviri/.forward lookup failed
and the same "lookup failed" log for a lot of other files
[2015-03-13 21:32:58.245795] E [dht-linkfile.c:278:dht_linkfile_setattr_cbk] 
0-home-dht: setattr of uid/gid on /seviri/.forward 
: failed (Stale NFS file handle)
[2015-03-13 21:32:58.286201] E [dht-common.c:2465:dht_vgetxattr_cbk] 
0-home-dht: Subvolume home-replicate-4 returned -1 (Stale NFS file handle)
[2015-03-13 21:32:58.286258] E [dht-rebalance.c:1336:gf_defrag_migrate_data] 
0-home-dht: Failed to get node-uuid for /seviri/.forward
and after initiating a stop command on the cli
[2015-03-19 10:34:38.484381] E [dht-rebalance.c:1622:gf_defrag_fix_layout] 
0-home-dht: Fix layout failed for 
/seviri/MSG/2007/MSG1_20070106/HRIT_200701060115
[2015-03-19 10:34:38.487426] E [dht-rebalance.c:1622:gf_defrag_fix_layout] 
0-home-dht: Fix layout failed for /seviri/MSG/2007/MSG1_20070106
[2015-03-19 10:34:38.487943] E [dht-rebalance.c:1622:gf_defrag_fix_layout] 
0-home-dht: Fix layout failed for /seviri/MSG/2007
[2015-03-19 10:34:38.488361] E [dht-rebalance.c:1622:gf_defrag_fix_layout] 
0-home-dht: Fix layout failed for /seviri/MSG
[2015-03-19 10:34:38.488801] E [dht-rebalance.c:1622:gf_defrag_fix_layout] 
0-home-dht: Fix layout failed for /seviri
3. We are exclusively accessing the servers through the native gluster fuse 
client, so no NFs mount.
4. The attributes of this specific file are given in my initial post at
 http://www.gluster.org/pipermail/gluster-users/2015-March/021175.html
 
Meanwhile, I launched a full heal and that specific file could be accessed 
normally, after a couple of days of healing... However, I now get in the client 
log (gluster.log) the following messages:
[2015-04-01 15:20:36.218425] E 
[afr-self-heal-common.c:2868:afr_log_self_heal_completion_status] 
2-home-replicate-7:  metadata self heal  failed,   on /seviri
[2015-04-01 15:20:36.218555] E 
[afr-self-heal-common.c:2868:afr_log_self_heal_completion_status] 
2-home-replicate-5:  metadata self heal  failed,   on /seviri
[2015-04-01 15:20:36.218630] E 
[afr-self-heal-common.c:2868:afr_log_self_heal_completion_status] 
2-home-replicate-2:  metadata self heal  failed,   on /seviri
[2015-04-01 15:20:36.218770] E 
[afr-self-heal-common.c:2868:afr_log_self_heal_completion_status] 
2-home-replicate-4:  metadata self heal  failed,   on /seviri
[2015-04-01 15:20:36.218840] E 
[afr-self-heal-common.c:2868:afr_log_self_heal_completion_status] 
2-home-replicate-6:  metadata self heal  failed,   on /seviri
[2015-04-01 15:20:36.218915] E 
[afr-self-heal-common.c:2868:afr_log_self_heal_completion_status] 
2-home-replicate-9:  metadata self heal  failed,   on /seviri
[2015-04-01 15:20:36.218976] E 
[afr-self-heal-common.c:2868:afr_log_self_heal_completion_status] 
2-home-replicate-10:  metadata self heal  failed,   on /seviri
[2015-04-01 15:20:36.219230] E 
[afr-self-heal-common.c:2868:afr_log_self_heal_completion_status] 
2-home-replicate-1:  metadata self heal  failed,   on /seviri
[2015-04-01 15:20:36.220062] E 
[afr-self-heal-common.c:2868:afr_log_self_heal_completion_status] 
2-home-replicate-8:  metadata self heal  failed,   on /seviri
[2015-04-01 15:20:36.236306] E 
[afr-self-heal-common.c:2868:afr_log_self_heal_completion_status] 
2-home-replicate-11:  metadata self heal  failed,   on /seviri
and various other top level directories from / on the volume.
Is there a way to fix this ?


Regards,


A.


On Thursday 26 March 2015 11:36:19 Nithya Balachandran wrote:
> Hi Alessandro,
> 
> Thanks for the information. A few more questions:
> 
> 1. Did you do an add-brick or remove-brick before doing the rebalance? If
> yes, how many bricks did you add/remove? 2. Can you send us the rebalance,
> client(NFS log if you are using an NFS client only)  and brick logs? 3. It
> looks like you are using an NFS client. Can you please confirm? 4. Is
> /home/seviri/.forward the only file on which you are seeing the stale file
> handle errors? Can you please provide the following information for this
> file on all bricks - the xattrs for the parent directory (/home/seviri/) as
> well as the file on each brick - Brick1 to Brick24 - with details of which
> node it is on so we can get a clearer picture. - the ls -li output on the
> bricks for the file on each node.
> 
> 
> As far as I know, there have not been any major changes to rebalance between
> 3.5.3 and 3.6.3 but I will confirm.
> 
> Regards,
> Nithya
> 
> - Original Message -
> From: "Alessandro Ipe" 
> To: "Nithya Balachandran" 
> Cc: gluste

Re: [Gluster-users] Is rebalance completely broken on 3.5.3 ?

2015-03-25 Thread Alessandro Ipe
Hi Nithya,


Thanks for your reply. I am glad that improving the rebalance status will be 
addressed in the (near) future. For my perspective, if the status is giving 
the total files to be scanned together with the files already scanned, it is 
sufficient information. Indeed, the user would see when it would complete (by 
doing several "gluster volume rebalance status" and computing differences 
according to elapsed time between them).

Please find below the answers to your questions:
1. Server and client are version 3.5.3
2. Indeed, I stopped the rebalance through the associated commdn from CLI, 
i.e. gluster  rebalance stop
3. Very limited file operations were carried out through a single client mount 
(servers were almost idle)
4.gluster volume info :
Volume Name: home
Type: Distributed-Replicate
Volume ID: 501741ed-4146-4022-af0b-41f5b1297766
Status: Started
Number of Bricks: 12 x 2 = 24
Transport-type: tcp
Bricks:
Brick1: tsunami1:/data/glusterfs/home/brick1
Brick2: tsunami2:/data/glusterfs/home/brick1
Brick3: tsunami1:/data/glusterfs/home/brick2
Brick4: tsunami2:/data/glusterfs/home/brick2
Brick5: tsunami1:/data/glusterfs/home/brick3
Brick6: tsunami2:/data/glusterfs/home/brick3
Brick7: tsunami1:/data/glusterfs/home/brick4
Brick8: tsunami2:/data/glusterfs/home/brick4
Brick9: tsunami3:/data/glusterfs/home/brick1
Brick10: tsunami4:/data/glusterfs/home/brick1
Brick11: tsunami3:/data/glusterfs/home/brick2
Brick12: tsunami4:/data/glusterfs/home/brick2
Brick13: tsunami3:/data/glusterfs/home/brick3
Brick14: tsunami4:/data/glusterfs/home/brick3
Brick15: tsunami3:/data/glusterfs/home/brick4
Brick16: tsunami4:/data/glusterfs/home/brick4
Brick17: tsunami5:/data/glusterfs/home/brick1
Brick18: tsunami6:/data/glusterfs/home/brick1
Brick19: tsunami5:/data/glusterfs/home/brick2
Brick20: tsunami6:/data/glusterfs/home/brick2
Brick21: tsunami5:/data/glusterfs/home/brick3
Brick22: tsunami6:/data/glusterfs/home/brick3
Brick23: tsunami5:/data/glusterfs/home/brick4
Brick24: tsunami6:/data/glusterfs/home/brick4
Options Reconfigured:
performance.cache-size: 512MB
performance.io-thread-count: 64
performance.flush-behind: off
performance.write-behind-window-size: 4MB
performance.write-behind: on
nfs.disable: on
features.quota: off
cluster.read-hash-mode: 2
diagnostics.brick-log-level: CRITICAL
cluster.lookup-unhashed: on
server.allow-insecure: on
cluster.ensure-durability: on

For the logs, it will be more difficult because it happened several days ago, 
and they were rotated. But I can dig... By the way, do you need a specific 
logfile, because gluster produces a lot of them...

I read in some discussion on the gluster-users mailing list that rebalance on 
version 3.5.x could leave the system with errors when stopped (or even when 
ran up to its completion ?) and that rebalance had gone a complete rewrite in 
3.6.x.  The issue is that I will put back online gluster next week, so my 
colleagues will definitively put it under high load and I was planning to run 
again the rebalance in the background. However, is it advisable ? Or should I 
wait after upgrading to 3.6.3 ?

I also noticed (currently undergoing a full heal on the volume) that accessing 
to some files on the client returned a "Transport endoint is not connected" 
the first time, but any new access was OK (probably due to self-healing). 
However, it is possible to setup a client or a volume parameter to just wait 
(and make the calling process wait) for the self-healing to complete and 
deliver the file the first time without issuing an error (extremely usefull in 
batch/operational processing) ?


Regards,


Alessandro.


On Wednesday 25 March 2015 05:09:38 Nithya Balachandran wrote:
> Hi Alessandro,
> 
> 
> I am sorry to hear that you are facing problems with rebalance.
> 
> Currently rebalance does not have the information as to how many files exist
> on the volume and so cannot calculate/estimate the time it will take to
> complete. Improving the rebalance status output to provide that info is on
> our to-do list already and we will be working on that.
> 
> I have a few questions :
> 
> 1. Which version of Glusterfs are you using?
> 2. How did you stop the rebalance ? I assume you ran "gluster 
> rebalance stop" but just wanted confirmation. 3. What file operations were
> being performed during the rebalance? 4. Can you send the "gluster volume
> info" output as well as the gluster log files?
> 
> Regards,
> Nithya
> 
> - Original Message -
> From: "Alessandro Ipe" 
> To: gluster-users@gluster.org
> Sent: Friday, March 20, 2015 4:52:35 PM
> Subject: [Gluster-users] Is rebalance completely broken on 3.5.3 ?
> 
> 
> 
> Hi,
> 
> 
> 
> 
> 
> After lauching a "rebalance" on an idle gluster system one week ago, its
> status told me it has scanned
> 
> more than 23 millions files on eac

Re: [Gluster-users] Is rebalance completely broken on 3.5.3 ?

2015-03-23 Thread Alessandro Ipe
Hi Olav,


Thanks for the info. I read the whole thread that you sent me... and I am more 
scared 
than ever... The fact that the developers do not have a clue of what is causing 
this 
issue is just frightening.

Concerning my issue, apparently after two days (a full heal is ongoing on the 
volume), 
I did not get any error messages from the client when trying to list the 
incriminate 
files, but I got twice the same file .forward with the same content, size, 
permissions 
and date... which is consistent to what you got previously... I simply remove 
TWICE 
the file with rm on the client and copy back a sane version. The one million 
dollar 
question is : are there more files in a similar state on my 90 TB volume ? I am 
delaying a find on the whole volume to find out...

What also concerns me is the absence of aknowledgement or reply from the 
developers concerning this severe issue... The fact that only end-users on 
production 
setup hit this issue while it cannot be reproduced in labs should be a clear 
signal that 
this should addressed in priority, from my point of view. And lab testing 
should also try 
to mimic real life use, with bricks servers under heavy load (> 10), with 
several tens 
of client accessing the gluster volume to track down all possible issues 
resulting from 
either network, i/o, ... timeouts.


Thanks for your help,


Alessandro.


___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

[Gluster-users] Is rebalance completely broken on 3.5.3 ?

2015-03-20 Thread Alessandro Ipe
Hi,


After lauching a "rebalance" on an idle gluster system one week ago, its status 
told me it has scanned
 more than 23 millions files on each of my 6 bricks. However, without knowing 
at least the total files to
 be scanned, this status is USELESS from an end-user perspective, because it 
does not allow you to
 know  WHEN the rebalance could eventually complete (one day, one week, one 
year or never). From
 my point of view, the total files per bricks could be obtained and maintained 
when activating quota,
 since the whole filesystem has to be crawled...

After one week being offline and still no clue when the rebalance would 
complete, I decided to stop it...
 Enormous mistake... It seems that rebalance cannot manage to not screw some 
files. Example, on
 the only client mounting the gluster system, "ls -la /home/seviri" returns
ls: cannot access /home/seviri/.forward: Stale NFS file handle
ls: cannot access /home/seviri/.forward: Stale NFS file handle
-?  ? ?  ? ?? .forward
-?  ? ?  ? ?? .forward
while this file could perfectly be accessed before (being rebalanced) and has 
not been modifed for at
 least 3 years.

Getting the extended attributes on the various bricks 3, 4, 5, 6 (3-4 
replicate, 5-6 replicate)
*Brick 3:*
ls -l /data/glusterfs/home/brick?/seviri/.forward
-rw-r--r-- 2 seviri users 68 May 26  2014 
/data/glusterfs/home/brick1/seviri/.forward
-rw-r--r-- 2 seviri users 68 Mar 10 10:22 
/data/glusterfs/home/brick2/seviri/.forward

getfattr -d -m . -e hex /data/glusterfs/home/brick?/seviri/.forward
# file: data/glusterfs/home/brick1/seviri/.forward
trusted.afr.home-client-8=0x
trusted.afr.home-client-9=0x
trusted.gfid=0xc1d268beb17443a39d914de917de123a

# file: data/glusterfs/home/brick2/seviri/.forward
trusted.afr.home-client-10=0x
trusted.afr.home-client-11=0x
trusted.gfid=0x14a1c10eb1474ef2bf72f4c6c64a90ce
trusted.glusterfs.quota.4138a9fa-a453-4b8e-905a-e02cce07d717.contri=0x0200
trusted.pgfid.4138a9fa-a453-4b8e-905a-e02cce07d717=0x0001

*Brick 4:*
ls -l /data/glusterfs/home/brick?/seviri/.forward
-rw-r--r-- 2 seviri users 68 May 26  2014 
/data/glusterfs/home/brick1/seviri/.forward
-rw-r--r-- 2 seviri users 68 Mar 10 10:22 
/data/glusterfs/home/brick2/seviri/.forward

getfattr -d -m . -e hex /data/glusterfs/home/brick?/seviri/.forward
# file: data/glusterfs/home/brick1/seviri/.forward
trusted.afr.home-client-8=0x
trusted.afr.home-client-9=0x
trusted.gfid=0xc1d268beb17443a39d914de917de123a

# file: data/glusterfs/home/brick2/seviri/.forward
trusted.afr.home-client-10=0x
trusted.afr.home-client-11=0x
trusted.gfid=0x14a1c10eb1474ef2bf72f4c6c64a90ce
trusted.glusterfs.quota.4138a9fa-a453-4b8e-905a-e02cce07d717.contri=0x0200
trusted.pgfid.4138a9fa-a453-4b8e-905a-e02cce07d717=0x0001

*Brick 5:*
ls -l /data/glusterfs/home/brick?/seviri/.forward
-T 2 root root 0 Mar 18 08:19 
/data/glusterfs/home/brick2/seviri/.forward

getfattr -d -m . -e hex /data/glusterfs/home/brick?/seviri/.forward
# file: data/glusterfs/home/brick2/seviri/.forward
trusted.gfid=0x14a1c10eb1474ef2bf72f4c6c64a90ce
trusted.glusterfs.dht.linkto=0x686f6d652d7265706c69636174652d3400

*Brick 6:*
ls -l /data/glusterfs/home/brick?/seviri/.forward
-T 2 root root 0 Mar 18 08:19 
/data/glusterfs/home/brick2/seviri/.forward

getfattr -d -m . -e hex /data/glusterfs/home/brick?/seviri/.forward
# file: data/glusterfs/home/brick2/seviri/.forward
trusted.gfid=0x14a1c10eb1474ef2bf72f4c6c64a90ce
trusted.glusterfs.dht.linkto=0x686f6d652d7265706c69636174652d3400

Looking at the results from bricks 3 & 4 shows something weird. The file exists 
on 2 sub-bricks
 storage directories, while it should only be found once on each brick server. 
Or is the issue lying in the
 results of bricks 5 & 6 ? *How can I fix this, please* ? By the way, the 
split-brain tutorial only covers
 BASIC split-brain conditions and not complex (real life) cases like this one. 
It would definitely benefit if
 enriched by this one.

More generally, I think the concept of gluster is promising, but if basic 
commands (rebalance,
 absolutely needed after adding more storage) from its own cli allows to put 
the system into an
 unstable state, I am really starting to question its ability to be used in a 
production environment. And
 from an end-user perspective, I do not care about new features added, no 
matter how appealing they
 could be, if the basic ones are not almost totally reliable. Finally, testing 
gluster under high load on the
 brick servers (real world conditions) would certainly gives insight to the 
developpers on what it failing
 and what needs therefore to be fixed to mitigate this and improve gluster 
reliability.

Forgive my harsh words/criticisms, but 

[Gluster-users] Missing extended attributes to some files on the bricks

2015-03-13 Thread Alessandro Ipe
Hi,


Apparently, this occured after a failed rebalance due to exhaution of 
available disk space on the bricks.

On the client, an ls on the directory gives
ls: cannot access .inputrc: No such file or directory
and displays
??  ? ?? ?? .inputrc

Getting the attributes on the bricks gives on brick server 1
NOTHING !!!

while on brick server 2
# file: data/glusterfs/home/brick1/aipe/.inputrc
trusted.afr.home-client-0=0x
trusted.afr.home-client-1=0x
trusted.gfid=0xeed4fb5048b8a0320e8632f34ed3
trusted.glusterfs.quota.c7ee612b-0dfe-4832-9efe-531040c696fd.contri=0x0400
trusted.pgfid.c7ee612b-0dfe-4832-9efe-531040c696fd=0x0001


Any clue on how to fix, i.e. force healing from brick server 2 to brick server 
1, so the file gets its correct attributes and a hardlink in the .glusterfs 
directory ?


Many thanks, 


Alessandro.


___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] Rebalance issue on 3.5.3

2015-03-12 Thread Alessandro Ipe
Hi,


The extended attrbitutes are (according to the brick number)
1. # file: data/glusterfs/home/brick1/aipe/.xinitrc.template
trusted.gfid=0x67bf3db057474c0a892f459b6c622ee8
trusted.glusterfs.dht.linkto=0x686f6d652d7265706c69636174652d3500
trusted.pgfid.c7ee612b-0dfe-4832-9efe-531040c696fd=0x0001

2. # file: data/glusterfs/home/brick1/aipe/.xinitrc.template
trusted.gfid=0x67bf3db057474c0a892f459b6c622ee8
trusted.glusterfs.dht.linkto=0x686f6d652d7265706c69636174652d3500
trusted.pgfid.c7ee612b-0dfe-4832-9efe-531040c696fd=0x0001

Stat'ing these two gives me 0-size file

3. # file: data/glusterfs/home/brick2/aipe/.xinitrc.template
trusted.afr.home-client-10=0x
trusted.afr.home-client-11=0x
trusted.gfid=0x67bf3db057474c0a892f459b6c622ee8
trusted.glusterfs.quota.c7ee612b-0dfe-4832-9efe-531040c696fd.contri=0x0
600
trusted.pgfid.c7ee612b-0dfe-4832-9efe-531040c696fd=0x0001

4. # file: data/glusterfs/home/brick2/aipe/.xinitrc.template
trusted.afr.home-client-10=0x
trusted.afr.home-client-11=0x
trusted.gfid=0x67bf3db057474c0a892f459b6c622ee8
trusted.glusterfs.quota.c7ee612b-0dfe-4832-9efe-531040c696fd.contri=0x0
600
trusted.pgfid.c7ee612b-0dfe-4832-9efe-531040c696fd=0x0001

These two are non-0-size files.


Thanks,


A.


On Wednesday 11 March 2015 07:51:59 Joe Julian wrote:


Those files are dht link files. Check out the extended attributes, "getfattr -m 
. -d" 



On March 10, 2015 7:30:33 AM PDT, Alessandro Ipe  
wrote:
Hi,


I launched a couple a days ago a rebalance on my gluster distribute-replicate 
volume 
(see below) through its CLI, while allowing my users to continue using the 
volume.

Yesterday, they managed to fill completely the volume. It now results in 
unavailable 
files on the client (using fuse) with the message "Transport endpoint is not 
connected". Investigating  to associated files on the bricks, I noticed that 
these are 
displayed with ls -l as 
-T 2 user group 0 Jan 15 22:00 file
Performing a 
ls -lR /data/glusterfs/home/brick1/* | grep -F -- "-T"
on a single brick gave me a LOT of files in that above-mentioned state.

Why are the files in that state ?

Did I lose all these files or can they still be recovered from the replicate 
copy of 
another brick ?


Regards,


Alessandro.


gluster volume info home output:
Volume Name: home
Type: Distributed-Replicate
Volume ID: 501741ed-4146-4022-af0b-41f5b1297766
Status: Started
Number of Bricks: 12 x 2 = 24
Transport-type: tcp
Bricks:
Brick1: tsunami1:/data/glusterfs/home/brick1
Brick2: tsunami2:/data/glusterfs/home/brick1
Brick3: tsunami1:/data/glusterfs/home/brick2
Brick4: tsunami2:/data/glusterfs/home/brick2
Brick5: tsunami1:/data/glusterfs/home/brick3
Brick6: tsunami2:/data/glusterfs/home/brick3
Brick7: tsunami1:/data/glusterfs/home/brick4
Brick8: tsunami2:/data/glusterfs/home/brick4
Brick9: tsunami3:/data/glusterfs/home/brick1
Brick10: tsunami4:/data/glusterfs/home/brick1
Brick11: tsunami3:/data/glusterfs/home/brick2
Brick12: tsunami4:/data/glusterfs/home/brick2
Brick13: tsunami3:/data/glusterfs/home/brick3
Brick14: tsunami4:/data/glusterfs/home/brick3
Brick15: tsunami3:/data/glusterfs/home/brick4
Brick16: tsunami4:/data/glusterfs/home/brick4
Brick17: tsunami5:/data/glusterfs/home/brick1
Brick18: tsunami6:/data/glusterfs/home/brick1
Brick19: tsunami5:/data/glusterfs/home/brick2
Brick20: tsunami6:/data/glusterfs/home/brick2
Brick21: tsunami5:/data/glusterfs/home/brick3
Brick22: tsunami6:/data/glusterfs/home/brick3
Brick23: tsunami5:/data/glusterfs/home/brick4
Brick24: tsunami6:/data/glusterfs/home/brick4
Options Reconfigured:
features.default-soft-limit: 95%
cluster.ensure-durability: off
performance.cache-size: 512MB
performance.io-thread-count: 64
performance.flush-behind: off
performance.write-behind-window-size: 4MB
performance.write-behind: on
nfs.disable: on___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Input/output error when trying to access a file on client

2015-03-12 Thread Alessandro Ipe
Hi,


In fact, going up one directory level (root of the gluster volume), I get 
similarly
1. # file: data/glusterfs/md1/brick1/
trusted.afr.md1-client-0=0x
trusted.afr.md1-client-1=0x
trusted.gfid=0x0001
trusted.glusterfs.dht=0x0001
trusted.glusterfs.volume-id=0x6da4b9151def4df4a41c2f3300ebf16b
2. # file: data/glusterfs/md1/brick1/
trusted.afr.md1-client-0=0x
trusted.afr.md1-client-1=0x
trusted.gfid=0x0001
trusted.glusterfs.dht=0x0001
trusted.glusterfs.volume-id=0x6da4b9151def4df4a41c2f3300ebf16b
3. # file: data/glusterfs/md1/brick1/
trusted.afr.md1-client-2=0x
trusted.afr.md1-client-3=0x
trusted.gfid=0x0001
trusted.glusterfs.dht=0x00015554
trusted.glusterfs.volume-id=0x6da4b9151def4df4a41c2f3300ebf16b
4. # file: data/glusterfs/md1/brick1/
trusted.afr.md1-client-2=0x
trusted.afr.md1-client-3=0x
trusted.gfid=0x0001
trusted.glusterfs.dht=0x00015554
trusted.glusterfs.volume-id=0x6da4b9151def4df4a41c2f3300ebf16b

These four bricks seem consistent, while the remaining two

5. # file: data/glusterfs/md1/brick1/
trusted.afr.md1-client-0=0x
trusted.afr.md1-client-1=0x
trusted.afr.md1-client-4=0x
trusted.afr.md1-client-5=0x0002
trusted.gfid=0x0001
trusted.glusterfs.dht=0x0001aaa9
trusted.glusterfs.volume-id=0x6da4b9151def4df4a41c2f3300ebf16b
6. # file: data/glusterfs/md1/brick1/
trusted.afr.md1-client-0=0x
trusted.afr.md1-client-1=0x
trusted.afr.md1-client-4=0x0001
trusted.afr.md1-client-5=0x
trusted.gfid=0x0001
trusted.glusterfs.dht=0x0001aaa9
trusted.glusterfs.volume-id=0x6da4b9151def4df4a41c2f3300ebf16b

show two extra entries trusted.afr.md1-client-0 & trusted.afr.md1-client-1 and
inconsistency between trusted.afr.md1-client-4 & trusted.afr.md1-client-5.

Could it be this issue which propagates to all subdirectories in the volume and 
thus 
results in the error message in the client log file ?

Should I remove trusted.afr.md1-client-0 & trusted.afr.md1-client-1 from brick5 
& 
brick 6 ?

Meanwhile, I am performing on the client 
find /home/.md1 -type f -exec cat {} > /dev/null \;
to check if I can access the content of all files on the volume. For the 
moment, only 4 
files gave errors.

It is quite frustrating, because I believe that all my data is still intact on 
the bricks and 
it seems that it is only the metadata which got screwed... I am reluctant to 
perform 
something to heal by myself, because I have the feeling that it could do more 
harm 
than good. 

It's been more than 2 days now that my colleagues cannot access the data and I 
cannot make them wait much longer...


A.


On Thursday 12 March 2015 12:59:00 Alessandro Ipe wrote:


Hi,


Sorry about that, I thought I was using the -e hex... I must have removed it at 
some 
point accidentally.

Here they are
1. # file: data/glusterfs/md1/brick1/root
trusted.afr.md1-client-0=0x
trusted.afr.md1-client-1=0x
trusted.gfid=0xdc398cbd2ab440ec9fed3d5937654f4b
trusted.glusterfs.dht=0x0001

2. # file: data/glusterfs/md1/brick1/root
trusted.afr.md1-client-0=0x
trusted.afr.md1-client-1=0x
trusted.gfid=0xdc398cbd2ab440ec9fed3d5937654f4b
trusted.glusterfs.dht=0x0001

3. # file: data/glusterfs/md1/brick1/root
trusted.afr.md1-client-2=0x
trusted.afr.md1-client-3=0x0001
trusted.gfid=0xdc398cbd2ab440ec9fed3d5937654f4b
trusted.glusterfs.dht=0x00015554

4. getfattr: Removing leading '/' from absolute path names
# file: data/glusterfs/md1/brick1/root
trusted.afr.md1-client-2=0x0001
trusted.afr.md1-client-3=0x
trusted.gfid=0xdc398cbd2ab440ec9fed3d5937654f4b
trusted.glusterfs.dht=0x00015554

5. # file: data/glusterfs/md1/brick1/root
trusted.afr.md1-client-4=0x
trusted.afr.md1-client-5=0x0001
trusted.gfid=0xdc398cbd2ab440ec9fed3d5937654f4b
trusted.glusterfs.dht=0x0001aaa9

6. # file: data/glusterfs/md1/brick1/root
trusted.afr.md1-client-4=0x0001
trusted.afr.md1-client-5=0x___
Gluster-users mailing list
Gluster-users@

Re: [Gluster-users] Input/output error when trying to access a file on client

2015-03-12 Thread Alessandro Ipe
Hi,


Doing splitmount localhost md1 . and ls -l gives me
total 8
drwxr-xr-x 12 root root  426 Jan 19 11:04 r1
drwxr-xr-x 12 root root  426 Jan 19 11:04 r2
-rw---  1 root root 2840 Mar 12 12:08 tmp7TLytQ
-rw---  1 root root 2840 Mar 12 12:08 tmptI3gv_

Doing ls -l r1/root/bash_cmd/ gives me
total 5
-rwxr-xr-x 1 root root  212 Nov 21 17:50 ira
-rwxr-xr-x 1 root root 2311 Nov 21 17:50 listing
drwxr-xr-x 2 root root   52 Jan 19 11:24 mbl
-rwxr-xr-x 1 root root 1210 Nov 21 17:50 viewhdf

while doing ls -l r1/root/bash_cmd/mbl/ gives me
ls: cannot access r1/root/bash_cmd/mbl/mbl.c: Software caused connection abort
ls: reading directory r1/root/bash_cmd/mbl/: Transport endpoint is not connected
total 0
?? ? ? ? ?? mbl.c


A.



On Wednesday 11 March 2015 07:52:11 Joe Julian wrote:


http://joejulian.name/blog/glusterfs-split-brain-recovery-made-easy/[1]

On March 11, 2015 4:24:09 AM PDT, Alessandro Ipe  
wrote:
Well, it is even worse. Now when doing  a "ls -R" on the volume results in a 
lot of 

[2015-03-11 11:18:31.957505] E [afr-self-heal-
common.c:233:afr_sh_print_split_brain_log] 0-md1-replicate-2: Unable to 
self-heal 
contents of '/library' (possible split-brain). Please delete the file from all 
but the 
preferred subvolume.- Pending matrix:  [ [ 0 2 ] [ 1 0 ] ][2015-03-11 
11:18:31.957692] 
E [afr-self-heal-common.c:2868:afr_log_self_heal_completion_status] 0-md1-
replicate-2:  metadata self heal  failed,   on /library

I am desperate...


A.


On Wednesday 11 March 2015 12:05:33 you wrote:


 Hi,   When trying to access a file on a gluster client (through fuse), I get 
an 
"Input/output error" message.  Getting the attributes for the file gives me for 
the first
brick # file: data/glusterfs/md1/brick1/kvm/hail/hail_home.qcow2 
trusted.afr.md1-
client-2=0s trusted.afr.md1-client-3=0sAAABdAAA 
trusted.gfid=0sOCFPGCdrQ9uyq2yTTPCKqQ==  while for the second (replicate) 
brick # file: data/glusterfs/md1/brick1/kvm/hail/hail_home.qcow2 
trusted.afr.md1-
client-2=0sAAABJAAA trusted.afr.md1-client-3=0s 
trusted.gfid=0sOCFPGCdrQ9uyq2yTTPCKqQ==  It seems that I have a split-brain. 
How can I solve this issue by resetting the attributes, please ?   Thanks,   
Alessandro.  
== gluster volume info md1  Volume Name: md1 Type: 
Distributed-Replicate Volume ID: 6da4b915-1def-4df4-a41c-2f3300ebf16b Status: 
Started Number of Bricks: 3 x 2 = 6 Transport-type: tcp Bricks: Brick1: 
tsunami1:/data/glusterfs/md1/brick1


Brick2: tsunami2:/data/glusterfs/md1/brick1 Brick3: 
tsunami3:/data/glusterfs/md1/brick1 Brick4: tsunami4:/data/glusterfs/md1/brick1 
Brick5: tsunami5:/data/glusterfs/md1/brick1 Brick6: 
tsunami6:/data/glusterfs/md1/brick1 Options Reconfigured: 
server.allow-insecure: on 
cluster.read-hash-mode: 2 features.quota: off performance.write-behind: on 
performance.write-behind-window-size: 4MB performance.flush-behind: off 
performance.io[2]-thread-count: 64 performance.cache-size: 512MB nfs.disable: 
on 
cluster.lookup-unhashed: off






http://www.gluster.org/mailman/listinfo/gluster-users[3]





-- 

 Dr. Ir. Alessandro Ipe   
 Department of Observations Tel. +32 2 373 06 31
 Remote Sensing from Space  Fax. +32 2 374 67 88  
 Royal Meteorological Institute  
 Avenue Circulaire 3Email:  
 B-1180 BrusselsBelgium alessandro@meteo.be 
 Web: http://gerb.oma.be   



[1] http://joejulian.name/blog/glusterfs-split-brain-recovery-made-easy/
[2] http://performance.io
[3] http://www.gluster.org/mailman/listinfo/gluster-users
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Input/output error when trying to access a file on client

2015-03-12 Thread Alessandro Ipe
Hi,


Sorry about that, I thought I was using the -e hex... I must have removed it at 
some 
point accidentally.

Here they are
1. # file: data/glusterfs/md1/brick1/root
trusted.afr.md1-client-0=0x
trusted.afr.md1-client-1=0x
trusted.gfid=0xdc398cbd2ab440ec9fed3d5937654f4b
trusted.glusterfs.dht=0x0001

2. # file: data/glusterfs/md1/brick1/root
trusted.afr.md1-client-0=0x
trusted.afr.md1-client-1=0x
trusted.gfid=0xdc398cbd2ab440ec9fed3d5937654f4b
trusted.glusterfs.dht=0x0001

3. # file: data/glusterfs/md1/brick1/root
trusted.afr.md1-client-2=0x
trusted.afr.md1-client-3=0x0001
trusted.gfid=0xdc398cbd2ab440ec9fed3d5937654f4b
trusted.glusterfs.dht=0x00015554

4. getfattr: Removing leading '/' from absolute path names
# file: data/glusterfs/md1/brick1/root
trusted.afr.md1-client-2=0x0001
trusted.afr.md1-client-3=0x
trusted.gfid=0xdc398cbd2ab440ec9fed3d5937654f4b
trusted.glusterfs.dht=0x00015554

5. # file: data/glusterfs/md1/brick1/root
trusted.afr.md1-client-4=0x
trusted.afr.md1-client-5=0x0001
trusted.gfid=0xdc398cbd2ab440ec9fed3d5937654f4b
trusted.glusterfs.dht=0x0001aaa9

6. # file: data/glusterfs/md1/brick1/root
trusted.afr.md1-client-4=0x0001
trusted.afr.md1-client-5=0x
trusted.gfid=0xdc398cbd2ab440ec9fed3d5937654f4b
trusted.glusterfs.dht=0x0001aaa9


Thanks for your help,


A.


On Thursday 12 March 2015 07:51:40 Krutika Dhananjay wrote:


Hi,




Could you provide the xattrs in hex format?


You can execute `getfattr -d -m . -e hex 
`


-Krutika



*From: *"Alessandro Ipe" 

*To: *"Krutika Dhananjay" 

*Cc: *gluster-users@gluster.org

*Sent: *Thursday, March 12, 2015 5:15:08 PM

*Subject: *Re: [Gluster-users] Input/output error when trying to access a file 
on client




Hi,


Actually, my gluster volume is distribute-replicate so I should provide the 
attributes on 
all the bricks. Here they are:
1. # file: data/glusterfs/md1/brick1/root
trusted.afr.md1-client-0=0s
trusted.afr.md1-client-1=0s
trusted.gfid=0s3DmMvSq0QOyf7T1ZN2VPSw==
trusted.glusterfs.dht=0sAQCq/w==

2. # file: data/glusterfs/md1/brick1/root
trusted.afr.md1-client-0=0s
trusted.afr.md1-client-1=0s
trusted.gfid=0s3DmMvSq0QOyf7T1ZN2VPSw==
trusted.glusterfs.dht=0sAQCq/w==

3. # file: data/glusterfs/md1/brick1/root
trusted.afr.md1-client-2=0s
trusted.afr.md1-client-3=0sAAEA
trusted.gfid=0s3DmMvSq0QOyf7T1ZN2VPSw==
trusted.glusterfs.dht=0sAQAAVA==

4. # file: data/glusterfs/md1/brick1/root
trusted.afr.md1-client-2=0sAAEA
trusted.afr.md1-client-3=0s
trusted.gfid=0s3DmMvSq0QOyf7T1ZN2VPSw==
trusted.glusterfs.dht=0sAQAAVA==

5. # file: data/glusterfs/md1/brick1/root
trusted.afr.md1-client-4=0s
trusted.afr.md1-client-5=0sAAEA
trusted.gfid=0s3DmMvSq0QOyf7T1ZN2VPSw==
trusted.glusterfs.dht=0sAQBVqQ==

6. # file: data/glusterfs/md1/brick1/root___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Input/output error when trying to access a file on client

2015-03-12 Thread Alessandro Ipe
Hi,


Actually, my gluster volume is distribute-replicate so I should provide the 
attributes on 
all the bricks. Here they are:
1. # file: data/glusterfs/md1/brick1/root
trusted.afr.md1-client-0=0s
trusted.afr.md1-client-1=0s
trusted.gfid=0s3DmMvSq0QOyf7T1ZN2VPSw==
trusted.glusterfs.dht=0sAQCq/w==

2. # file: data/glusterfs/md1/brick1/root
trusted.afr.md1-client-0=0s
trusted.afr.md1-client-1=0s
trusted.gfid=0s3DmMvSq0QOyf7T1ZN2VPSw==
trusted.glusterfs.dht=0sAQCq/w==

3. # file: data/glusterfs/md1/brick1/root
trusted.afr.md1-client-2=0s
trusted.afr.md1-client-3=0sAAEA
trusted.gfid=0s3DmMvSq0QOyf7T1ZN2VPSw==
trusted.glusterfs.dht=0sAQAAVA==

4. # file: data/glusterfs/md1/brick1/root
trusted.afr.md1-client-2=0sAAEA
trusted.afr.md1-client-3=0s
trusted.gfid=0s3DmMvSq0QOyf7T1ZN2VPSw==
trusted.glusterfs.dht=0sAQAAVA==

5. # file: data/glusterfs/md1/brick1/root
trusted.afr.md1-client-4=0s
trusted.afr.md1-client-5=0sAAEA
trusted.gfid=0s3DmMvSq0QOyf7T1ZN2VPSw==
trusted.glusterfs.dht=0sAQBVqQ==

6. # file: data/glusterfs/md1/brick1/root
trusted.afr.md1-client-4=0sAAEA
trusted.afr.md1-client-5=0s
trusted.gfid=0s3DmMvSq0QOyf7T1ZN2VPSw==
trusted.glusterfs.dht=0sAQBVqQ==

so it seems in fact that there are discrepancies between 3-4 and 5-6 (replicate 
pairs).


A.


On Thursday 12 March 2015 11:33:00 Alessandro Ipe wrote:


Hi,


"gluster volume heal md1 info split-brain" returns approximatively 2000 files 
(already 
divided by 2
 due to replicate volume). So manually repairing each split-brain is 
unfeasable. Before 
scripting some
 procedure, I need to be sure that I will not harm further the gluster system.

Moreover, I noticed that the messages printed in the logs are all about 
directories,
 e.g.
[2015-03-12 10:06:53.423856] E [afr-self-heal-
common.c:233:afr_sh_print_split_brain_log] 0-md1-replicate-1: Unable to 
self-heal 
contents of '/root' (possible split-brain). Please delete the file from all but 
the preferred 
subvolume.- Pending matrix:  [ [ 0 1 ] [ 1 0 ] ]
[2015-03-12 10:06:53.424005] E [afr-self-heal-
common.c:233:afr_sh_print_split_brain_log] 0-md1-replicate-2: Unable to 
self-heal 
contents of '/root' (possible split-brain). Please delete the file from all but 
the preferred 
subvolume.- Pending matrix:  [ [ 0 1 ] [ 1 0 ] ]
[2015-03-12 10:06:53.424110] E [afr-self-heal-
common.c:2868:afr_log_self_heal_completion_status] 0-md1-replicate-1:  metadata 
self heal  failed,   on /root
[2015-03-12 10:06:53.424290] E [afr-self-heal-
common.c:2868:afr_log_self_heal_completion_status] 0-md1-replicate-2:  metadata 
self heal  failed,   on /root

Getting the attributes of that directory on each brick gives me for the first
# file: data/glusterfs/md1/brick1/root
trusted.afr.md1-client-0=0s
trusted.afr.md1-client-1=0s
trusted.gfid=0s3DmMvSq0QOyf7T1ZN2VPSw==
trusted.glusterfs.dht=0sAQCq/w==

and for the second
# file: data/glusterfs/md1/brick1/root
trusted.afr.md1-client-0=0s
trusted.afr.md1-client-1=0s
trusted.gfid=0s3DmMvSq0QOyf7T1ZN2VPSw==
trusted.glusterfs.dht=0sAQCq/w==

so it seems that there are both rigorously identical. However, according to 
your split -
brain tutorial, 
 none of them has 0x. What 0s 
means
 in fact ?

Should I change both attributes on each directory to 0x 
?


Many thanks,


A.


On Wednesday 11 March 2015 08:02:56 Krutika Dhananjay wrote:


Hi,




Have you gone 
through 
https://github.com/gluster/glusterfs/blob/master/doc/debugging/split-brain.md[1]
 ?
If not, could you go through that once and try the steps given there? Do let us 
know if 
something is not clear in the doc.___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Input/output error when trying to access a file on client

2015-03-12 Thread Alessandro Ipe
Hi,


"gluster volume heal md1 info split-brain" returns approximatively 2000 files 
(already divided by 2
 due to replicate volume). So manually repairing each split-brain is 
unfeasable. Before scripting some
 procedure, I need to be sure that I will not harm further the gluster system.

Moreover, I noticed that the messages printed in the logs are all about 
directories,
 e.g.
[2015-03-12 10:06:53.423856] E 
[afr-self-heal-common.c:233:afr_sh_print_split_brain_log] 0-md1-replicate-1: 
Unable to self-heal contents of '/root' (possible split-brain). Please delete 
the file from all but the preferred subvolume.- Pending matrix:  [ [ 0 1 ] [ 1 
0 ] ]
[2015-03-12 10:06:53.424005] E 
[afr-self-heal-common.c:233:afr_sh_print_split_brain_log] 0-md1-replicate-2: 
Unable to self-heal contents of '/root' (possible split-brain). Please delete 
the file from all but the preferred subvolume.- Pending matrix:  [ [ 0 1 ] [ 1 
0 ] ]
[2015-03-12 10:06:53.424110] E 
[afr-self-heal-common.c:2868:afr_log_self_heal_completion_status] 
0-md1-replicate-1:  metadata self heal  failed,   on /root
[2015-03-12 10:06:53.424290] E 
[afr-self-heal-common.c:2868:afr_log_self_heal_completion_status] 
0-md1-replicate-2:  metadata self heal  failed,   on /root

Getting the attributes of that directory on each brick gives me for the first
# file: data/glusterfs/md1/brick1/root
trusted.afr.md1-client-0=0s
trusted.afr.md1-client-1=0s
trusted.gfid=0s3DmMvSq0QOyf7T1ZN2VPSw==
trusted.glusterfs.dht=0sAQCq/w==

and for the second
# file: data/glusterfs/md1/brick1/root
trusted.afr.md1-client-0=0s
trusted.afr.md1-client-1=0s
trusted.gfid=0s3DmMvSq0QOyf7T1ZN2VPSw==
trusted.glusterfs.dht=0sAQCq/w==

so it seems that there are both rigorously identical. However, according to 
your split -brain tutorial, 
 none of them has 0x. What 0s means
 in fact ?

Should I change both attributes on each directory to 0x 
?


Many thanks,


A.


On Wednesday 11 March 2015 08:02:56 Krutika Dhananjay wrote:


Hi,




Have you gone through 
https://github.com/gluster/glusterfs/blob/master/doc/debugging/split-brain.md[1]
 ?
If not, could you go through that once and try the steps given there? Do let us 
know if something is not clear in the doc.


-Krutika


--------
*From: *"Alessandro Ipe" 

*To: *gluster-users@gluster.org

*Sent: *Wednesday, March 11, 2015 4:54:09 PM

*Subject: *Re: [Gluster-users] Input/output error when trying to access a file  
  on client




Well, it is even worse. Now when doing  a "ls -R" on the volume results in a 
lot of 




[2015-03-11 11:18:31.957505] E 
[afr-self-heal-common.c:233:afr_sh_print_split_brain_log] 0-md1-replicate-2: 
Unable to self-heal contents of '/library' (possible split-brain). Please 
delete the file from all but the preferred subvolume.- Pending matrix:  [ [ 0 2 
] [ 1 0 ] ][2015-03-11 11:18:31.957692] E 
[afr-self-heal-common.c:2868:afr_log_self_heal_completion_status] 
0-md1-replicate-2:  metadata self heal  failed,   on /library




I am desperate...












___Gluster-users mailing 
listGluster-users@gluster.orghttp://www.gluster.org/mailman/listinfo/gluster-users








[1] 
https://github.com/gluster/glusterfs/blob/master/doc/debugging/split-brain.md
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Input/output error when trying to access a file on client

2015-03-11 Thread Alessandro Ipe
Well, it is even worse. Now when doing  a "ls -R" on the volume results in a 
lot of 

[2015-03-11 11:18:31.957505] E 
[afr-self-heal-common.c:233:afr_sh_print_split_brain_log] 0-md1-replicate-2: 
Unable to self-heal contents of '/library' (possible split-brain). Please 
delete the file from all but the preferred subvolume.- Pending matrix:  [ [ 0 2 
] [ 1 0 ] ]
[2015-03-11 11:18:31.957692] E 
[afr-self-heal-common.c:2868:afr_log_self_heal_completion_status] 
0-md1-replicate-2:  metadata self heal  failed,   on /library

I am desperate...


A.


On Wednesday 11 March 2015 12:05:33 you wrote:
> Hi,
> 
> 
> When trying to access a file on a gluster client (through fuse), I get an
> "Input/output error" message.
> 
> Getting the attributes for the file gives me for the first brick
> # file: data/glusterfs/md1/brick1/kvm/hail/hail_home.qcow2
> trusted.afr.md1-client-2=0s
> trusted.afr.md1-client-3=0sAAABdAAA
> trusted.gfid=0sOCFPGCdrQ9uyq2yTTPCKqQ==
> 
> while for the second (replicate) brick
> # file: data/glusterfs/md1/brick1/kvm/hail/hail_home.qcow2
> trusted.afr.md1-client-2=0sAAABJAAA
> trusted.afr.md1-client-3=0s
> trusted.gfid=0sOCFPGCdrQ9uyq2yTTPCKqQ==
> 
> It seems that I have a split-brain. How can I solve this issue by resetting
> the attributes, please ?
> 
> 
> Thanks,
> 
> 
> Alessandro.
> 
> ==
> gluster volume info md1
> 
> Volume Name: md1
> Type: Distributed-Replicate
> Volume ID: 6da4b915-1def-4df4-a41c-2f3300ebf16b
> Status: Started
> Number of Bricks: 3 x 2 = 6
> Transport-type: tcp
> Bricks:
> Brick1: tsunami1:/data/glusterfs/md1/brick1
> Brick2: tsunami2:/data/glusterfs/md1/brick1
> Brick3: tsunami3:/data/glusterfs/md1/brick1
> Brick4: tsunami4:/data/glusterfs/md1/brick1
> Brick5: tsunami5:/data/glusterfs/md1/brick1
> Brick6: tsunami6:/data/glusterfs/md1/brick1
> Options Reconfigured:
> server.allow-insecure: on
> cluster.read-hash-mode: 2
> features.quota: off
> performance.write-behind: on
> performance.write-behind-window-size: 4MB
> performance.flush-behind: off
> performance.io-thread-count: 64
> performance.cache-size: 512MB
> nfs.disable: on
> cluster.lookup-unhashed: off

___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users


[Gluster-users] Input/output error when trying to access a file on client

2015-03-11 Thread Alessandro Ipe
Hi,


When trying to access a file on a gluster client (through fuse), I get an 
"Input/output error" message.

Getting the attributes for the file gives me for the first brick
# file: data/glusterfs/md1/brick1/kvm/hail/hail_home.qcow2
trusted.afr.md1-client-2=0s
trusted.afr.md1-client-3=0sAAABdAAA
trusted.gfid=0sOCFPGCdrQ9uyq2yTTPCKqQ==

while for the second (replicate) brick
# file: data/glusterfs/md1/brick1/kvm/hail/hail_home.qcow2
trusted.afr.md1-client-2=0sAAABJAAA
trusted.afr.md1-client-3=0s
trusted.gfid=0sOCFPGCdrQ9uyq2yTTPCKqQ==

It seems that I have a split-brain. How can I solve this issue by resetting 
the attributes, please ?


Thanks,


Alessandro.

==
gluster volume info md1
 
Volume Name: md1
Type: Distributed-Replicate
Volume ID: 6da4b915-1def-4df4-a41c-2f3300ebf16b
Status: Started
Number of Bricks: 3 x 2 = 6
Transport-type: tcp
Bricks:
Brick1: tsunami1:/data/glusterfs/md1/brick1
Brick2: tsunami2:/data/glusterfs/md1/brick1
Brick3: tsunami3:/data/glusterfs/md1/brick1
Brick4: tsunami4:/data/glusterfs/md1/brick1
Brick5: tsunami5:/data/glusterfs/md1/brick1
Brick6: tsunami6:/data/glusterfs/md1/brick1
Options Reconfigured:
server.allow-insecure: on
cluster.read-hash-mode: 2
features.quota: off
performance.write-behind: on
performance.write-behind-window-size: 4MB
performance.flush-behind: off
performance.io-thread-count: 64
performance.cache-size: 512MB
nfs.disable: on
cluster.lookup-unhashed: off

___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users


[Gluster-users] Rebalance issue on 3.5.3

2015-03-10 Thread Alessandro Ipe
Hi,


I launched a couple a days ago a rebalance on my gluster distribute-replicate 
volume 
(see below) through its CLI, while allowing my users to continue using the 
volume.

Yesterday, they managed to fill completely the volume. It now results in 
unavailable 
files on the client (using fuse) with the message "Transport endpoint is not 
connected". Investigating  to associated files on the bricks, I noticed that 
these are 
displayed with ls -l as 
-T 2 user group 0 Jan 15 22:00 file
Performing a 
ls -lR /data/glusterfs/home/brick1/* | grep -F -- "-T"
on a single brick gave me a LOT of files in that above-mentioned state.

Why are the files in that state ?

Did I lose all these files or can they still be recovered from the replicate 
copy of 
another brick ?


Regards,


Alessandro.


gluster volume info home output:
Volume Name: home
Type: Distributed-Replicate
Volume ID: 501741ed-4146-4022-af0b-41f5b1297766
Status: Started
Number of Bricks: 12 x 2 = 24
Transport-type: tcp
Bricks:
Brick1: tsunami1:/data/glusterfs/home/brick1
Brick2: tsunami2:/data/glusterfs/home/brick1
Brick3: tsunami1:/data/glusterfs/home/brick2
Brick4: tsunami2:/data/glusterfs/home/brick2
Brick5: tsunami1:/data/glusterfs/home/brick3
Brick6: tsunami2:/data/glusterfs/home/brick3
Brick7: tsunami1:/data/glusterfs/home/brick4
Brick8: tsunami2:/data/glusterfs/home/brick4
Brick9: tsunami3:/data/glusterfs/home/brick1
Brick10: tsunami4:/data/glusterfs/home/brick1
Brick11: tsunami3:/data/glusterfs/home/brick2
Brick12: tsunami4:/data/glusterfs/home/brick2
Brick13: tsunami3:/data/glusterfs/home/brick3
Brick14: tsunami4:/data/glusterfs/home/brick3
Brick15: tsunami3:/data/glusterfs/home/brick4
Brick16: tsunami4:/data/glusterfs/home/brick4
Brick17: tsunami5:/data/glusterfs/home/brick1
Brick18: tsunami6:/data/glusterfs/home/brick1
Brick19: tsunami5:/data/glusterfs/home/brick2
Brick20: tsunami6:/data/glusterfs/home/brick2
Brick21: tsunami5:/data/glusterfs/home/brick3
Brick22: tsunami6:/data/glusterfs/home/brick3
Brick23: tsunami5:/data/glusterfs/home/brick4
Brick24: tsunami6:/data/glusterfs/home/brick4
Options Reconfigured:
features.default-soft-limit: 95%
cluster.ensure-durability: off
performance.cache-size: 512MB
performance.io-thread-count: 64
performance.flush-behind: off
performance.write-behind-window-size: 4MB
performance.write-behind: on
nfs.disable: on
features.quota: on
cluster.read-hash-mode: 2
diagnostics.brick-log-level: CRITICAL
cluster.lookup-unhashed: off
server.allow-insecure: on


___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] rm -rf some_dir results in "Directory not empty"

2015-02-23 Thread Alessandro Ipe
gluster volume rebalance md1 status gives :
Node Rebalanced-files  size   
scanned  failures   skipped   status   run time in secs
   -  ---   ---   
---   ---   ---  --
   localhost 3837 6.3GB
163881 0 0completed 365.00
tsunami5  179   343.8MB
163882 0 0completed 353.00
tsunami3 6786 4.7GB
163882 0 0completed 416.00
tsunami600Bytes
163882 0 0completed 353.00
tsunami400Bytes
163882 0 0completed 353.00
tsunami200Bytes
163882 0 0completed 353.00
volume rebalance: md1: success:

but no change on the bricks for the directory, still empty except on 2 bricks. 
Should I remove files in the .glusterfs directory on the 2 bricks associated to 
these  "---T" files ?


Thanks,


A.  



On Monday 23 February 2015 21:40:41 Ravishankar N wrote:



On 02/23/2015 09:19 PM, Alessandro Ipe  wrote:
  
On 4 of the 6 bricks, it is empty. However,on tsunami 3-4, ls -lsa 
gives  
total 16  
d- 2 root root 61440 Feb 23 15:42 .  
drwxrwxrwx 3 gerb users 61 Feb 22 21:10 ..  
-T 2 gerb users 0 Apr 16 2014
akonadi-googledata-1.2.0-2.5.2.i586.rpm  
-T 2 gerb users 0 Apr 16 2014
bluedevil-debugsource-1.2.2-1.8.3.i586.rpm  
-T 2 gerb users 0 Apr 16 2014bovo-4.7.4-3.12.7.i586.rpm  
-T 2 gerb users 0 Apr 16 2014
digikam-debugsource-2.2.0-3.12.9.i586.rpm  
-T 2 gerb users 0 Apr 16 2014
dolphin-debuginfo-4.7.4-4.22.6.i586.rpm  
-T 2 gerb users 0 Apr 16 2014freetds-doc-0.91-2.5.1.i586.rpm
  
-T 2 gerb users 0 Apr 16 2014
kanagram-debuginfo-4.7.4-2.10.2.i586.rpm  
-T 2 gerb users 0 Apr 16 2014
kdebase4-runtime-4.7.4-3.17.7.i586.rpm  
-T 2 gerb users 0 Apr 16 2014
kdebindings-smokegen-debuginfo-4.7.4-2.9.1.i586.rpm  
-T 2 gerb users 0 Apr 16 2014
kdesdk4-strigi-debuginfo-4.7.4-3.12.5.i586.rpm  
-T 2 gerb users 0 Apr 16 2014kradio-4.0.2-9.9.7.i586.rpm  
-T 2 gerb users 0 Apr 16 2014
kremotecontrol-4.7.4-2.12.9.i586.rpm  
-T 2 gerb users 0 Apr 16 2014
kreversi-debuginfo-4.7.4-3.12.7.i586.rpm  
-T 2 gerb users 0 Apr 16 2014krfb-4.7.4-2.13.6.i586.rpm  
-T 2 gerb users 0 Apr 16 2014krusader-doc-2.0.0-23.9.7.i586.rpm 
 
-T 2 gerb users 0 Apr 16 2014
libalkimia-devel-4.3.1-2.5.1.i586.rpm  
-T 2 gerb users 0 Apr 16 2014libdmtx0-0.7.4-2.1.i586.rpm  
-T 2 gerb users 0 Apr 16 2014
libdmtx0-debuginfo-0.7.4-2.1.i586.rpm  
-T 2 gerb users 0 Apr 16 2014
libkdegames4-debuginfo-4.7.4-3.12.7.i586.rpm  
-T 2 gerb users 0 Apr 16 2014libksane0-4.7.4-2.10.1.i586.rpm
  
-T 2 gerb users 0 Apr 16 2014
libkvkontakte-debugsource-1.0.0-2.2.i586.rpm  
-T 2 gerb users 0 Apr 16 2014
libmediawiki-debugsource-2.5.0-4.6.1.i586.rpm  
-T 2 gerb users 0 Apr 16 2014libsmokeqt-4.7.4-2.10.2.i586.rpm   
   
-T 2 gerb users 0 Apr 16 2014
NetworkManager-vpnc-kde4-0.9.1git20111027-1.11.5.i586.rpm  
-T 2 gerb users 0 Apr 16 2014qtcurve-kde4-1.8.8-3.6.2.i586.rpm  

-T 2 gerb users 0 Apr 16 2014
QtZeitgeist-devel-0.7.0-7.4.2.i586.rpm  
-T 2 gerb users 0 Apr 16 2014umbrello-4.7.4-3.12.5.i586.rpm 
 
  
so that might be the reason of error. How canI fix this ?  
  

The 'T' files are DHT link-to files.  The actual files must be
present  on the other distribute subolumes (tsunami 1-2 or tsunami5-6) in 
the same path.  But since that doesn't seem to be the case,the something 
went wrong with the re-balance process.  You could run`gluster volume 
rebalance  start+status` again andsee if they disappear.   
 
  
Thanks,  
  
  
A.  
  
  
On Monday 23 February 2015 21:06:58Ravishankar N wrote:
 Just noticed that your `gluster volumestatus`

Re: [Gluster-users] rm -rf some_dir results in "Directory not empty"

2015-02-23 Thread Alessandro Ipe
On 4 of the 6 bricks, it is empty. However, on tsunami 3-4, ls -lsa gives
total 16
d- 2 root root  61440 Feb 23 15:42 .
drwxrwxrwx 3 gerb users61 Feb 22 21:10 ..
-T 2 gerb users 0 Apr 16  2014 
akonadi-googledata-1.2.0-2.5.2.i586.rpm
-T 2 gerb users 0 Apr 16  2014 
bluedevil-debugsource-1.2.2-1.8.3.i586.rpm
-T 2 gerb users 0 Apr 16  2014 bovo-4.7.4-3.12.7.i586.rpm
-T 2 gerb users 0 Apr 16  2014 
digikam-debugsource-2.2.0-3.12.9.i586.rpm
-T 2 gerb users 0 Apr 16  2014 
dolphin-debuginfo-4.7.4-4.22.6.i586.rpm
-T 2 gerb users 0 Apr 16  2014 freetds-doc-0.91-2.5.1.i586.rpm
-T 2 gerb users 0 Apr 16  2014 
kanagram-debuginfo-4.7.4-2.10.2.i586.rpm
-T 2 gerb users 0 Apr 16  2014 
kdebase4-runtime-4.7.4-3.17.7.i586.rpm
-T 2 gerb users 0 Apr 16  2014 kdebindings-smokegen-
debuginfo-4.7.4-2.9.1.i586.rpm
-T 2 gerb users 0 Apr 16  2014 kdesdk4-strigi-
debuginfo-4.7.4-3.12.5.i586.rpm
-T 2 gerb users 0 Apr 16  2014 kradio-4.0.2-9.9.7.i586.rpm
-T 2 gerb users 0 Apr 16  2014 kremotecontrol-4.7.4-2.12.9.i586.rpm
-T 2 gerb users 0 Apr 16  2014 
kreversi-debuginfo-4.7.4-3.12.7.i586.rpm
-T 2 gerb users 0 Apr 16  2014 krfb-4.7.4-2.13.6.i586.rpm
-T 2 gerb users 0 Apr 16  2014 krusader-doc-2.0.0-23.9.7.i586.rpm
-T 2 gerb users 0 Apr 16  2014 libalkimia-devel-4.3.1-2.5.1.i586.rpm
-T 2 gerb users 0 Apr 16  2014 libdmtx0-0.7.4-2.1.i586.rpm
-T 2 gerb users 0 Apr 16  2014 libdmtx0-debuginfo-0.7.4-2.1.i586.rpm
-T 2 gerb users 0 Apr 16  2014 libkdegames4-
debuginfo-4.7.4-3.12.7.i586.rpm
-T 2 gerb users 0 Apr 16  2014 libksane0-4.7.4-2.10.1.i586.rpm
-T 2 gerb users 0 Apr 16  2014 
libkvkontakte-debugsource-1.0.0-2.2.i586.rpm
-T 2 gerb users 0 Apr 16  2014 libmediawiki-
debugsource-2.5.0-4.6.1.i586.rpm
-T 2 gerb users 0 Apr 16  2014 libsmokeqt-4.7.4-2.10.2.i586.rpm
-T 2 gerb users 0 Apr 16  2014 NetworkManager-vpnc-
kde4-0.9.1git20111027-1.11.5.i586.rpm
-T 2 gerb users 0 Apr 16  2014 qtcurve-kde4-1.8.8-3.6.2.i586.rpm
-T 2 gerb users 0 Apr 16  2014 
QtZeitgeist-devel-0.7.0-7.4.2.i586.rpm
-T 2 gerb users 0 Apr 16  2014 umbrello-4.7.4-3.12.5.i586.rpm

so that might be the reason of error. How can I fix this ?


Thanks,


A.


On Monday 23 February 2015 21:06:58 Ravishankar N wrote:


Just noticed that your `gluster volume status` shows that rebalancewas 
triggered. Maybe DHT developers can help out. I see a similar bug[1]has 
been fixed 
some time back.FWIW, can you check if " /linux/suse/12.1/KDE4.7.4/i586" on 
all 6
bricks is indeed empty?
On 02/23/2015 08:15 PM, Alessandro Ipe  wrote:
  
Hi,  
  
  
Gluster version is 3.5.3-1.  
/var/log/gluster.log (client log) givesduring the rm -rf the following 
logs:  
[2015-02-23 14:42:50.180091] W
[client-rpc-fops.c:696:client3_3_rmdir_cbk] 0-
md1-client-2:remote operation failed: Directory not empty  
[2015-02-23 14:42:50.180134] W
[client-rpc-fops.c:696:client3_3_rmdir_cbk] 0-
md1-client-3:remote operation failed: Directory not empty  
[2015-02-23 14:42:50.180740] W
[client-rpc-fops.c:322:client3_3_mkdir_cbk] 0-
md1-client-5:remote operation failed: File exists. Path:
/linux/suse/12.1/KDE4.7.4/i586  
[2015-02-23 14:42:50.180772] W
[client-rpc-fops.c:322:client3_3_mkdir_cbk] 0-
md1-client-4:remote operation failed: File exists. Path:
/linux/suse/12.1/KDE4.7.4/i586  
[2015-02-23 14:42:50.181129] W
[client-rpc-fops.c:322:client3_3_mkdir_cbk] 0-
md1-client-3:remote operation failed: File exists. Path:
/linux/suse/12.1/KDE4.7.4/i586  
[2015-02-23 14:42:50.181160] W
[client-rpc-fops.c:322:client3_3_mkdir_cbk] 0-
md1-client-2:remote operation failed: File exists. Path:
/linux/suse/12.1/KDE4.7.4/i586  
[2015-02-23 14:42:50.319213] W
[client-rpc-fops.c:696:client3_3_rmdir_cbk] 0-
md1-client-3:remote operation failed: Directory not empty  
[2015-02-23 14:42:50.319762] W
[client-rpc-fops.c:696:client3_3_rmdir_cbk] 0-
md1-client-2:remote operation failed: Directory not empty  
[2015-02-23 14:42:50.320501] W
[client-rpc-fops.c:322:client3_3_mkdir_cbk] 0-
md1-client-0:remote operation failed: File exists. Path:
/linux/suse/12.1/src-
oss/suse/src  
[2015-02-23 14:42:50.320552] W
[client-rpc-fops.c:322:client3_3_mkdir_cbk] 0-
md1-client-1:remote operation failed: File exists. Path:
/linux/suse/12.1/src-
oss/suse/src  
[2015-02-23 14:42:50.320842] W
[client-rpc-fops.c:322:client3_3_mkdir_cbk] 0-
md1-client-2:remote operation fa

Re: [Gluster-users] rm -rf some_dir results in "Directory not empty"

2015-02-23 Thread Alessandro Ipe
Hi,


Gluster version is 3.5.3-1.
/var/log/gluster.log (client log) gives during the rm -rf the  following logs:
[2015-02-23 14:42:50.180091] W [client-rpc-fops.c:696:client3_3_rmdir_cbk] 
0-md1-
client-2: remote operation failed: Directory not empty
[2015-02-23 14:42:50.180134] W [client-rpc-fops.c:696:client3_3_rmdir_cbk] 
0-md1-
client-3: remote operation failed: Directory not empty
[2015-02-23 14:42:50.180740] W [client-rpc-fops.c:322:client3_3_mkdir_cbk] 
0-md1-
client-5: remote operation failed: File exists. Path: 
/linux/suse/12.1/KDE4.7.4/i586
[2015-02-23 14:42:50.180772] W [client-rpc-fops.c:322:client3_3_mkdir_cbk] 
0-md1-
client-4: remote operation failed: File exists. Path: 
/linux/suse/12.1/KDE4.7.4/i586
[2015-02-23 14:42:50.181129] W [client-rpc-fops.c:322:client3_3_mkdir_cbk] 
0-md1-
client-3: remote operation failed: File exists. Path: 
/linux/suse/12.1/KDE4.7.4/i586
[2015-02-23 14:42:50.181160] W [client-rpc-fops.c:322:client3_3_mkdir_cbk] 
0-md1-
client-2: remote operation failed: File exists. Path: 
/linux/suse/12.1/KDE4.7.4/i586
[2015-02-23 14:42:50.319213] W [client-rpc-fops.c:696:client3_3_rmdir_cbk] 
0-md1-
client-3: remote operation failed: Directory not empty
[2015-02-23 14:42:50.319762] W [client-rpc-fops.c:696:client3_3_rmdir_cbk] 
0-md1-
client-2: remote operation failed: Directory not empty
[2015-02-23 14:42:50.320501] W [client-rpc-fops.c:322:client3_3_mkdir_cbk] 
0-md1-
client-0: remote operation failed: File exists. Path: 
/linux/suse/12.1/src-oss/suse/src
[2015-02-23 14:42:50.320552] W [client-rpc-fops.c:322:client3_3_mkdir_cbk] 
0-md1-
client-1: remote operation failed: File exists. Path: 
/linux/suse/12.1/src-oss/suse/src
[2015-02-23 14:42:50.320842] W [client-rpc-fops.c:322:client3_3_mkdir_cbk] 
0-md1-
client-2: remote operation failed: File exists. Path: 
/linux/suse/12.1/src-oss/suse/src
[2015-02-23 14:42:50.320884] W [client-rpc-fops.c:322:client3_3_mkdir_cbk] 
0-md1-
client-3: remote operation failed: File exists. Path: 
/linux/suse/12.1/src-oss/suse/src
[2015-02-23 14:42:50.438982] W [client-rpc-fops.c:696:client3_3_rmdir_cbk] 
0-md1-
client-3: remote operation failed: Directory not empty
[2015-02-23 14:42:50.439347] W [client-rpc-fops.c:696:client3_3_rmdir_cbk] 
0-md1-
client-2: remote operation failed: Directory not empty
[2015-02-23 14:42:50.440235] W [client-rpc-fops.c:322:client3_3_mkdir_cbk] 
0-md1-
client-0: remote operation failed: File exists. Path: 
/linux/suse/12.1/oss/suse/noarch
[2015-02-23 14:42:50.440344] W [client-rpc-fops.c:322:client3_3_mkdir_cbk] 
0-md1-
client-1: remote operation failed: File exists. Path: 
/linux/suse/12.1/oss/suse/noarch
[2015-02-23 14:42:50.440603] W [client-rpc-fops.c:322:client3_3_mkdir_cbk] 
0-md1-
client-2: remote operation failed: File exists. Path: 
/linux/suse/12.1/oss/suse/noarch
[2015-02-23 14:42:50.440665] W [client-rpc-fops.c:322:client3_3_mkdir_cbk] 
0-md1-
client-3: remote operation failed: File exists. Path: 
/linux/suse/12.1/oss/suse/noarch
[2015-02-23 14:42:50.680827] W [client-rpc-fops.c:696:client3_3_rmdir_cbk] 
0-md1-
client-2: remote operation failed: Directory not empty
[2015-02-23 14:42:50.681721] W [client-rpc-fops.c:696:client3_3_rmdir_cbk] 
0-md1-
client-3: remote operation failed: Directory not empty
[2015-02-23 14:42:50.682482] W [client-rpc-fops.c:322:client3_3_mkdir_cbk] 
0-md1-
client-3: remote operation failed: File exists. Path: 
/linux/suse/12.1/oss/suse/i586
[2015-02-23 14:42:50.682517] W [client-rpc-fops.c:322:client3_3_mkdir_cbk] 
0-md1-
client-2: remote operation failed: File exists. Path: 
/linux/suse/12.1/oss/suse/i586


Thanks,


A.


On Monday 23 February 2015 20:06:17 Ravishankar N wrote:



On 02/23/2015 07:04 PM, Alessandro Ipe  wrote:
  
Hi Ravi,  
  
  
gluster volume status md1 returns  
Status of volume: md1  
Gluster process Port Online Pid  
--  

Brick tsunami1:/data/glusterfs/md1/brick149157 Y 2260  
Brick tsunami2:/data/glusterfs/md1/brick149152 Y 2320  
Brick tsunami3:/data/glusterfs/md1/brick149156 Y 20715  
Brick tsunami4:/data/glusterfs/md1/brick149156 Y 10544  
Brick tsunami5:/data/glusterfs/md1/brick149152 Y 12588  
Brick tsunami6:/data/glusterfs/md1/brick149152 Y 12242  
Self-heal Daemon on localhost N/A Y 2336  
Self-heal Daemon on tsunami2 N/A Y 2359  
Self-heal Daemon on tsunami5 N/A Y 27619  
Self-heal Daemon on tsunami4 N/A Y 12318  
Self-heal Daemon on tsunami3 N/A Y 19118  
Self-heal Daemon on tsunami6 N/A Y 27650  
   
Task Status of Volume md1  
--  

Task : Rebalance   
ID : 9dfee1a2-49ac-4766-bdb6-00de5e5883f6  
Status : completed   
so it seems that all brick server are up.  
  
gluster volume heal md1 info returns

Re: [Gluster-users] rm -rf some_dir results in "Directory not empty"

2015-02-23 Thread Alessandro Ipe
Hi Ravi,


gluster volume status md1 returns
Status of volume: md1
Gluster process PortOnline  Pid
--
Brick tsunami1:/data/glusterfs/md1/brick1   49157   Y   2260
Brick tsunami2:/data/glusterfs/md1/brick1   49152   Y   2320
Brick tsunami3:/data/glusterfs/md1/brick1   49156   Y   20715
Brick tsunami4:/data/glusterfs/md1/brick1   49156   Y   10544
Brick tsunami5:/data/glusterfs/md1/brick1   49152   Y   12588
Brick tsunami6:/data/glusterfs/md1/brick1   49152   Y   12242
Self-heal Daemon on localhost   N/A Y   2336
Self-heal Daemon on tsunami2N/A Y   2359
Self-heal Daemon on tsunami5N/A Y   27619
Self-heal Daemon on tsunami4N/A Y   12318
Self-heal Daemon on tsunami3N/A Y   19118
Self-heal Daemon on tsunami6N/A Y   27650
 
Task Status of Volume md1
--
Task : Rebalance   
ID   : 9dfee1a2-49ac-4766-bdb6-00de5e5883f6
Status   : completed   
so it seems that all brick server are up.

gluster volume heal md1 info returns
Brick tsunami1.oma.be:/data/glusterfs/md1/brick1/
Number of entries: 0

Brick tsunami2.oma.be:/data/glusterfs/md1/brick1/
Number of entries: 0

Brick tsunami3.oma.be:/data/glusterfs/md1/brick1/
Number of entries: 0

Brick tsunami4.oma.be:/data/glusterfs/md1/brick1/
Number of entries: 0

Brick tsunami5.oma.be:/data/glusterfs/md1/brick1/
Number of entries: 0

Brick tsunami6.oma.be:/data/glusterfs/md1/brick1/
Number of entries: 0

Should I run "gluster volume heal md1 full" ?


Thanks,


A.


On Monday 23 February 2015 18:12:43 Ravishankar N wrote:



On 02/23/2015 05:42 PM, Alessandro Ipe  wrote:
  
Hi,  
  
  
We have a "md1" volume under gluster 3.5.3over 6 servers configured as 
distributed and replicated. Whentrying on a client, thourgh fuse mount 
(which 
turns out to bealso a brick server) to delete (as root) recursively a 
directory
with "rm -rf /home/.md1/linux/suse/12.1", I get the errormessages  
  
rm: cannot remove‘/home/.md1/linux/suse/12.1/KDE4.7.4/i586’: Directory 
not 
empty  
rm: cannot remove‘/home/.md1/linux/suse/12.1/src-oss/suse/src’: 
Directory not
empty  
rm: cannot remove‘/home/.md1/linux/suse/12.1/oss/suse/noarch’: 
Directory not
empty  
rm: cannot remove‘/home/.md1/linux/suse/12.1/oss/suse/i586’: Directory 
not 
empty  
(the same occurs as unprivileged user butwith "Permission denied".) 
 
  
while a "ls -Ral /home/.md1/linux/suse/12.1"gives me  
/home/.md1/linux/suse/12.1:  
total 0  
drwxrwxrwx 5 gerb users 151 Feb 20 16:22 .  
drwxr-xr-x 6 gerb users 245 Feb 23 12:55 ..  
drwxrwxrwx 3 gerb users 95 Feb 23 13:03KDE4.7.4  
drwxrwxrwx 3 gerb users 311 Feb 20 16:57 oss  
drwxrwxrwx 3 gerb users 86 Feb 20 16:20src-oss  
  
/home/.md1/linux/suse/12.1/KDE4.7.4:  
total 28  
drwxrwxrwx 3 gerb users 95 Feb 23 13:03 .  
drwxrwxrwx 5 gerb users 151 Feb 20 16:22 ..  
d- 2 root root 61452 Feb 23 13:03i586  
  
/home/.md1/linux/suse/12.1/KDE4.7.4/i586:  
total 28  
d- 2 root root 61452 Feb 23 13:03 .  
drwxrwxrwx 3 gerb users 95 Feb 23 13:03 ..  
  
/home/.md1/linux/suse/12.1/oss:  
total 0  
drwxrwxrwx 3 gerb users 311 Feb 20 16:57 .  
drwxrwxrwx 5 gerb users 151 Feb 20 16:22 ..  
drwxrwxrwx 4 gerb users 90 Feb 23 13:03 suse  
  
/home/.md1/linux/suse/12.1/oss/suse:  
total 536  
drwxrwxrwx 4 gerb users 90 Feb 23 13:03 .  
drwxrwxrwx 3 gerb users 311 Feb 20 16:57 ..  
d- 2 root root 368652 Feb 23 13:03i586  
d- 2 root root 196620 Feb 23 13:03noarch  
  
/home/.md1/linux/suse/12.1/oss/suse/i586:  
total 360  
d- 2 root root 368652 Feb 23 13:03 .  
drwxrwxrwx 4 gerb users 90 Feb 23 13:03 ..  
  
/home/.md1/linux/suse/12.1/oss/suse/noarch:  ___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

[Gluster-users] rm -rf some_dir results in "Directory not empty"

2015-02-23 Thread Alessandro Ipe
Hi,


We have a "md1" volume under gluster 3.5.3 over 6 servers configured as 
distributed and replicated. When trying on a client, thourgh fuse mount (which 
turns out to be also a brick server) to delete (as root) recursively a 
directory with "rm -rf /home/.md1/linux/suse/12.1", I get the error messages

rm: cannot remove ‘/home/.md1/linux/suse/12.1/KDE4.7.4/i586’: Directory not 
empty
rm: cannot remove ‘/home/.md1/linux/suse/12.1/src-oss/suse/src’: Directory not 
empty
rm: cannot remove ‘/home/.md1/linux/suse/12.1/oss/suse/noarch’: Directory not 
empty
rm: cannot remove ‘/home/.md1/linux/suse/12.1/oss/suse/i586’: Directory not 
empty
(the same occurs as unprivileged user but with "Permission denied".)

while a "ls -Ral /home/.md1/linux/suse/12.1" gives me
/home/.md1/linux/suse/12.1:
total 0
drwxrwxrwx 5 gerb users 151 Feb 20 16:22 .
drwxr-xr-x 6 gerb users 245 Feb 23 12:55 ..
drwxrwxrwx 3 gerb users  95 Feb 23 13:03 KDE4.7.4
drwxrwxrwx 3 gerb users 311 Feb 20 16:57 oss
drwxrwxrwx 3 gerb users  86 Feb 20 16:20 src-oss

/home/.md1/linux/suse/12.1/KDE4.7.4:
total 28
drwxrwxrwx 3 gerb users95 Feb 23 13:03 .
drwxrwxrwx 5 gerb users   151 Feb 20 16:22 ..
d- 2 root root  61452 Feb 23 13:03 i586

/home/.md1/linux/suse/12.1/KDE4.7.4/i586:
total 28
d- 2 root root  61452 Feb 23 13:03 .
drwxrwxrwx 3 gerb users95 Feb 23 13:03 ..

/home/.md1/linux/suse/12.1/oss:
total 0
drwxrwxrwx 3 gerb users 311 Feb 20 16:57 .
drwxrwxrwx 5 gerb users 151 Feb 20 16:22 ..
drwxrwxrwx 4 gerb users  90 Feb 23 13:03 suse

/home/.md1/linux/suse/12.1/oss/suse:
total 536
drwxrwxrwx 4 gerb users 90 Feb 23 13:03 .
drwxrwxrwx 3 gerb users311 Feb 20 16:57 ..
d- 2 root root  368652 Feb 23 13:03 i586
d- 2 root root  196620 Feb 23 13:03 noarch

/home/.md1/linux/suse/12.1/oss/suse/i586:
total 360
d- 2 root root  368652 Feb 23 13:03 .
drwxrwxrwx 4 gerb users 90 Feb 23 13:03 ..

/home/.md1/linux/suse/12.1/oss/suse/noarch:
total 176
d- 2 root root  196620 Feb 23 13:03 .
drwxrwxrwx 4 gerb users 90 Feb 23 13:03 ..

/home/.md1/linux/suse/12.1/src-oss:
total 0
drwxrwxrwx 3 gerb users  86 Feb 20 16:20 .
drwxrwxrwx 5 gerb users 151 Feb 20 16:22 ..
drwxrwxrwx 3 gerb users  48 Feb 23 13:03 suse

/home/.md1/linux/suse/12.1/src-oss/suse:
total 220
drwxrwxrwx 3 gerb users 48 Feb 23 13:03 .
drwxrwxrwx 3 gerb users 86 Feb 20 16:20 ..
d- 2 root root  225292 Feb 23 13:03 src

/home/.md1/linux/suse/12.1/src-oss/suse/src:
total 220
d- 2 root root  225292 Feb 23 13:03 .
drwxrwxrwx 3 gerb users 48 Feb 23 13:03 ..


Is there a cure such as manually forcing a healing on that directory ?


Many thanks,


Alessandro.


gluster volume info md1 outputs:
Volume Name: md1
Type: Distributed-Replicate
Volume ID: 6da4b915-1def-4df4-a41c-2f3300ebf16b
Status: Started
Number of Bricks: 3 x 2 = 6
Transport-type: tcp
Bricks:
Brick1: tsunami1:/data/glusterfs/md1/brick1
Brick2: tsunami2:/data/glusterfs/md1/brick1
Brick3: tsunami3:/data/glusterfs/md1/brick1
Brick4: tsunami4:/data/glusterfs/md1/brick1
Brick5: tsunami5:/data/glusterfs/md1/brick1
Brick6: tsunami6:/data/glusterfs/md1/brick1
Options Reconfigured:
performance.write-behind: on
performance.write-behind-window-size: 4MB
performance.flush-behind: off
performance.io-thread-count: 64
performance.cache-size: 512MB
nfs.disable: on
features.quota: off
cluster.read-hash-mode: 2
server.allow-insecure: on
cluster.lookup-unhashed: off

___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] volume replace-brick start is not working

2015-01-07 Thread Alessandro Ipe
Hi,


I suspect that the first issue could come from the fact that I made a typo in 
/etc/hosts on one of the peers associating by mistake 2 IP addresses to the 
same name (tsunami5 instead of tsunami6)...

Apologize for my stupidity and thanks for the help.  


A.


On Wednesday 07 January 2015 18:06:44 Atin Mukherjee wrote:
> On 01/07/2015 05:09 PM, Alessandro Ipe wrote:
> > Hi,
> > 
> > 
> > The corresponding logs in
> > /var/log/glusterfs/etc-glusterfs-glusterd.vol.log (OS is openSuSE 12.3)
> > [2015-01-06 12:32:14.596601] I
> > [glusterd-replace-brick.c:98:__glusterd_handle_replace_brick]
> > 0-management: Received replace brick req [2015-01-06 12:32:14.596633] I
> > [glusterd-replace-brick.c:153:__glusterd_handle_replace_brick]
> > 0-management: Received replace brick status request [2015-01-06
> > 12:32:14.596991] E [glusterd-rpc-ops.c:602:__glusterd_cluster_lock_cbk]
> > 0-management: Received lock RJT from uuid:
> > b1aa773f-f9f4-491c-9493-00b23d5ee380
> Here is the problem, lock request is rejected by peer
> b1aa773f-f9f4-491c-9493-00b23d5ee380, I feel this is one of the peer
> which you added as part of your use case. Was the peer probe successful?
> Can you please provide the peer status output?
> 
> ~Atin
> 
> > [2015-01-06 12:32:14.598100] E
> > [glusterd-rpc-ops.c:675:__glusterd_cluster_unlock_cbk] 0-management:
> > Received unlock RJT from uuid: b1aa773f-f9f4-491c-9493-00b23d5ee380
> > 
> > However, several reboots managed to cancel the replace-brick command.
> > Moreover, I read that this command could still have issues (obviously) in
> > 3.5, so I managed to find a workaround for it.
> > 
> > 
> > A.
> > 
> > On Wednesday 07 January 2015 15:41:07 Atin Mukherjee wrote:
> >> On 01/06/2015 06:05 PM, Alessandro Ipe wrote:
> >>> Hi,
> >>> 
> >>> 
> >>> We have set up a "md1" volume using gluster 3.4.2 over 4 servers
> >>> configured as distributed and replicated. Then, we upgraded smoohtly to
> >>> 3.5.3, since it was mentionned that the command "volume replace-brick"
> >>> is
> >>> broken on 3.4.x. We added two more peers (after having read that the
> >>> quota feature neede to be turn off for this command to succeed...).
> >>> 
> >>> We have then issued an
> >>> gluster volume replace-brick md1
> >>> 193.190.249.113:/data/glusterfs/md1/brick1
> >>> 193.190.249.122:/data/glusterfs/md1/brick1 start force Then I did an
> >>> gluster volume replace-brick md1
> >>> 193.190.249.113:/data/glusterfs/md1/brick1
> >>> 193.190.249.122:/data/glusterfs/md1/brick1 abort
> >>> because nothing was happening.
> >>> 
> >>> However wheh trying to monitor the previous command by
> >>> gluster volume replace-brick md1
> >>> 193.190.249.113:/data/glusterfs/md1/brick1
> >>> 193.190.249.122:/data/glusterfs/md1/brick1 status it outputs
> >>> volume replace-brick: failed: Another transaction could be in progress.
> >>> Please try again after sometime. and the following lines are written in
> >>> cli.log
> >>> [2015-01-06 12:32:14.595387] I [socket.c:3645:socket_init] 0-glusterfs:
> >>> SSL support is NOT enabled [2015-01-06 12:32:14.595434] I
> >>> [socket.c:3660:socket_init] 0-glusterfs: using system polling thread
> >>> [2015-01-06 12:32:14.595590] I [socket.c:3645:socket_init] 0-glusterfs:
> >>> SSL support is NOT enabled [2015-01-06 12:32:14.595606] I
> >>> [socket.c:3660:socket_init] 0-glusterfs: using system polling thread
> >>> [2015-01-06 12:32:14.596013] I
> >>> [cli-cmd-volume.c:1706:cli_check_gsync_present] 0-: geo-replication not
> >>> installed [2015-01-06 12:32:14.602165] I
> >>> [cli-rpc-ops.c:2162:gf_cli_replace_brick_cbk] 0-cli: Received resp to
> >>> replace brick [2015-01-06 12:32:14.602248] I [input.c:36:cli_batch] 0-:
> >>> Exiting with: -1
> >>> 
> >>> What am I doing wrong ?
> >> 
> >> Can you please share the glusterd log?
> >> 
> >> ~Atin
> >> 
> >>> Many thanks,
> >>> 
> >>> 
> >>> Alessandro.
> >>> 
> >>> 
> >>> gluster volume info md1 outputs:
> >>> Volume Name: md1
> >>> Type: Distributed-Replicate
> >>> Volume ID: 6da4b915-1def-4df4-a41c-2f3300ebf16b
> >>> Status: Started
> >>> Number of Bricks: 2 x 2 = 4
> >>> Tran

Re: [Gluster-users] volume replace-brick start is not working

2015-01-07 Thread Alessandro Ipe
Indeed, I did a
gluster peer probe tsunami5
which gave me "Peer Probe Sent (Connected)"

After searching on the internet, I found some info telling that under 3.5(.3), 
"peer probe" was failing if quota was activated...


A.


On Wednesday 07 January 2015 18:06:44 Atin Mukherjee wrote:
> On 01/07/2015 05:09 PM, Alessandro Ipe wrote:
> > Hi,
> > 
> > 
> > The corresponding logs in
> > /var/log/glusterfs/etc-glusterfs-glusterd.vol.log (OS is openSuSE 12.3)
> > [2015-01-06 12:32:14.596601] I
> > [glusterd-replace-brick.c:98:__glusterd_handle_replace_brick]
> > 0-management: Received replace brick req [2015-01-06 12:32:14.596633] I
> > [glusterd-replace-brick.c:153:__glusterd_handle_replace_brick]
> > 0-management: Received replace brick status request [2015-01-06
> > 12:32:14.596991] E [glusterd-rpc-ops.c:602:__glusterd_cluster_lock_cbk]
> > 0-management: Received lock RJT from uuid:
> > b1aa773f-f9f4-491c-9493-00b23d5ee380
> Here is the problem, lock request is rejected by peer
> b1aa773f-f9f4-491c-9493-00b23d5ee380, I feel this is one of the peer
> which you added as part of your use case. Was the peer probe successful?
> Can you please provide the peer status output?
> 
> ~Atin
> 
> > [2015-01-06 12:32:14.598100] E
> > [glusterd-rpc-ops.c:675:__glusterd_cluster_unlock_cbk] 0-management:
> > Received unlock RJT from uuid: b1aa773f-f9f4-491c-9493-00b23d5ee380
> > 
> > However, several reboots managed to cancel the replace-brick command.
> > Moreover, I read that this command could still have issues (obviously) in
> > 3.5, so I managed to find a workaround for it.
> > 
> > 
> > A.
> > 
> > On Wednesday 07 January 2015 15:41:07 Atin Mukherjee wrote:
> >> On 01/06/2015 06:05 PM, Alessandro Ipe wrote:
> >>> Hi,
> >>> 
> >>> 
> >>> We have set up a "md1" volume using gluster 3.4.2 over 4 servers
> >>> configured as distributed and replicated. Then, we upgraded smoohtly to
> >>> 3.5.3, since it was mentionned that the command "volume replace-brick"
> >>> is
> >>> broken on 3.4.x. We added two more peers (after having read that the
> >>> quota feature neede to be turn off for this command to succeed...).
> >>> 
> >>> We have then issued an
> >>> gluster volume replace-brick md1
> >>> 193.190.249.113:/data/glusterfs/md1/brick1
> >>> 193.190.249.122:/data/glusterfs/md1/brick1 start force Then I did an
> >>> gluster volume replace-brick md1
> >>> 193.190.249.113:/data/glusterfs/md1/brick1
> >>> 193.190.249.122:/data/glusterfs/md1/brick1 abort
> >>> because nothing was happening.
> >>> 
> >>> However wheh trying to monitor the previous command by
> >>> gluster volume replace-brick md1
> >>> 193.190.249.113:/data/glusterfs/md1/brick1
> >>> 193.190.249.122:/data/glusterfs/md1/brick1 status it outputs
> >>> volume replace-brick: failed: Another transaction could be in progress.
> >>> Please try again after sometime. and the following lines are written in
> >>> cli.log
> >>> [2015-01-06 12:32:14.595387] I [socket.c:3645:socket_init] 0-glusterfs:
> >>> SSL support is NOT enabled [2015-01-06 12:32:14.595434] I
> >>> [socket.c:3660:socket_init] 0-glusterfs: using system polling thread
> >>> [2015-01-06 12:32:14.595590] I [socket.c:3645:socket_init] 0-glusterfs:
> >>> SSL support is NOT enabled [2015-01-06 12:32:14.595606] I
> >>> [socket.c:3660:socket_init] 0-glusterfs: using system polling thread
> >>> [2015-01-06 12:32:14.596013] I
> >>> [cli-cmd-volume.c:1706:cli_check_gsync_present] 0-: geo-replication not
> >>> installed [2015-01-06 12:32:14.602165] I
> >>> [cli-rpc-ops.c:2162:gf_cli_replace_brick_cbk] 0-cli: Received resp to
> >>> replace brick [2015-01-06 12:32:14.602248] I [input.c:36:cli_batch] 0-:
> >>> Exiting with: -1
> >>> 
> >>> What am I doing wrong ?
> >> 
> >> Can you please share the glusterd log?
> >> 
> >> ~Atin
> >> 
> >>> Many thanks,
> >>> 
> >>> 
> >>> Alessandro.
> >>> 
> >>> 
> >>> gluster volume info md1 outputs:
> >>> Volume Name: md1
> >>> Type: Distributed-Replicate
> >>> Volume ID: 6da4b915-1def-4df4-a41c-2f3300ebf16b
> >>> Status: Started
> >>> Number of Bricks: 2 x 2 = 4
> >>> Transport-type: tcp
>

Re: [Gluster-users] volume replace-brick start is not working

2015-01-07 Thread Alessandro Ipe
Hi,


The corresponding logs in /var/log/glusterfs/etc-glusterfs-glusterd.vol.log (OS 
is openSuSE 12.3)
[2015-01-06 12:32:14.596601] I 
[glusterd-replace-brick.c:98:__glusterd_handle_replace_brick] 0-management: 
Received replace brick req
[2015-01-06 12:32:14.596633] I 
[glusterd-replace-brick.c:153:__glusterd_handle_replace_brick] 0-management: 
Received replace brick status request
[2015-01-06 12:32:14.596991] E 
[glusterd-rpc-ops.c:602:__glusterd_cluster_lock_cbk] 0-management: Received 
lock RJT from uuid: b1aa773f-f9f4-491c-9493-00b23d5ee380
[2015-01-06 12:32:14.598100] E 
[glusterd-rpc-ops.c:675:__glusterd_cluster_unlock_cbk] 0-management: Received 
unlock RJT from uuid: b1aa773f-f9f4-491c-9493-00b23d5ee380

However, several reboots managed to cancel the replace-brick command. Moreover, 
I read that this command could still have issues (obviously) in 3.5, so I 
managed to find a workaround for it.


A.


On Wednesday 07 January 2015 15:41:07 Atin Mukherjee wrote:
> On 01/06/2015 06:05 PM, Alessandro Ipe wrote:
> > Hi,
> > 
> > 
> > We have set up a "md1" volume using gluster 3.4.2 over 4 servers
> > configured as distributed and replicated. Then, we upgraded smoohtly to
> > 3.5.3, since it was mentionned that the command "volume replace-brick" is
> > broken on 3.4.x. We added two more peers (after having read that the
> > quota feature neede to be turn off for this command to succeed...).
> > 
> > We have then issued an
> > gluster volume replace-brick md1
> > 193.190.249.113:/data/glusterfs/md1/brick1
> > 193.190.249.122:/data/glusterfs/md1/brick1 start force Then I did an
> > gluster volume replace-brick md1
> > 193.190.249.113:/data/glusterfs/md1/brick1
> > 193.190.249.122:/data/glusterfs/md1/brick1 abort
> > because nothing was happening.
> > 
> > However wheh trying to monitor the previous command by
> > gluster volume replace-brick md1
> > 193.190.249.113:/data/glusterfs/md1/brick1
> > 193.190.249.122:/data/glusterfs/md1/brick1 status it outputs
> > volume replace-brick: failed: Another transaction could be in progress.
> > Please try again after sometime. and the following lines are written in
> > cli.log
> > [2015-01-06 12:32:14.595387] I [socket.c:3645:socket_init] 0-glusterfs:
> > SSL support is NOT enabled [2015-01-06 12:32:14.595434] I
> > [socket.c:3660:socket_init] 0-glusterfs: using system polling thread
> > [2015-01-06 12:32:14.595590] I [socket.c:3645:socket_init] 0-glusterfs:
> > SSL support is NOT enabled [2015-01-06 12:32:14.595606] I
> > [socket.c:3660:socket_init] 0-glusterfs: using system polling thread
> > [2015-01-06 12:32:14.596013] I
> > [cli-cmd-volume.c:1706:cli_check_gsync_present] 0-: geo-replication not
> > installed [2015-01-06 12:32:14.602165] I
> > [cli-rpc-ops.c:2162:gf_cli_replace_brick_cbk] 0-cli: Received resp to
> > replace brick [2015-01-06 12:32:14.602248] I [input.c:36:cli_batch] 0-:
> > Exiting with: -1
> > 
> > What am I doing wrong ?
> 
> Can you please share the glusterd log?
> 
> ~Atin
> 
> > Many thanks,
> > 
> > 
> > Alessandro.
> > 
> > 
> > gluster volume info md1 outputs:
> > Volume Name: md1
> > Type: Distributed-Replicate
> > Volume ID: 6da4b915-1def-4df4-a41c-2f3300ebf16b
> > Status: Started
> > Number of Bricks: 2 x 2 = 4
> > Transport-type: tcp
> > Bricks:
> > Brick1: tsunami1:/data/glusterfs/md1/brick1
> > Brick2: tsunami2:/data/glusterfs/md1/brick1
> > Brick3: tsunami3:/data/glusterfs/md1/brick1
> > Brick4: tsunami4:/data/glusterfs/md1/brick1
> > Options Reconfigured:
> > server.allow-insecure: on
> > cluster.read-hash-mode: 2
> > features.quota: off
> > nfs.disable: on
> > performance.cache-size: 512MB
> > performance.io-thread-count: 64
> > performance.flush-behind: off
> > performance.write-behind-window-size: 4MB
> > performance.write-behind: on
> > 
> > 
> > 
> > 
> > ___
> > Gluster-users mailing list
> > Gluster-users@gluster.org
> > http://www.gluster.org/mailman/listinfo/gluster-users

-- 

 Dr. Ir. Alessandro Ipe   
 Department of Observations Tel. +32 2 373 06 31
 Remote Sensing from Space  Fax. +32 2 374 67 88  
 Royal Meteorological Institute  
 Avenue Circulaire 3Email:  
 B-1180 BrusselsBelgium alessandro@meteo.be 
 Web: http://gerb.oma.be   

___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users


[Gluster-users] volume replace-brick start is not working

2015-01-06 Thread Alessandro Ipe
Hi,


We have set up a "md1" volume using gluster 3.4.2 over 4 servers configured as 
distributed and replicated. Then, we upgraded smoohtly to 3.5.3, since it was 
mentionned that the command "volume replace-brick" is broken on 3.4.x. We added 
two more peers (after having read that the quota feature neede to be turn off 
for this command to succeed...).

We have then issued an
gluster volume replace-brick md1 193.190.249.113:/data/glusterfs/md1/brick1 
193.190.249.122:/data/glusterfs/md1/brick1 start force
Then I did an 
gluster volume replace-brick md1 193.190.249.113:/data/glusterfs/md1/brick1 
193.190.249.122:/data/glusterfs/md1/brick1 abort
because nothing was happening.

However wheh trying to monitor the previous command by
gluster volume replace-brick md1 193.190.249.113:/data/glusterfs/md1/brick1 
193.190.249.122:/data/glusterfs/md1/brick1 status
it outputs
volume replace-brick: failed: Another transaction could be in progress. Please 
try again after sometime.
and the following lines are written in cli.log
[2015-01-06 12:32:14.595387] I [socket.c:3645:socket_init] 0-glusterfs: SSL 
support is NOT enabled
[2015-01-06 12:32:14.595434] I [socket.c:3660:socket_init] 0-glusterfs: using 
system polling thread
[2015-01-06 12:32:14.595590] I [socket.c:3645:socket_init] 0-glusterfs: SSL 
support is NOT enabled
[2015-01-06 12:32:14.595606] I [socket.c:3660:socket_init] 0-glusterfs: using 
system polling thread
[2015-01-06 12:32:14.596013] I [cli-cmd-volume.c:1706:cli_check_gsync_present] 
0-: geo-replication not installed
[2015-01-06 12:32:14.602165] I [cli-rpc-ops.c:2162:gf_cli_replace_brick_cbk] 
0-cli: Received resp to replace brick
[2015-01-06 12:32:14.602248] I [input.c:36:cli_batch] 0-: Exiting with: -1

What am I doing wrong ?


Many thanks,


Alessandro.


gluster volume info md1 outputs:
Volume Name: md1
Type: Distributed-Replicate
Volume ID: 6da4b915-1def-4df4-a41c-2f3300ebf16b
Status: Started
Number of Bricks: 2 x 2 = 4
Transport-type: tcp
Bricks:
Brick1: tsunami1:/data/glusterfs/md1/brick1
Brick2: tsunami2:/data/glusterfs/md1/brick1
Brick3: tsunami3:/data/glusterfs/md1/brick1
Brick4: tsunami4:/data/glusterfs/md1/brick1
Options Reconfigured:
server.allow-insecure: on
cluster.read-hash-mode: 2
features.quota: off
nfs.disable: on
performance.cache-size: 512MB
performance.io-thread-count: 64
performance.flush-behind: off
performance.write-behind-window-size: 4MB
performance.write-behind: on

___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

[Gluster-users] Error message: getxattr failed on : user.virtfs.rdev (No data available)

2014-09-02 Thread Alessandro Ipe
Hi,


We have set up a "home" volume using gluster 3.4.2 over 4 servers configured as 
distributed and replicated. On each server, 4 ext4 bricks are mounted with the 
following options:
defaults,noatime,nodiratime

This "home" volume is mounted using FUSE on a client server with the following 
options:
defaults,_netdev,noatime,direct-io-mode=disable,backupvolfile-server=tsunami2,log-level=ERROR,log-file=/var/log/gluster.log

This client (host) also runs a virtual machine (qemu-kvm guest) with a 
"Filesystem Passthrough" from the host using "Mapped" mode to the guest. The 
filesystem exported from the host is located on the gluster "home" volume. This 
filesystem is mounted inside the guest using the fstab line:
home /home9p 
trans=virtio,version=9p2000.L,rw,noatime0   0

On my guest, userid mapping is working correctly and I can copy files on /home. 
However, doing so results in my bricks' logs (/var/log/glusterfs/bricks on the 
4 gluster servers) filling with error messages similar to:
E [posix.c:2668:posix_getxattr] 0-home-posix: getxattr failed on 
/data/glusterfs/home/brick1/hail/mailman/mailman: user.virtfs.rdev (No data 
available)
for all files being copied on /home.

For the moment, a quick fix to avoid filling my system partition holding the 
logs was to set the volume's parameter "diagnostics.brick-log-level" to 
CRITICAL, but this will prevent me to see other important error messages which 
could occur.

Is there a cleaner way (use of ACL ?) to prevent these error messages filling 
my logs and my system partition, please ?


Many thanks,


Alessandro.


gluster volume info home outputs:
Volume Name: home
Type: Distributed-Replicate
Volume ID: 501741ed-4146-4022-af0b-41f5b1297766
Status: Started
Number of Bricks: 8 x 2 = 16
Transport-type: tcp
Bricks:
Brick1: tsunami1:/data/glusterfs/home/brick1
Brick2: tsunami2:/data/glusterfs/home/brick1
Brick3: tsunami1:/data/glusterfs/home/brick2
Brick4: tsunami2:/data/glusterfs/home/brick2
Brick5: tsunami1:/data/glusterfs/home/brick3
Brick6: tsunami2:/data/glusterfs/home/brick3
Brick7: tsunami1:/data/glusterfs/home/brick4
Brick8: tsunami2:/data/glusterfs/home/brick4
Brick9: tsunami3:/data/glusterfs/home/brick1
Brick10: tsunami4:/data/glusterfs/home/brick1
Brick11: tsunami3:/data/glusterfs/home/brick2
Brick12: tsunami4:/data/glusterfs/home/brick2
Brick13: tsunami3:/data/glusterfs/home/brick3
Brick14: tsunami4:/data/glusterfs/home/brick3
Brick15: tsunami3:/data/glusterfs/home/brick4
Brick16: tsunami4:/data/glusterfs/home/brick4
Options Reconfigured:
diagnostics.brick-log-level: CRITICAL
cluster.read-hash-mode: 2
features.limit-usage: 
features.quota: on
performance.cache-size: 512MB
performance.io-thread-count: 64
performance.flush-behind: off
performance.write-behind-window-size: 4MB
performance.write-behind: on
nfs.disable: on


___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users