Re: [Gluster-users] How to trigger a resync of a newly replaced empty brick in replicate config ?
0completed0:03:53 Estimated time left for rebalance to complete : 359739:51:24 volume rebalance: home: success Thanks, A. On Thursday, 1 February 2018 18:57:17 CET Serkan Çoban wrote: > What is server4? You just mentioned server1 and server2 previously. > Can you post the output of gluster v status volname > > On Thu, Feb 1, 2018 at 8:13 PM, Alessandro Ipe > wrote: > > Hi, > > > > > > Thanks. However "gluster v heal volname full" returned the following error > > message > > Commit failed on server4. Please check log file for details. > > > > I have checked the log files in /var/log/glusterfs on server4 (by grepping > > heal), but did not get any match. What should I be looking for and in > > which > > log file, please ? > > > > Note that there is currently a rebalance process running on the volume. > > > > > > Many thanks, > > > > > > A. > > > > On Thursday, 1 February 2018 17:32:19 CET Serkan Çoban wrote: > >> You do not need to reset brick if brick path does not change. Replace > >> the brick format and mount, then gluster v start volname force. > >> To start self heal just run gluster v heal volname full. > >> > >> On Thu, Feb 1, 2018 at 6:39 PM, Alessandro Ipe > > > > wrote: > >> > Hi, > >> > > >> > > >> > My volume home is configured in replicate mode (version 3.12.4) with > >> > the > >> > bricks server1:/data/gluster/brick1 > >> > server2:/data/gluster/brick1 > >> > > >> > server2:/data/gluster/brick1 was corrupted, so I killed gluster daemon > >> > for > >> > that brick on server2, umounted it, reformated it, remounted it and did > >> > a> > >> > > >> >> gluster volume reset-brick home server2:/data/gluster/brick1 > >> >> server2:/data/gluster/brick1 commit force> > >> > > >> > I was expecting that the self-heal daemon would start copying data from > >> > server1:/data/gluster/brick1 (about 7.4 TB) to the empty > >> > server2:/data/gluster/brick1, which it only did for directories, but > >> > not > >> > for files. > >> > > >> > For the moment, I launched on the fuse mount point > >> > > >> >> find . | xargs stat > >> > > >> > but crawling the whole volume (100 TB) to trigger self-healing of a > >> > single > >> > brick of 7.4 TB is unefficient. > >> > > >> > Is there any trick to only self-heal a single brick, either by setting > >> > some attributes to its top directory, for example ? > >> > > >> > > >> > Many thanks, > >> > > >> > > >> > Alessandro > >> > > >> > > >> > ___ > >> > Gluster-users mailing list > >> > Gluster-users@gluster.org > >> > http://lists.gluster.org/mailman/listinfo/gluster-users > > > > -- > > > > Dr. Ir. Alessandro Ipe > > Department of Observations Tel. +32 2 373 06 31 > > Remote Sensing from Space > > Royal Meteorological Institute > > Avenue Circulaire 3Email: > > B-1180 BrusselsBelgium alessandro@meteo.be > > Web: http://gerb.oma.be -- Dr. Ir. Alessandro Ipe Department of Observations Tel. +32 2 373 06 31 Remote Sensing from Space Royal Meteorological Institute Avenue Circulaire 3Email: B-1180 BrusselsBelgium alessandro@meteo.be Web: http://gerb.oma.be ___ Gluster-users mailing list Gluster-users@gluster.org http://lists.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] How to trigger a resync of a newly replaced empty brick in replicate config ?
Hi, Thanks. However "gluster v heal volname full" returned the following error message Commit failed on server4. Please check log file for details. I have checked the log files in /var/log/glusterfs on server4 (by grepping heal), but did not get any match. What should I be looking for and in which log file, please ? Note that there is currently a rebalance process running on the volume. Many thanks, A. On Thursday, 1 February 2018 17:32:19 CET Serkan Çoban wrote: > You do not need to reset brick if brick path does not change. Replace > the brick format and mount, then gluster v start volname force. > To start self heal just run gluster v heal volname full. > > On Thu, Feb 1, 2018 at 6:39 PM, Alessandro Ipe wrote: > > Hi, > > > > > > My volume home is configured in replicate mode (version 3.12.4) with the > > bricks server1:/data/gluster/brick1 > > server2:/data/gluster/brick1 > > > > server2:/data/gluster/brick1 was corrupted, so I killed gluster daemon for > > that brick on server2, umounted it, reformated it, remounted it and did a> > >> gluster volume reset-brick home server2:/data/gluster/brick1 > >> server2:/data/gluster/brick1 commit force> > > I was expecting that the self-heal daemon would start copying data from > > server1:/data/gluster/brick1 (about 7.4 TB) to the empty > > server2:/data/gluster/brick1, which it only did for directories, but not > > for files. > > > > For the moment, I launched on the fuse mount point > > > >> find . | xargs stat > > > > but crawling the whole volume (100 TB) to trigger self-healing of a single > > brick of 7.4 TB is unefficient. > > > > Is there any trick to only self-heal a single brick, either by setting > > some attributes to its top directory, for example ? > > > > > > Many thanks, > > > > > > Alessandro > > > > > > ___ > > Gluster-users mailing list > > Gluster-users@gluster.org > > http://lists.gluster.org/mailman/listinfo/gluster-users -- Dr. Ir. Alessandro Ipe Department of Observations Tel. +32 2 373 06 31 Remote Sensing from Space Royal Meteorological Institute Avenue Circulaire 3Email: B-1180 BrusselsBelgium alessandro@meteo.be Web: http://gerb.oma.be ___ Gluster-users mailing list Gluster-users@gluster.org http://lists.gluster.org/mailman/listinfo/gluster-users
[Gluster-users] How to trigger a resync of a newly replaced empty brick in replicate config ?
Hi, My volume home is configured in replicate mode (version 3.12.4) with the bricks server1:/data/gluster/brick1 server2:/data/gluster/brick1 server2:/data/gluster/brick1 was corrupted, so I killed gluster daemon for that brick on server2, umounted it, reformated it, remounted it and did a > gluster volume reset-brick home server2:/data/gluster/brick1 > server2:/data/gluster/brick1 commit force I was expecting that the self-heal daemon would start copying data from server1:/data/gluster/brick1 (about 7.4 TB) to the empty server2:/data/gluster/brick1, which it only did for directories, but not for files. For the moment, I launched on the fuse mount point > find . | xargs stat but crawling the whole volume (100 TB) to trigger self-healing of a single brick of 7.4 TB is unefficient. Is there any trick to only self-heal a single brick, either by setting some attributes to its top directory, for example ? Many thanks, Alessandro ___ Gluster-users mailing list Gluster-users@gluster.org http://lists.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Memory leak in 3.6.9
OK, great... Any plan to backport those important fixes to the 3.6 branch ? Because, I am not reading to upgrade to the 3.7 branch for a production system. My fears is that 3.7 will bring other new issues and all I want is a stable and reliable branch without extra new functionalities (and new bugs) that will just work under normal use. Thanks, A. On Wednesday 27 April 2016 09:58:00 Tim wrote: There have been alot of fixes since 3.6.9. Specifically, https://bugzilla.redhat.com/1311377[1] was fixed in 3.7.9. re:https://github.com/gluster/glusterfs/blob/release-3.7/doc/release-notes/3.7.9.md[2] Hi, Apparently, version 3.6.9 is suffering from a SERIOUS memoryleak as illustrated in the following logs: 2016-04-26T11:54:27.971564+00:00 tsunami1 kernel:[698635.210069] glusterfsd invoked oom-killer: gfp_mask=0x201da,order=0, oom_score_adj=0 2016-04-26T11:54:27.974133+00:00 tsunami1 kernel:[698635.210076] Pid: 28111, comm: glusterfsd Tainted: G W O3.7.10-1.1-desktop #1 2016-04-26T11:54:27.974136+00:00 tsunami1 kernel:[698635.210077] Call Trace: 2016-04-26T11:54:27.974137+00:00 tsunami1 kernel:[698635.210090] [] dump_trace+0x88/0x300 2016-04-26T11:54:27.974137+00:00 tsunami1 kernel:[698635.210096] [] dump_stack+0x69/0x6f 2016-04-26T11:54:27.974138+00:00 tsunami1 kernel:[698635.210101] []dump_header+0x70/0x200 2016-04-26T11:54:27.974139+00:00 tsunami1 kernel:[698635.210105] []oom_kill_process+0x244/0x390 2016-04-26T11:54:28.113125+00:00 tsunami1 kernel:[698635.210111] []out_of_memory+0x451/0x490 2016-04-26T11:54:28.113142+00:00 tsunami1 kernel:[698635.210116] []__alloc_pages_nodemask+0x8ae/0x9f0 2016-04-26T11:54:28.113143+00:00 tsunami1 kernel:[698635.210122] []alloc_pages_current+0xb7/0x130 2016-04-26T11:54:28.113144+00:00 tsunami1 kernel:[698635.210127] []filemap_fault+0x283/0x440 2016-04-26T11:54:28.113144+00:00 tsunami1 kernel:[698635.210131] [] __do_fault+0x6e/0x560 2016-04-26T11:54:28.113145+00:00 tsunami1 kernel:[698635.210136] []handle_pte_fault+0x97/0x490 2016-04-26T11:54:28.113145+00:00 tsunami1 kernel:[698635.210141] []__do_page_fault+0x16b/0x4c0 2016-04-26T11:54:28.113562+00:00 tsunami1 kernel:[698635.210145] [] page_fault+0x28/0x30 2016-04-26T11:54:28.113565+00:00 tsunami1 kernel:[698635.210158] [<7fa9d8a8292b>] 0x7fa9d8a8292a 2016-04-26T11:54:28.120811+00:00 tsunami1 kernel:[698635.226243] Out of memory: Kill process 17144 (glusterfsd)score 694 or sacrifice child 2016-04-26T11:54:28.120811+00:00 tsunami1 kernel:[698635.226251] Killed process 17144 (glusterfsd)total-vm:8956384kB, anon-rss:6670900kB, file- rss:0kB It makes this version completely useless in production. Bricksservers have 8 GB of RAM (but will be upgraded to 16 GB). gluster volume info returns: Volume Name: home Type: Distributed-Replicate Volume ID: 501741ed-4146-4022-af0b-41f5b1297766 Status: Started Number of Bricks: 14 x 2 = 28 Transport-type: tcp Bricks: Brick1: tsunami1:/data/glusterfs/home/brick1 Brick2: tsunami2:/data/glusterfs/home/brick1 Brick3: tsunami1:/data/glusterfs/home/brick2 Brick4: tsunami2:/data/glusterfs/home/brick2 Brick5: tsunami1:/data/glusterfs/home/brick3 Brick6: tsunami2:/data/glusterfs/home/brick3 Brick7: tsunami1:/data/glusterfs/home/brick4 Brick8: tsunami2:/data/glusterfs/home/brick4 Brick9: tsunami3:/data/glusterfs/home/brick1 Brick10: tsunami4:/data/glusterfs/home/brick1 Brick11: tsunami3:/data/glusterfs/home/brick2 Brick12: tsunami4:/data/glusterfs/home/brick2 Brick13: tsunami3:/data/glusterfs/home/brick3 Brick14: tsunami4:/data/glusterfs/home/brick3 Brick15: tsunami3:/data/glusterfs/home/brick4 Brick16: tsunami4:/data/glusterfs/home/brick4 Brick17: tsunami5:/data/glusterfs/home/brick1 Brick18: tsunami6:/data/glusterfs/home/brick1 Brick19: tsunami5:/data/glusterfs/home/brick2 Brick20: tsunami6:/data/glusterfs/home/brick2 Brick21: tsunami5:/data/glusterfs/home/brick3 Brick22: tsunami6:/data/glusterfs/home/brick3 Brick23: tsunami5:/data/glusterfs/home/brick4 Brick24: tsunami6:/data/glusterfs/home/brick4 Brick25: tsunami7:/data/glusterfs/home/brick1 Brick26: tsunami8:/data/glusterfs/home/brick1 Brick27: tsunami7:/data/glusterfs/home/brick2 Brick28: tsunami8:/data/glusterfs/home/brick2 Options Reconfigured: nfs.export-dir: /gerb-reproc/Archive nfs.volume-access: read-only cluster.ensure
[Gluster-users] Memory leak in 3.6.9
Hi, Apparently, version 3.6.9 is suffering from a SERIOUS memory leak as illustrated in the following logs: 2016-04-26T11:54:27.971564+00:00 tsunami1 kernel: [698635.210069] glusterfsd invoked oom-killer: gfp_mask=0x201da, order=0, oom_score_adj=0 2016-04-26T11:54:27.974133+00:00 tsunami1 kernel: [698635.210076] Pid: 28111, comm: glusterfsd Tainted: GW O 3.7.10-1.1-desktop #1 2016-04-26T11:54:27.974136+00:00 tsunami1 kernel: [698635.210077] Call Trace: 2016-04-26T11:54:27.974137+00:00 tsunami1 kernel: [698635.210090] [] dump_trace+0x88/0x300 2016-04-26T11:54:27.974137+00:00 tsunami1 kernel: [698635.210096] [] dump_stack+0x69/0x6f 2016-04-26T11:54:27.974138+00:00 tsunami1 kernel: [698635.210101] [] dump_header+0x70/0x200 2016-04-26T11:54:27.974139+00:00 tsunami1 kernel: [698635.210105] [] oom_kill_process+0x244/0x390 2016-04-26T11:54:28.113125+00:00 tsunami1 kernel: [698635.210111] [] out_of_memory+0x451/0x490 2016-04-26T11:54:28.113142+00:00 tsunami1 kernel: [698635.210116] [] __alloc_pages_nodemask+0x8ae/0x9f0 2016-04-26T11:54:28.113143+00:00 tsunami1 kernel: [698635.210122] [] alloc_pages_current+0xb7/0x130 2016-04-26T11:54:28.113144+00:00 tsunami1 kernel: [698635.210127] [] filemap_fault+0x283/0x440 2016-04-26T11:54:28.113144+00:00 tsunami1 kernel: [698635.210131] [] __do_fault+0x6e/0x560 2016-04-26T11:54:28.113145+00:00 tsunami1 kernel: [698635.210136] [] handle_pte_fault+0x97/0x490 2016-04-26T11:54:28.113145+00:00 tsunami1 kernel: [698635.210141] [] __do_page_fault+0x16b/0x4c0 2016-04-26T11:54:28.113562+00:00 tsunami1 kernel: [698635.210145] [] page_fault+0x28/0x30 2016-04-26T11:54:28.113565+00:00 tsunami1 kernel: [698635.210158] [<7fa9d8a8292b>] 0x7fa9d8a8292a 2016-04-26T11:54:28.120811+00:00 tsunami1 kernel: [698635.226243] Out of memory: Kill process 17144 (glusterfsd) score 694 or sacrifice child 2016-04-26T11:54:28.120811+00:00 tsunami1 kernel: [698635.226251] Killed process 17144 (glusterfsd) total-vm:8956384kB, anon-rss:6670900kB, file-rss:0kB It makes this version completely useless in production. Bricks servers have 8 GB of RAM (but will be upgraded to 16 GB). gluster volume info returns: Volume Name: home Type: Distributed-Replicate Volume ID: 501741ed-4146-4022-af0b-41f5b1297766 Status: Started Number of Bricks: 14 x 2 = 28 Transport-type: tcp Bricks: Brick1: tsunami1:/data/glusterfs/home/brick1 Brick2: tsunami2:/data/glusterfs/home/brick1 Brick3: tsunami1:/data/glusterfs/home/brick2 Brick4: tsunami2:/data/glusterfs/home/brick2 Brick5: tsunami1:/data/glusterfs/home/brick3 Brick6: tsunami2:/data/glusterfs/home/brick3 Brick7: tsunami1:/data/glusterfs/home/brick4 Brick8: tsunami2:/data/glusterfs/home/brick4 Brick9: tsunami3:/data/glusterfs/home/brick1 Brick10: tsunami4:/data/glusterfs/home/brick1 Brick11: tsunami3:/data/glusterfs/home/brick2 Brick12: tsunami4:/data/glusterfs/home/brick2 Brick13: tsunami3:/data/glusterfs/home/brick3 Brick14: tsunami4:/data/glusterfs/home/brick3 Brick15: tsunami3:/data/glusterfs/home/brick4 Brick16: tsunami4:/data/glusterfs/home/brick4 Brick17: tsunami5:/data/glusterfs/home/brick1 Brick18: tsunami6:/data/glusterfs/home/brick1 Brick19: tsunami5:/data/glusterfs/home/brick2 Brick20: tsunami6:/data/glusterfs/home/brick2 Brick21: tsunami5:/data/glusterfs/home/brick3 Brick22: tsunami6:/data/glusterfs/home/brick3 Brick23: tsunami5:/data/glusterfs/home/brick4 Brick24: tsunami6:/data/glusterfs/home/brick4 Brick25: tsunami7:/data/glusterfs/home/brick1 Brick26: tsunami8:/data/glusterfs/home/brick1 Brick27: tsunami7:/data/glusterfs/home/brick2 Brick28: tsunami8:/data/glusterfs/home/brick2 Options Reconfigured: nfs.export-dir: /gerb-reproc/Archive nfs.volume-access: read-only cluster.ensure-durability: on features.quota: on performance.cache-size: 512MB performance.io-thread-count: 32 performance.flush-behind: off performance.write-behind-window-size: 4MB performance.write-behind: off nfs.disable: off cluster.read-hash-mode: 2 diagnostics.brick-log-level: CRITICAL cluster.lookup-unhashed: on server.allow-insecure: on auth.allow: localhost, cluster.readdir-optimize: on performance.readdir-ahead: on nfs.export-volumes: off Are you aware if this issue ? Thanks, A. ___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users
[Gluster-users] glusterfs fuse mount on clients
Hi, Our gluster system is currently made of 4 replicated pairs of servers, holding either 2 or 4 bricks of 4 HDs in RAID 10. We have a bunch of clients which are mounting the system through the native fuse glusterfs client, more specifically there are all using the same server1 to get the config of the volume with a backupvolfile-server=server2, i.e. in /etc/fstab as: *server1*:/home /mnt/server glusterfs defaults,_netdev,use-readdirp=no,direct-io-mode=disable,backupvolfile-server=*server2*,log-level=ERROR,log-file=/var/log/gluster.log 0 0 Would it be better if my clients were using alternatively server1 to server8 in terms of network/disk loads ? Many thanks, Alessandro. ___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Is rebalance completely broken on 3.5.3 ?
Hi Nithya, Sorry that it took so long to respond... 1. Indeed, couple of weeks ago, I added 2 bricks (in replicate mode) with add-brick and since, I was never able to complete the required rebalance (however a rebalance fix-layout completed). 2. *home-rebalance.log* [2015-03-13 21:32:58.066242] E [dht-rebalance.c:1328:gf_defrag_migrate_data] 0-home-dht: /seviri/.forward lookup failed and the same "lookup failed" log for a lot of other files [2015-03-13 21:32:58.245795] E [dht-linkfile.c:278:dht_linkfile_setattr_cbk] 0-home-dht: setattr of uid/gid on /seviri/.forward : failed (Stale NFS file handle) [2015-03-13 21:32:58.286201] E [dht-common.c:2465:dht_vgetxattr_cbk] 0-home-dht: Subvolume home-replicate-4 returned -1 (Stale NFS file handle) [2015-03-13 21:32:58.286258] E [dht-rebalance.c:1336:gf_defrag_migrate_data] 0-home-dht: Failed to get node-uuid for /seviri/.forward and after initiating a stop command on the cli [2015-03-19 10:34:38.484381] E [dht-rebalance.c:1622:gf_defrag_fix_layout] 0-home-dht: Fix layout failed for /seviri/MSG/2007/MSG1_20070106/HRIT_200701060115 [2015-03-19 10:34:38.487426] E [dht-rebalance.c:1622:gf_defrag_fix_layout] 0-home-dht: Fix layout failed for /seviri/MSG/2007/MSG1_20070106 [2015-03-19 10:34:38.487943] E [dht-rebalance.c:1622:gf_defrag_fix_layout] 0-home-dht: Fix layout failed for /seviri/MSG/2007 [2015-03-19 10:34:38.488361] E [dht-rebalance.c:1622:gf_defrag_fix_layout] 0-home-dht: Fix layout failed for /seviri/MSG [2015-03-19 10:34:38.488801] E [dht-rebalance.c:1622:gf_defrag_fix_layout] 0-home-dht: Fix layout failed for /seviri 3. We are exclusively accessing the servers through the native gluster fuse client, so no NFs mount. 4. The attributes of this specific file are given in my initial post at http://www.gluster.org/pipermail/gluster-users/2015-March/021175.html Meanwhile, I launched a full heal and that specific file could be accessed normally, after a couple of days of healing... However, I now get in the client log (gluster.log) the following messages: [2015-04-01 15:20:36.218425] E [afr-self-heal-common.c:2868:afr_log_self_heal_completion_status] 2-home-replicate-7: metadata self heal failed, on /seviri [2015-04-01 15:20:36.218555] E [afr-self-heal-common.c:2868:afr_log_self_heal_completion_status] 2-home-replicate-5: metadata self heal failed, on /seviri [2015-04-01 15:20:36.218630] E [afr-self-heal-common.c:2868:afr_log_self_heal_completion_status] 2-home-replicate-2: metadata self heal failed, on /seviri [2015-04-01 15:20:36.218770] E [afr-self-heal-common.c:2868:afr_log_self_heal_completion_status] 2-home-replicate-4: metadata self heal failed, on /seviri [2015-04-01 15:20:36.218840] E [afr-self-heal-common.c:2868:afr_log_self_heal_completion_status] 2-home-replicate-6: metadata self heal failed, on /seviri [2015-04-01 15:20:36.218915] E [afr-self-heal-common.c:2868:afr_log_self_heal_completion_status] 2-home-replicate-9: metadata self heal failed, on /seviri [2015-04-01 15:20:36.218976] E [afr-self-heal-common.c:2868:afr_log_self_heal_completion_status] 2-home-replicate-10: metadata self heal failed, on /seviri [2015-04-01 15:20:36.219230] E [afr-self-heal-common.c:2868:afr_log_self_heal_completion_status] 2-home-replicate-1: metadata self heal failed, on /seviri [2015-04-01 15:20:36.220062] E [afr-self-heal-common.c:2868:afr_log_self_heal_completion_status] 2-home-replicate-8: metadata self heal failed, on /seviri [2015-04-01 15:20:36.236306] E [afr-self-heal-common.c:2868:afr_log_self_heal_completion_status] 2-home-replicate-11: metadata self heal failed, on /seviri and various other top level directories from / on the volume. Is there a way to fix this ? Regards, A. On Thursday 26 March 2015 11:36:19 Nithya Balachandran wrote: > Hi Alessandro, > > Thanks for the information. A few more questions: > > 1. Did you do an add-brick or remove-brick before doing the rebalance? If > yes, how many bricks did you add/remove? 2. Can you send us the rebalance, > client(NFS log if you are using an NFS client only) and brick logs? 3. It > looks like you are using an NFS client. Can you please confirm? 4. Is > /home/seviri/.forward the only file on which you are seeing the stale file > handle errors? Can you please provide the following information for this > file on all bricks - the xattrs for the parent directory (/home/seviri/) as > well as the file on each brick - Brick1 to Brick24 - with details of which > node it is on so we can get a clearer picture. - the ls -li output on the > bricks for the file on each node. > > > As far as I know, there have not been any major changes to rebalance between > 3.5.3 and 3.6.3 but I will confirm. > > Regards, > Nithya > > - Original Message - > From: "Alessandro Ipe" > To: "Nithya Balachandran" > Cc: gluste
Re: [Gluster-users] Is rebalance completely broken on 3.5.3 ?
Hi Nithya, Thanks for your reply. I am glad that improving the rebalance status will be addressed in the (near) future. For my perspective, if the status is giving the total files to be scanned together with the files already scanned, it is sufficient information. Indeed, the user would see when it would complete (by doing several "gluster volume rebalance status" and computing differences according to elapsed time between them). Please find below the answers to your questions: 1. Server and client are version 3.5.3 2. Indeed, I stopped the rebalance through the associated commdn from CLI, i.e. gluster rebalance stop 3. Very limited file operations were carried out through a single client mount (servers were almost idle) 4.gluster volume info : Volume Name: home Type: Distributed-Replicate Volume ID: 501741ed-4146-4022-af0b-41f5b1297766 Status: Started Number of Bricks: 12 x 2 = 24 Transport-type: tcp Bricks: Brick1: tsunami1:/data/glusterfs/home/brick1 Brick2: tsunami2:/data/glusterfs/home/brick1 Brick3: tsunami1:/data/glusterfs/home/brick2 Brick4: tsunami2:/data/glusterfs/home/brick2 Brick5: tsunami1:/data/glusterfs/home/brick3 Brick6: tsunami2:/data/glusterfs/home/brick3 Brick7: tsunami1:/data/glusterfs/home/brick4 Brick8: tsunami2:/data/glusterfs/home/brick4 Brick9: tsunami3:/data/glusterfs/home/brick1 Brick10: tsunami4:/data/glusterfs/home/brick1 Brick11: tsunami3:/data/glusterfs/home/brick2 Brick12: tsunami4:/data/glusterfs/home/brick2 Brick13: tsunami3:/data/glusterfs/home/brick3 Brick14: tsunami4:/data/glusterfs/home/brick3 Brick15: tsunami3:/data/glusterfs/home/brick4 Brick16: tsunami4:/data/glusterfs/home/brick4 Brick17: tsunami5:/data/glusterfs/home/brick1 Brick18: tsunami6:/data/glusterfs/home/brick1 Brick19: tsunami5:/data/glusterfs/home/brick2 Brick20: tsunami6:/data/glusterfs/home/brick2 Brick21: tsunami5:/data/glusterfs/home/brick3 Brick22: tsunami6:/data/glusterfs/home/brick3 Brick23: tsunami5:/data/glusterfs/home/brick4 Brick24: tsunami6:/data/glusterfs/home/brick4 Options Reconfigured: performance.cache-size: 512MB performance.io-thread-count: 64 performance.flush-behind: off performance.write-behind-window-size: 4MB performance.write-behind: on nfs.disable: on features.quota: off cluster.read-hash-mode: 2 diagnostics.brick-log-level: CRITICAL cluster.lookup-unhashed: on server.allow-insecure: on cluster.ensure-durability: on For the logs, it will be more difficult because it happened several days ago, and they were rotated. But I can dig... By the way, do you need a specific logfile, because gluster produces a lot of them... I read in some discussion on the gluster-users mailing list that rebalance on version 3.5.x could leave the system with errors when stopped (or even when ran up to its completion ?) and that rebalance had gone a complete rewrite in 3.6.x. The issue is that I will put back online gluster next week, so my colleagues will definitively put it under high load and I was planning to run again the rebalance in the background. However, is it advisable ? Or should I wait after upgrading to 3.6.3 ? I also noticed (currently undergoing a full heal on the volume) that accessing to some files on the client returned a "Transport endoint is not connected" the first time, but any new access was OK (probably due to self-healing). However, it is possible to setup a client or a volume parameter to just wait (and make the calling process wait) for the self-healing to complete and deliver the file the first time without issuing an error (extremely usefull in batch/operational processing) ? Regards, Alessandro. On Wednesday 25 March 2015 05:09:38 Nithya Balachandran wrote: > Hi Alessandro, > > > I am sorry to hear that you are facing problems with rebalance. > > Currently rebalance does not have the information as to how many files exist > on the volume and so cannot calculate/estimate the time it will take to > complete. Improving the rebalance status output to provide that info is on > our to-do list already and we will be working on that. > > I have a few questions : > > 1. Which version of Glusterfs are you using? > 2. How did you stop the rebalance ? I assume you ran "gluster > rebalance stop" but just wanted confirmation. 3. What file operations were > being performed during the rebalance? 4. Can you send the "gluster volume > info" output as well as the gluster log files? > > Regards, > Nithya > > - Original Message - > From: "Alessandro Ipe" > To: gluster-users@gluster.org > Sent: Friday, March 20, 2015 4:52:35 PM > Subject: [Gluster-users] Is rebalance completely broken on 3.5.3 ? > > > > Hi, > > > > > > After lauching a "rebalance" on an idle gluster system one week ago, its > status told me it has scanned > > more than 23 millions files on eac
Re: [Gluster-users] Is rebalance completely broken on 3.5.3 ?
Hi Olav, Thanks for the info. I read the whole thread that you sent me... and I am more scared than ever... The fact that the developers do not have a clue of what is causing this issue is just frightening. Concerning my issue, apparently after two days (a full heal is ongoing on the volume), I did not get any error messages from the client when trying to list the incriminate files, but I got twice the same file .forward with the same content, size, permissions and date... which is consistent to what you got previously... I simply remove TWICE the file with rm on the client and copy back a sane version. The one million dollar question is : are there more files in a similar state on my 90 TB volume ? I am delaying a find on the whole volume to find out... What also concerns me is the absence of aknowledgement or reply from the developers concerning this severe issue... The fact that only end-users on production setup hit this issue while it cannot be reproduced in labs should be a clear signal that this should addressed in priority, from my point of view. And lab testing should also try to mimic real life use, with bricks servers under heavy load (> 10), with several tens of client accessing the gluster volume to track down all possible issues resulting from either network, i/o, ... timeouts. Thanks for your help, Alessandro. ___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users
[Gluster-users] Is rebalance completely broken on 3.5.3 ?
Hi, After lauching a "rebalance" on an idle gluster system one week ago, its status told me it has scanned more than 23 millions files on each of my 6 bricks. However, without knowing at least the total files to be scanned, this status is USELESS from an end-user perspective, because it does not allow you to know WHEN the rebalance could eventually complete (one day, one week, one year or never). From my point of view, the total files per bricks could be obtained and maintained when activating quota, since the whole filesystem has to be crawled... After one week being offline and still no clue when the rebalance would complete, I decided to stop it... Enormous mistake... It seems that rebalance cannot manage to not screw some files. Example, on the only client mounting the gluster system, "ls -la /home/seviri" returns ls: cannot access /home/seviri/.forward: Stale NFS file handle ls: cannot access /home/seviri/.forward: Stale NFS file handle -? ? ? ? ?? .forward -? ? ? ? ?? .forward while this file could perfectly be accessed before (being rebalanced) and has not been modifed for at least 3 years. Getting the extended attributes on the various bricks 3, 4, 5, 6 (3-4 replicate, 5-6 replicate) *Brick 3:* ls -l /data/glusterfs/home/brick?/seviri/.forward -rw-r--r-- 2 seviri users 68 May 26 2014 /data/glusterfs/home/brick1/seviri/.forward -rw-r--r-- 2 seviri users 68 Mar 10 10:22 /data/glusterfs/home/brick2/seviri/.forward getfattr -d -m . -e hex /data/glusterfs/home/brick?/seviri/.forward # file: data/glusterfs/home/brick1/seviri/.forward trusted.afr.home-client-8=0x trusted.afr.home-client-9=0x trusted.gfid=0xc1d268beb17443a39d914de917de123a # file: data/glusterfs/home/brick2/seviri/.forward trusted.afr.home-client-10=0x trusted.afr.home-client-11=0x trusted.gfid=0x14a1c10eb1474ef2bf72f4c6c64a90ce trusted.glusterfs.quota.4138a9fa-a453-4b8e-905a-e02cce07d717.contri=0x0200 trusted.pgfid.4138a9fa-a453-4b8e-905a-e02cce07d717=0x0001 *Brick 4:* ls -l /data/glusterfs/home/brick?/seviri/.forward -rw-r--r-- 2 seviri users 68 May 26 2014 /data/glusterfs/home/brick1/seviri/.forward -rw-r--r-- 2 seviri users 68 Mar 10 10:22 /data/glusterfs/home/brick2/seviri/.forward getfattr -d -m . -e hex /data/glusterfs/home/brick?/seviri/.forward # file: data/glusterfs/home/brick1/seviri/.forward trusted.afr.home-client-8=0x trusted.afr.home-client-9=0x trusted.gfid=0xc1d268beb17443a39d914de917de123a # file: data/glusterfs/home/brick2/seviri/.forward trusted.afr.home-client-10=0x trusted.afr.home-client-11=0x trusted.gfid=0x14a1c10eb1474ef2bf72f4c6c64a90ce trusted.glusterfs.quota.4138a9fa-a453-4b8e-905a-e02cce07d717.contri=0x0200 trusted.pgfid.4138a9fa-a453-4b8e-905a-e02cce07d717=0x0001 *Brick 5:* ls -l /data/glusterfs/home/brick?/seviri/.forward -T 2 root root 0 Mar 18 08:19 /data/glusterfs/home/brick2/seviri/.forward getfattr -d -m . -e hex /data/glusterfs/home/brick?/seviri/.forward # file: data/glusterfs/home/brick2/seviri/.forward trusted.gfid=0x14a1c10eb1474ef2bf72f4c6c64a90ce trusted.glusterfs.dht.linkto=0x686f6d652d7265706c69636174652d3400 *Brick 6:* ls -l /data/glusterfs/home/brick?/seviri/.forward -T 2 root root 0 Mar 18 08:19 /data/glusterfs/home/brick2/seviri/.forward getfattr -d -m . -e hex /data/glusterfs/home/brick?/seviri/.forward # file: data/glusterfs/home/brick2/seviri/.forward trusted.gfid=0x14a1c10eb1474ef2bf72f4c6c64a90ce trusted.glusterfs.dht.linkto=0x686f6d652d7265706c69636174652d3400 Looking at the results from bricks 3 & 4 shows something weird. The file exists on 2 sub-bricks storage directories, while it should only be found once on each brick server. Or is the issue lying in the results of bricks 5 & 6 ? *How can I fix this, please* ? By the way, the split-brain tutorial only covers BASIC split-brain conditions and not complex (real life) cases like this one. It would definitely benefit if enriched by this one. More generally, I think the concept of gluster is promising, but if basic commands (rebalance, absolutely needed after adding more storage) from its own cli allows to put the system into an unstable state, I am really starting to question its ability to be used in a production environment. And from an end-user perspective, I do not care about new features added, no matter how appealing they could be, if the basic ones are not almost totally reliable. Finally, testing gluster under high load on the brick servers (real world conditions) would certainly gives insight to the developpers on what it failing and what needs therefore to be fixed to mitigate this and improve gluster reliability. Forgive my harsh words/criticisms, but
[Gluster-users] Missing extended attributes to some files on the bricks
Hi, Apparently, this occured after a failed rebalance due to exhaution of available disk space on the bricks. On the client, an ls on the directory gives ls: cannot access .inputrc: No such file or directory and displays ?? ? ?? ?? .inputrc Getting the attributes on the bricks gives on brick server 1 NOTHING !!! while on brick server 2 # file: data/glusterfs/home/brick1/aipe/.inputrc trusted.afr.home-client-0=0x trusted.afr.home-client-1=0x trusted.gfid=0xeed4fb5048b8a0320e8632f34ed3 trusted.glusterfs.quota.c7ee612b-0dfe-4832-9efe-531040c696fd.contri=0x0400 trusted.pgfid.c7ee612b-0dfe-4832-9efe-531040c696fd=0x0001 Any clue on how to fix, i.e. force healing from brick server 2 to brick server 1, so the file gets its correct attributes and a hardlink in the .glusterfs directory ? Many thanks, Alessandro. ___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Rebalance issue on 3.5.3
Hi, The extended attrbitutes are (according to the brick number) 1. # file: data/glusterfs/home/brick1/aipe/.xinitrc.template trusted.gfid=0x67bf3db057474c0a892f459b6c622ee8 trusted.glusterfs.dht.linkto=0x686f6d652d7265706c69636174652d3500 trusted.pgfid.c7ee612b-0dfe-4832-9efe-531040c696fd=0x0001 2. # file: data/glusterfs/home/brick1/aipe/.xinitrc.template trusted.gfid=0x67bf3db057474c0a892f459b6c622ee8 trusted.glusterfs.dht.linkto=0x686f6d652d7265706c69636174652d3500 trusted.pgfid.c7ee612b-0dfe-4832-9efe-531040c696fd=0x0001 Stat'ing these two gives me 0-size file 3. # file: data/glusterfs/home/brick2/aipe/.xinitrc.template trusted.afr.home-client-10=0x trusted.afr.home-client-11=0x trusted.gfid=0x67bf3db057474c0a892f459b6c622ee8 trusted.glusterfs.quota.c7ee612b-0dfe-4832-9efe-531040c696fd.contri=0x0 600 trusted.pgfid.c7ee612b-0dfe-4832-9efe-531040c696fd=0x0001 4. # file: data/glusterfs/home/brick2/aipe/.xinitrc.template trusted.afr.home-client-10=0x trusted.afr.home-client-11=0x trusted.gfid=0x67bf3db057474c0a892f459b6c622ee8 trusted.glusterfs.quota.c7ee612b-0dfe-4832-9efe-531040c696fd.contri=0x0 600 trusted.pgfid.c7ee612b-0dfe-4832-9efe-531040c696fd=0x0001 These two are non-0-size files. Thanks, A. On Wednesday 11 March 2015 07:51:59 Joe Julian wrote: Those files are dht link files. Check out the extended attributes, "getfattr -m . -d" On March 10, 2015 7:30:33 AM PDT, Alessandro Ipe wrote: Hi, I launched a couple a days ago a rebalance on my gluster distribute-replicate volume (see below) through its CLI, while allowing my users to continue using the volume. Yesterday, they managed to fill completely the volume. It now results in unavailable files on the client (using fuse) with the message "Transport endpoint is not connected". Investigating to associated files on the bricks, I noticed that these are displayed with ls -l as -T 2 user group 0 Jan 15 22:00 file Performing a ls -lR /data/glusterfs/home/brick1/* | grep -F -- "-T" on a single brick gave me a LOT of files in that above-mentioned state. Why are the files in that state ? Did I lose all these files or can they still be recovered from the replicate copy of another brick ? Regards, Alessandro. gluster volume info home output: Volume Name: home Type: Distributed-Replicate Volume ID: 501741ed-4146-4022-af0b-41f5b1297766 Status: Started Number of Bricks: 12 x 2 = 24 Transport-type: tcp Bricks: Brick1: tsunami1:/data/glusterfs/home/brick1 Brick2: tsunami2:/data/glusterfs/home/brick1 Brick3: tsunami1:/data/glusterfs/home/brick2 Brick4: tsunami2:/data/glusterfs/home/brick2 Brick5: tsunami1:/data/glusterfs/home/brick3 Brick6: tsunami2:/data/glusterfs/home/brick3 Brick7: tsunami1:/data/glusterfs/home/brick4 Brick8: tsunami2:/data/glusterfs/home/brick4 Brick9: tsunami3:/data/glusterfs/home/brick1 Brick10: tsunami4:/data/glusterfs/home/brick1 Brick11: tsunami3:/data/glusterfs/home/brick2 Brick12: tsunami4:/data/glusterfs/home/brick2 Brick13: tsunami3:/data/glusterfs/home/brick3 Brick14: tsunami4:/data/glusterfs/home/brick3 Brick15: tsunami3:/data/glusterfs/home/brick4 Brick16: tsunami4:/data/glusterfs/home/brick4 Brick17: tsunami5:/data/glusterfs/home/brick1 Brick18: tsunami6:/data/glusterfs/home/brick1 Brick19: tsunami5:/data/glusterfs/home/brick2 Brick20: tsunami6:/data/glusterfs/home/brick2 Brick21: tsunami5:/data/glusterfs/home/brick3 Brick22: tsunami6:/data/glusterfs/home/brick3 Brick23: tsunami5:/data/glusterfs/home/brick4 Brick24: tsunami6:/data/glusterfs/home/brick4 Options Reconfigured: features.default-soft-limit: 95% cluster.ensure-durability: off performance.cache-size: 512MB performance.io-thread-count: 64 performance.flush-behind: off performance.write-behind-window-size: 4MB performance.write-behind: on nfs.disable: on___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Input/output error when trying to access a file on client
Hi, In fact, going up one directory level (root of the gluster volume), I get similarly 1. # file: data/glusterfs/md1/brick1/ trusted.afr.md1-client-0=0x trusted.afr.md1-client-1=0x trusted.gfid=0x0001 trusted.glusterfs.dht=0x0001 trusted.glusterfs.volume-id=0x6da4b9151def4df4a41c2f3300ebf16b 2. # file: data/glusterfs/md1/brick1/ trusted.afr.md1-client-0=0x trusted.afr.md1-client-1=0x trusted.gfid=0x0001 trusted.glusterfs.dht=0x0001 trusted.glusterfs.volume-id=0x6da4b9151def4df4a41c2f3300ebf16b 3. # file: data/glusterfs/md1/brick1/ trusted.afr.md1-client-2=0x trusted.afr.md1-client-3=0x trusted.gfid=0x0001 trusted.glusterfs.dht=0x00015554 trusted.glusterfs.volume-id=0x6da4b9151def4df4a41c2f3300ebf16b 4. # file: data/glusterfs/md1/brick1/ trusted.afr.md1-client-2=0x trusted.afr.md1-client-3=0x trusted.gfid=0x0001 trusted.glusterfs.dht=0x00015554 trusted.glusterfs.volume-id=0x6da4b9151def4df4a41c2f3300ebf16b These four bricks seem consistent, while the remaining two 5. # file: data/glusterfs/md1/brick1/ trusted.afr.md1-client-0=0x trusted.afr.md1-client-1=0x trusted.afr.md1-client-4=0x trusted.afr.md1-client-5=0x0002 trusted.gfid=0x0001 trusted.glusterfs.dht=0x0001aaa9 trusted.glusterfs.volume-id=0x6da4b9151def4df4a41c2f3300ebf16b 6. # file: data/glusterfs/md1/brick1/ trusted.afr.md1-client-0=0x trusted.afr.md1-client-1=0x trusted.afr.md1-client-4=0x0001 trusted.afr.md1-client-5=0x trusted.gfid=0x0001 trusted.glusterfs.dht=0x0001aaa9 trusted.glusterfs.volume-id=0x6da4b9151def4df4a41c2f3300ebf16b show two extra entries trusted.afr.md1-client-0 & trusted.afr.md1-client-1 and inconsistency between trusted.afr.md1-client-4 & trusted.afr.md1-client-5. Could it be this issue which propagates to all subdirectories in the volume and thus results in the error message in the client log file ? Should I remove trusted.afr.md1-client-0 & trusted.afr.md1-client-1 from brick5 & brick 6 ? Meanwhile, I am performing on the client find /home/.md1 -type f -exec cat {} > /dev/null \; to check if I can access the content of all files on the volume. For the moment, only 4 files gave errors. It is quite frustrating, because I believe that all my data is still intact on the bricks and it seems that it is only the metadata which got screwed... I am reluctant to perform something to heal by myself, because I have the feeling that it could do more harm than good. It's been more than 2 days now that my colleagues cannot access the data and I cannot make them wait much longer... A. On Thursday 12 March 2015 12:59:00 Alessandro Ipe wrote: Hi, Sorry about that, I thought I was using the -e hex... I must have removed it at some point accidentally. Here they are 1. # file: data/glusterfs/md1/brick1/root trusted.afr.md1-client-0=0x trusted.afr.md1-client-1=0x trusted.gfid=0xdc398cbd2ab440ec9fed3d5937654f4b trusted.glusterfs.dht=0x0001 2. # file: data/glusterfs/md1/brick1/root trusted.afr.md1-client-0=0x trusted.afr.md1-client-1=0x trusted.gfid=0xdc398cbd2ab440ec9fed3d5937654f4b trusted.glusterfs.dht=0x0001 3. # file: data/glusterfs/md1/brick1/root trusted.afr.md1-client-2=0x trusted.afr.md1-client-3=0x0001 trusted.gfid=0xdc398cbd2ab440ec9fed3d5937654f4b trusted.glusterfs.dht=0x00015554 4. getfattr: Removing leading '/' from absolute path names # file: data/glusterfs/md1/brick1/root trusted.afr.md1-client-2=0x0001 trusted.afr.md1-client-3=0x trusted.gfid=0xdc398cbd2ab440ec9fed3d5937654f4b trusted.glusterfs.dht=0x00015554 5. # file: data/glusterfs/md1/brick1/root trusted.afr.md1-client-4=0x trusted.afr.md1-client-5=0x0001 trusted.gfid=0xdc398cbd2ab440ec9fed3d5937654f4b trusted.glusterfs.dht=0x0001aaa9 6. # file: data/glusterfs/md1/brick1/root trusted.afr.md1-client-4=0x0001 trusted.afr.md1-client-5=0x___ Gluster-users mailing list Gluster-users@
Re: [Gluster-users] Input/output error when trying to access a file on client
Hi, Doing splitmount localhost md1 . and ls -l gives me total 8 drwxr-xr-x 12 root root 426 Jan 19 11:04 r1 drwxr-xr-x 12 root root 426 Jan 19 11:04 r2 -rw--- 1 root root 2840 Mar 12 12:08 tmp7TLytQ -rw--- 1 root root 2840 Mar 12 12:08 tmptI3gv_ Doing ls -l r1/root/bash_cmd/ gives me total 5 -rwxr-xr-x 1 root root 212 Nov 21 17:50 ira -rwxr-xr-x 1 root root 2311 Nov 21 17:50 listing drwxr-xr-x 2 root root 52 Jan 19 11:24 mbl -rwxr-xr-x 1 root root 1210 Nov 21 17:50 viewhdf while doing ls -l r1/root/bash_cmd/mbl/ gives me ls: cannot access r1/root/bash_cmd/mbl/mbl.c: Software caused connection abort ls: reading directory r1/root/bash_cmd/mbl/: Transport endpoint is not connected total 0 ?? ? ? ? ?? mbl.c A. On Wednesday 11 March 2015 07:52:11 Joe Julian wrote: http://joejulian.name/blog/glusterfs-split-brain-recovery-made-easy/[1] On March 11, 2015 4:24:09 AM PDT, Alessandro Ipe wrote: Well, it is even worse. Now when doing a "ls -R" on the volume results in a lot of [2015-03-11 11:18:31.957505] E [afr-self-heal- common.c:233:afr_sh_print_split_brain_log] 0-md1-replicate-2: Unable to self-heal contents of '/library' (possible split-brain). Please delete the file from all but the preferred subvolume.- Pending matrix: [ [ 0 2 ] [ 1 0 ] ][2015-03-11 11:18:31.957692] E [afr-self-heal-common.c:2868:afr_log_self_heal_completion_status] 0-md1- replicate-2: metadata self heal failed, on /library I am desperate... A. On Wednesday 11 March 2015 12:05:33 you wrote: Hi, When trying to access a file on a gluster client (through fuse), I get an "Input/output error" message. Getting the attributes for the file gives me for the first brick # file: data/glusterfs/md1/brick1/kvm/hail/hail_home.qcow2 trusted.afr.md1- client-2=0s trusted.afr.md1-client-3=0sAAABdAAA trusted.gfid=0sOCFPGCdrQ9uyq2yTTPCKqQ== while for the second (replicate) brick # file: data/glusterfs/md1/brick1/kvm/hail/hail_home.qcow2 trusted.afr.md1- client-2=0sAAABJAAA trusted.afr.md1-client-3=0s trusted.gfid=0sOCFPGCdrQ9uyq2yTTPCKqQ== It seems that I have a split-brain. How can I solve this issue by resetting the attributes, please ? Thanks, Alessandro. == gluster volume info md1 Volume Name: md1 Type: Distributed-Replicate Volume ID: 6da4b915-1def-4df4-a41c-2f3300ebf16b Status: Started Number of Bricks: 3 x 2 = 6 Transport-type: tcp Bricks: Brick1: tsunami1:/data/glusterfs/md1/brick1 Brick2: tsunami2:/data/glusterfs/md1/brick1 Brick3: tsunami3:/data/glusterfs/md1/brick1 Brick4: tsunami4:/data/glusterfs/md1/brick1 Brick5: tsunami5:/data/glusterfs/md1/brick1 Brick6: tsunami6:/data/glusterfs/md1/brick1 Options Reconfigured: server.allow-insecure: on cluster.read-hash-mode: 2 features.quota: off performance.write-behind: on performance.write-behind-window-size: 4MB performance.flush-behind: off performance.io[2]-thread-count: 64 performance.cache-size: 512MB nfs.disable: on cluster.lookup-unhashed: off http://www.gluster.org/mailman/listinfo/gluster-users[3] -- Dr. Ir. Alessandro Ipe Department of Observations Tel. +32 2 373 06 31 Remote Sensing from Space Fax. +32 2 374 67 88 Royal Meteorological Institute Avenue Circulaire 3Email: B-1180 BrusselsBelgium alessandro@meteo.be Web: http://gerb.oma.be [1] http://joejulian.name/blog/glusterfs-split-brain-recovery-made-easy/ [2] http://performance.io [3] http://www.gluster.org/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Input/output error when trying to access a file on client
Hi, Sorry about that, I thought I was using the -e hex... I must have removed it at some point accidentally. Here they are 1. # file: data/glusterfs/md1/brick1/root trusted.afr.md1-client-0=0x trusted.afr.md1-client-1=0x trusted.gfid=0xdc398cbd2ab440ec9fed3d5937654f4b trusted.glusterfs.dht=0x0001 2. # file: data/glusterfs/md1/brick1/root trusted.afr.md1-client-0=0x trusted.afr.md1-client-1=0x trusted.gfid=0xdc398cbd2ab440ec9fed3d5937654f4b trusted.glusterfs.dht=0x0001 3. # file: data/glusterfs/md1/brick1/root trusted.afr.md1-client-2=0x trusted.afr.md1-client-3=0x0001 trusted.gfid=0xdc398cbd2ab440ec9fed3d5937654f4b trusted.glusterfs.dht=0x00015554 4. getfattr: Removing leading '/' from absolute path names # file: data/glusterfs/md1/brick1/root trusted.afr.md1-client-2=0x0001 trusted.afr.md1-client-3=0x trusted.gfid=0xdc398cbd2ab440ec9fed3d5937654f4b trusted.glusterfs.dht=0x00015554 5. # file: data/glusterfs/md1/brick1/root trusted.afr.md1-client-4=0x trusted.afr.md1-client-5=0x0001 trusted.gfid=0xdc398cbd2ab440ec9fed3d5937654f4b trusted.glusterfs.dht=0x0001aaa9 6. # file: data/glusterfs/md1/brick1/root trusted.afr.md1-client-4=0x0001 trusted.afr.md1-client-5=0x trusted.gfid=0xdc398cbd2ab440ec9fed3d5937654f4b trusted.glusterfs.dht=0x0001aaa9 Thanks for your help, A. On Thursday 12 March 2015 07:51:40 Krutika Dhananjay wrote: Hi, Could you provide the xattrs in hex format? You can execute `getfattr -d -m . -e hex ` -Krutika *From: *"Alessandro Ipe" *To: *"Krutika Dhananjay" *Cc: *gluster-users@gluster.org *Sent: *Thursday, March 12, 2015 5:15:08 PM *Subject: *Re: [Gluster-users] Input/output error when trying to access a file on client Hi, Actually, my gluster volume is distribute-replicate so I should provide the attributes on all the bricks. Here they are: 1. # file: data/glusterfs/md1/brick1/root trusted.afr.md1-client-0=0s trusted.afr.md1-client-1=0s trusted.gfid=0s3DmMvSq0QOyf7T1ZN2VPSw== trusted.glusterfs.dht=0sAQCq/w== 2. # file: data/glusterfs/md1/brick1/root trusted.afr.md1-client-0=0s trusted.afr.md1-client-1=0s trusted.gfid=0s3DmMvSq0QOyf7T1ZN2VPSw== trusted.glusterfs.dht=0sAQCq/w== 3. # file: data/glusterfs/md1/brick1/root trusted.afr.md1-client-2=0s trusted.afr.md1-client-3=0sAAEA trusted.gfid=0s3DmMvSq0QOyf7T1ZN2VPSw== trusted.glusterfs.dht=0sAQAAVA== 4. # file: data/glusterfs/md1/brick1/root trusted.afr.md1-client-2=0sAAEA trusted.afr.md1-client-3=0s trusted.gfid=0s3DmMvSq0QOyf7T1ZN2VPSw== trusted.glusterfs.dht=0sAQAAVA== 5. # file: data/glusterfs/md1/brick1/root trusted.afr.md1-client-4=0s trusted.afr.md1-client-5=0sAAEA trusted.gfid=0s3DmMvSq0QOyf7T1ZN2VPSw== trusted.glusterfs.dht=0sAQBVqQ== 6. # file: data/glusterfs/md1/brick1/root___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Input/output error when trying to access a file on client
Hi, Actually, my gluster volume is distribute-replicate so I should provide the attributes on all the bricks. Here they are: 1. # file: data/glusterfs/md1/brick1/root trusted.afr.md1-client-0=0s trusted.afr.md1-client-1=0s trusted.gfid=0s3DmMvSq0QOyf7T1ZN2VPSw== trusted.glusterfs.dht=0sAQCq/w== 2. # file: data/glusterfs/md1/brick1/root trusted.afr.md1-client-0=0s trusted.afr.md1-client-1=0s trusted.gfid=0s3DmMvSq0QOyf7T1ZN2VPSw== trusted.glusterfs.dht=0sAQCq/w== 3. # file: data/glusterfs/md1/brick1/root trusted.afr.md1-client-2=0s trusted.afr.md1-client-3=0sAAEA trusted.gfid=0s3DmMvSq0QOyf7T1ZN2VPSw== trusted.glusterfs.dht=0sAQAAVA== 4. # file: data/glusterfs/md1/brick1/root trusted.afr.md1-client-2=0sAAEA trusted.afr.md1-client-3=0s trusted.gfid=0s3DmMvSq0QOyf7T1ZN2VPSw== trusted.glusterfs.dht=0sAQAAVA== 5. # file: data/glusterfs/md1/brick1/root trusted.afr.md1-client-4=0s trusted.afr.md1-client-5=0sAAEA trusted.gfid=0s3DmMvSq0QOyf7T1ZN2VPSw== trusted.glusterfs.dht=0sAQBVqQ== 6. # file: data/glusterfs/md1/brick1/root trusted.afr.md1-client-4=0sAAEA trusted.afr.md1-client-5=0s trusted.gfid=0s3DmMvSq0QOyf7T1ZN2VPSw== trusted.glusterfs.dht=0sAQBVqQ== so it seems in fact that there are discrepancies between 3-4 and 5-6 (replicate pairs). A. On Thursday 12 March 2015 11:33:00 Alessandro Ipe wrote: Hi, "gluster volume heal md1 info split-brain" returns approximatively 2000 files (already divided by 2 due to replicate volume). So manually repairing each split-brain is unfeasable. Before scripting some procedure, I need to be sure that I will not harm further the gluster system. Moreover, I noticed that the messages printed in the logs are all about directories, e.g. [2015-03-12 10:06:53.423856] E [afr-self-heal- common.c:233:afr_sh_print_split_brain_log] 0-md1-replicate-1: Unable to self-heal contents of '/root' (possible split-brain). Please delete the file from all but the preferred subvolume.- Pending matrix: [ [ 0 1 ] [ 1 0 ] ] [2015-03-12 10:06:53.424005] E [afr-self-heal- common.c:233:afr_sh_print_split_brain_log] 0-md1-replicate-2: Unable to self-heal contents of '/root' (possible split-brain). Please delete the file from all but the preferred subvolume.- Pending matrix: [ [ 0 1 ] [ 1 0 ] ] [2015-03-12 10:06:53.424110] E [afr-self-heal- common.c:2868:afr_log_self_heal_completion_status] 0-md1-replicate-1: metadata self heal failed, on /root [2015-03-12 10:06:53.424290] E [afr-self-heal- common.c:2868:afr_log_self_heal_completion_status] 0-md1-replicate-2: metadata self heal failed, on /root Getting the attributes of that directory on each brick gives me for the first # file: data/glusterfs/md1/brick1/root trusted.afr.md1-client-0=0s trusted.afr.md1-client-1=0s trusted.gfid=0s3DmMvSq0QOyf7T1ZN2VPSw== trusted.glusterfs.dht=0sAQCq/w== and for the second # file: data/glusterfs/md1/brick1/root trusted.afr.md1-client-0=0s trusted.afr.md1-client-1=0s trusted.gfid=0s3DmMvSq0QOyf7T1ZN2VPSw== trusted.glusterfs.dht=0sAQCq/w== so it seems that there are both rigorously identical. However, according to your split - brain tutorial, none of them has 0x. What 0s means in fact ? Should I change both attributes on each directory to 0x ? Many thanks, A. On Wednesday 11 March 2015 08:02:56 Krutika Dhananjay wrote: Hi, Have you gone through https://github.com/gluster/glusterfs/blob/master/doc/debugging/split-brain.md[1] ? If not, could you go through that once and try the steps given there? Do let us know if something is not clear in the doc.___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Input/output error when trying to access a file on client
Hi, "gluster volume heal md1 info split-brain" returns approximatively 2000 files (already divided by 2 due to replicate volume). So manually repairing each split-brain is unfeasable. Before scripting some procedure, I need to be sure that I will not harm further the gluster system. Moreover, I noticed that the messages printed in the logs are all about directories, e.g. [2015-03-12 10:06:53.423856] E [afr-self-heal-common.c:233:afr_sh_print_split_brain_log] 0-md1-replicate-1: Unable to self-heal contents of '/root' (possible split-brain). Please delete the file from all but the preferred subvolume.- Pending matrix: [ [ 0 1 ] [ 1 0 ] ] [2015-03-12 10:06:53.424005] E [afr-self-heal-common.c:233:afr_sh_print_split_brain_log] 0-md1-replicate-2: Unable to self-heal contents of '/root' (possible split-brain). Please delete the file from all but the preferred subvolume.- Pending matrix: [ [ 0 1 ] [ 1 0 ] ] [2015-03-12 10:06:53.424110] E [afr-self-heal-common.c:2868:afr_log_self_heal_completion_status] 0-md1-replicate-1: metadata self heal failed, on /root [2015-03-12 10:06:53.424290] E [afr-self-heal-common.c:2868:afr_log_self_heal_completion_status] 0-md1-replicate-2: metadata self heal failed, on /root Getting the attributes of that directory on each brick gives me for the first # file: data/glusterfs/md1/brick1/root trusted.afr.md1-client-0=0s trusted.afr.md1-client-1=0s trusted.gfid=0s3DmMvSq0QOyf7T1ZN2VPSw== trusted.glusterfs.dht=0sAQCq/w== and for the second # file: data/glusterfs/md1/brick1/root trusted.afr.md1-client-0=0s trusted.afr.md1-client-1=0s trusted.gfid=0s3DmMvSq0QOyf7T1ZN2VPSw== trusted.glusterfs.dht=0sAQCq/w== so it seems that there are both rigorously identical. However, according to your split -brain tutorial, none of them has 0x. What 0s means in fact ? Should I change both attributes on each directory to 0x ? Many thanks, A. On Wednesday 11 March 2015 08:02:56 Krutika Dhananjay wrote: Hi, Have you gone through https://github.com/gluster/glusterfs/blob/master/doc/debugging/split-brain.md[1] ? If not, could you go through that once and try the steps given there? Do let us know if something is not clear in the doc. -Krutika -------- *From: *"Alessandro Ipe" *To: *gluster-users@gluster.org *Sent: *Wednesday, March 11, 2015 4:54:09 PM *Subject: *Re: [Gluster-users] Input/output error when trying to access a file on client Well, it is even worse. Now when doing a "ls -R" on the volume results in a lot of [2015-03-11 11:18:31.957505] E [afr-self-heal-common.c:233:afr_sh_print_split_brain_log] 0-md1-replicate-2: Unable to self-heal contents of '/library' (possible split-brain). Please delete the file from all but the preferred subvolume.- Pending matrix: [ [ 0 2 ] [ 1 0 ] ][2015-03-11 11:18:31.957692] E [afr-self-heal-common.c:2868:afr_log_self_heal_completion_status] 0-md1-replicate-2: metadata self heal failed, on /library I am desperate... ___Gluster-users mailing listGluster-users@gluster.orghttp://www.gluster.org/mailman/listinfo/gluster-users [1] https://github.com/gluster/glusterfs/blob/master/doc/debugging/split-brain.md ___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Input/output error when trying to access a file on client
Well, it is even worse. Now when doing a "ls -R" on the volume results in a lot of [2015-03-11 11:18:31.957505] E [afr-self-heal-common.c:233:afr_sh_print_split_brain_log] 0-md1-replicate-2: Unable to self-heal contents of '/library' (possible split-brain). Please delete the file from all but the preferred subvolume.- Pending matrix: [ [ 0 2 ] [ 1 0 ] ] [2015-03-11 11:18:31.957692] E [afr-self-heal-common.c:2868:afr_log_self_heal_completion_status] 0-md1-replicate-2: metadata self heal failed, on /library I am desperate... A. On Wednesday 11 March 2015 12:05:33 you wrote: > Hi, > > > When trying to access a file on a gluster client (through fuse), I get an > "Input/output error" message. > > Getting the attributes for the file gives me for the first brick > # file: data/glusterfs/md1/brick1/kvm/hail/hail_home.qcow2 > trusted.afr.md1-client-2=0s > trusted.afr.md1-client-3=0sAAABdAAA > trusted.gfid=0sOCFPGCdrQ9uyq2yTTPCKqQ== > > while for the second (replicate) brick > # file: data/glusterfs/md1/brick1/kvm/hail/hail_home.qcow2 > trusted.afr.md1-client-2=0sAAABJAAA > trusted.afr.md1-client-3=0s > trusted.gfid=0sOCFPGCdrQ9uyq2yTTPCKqQ== > > It seems that I have a split-brain. How can I solve this issue by resetting > the attributes, please ? > > > Thanks, > > > Alessandro. > > == > gluster volume info md1 > > Volume Name: md1 > Type: Distributed-Replicate > Volume ID: 6da4b915-1def-4df4-a41c-2f3300ebf16b > Status: Started > Number of Bricks: 3 x 2 = 6 > Transport-type: tcp > Bricks: > Brick1: tsunami1:/data/glusterfs/md1/brick1 > Brick2: tsunami2:/data/glusterfs/md1/brick1 > Brick3: tsunami3:/data/glusterfs/md1/brick1 > Brick4: tsunami4:/data/glusterfs/md1/brick1 > Brick5: tsunami5:/data/glusterfs/md1/brick1 > Brick6: tsunami6:/data/glusterfs/md1/brick1 > Options Reconfigured: > server.allow-insecure: on > cluster.read-hash-mode: 2 > features.quota: off > performance.write-behind: on > performance.write-behind-window-size: 4MB > performance.flush-behind: off > performance.io-thread-count: 64 > performance.cache-size: 512MB > nfs.disable: on > cluster.lookup-unhashed: off ___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users
[Gluster-users] Input/output error when trying to access a file on client
Hi, When trying to access a file on a gluster client (through fuse), I get an "Input/output error" message. Getting the attributes for the file gives me for the first brick # file: data/glusterfs/md1/brick1/kvm/hail/hail_home.qcow2 trusted.afr.md1-client-2=0s trusted.afr.md1-client-3=0sAAABdAAA trusted.gfid=0sOCFPGCdrQ9uyq2yTTPCKqQ== while for the second (replicate) brick # file: data/glusterfs/md1/brick1/kvm/hail/hail_home.qcow2 trusted.afr.md1-client-2=0sAAABJAAA trusted.afr.md1-client-3=0s trusted.gfid=0sOCFPGCdrQ9uyq2yTTPCKqQ== It seems that I have a split-brain. How can I solve this issue by resetting the attributes, please ? Thanks, Alessandro. == gluster volume info md1 Volume Name: md1 Type: Distributed-Replicate Volume ID: 6da4b915-1def-4df4-a41c-2f3300ebf16b Status: Started Number of Bricks: 3 x 2 = 6 Transport-type: tcp Bricks: Brick1: tsunami1:/data/glusterfs/md1/brick1 Brick2: tsunami2:/data/glusterfs/md1/brick1 Brick3: tsunami3:/data/glusterfs/md1/brick1 Brick4: tsunami4:/data/glusterfs/md1/brick1 Brick5: tsunami5:/data/glusterfs/md1/brick1 Brick6: tsunami6:/data/glusterfs/md1/brick1 Options Reconfigured: server.allow-insecure: on cluster.read-hash-mode: 2 features.quota: off performance.write-behind: on performance.write-behind-window-size: 4MB performance.flush-behind: off performance.io-thread-count: 64 performance.cache-size: 512MB nfs.disable: on cluster.lookup-unhashed: off ___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users
[Gluster-users] Rebalance issue on 3.5.3
Hi, I launched a couple a days ago a rebalance on my gluster distribute-replicate volume (see below) through its CLI, while allowing my users to continue using the volume. Yesterday, they managed to fill completely the volume. It now results in unavailable files on the client (using fuse) with the message "Transport endpoint is not connected". Investigating to associated files on the bricks, I noticed that these are displayed with ls -l as -T 2 user group 0 Jan 15 22:00 file Performing a ls -lR /data/glusterfs/home/brick1/* | grep -F -- "-T" on a single brick gave me a LOT of files in that above-mentioned state. Why are the files in that state ? Did I lose all these files or can they still be recovered from the replicate copy of another brick ? Regards, Alessandro. gluster volume info home output: Volume Name: home Type: Distributed-Replicate Volume ID: 501741ed-4146-4022-af0b-41f5b1297766 Status: Started Number of Bricks: 12 x 2 = 24 Transport-type: tcp Bricks: Brick1: tsunami1:/data/glusterfs/home/brick1 Brick2: tsunami2:/data/glusterfs/home/brick1 Brick3: tsunami1:/data/glusterfs/home/brick2 Brick4: tsunami2:/data/glusterfs/home/brick2 Brick5: tsunami1:/data/glusterfs/home/brick3 Brick6: tsunami2:/data/glusterfs/home/brick3 Brick7: tsunami1:/data/glusterfs/home/brick4 Brick8: tsunami2:/data/glusterfs/home/brick4 Brick9: tsunami3:/data/glusterfs/home/brick1 Brick10: tsunami4:/data/glusterfs/home/brick1 Brick11: tsunami3:/data/glusterfs/home/brick2 Brick12: tsunami4:/data/glusterfs/home/brick2 Brick13: tsunami3:/data/glusterfs/home/brick3 Brick14: tsunami4:/data/glusterfs/home/brick3 Brick15: tsunami3:/data/glusterfs/home/brick4 Brick16: tsunami4:/data/glusterfs/home/brick4 Brick17: tsunami5:/data/glusterfs/home/brick1 Brick18: tsunami6:/data/glusterfs/home/brick1 Brick19: tsunami5:/data/glusterfs/home/brick2 Brick20: tsunami6:/data/glusterfs/home/brick2 Brick21: tsunami5:/data/glusterfs/home/brick3 Brick22: tsunami6:/data/glusterfs/home/brick3 Brick23: tsunami5:/data/glusterfs/home/brick4 Brick24: tsunami6:/data/glusterfs/home/brick4 Options Reconfigured: features.default-soft-limit: 95% cluster.ensure-durability: off performance.cache-size: 512MB performance.io-thread-count: 64 performance.flush-behind: off performance.write-behind-window-size: 4MB performance.write-behind: on nfs.disable: on features.quota: on cluster.read-hash-mode: 2 diagnostics.brick-log-level: CRITICAL cluster.lookup-unhashed: off server.allow-insecure: on ___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] rm -rf some_dir results in "Directory not empty"
gluster volume rebalance md1 status gives : Node Rebalanced-files size scanned failures skipped status run time in secs - --- --- --- --- --- -- localhost 3837 6.3GB 163881 0 0completed 365.00 tsunami5 179 343.8MB 163882 0 0completed 353.00 tsunami3 6786 4.7GB 163882 0 0completed 416.00 tsunami600Bytes 163882 0 0completed 353.00 tsunami400Bytes 163882 0 0completed 353.00 tsunami200Bytes 163882 0 0completed 353.00 volume rebalance: md1: success: but no change on the bricks for the directory, still empty except on 2 bricks. Should I remove files in the .glusterfs directory on the 2 bricks associated to these "---T" files ? Thanks, A. On Monday 23 February 2015 21:40:41 Ravishankar N wrote: On 02/23/2015 09:19 PM, Alessandro Ipe wrote: On 4 of the 6 bricks, it is empty. However,on tsunami 3-4, ls -lsa gives total 16 d- 2 root root 61440 Feb 23 15:42 . drwxrwxrwx 3 gerb users 61 Feb 22 21:10 .. -T 2 gerb users 0 Apr 16 2014 akonadi-googledata-1.2.0-2.5.2.i586.rpm -T 2 gerb users 0 Apr 16 2014 bluedevil-debugsource-1.2.2-1.8.3.i586.rpm -T 2 gerb users 0 Apr 16 2014bovo-4.7.4-3.12.7.i586.rpm -T 2 gerb users 0 Apr 16 2014 digikam-debugsource-2.2.0-3.12.9.i586.rpm -T 2 gerb users 0 Apr 16 2014 dolphin-debuginfo-4.7.4-4.22.6.i586.rpm -T 2 gerb users 0 Apr 16 2014freetds-doc-0.91-2.5.1.i586.rpm -T 2 gerb users 0 Apr 16 2014 kanagram-debuginfo-4.7.4-2.10.2.i586.rpm -T 2 gerb users 0 Apr 16 2014 kdebase4-runtime-4.7.4-3.17.7.i586.rpm -T 2 gerb users 0 Apr 16 2014 kdebindings-smokegen-debuginfo-4.7.4-2.9.1.i586.rpm -T 2 gerb users 0 Apr 16 2014 kdesdk4-strigi-debuginfo-4.7.4-3.12.5.i586.rpm -T 2 gerb users 0 Apr 16 2014kradio-4.0.2-9.9.7.i586.rpm -T 2 gerb users 0 Apr 16 2014 kremotecontrol-4.7.4-2.12.9.i586.rpm -T 2 gerb users 0 Apr 16 2014 kreversi-debuginfo-4.7.4-3.12.7.i586.rpm -T 2 gerb users 0 Apr 16 2014krfb-4.7.4-2.13.6.i586.rpm -T 2 gerb users 0 Apr 16 2014krusader-doc-2.0.0-23.9.7.i586.rpm -T 2 gerb users 0 Apr 16 2014 libalkimia-devel-4.3.1-2.5.1.i586.rpm -T 2 gerb users 0 Apr 16 2014libdmtx0-0.7.4-2.1.i586.rpm -T 2 gerb users 0 Apr 16 2014 libdmtx0-debuginfo-0.7.4-2.1.i586.rpm -T 2 gerb users 0 Apr 16 2014 libkdegames4-debuginfo-4.7.4-3.12.7.i586.rpm -T 2 gerb users 0 Apr 16 2014libksane0-4.7.4-2.10.1.i586.rpm -T 2 gerb users 0 Apr 16 2014 libkvkontakte-debugsource-1.0.0-2.2.i586.rpm -T 2 gerb users 0 Apr 16 2014 libmediawiki-debugsource-2.5.0-4.6.1.i586.rpm -T 2 gerb users 0 Apr 16 2014libsmokeqt-4.7.4-2.10.2.i586.rpm -T 2 gerb users 0 Apr 16 2014 NetworkManager-vpnc-kde4-0.9.1git20111027-1.11.5.i586.rpm -T 2 gerb users 0 Apr 16 2014qtcurve-kde4-1.8.8-3.6.2.i586.rpm -T 2 gerb users 0 Apr 16 2014 QtZeitgeist-devel-0.7.0-7.4.2.i586.rpm -T 2 gerb users 0 Apr 16 2014umbrello-4.7.4-3.12.5.i586.rpm so that might be the reason of error. How canI fix this ? The 'T' files are DHT link-to files. The actual files must be present on the other distribute subolumes (tsunami 1-2 or tsunami5-6) in the same path. But since that doesn't seem to be the case,the something went wrong with the re-balance process. You could run`gluster volume rebalance start+status` again andsee if they disappear. Thanks, A. On Monday 23 February 2015 21:06:58Ravishankar N wrote: Just noticed that your `gluster volumestatus`
Re: [Gluster-users] rm -rf some_dir results in "Directory not empty"
On 4 of the 6 bricks, it is empty. However, on tsunami 3-4, ls -lsa gives total 16 d- 2 root root 61440 Feb 23 15:42 . drwxrwxrwx 3 gerb users61 Feb 22 21:10 .. -T 2 gerb users 0 Apr 16 2014 akonadi-googledata-1.2.0-2.5.2.i586.rpm -T 2 gerb users 0 Apr 16 2014 bluedevil-debugsource-1.2.2-1.8.3.i586.rpm -T 2 gerb users 0 Apr 16 2014 bovo-4.7.4-3.12.7.i586.rpm -T 2 gerb users 0 Apr 16 2014 digikam-debugsource-2.2.0-3.12.9.i586.rpm -T 2 gerb users 0 Apr 16 2014 dolphin-debuginfo-4.7.4-4.22.6.i586.rpm -T 2 gerb users 0 Apr 16 2014 freetds-doc-0.91-2.5.1.i586.rpm -T 2 gerb users 0 Apr 16 2014 kanagram-debuginfo-4.7.4-2.10.2.i586.rpm -T 2 gerb users 0 Apr 16 2014 kdebase4-runtime-4.7.4-3.17.7.i586.rpm -T 2 gerb users 0 Apr 16 2014 kdebindings-smokegen- debuginfo-4.7.4-2.9.1.i586.rpm -T 2 gerb users 0 Apr 16 2014 kdesdk4-strigi- debuginfo-4.7.4-3.12.5.i586.rpm -T 2 gerb users 0 Apr 16 2014 kradio-4.0.2-9.9.7.i586.rpm -T 2 gerb users 0 Apr 16 2014 kremotecontrol-4.7.4-2.12.9.i586.rpm -T 2 gerb users 0 Apr 16 2014 kreversi-debuginfo-4.7.4-3.12.7.i586.rpm -T 2 gerb users 0 Apr 16 2014 krfb-4.7.4-2.13.6.i586.rpm -T 2 gerb users 0 Apr 16 2014 krusader-doc-2.0.0-23.9.7.i586.rpm -T 2 gerb users 0 Apr 16 2014 libalkimia-devel-4.3.1-2.5.1.i586.rpm -T 2 gerb users 0 Apr 16 2014 libdmtx0-0.7.4-2.1.i586.rpm -T 2 gerb users 0 Apr 16 2014 libdmtx0-debuginfo-0.7.4-2.1.i586.rpm -T 2 gerb users 0 Apr 16 2014 libkdegames4- debuginfo-4.7.4-3.12.7.i586.rpm -T 2 gerb users 0 Apr 16 2014 libksane0-4.7.4-2.10.1.i586.rpm -T 2 gerb users 0 Apr 16 2014 libkvkontakte-debugsource-1.0.0-2.2.i586.rpm -T 2 gerb users 0 Apr 16 2014 libmediawiki- debugsource-2.5.0-4.6.1.i586.rpm -T 2 gerb users 0 Apr 16 2014 libsmokeqt-4.7.4-2.10.2.i586.rpm -T 2 gerb users 0 Apr 16 2014 NetworkManager-vpnc- kde4-0.9.1git20111027-1.11.5.i586.rpm -T 2 gerb users 0 Apr 16 2014 qtcurve-kde4-1.8.8-3.6.2.i586.rpm -T 2 gerb users 0 Apr 16 2014 QtZeitgeist-devel-0.7.0-7.4.2.i586.rpm -T 2 gerb users 0 Apr 16 2014 umbrello-4.7.4-3.12.5.i586.rpm so that might be the reason of error. How can I fix this ? Thanks, A. On Monday 23 February 2015 21:06:58 Ravishankar N wrote: Just noticed that your `gluster volume status` shows that rebalancewas triggered. Maybe DHT developers can help out. I see a similar bug[1]has been fixed some time back.FWIW, can you check if " /linux/suse/12.1/KDE4.7.4/i586" on all 6 bricks is indeed empty? On 02/23/2015 08:15 PM, Alessandro Ipe wrote: Hi, Gluster version is 3.5.3-1. /var/log/gluster.log (client log) givesduring the rm -rf the following logs: [2015-02-23 14:42:50.180091] W [client-rpc-fops.c:696:client3_3_rmdir_cbk] 0- md1-client-2:remote operation failed: Directory not empty [2015-02-23 14:42:50.180134] W [client-rpc-fops.c:696:client3_3_rmdir_cbk] 0- md1-client-3:remote operation failed: Directory not empty [2015-02-23 14:42:50.180740] W [client-rpc-fops.c:322:client3_3_mkdir_cbk] 0- md1-client-5:remote operation failed: File exists. Path: /linux/suse/12.1/KDE4.7.4/i586 [2015-02-23 14:42:50.180772] W [client-rpc-fops.c:322:client3_3_mkdir_cbk] 0- md1-client-4:remote operation failed: File exists. Path: /linux/suse/12.1/KDE4.7.4/i586 [2015-02-23 14:42:50.181129] W [client-rpc-fops.c:322:client3_3_mkdir_cbk] 0- md1-client-3:remote operation failed: File exists. Path: /linux/suse/12.1/KDE4.7.4/i586 [2015-02-23 14:42:50.181160] W [client-rpc-fops.c:322:client3_3_mkdir_cbk] 0- md1-client-2:remote operation failed: File exists. Path: /linux/suse/12.1/KDE4.7.4/i586 [2015-02-23 14:42:50.319213] W [client-rpc-fops.c:696:client3_3_rmdir_cbk] 0- md1-client-3:remote operation failed: Directory not empty [2015-02-23 14:42:50.319762] W [client-rpc-fops.c:696:client3_3_rmdir_cbk] 0- md1-client-2:remote operation failed: Directory not empty [2015-02-23 14:42:50.320501] W [client-rpc-fops.c:322:client3_3_mkdir_cbk] 0- md1-client-0:remote operation failed: File exists. Path: /linux/suse/12.1/src- oss/suse/src [2015-02-23 14:42:50.320552] W [client-rpc-fops.c:322:client3_3_mkdir_cbk] 0- md1-client-1:remote operation failed: File exists. Path: /linux/suse/12.1/src- oss/suse/src [2015-02-23 14:42:50.320842] W [client-rpc-fops.c:322:client3_3_mkdir_cbk] 0- md1-client-2:remote operation fa
Re: [Gluster-users] rm -rf some_dir results in "Directory not empty"
Hi, Gluster version is 3.5.3-1. /var/log/gluster.log (client log) gives during the rm -rf the following logs: [2015-02-23 14:42:50.180091] W [client-rpc-fops.c:696:client3_3_rmdir_cbk] 0-md1- client-2: remote operation failed: Directory not empty [2015-02-23 14:42:50.180134] W [client-rpc-fops.c:696:client3_3_rmdir_cbk] 0-md1- client-3: remote operation failed: Directory not empty [2015-02-23 14:42:50.180740] W [client-rpc-fops.c:322:client3_3_mkdir_cbk] 0-md1- client-5: remote operation failed: File exists. Path: /linux/suse/12.1/KDE4.7.4/i586 [2015-02-23 14:42:50.180772] W [client-rpc-fops.c:322:client3_3_mkdir_cbk] 0-md1- client-4: remote operation failed: File exists. Path: /linux/suse/12.1/KDE4.7.4/i586 [2015-02-23 14:42:50.181129] W [client-rpc-fops.c:322:client3_3_mkdir_cbk] 0-md1- client-3: remote operation failed: File exists. Path: /linux/suse/12.1/KDE4.7.4/i586 [2015-02-23 14:42:50.181160] W [client-rpc-fops.c:322:client3_3_mkdir_cbk] 0-md1- client-2: remote operation failed: File exists. Path: /linux/suse/12.1/KDE4.7.4/i586 [2015-02-23 14:42:50.319213] W [client-rpc-fops.c:696:client3_3_rmdir_cbk] 0-md1- client-3: remote operation failed: Directory not empty [2015-02-23 14:42:50.319762] W [client-rpc-fops.c:696:client3_3_rmdir_cbk] 0-md1- client-2: remote operation failed: Directory not empty [2015-02-23 14:42:50.320501] W [client-rpc-fops.c:322:client3_3_mkdir_cbk] 0-md1- client-0: remote operation failed: File exists. Path: /linux/suse/12.1/src-oss/suse/src [2015-02-23 14:42:50.320552] W [client-rpc-fops.c:322:client3_3_mkdir_cbk] 0-md1- client-1: remote operation failed: File exists. Path: /linux/suse/12.1/src-oss/suse/src [2015-02-23 14:42:50.320842] W [client-rpc-fops.c:322:client3_3_mkdir_cbk] 0-md1- client-2: remote operation failed: File exists. Path: /linux/suse/12.1/src-oss/suse/src [2015-02-23 14:42:50.320884] W [client-rpc-fops.c:322:client3_3_mkdir_cbk] 0-md1- client-3: remote operation failed: File exists. Path: /linux/suse/12.1/src-oss/suse/src [2015-02-23 14:42:50.438982] W [client-rpc-fops.c:696:client3_3_rmdir_cbk] 0-md1- client-3: remote operation failed: Directory not empty [2015-02-23 14:42:50.439347] W [client-rpc-fops.c:696:client3_3_rmdir_cbk] 0-md1- client-2: remote operation failed: Directory not empty [2015-02-23 14:42:50.440235] W [client-rpc-fops.c:322:client3_3_mkdir_cbk] 0-md1- client-0: remote operation failed: File exists. Path: /linux/suse/12.1/oss/suse/noarch [2015-02-23 14:42:50.440344] W [client-rpc-fops.c:322:client3_3_mkdir_cbk] 0-md1- client-1: remote operation failed: File exists. Path: /linux/suse/12.1/oss/suse/noarch [2015-02-23 14:42:50.440603] W [client-rpc-fops.c:322:client3_3_mkdir_cbk] 0-md1- client-2: remote operation failed: File exists. Path: /linux/suse/12.1/oss/suse/noarch [2015-02-23 14:42:50.440665] W [client-rpc-fops.c:322:client3_3_mkdir_cbk] 0-md1- client-3: remote operation failed: File exists. Path: /linux/suse/12.1/oss/suse/noarch [2015-02-23 14:42:50.680827] W [client-rpc-fops.c:696:client3_3_rmdir_cbk] 0-md1- client-2: remote operation failed: Directory not empty [2015-02-23 14:42:50.681721] W [client-rpc-fops.c:696:client3_3_rmdir_cbk] 0-md1- client-3: remote operation failed: Directory not empty [2015-02-23 14:42:50.682482] W [client-rpc-fops.c:322:client3_3_mkdir_cbk] 0-md1- client-3: remote operation failed: File exists. Path: /linux/suse/12.1/oss/suse/i586 [2015-02-23 14:42:50.682517] W [client-rpc-fops.c:322:client3_3_mkdir_cbk] 0-md1- client-2: remote operation failed: File exists. Path: /linux/suse/12.1/oss/suse/i586 Thanks, A. On Monday 23 February 2015 20:06:17 Ravishankar N wrote: On 02/23/2015 07:04 PM, Alessandro Ipe wrote: Hi Ravi, gluster volume status md1 returns Status of volume: md1 Gluster process Port Online Pid -- Brick tsunami1:/data/glusterfs/md1/brick149157 Y 2260 Brick tsunami2:/data/glusterfs/md1/brick149152 Y 2320 Brick tsunami3:/data/glusterfs/md1/brick149156 Y 20715 Brick tsunami4:/data/glusterfs/md1/brick149156 Y 10544 Brick tsunami5:/data/glusterfs/md1/brick149152 Y 12588 Brick tsunami6:/data/glusterfs/md1/brick149152 Y 12242 Self-heal Daemon on localhost N/A Y 2336 Self-heal Daemon on tsunami2 N/A Y 2359 Self-heal Daemon on tsunami5 N/A Y 27619 Self-heal Daemon on tsunami4 N/A Y 12318 Self-heal Daemon on tsunami3 N/A Y 19118 Self-heal Daemon on tsunami6 N/A Y 27650 Task Status of Volume md1 -- Task : Rebalance ID : 9dfee1a2-49ac-4766-bdb6-00de5e5883f6 Status : completed so it seems that all brick server are up. gluster volume heal md1 info returns
Re: [Gluster-users] rm -rf some_dir results in "Directory not empty"
Hi Ravi, gluster volume status md1 returns Status of volume: md1 Gluster process PortOnline Pid -- Brick tsunami1:/data/glusterfs/md1/brick1 49157 Y 2260 Brick tsunami2:/data/glusterfs/md1/brick1 49152 Y 2320 Brick tsunami3:/data/glusterfs/md1/brick1 49156 Y 20715 Brick tsunami4:/data/glusterfs/md1/brick1 49156 Y 10544 Brick tsunami5:/data/glusterfs/md1/brick1 49152 Y 12588 Brick tsunami6:/data/glusterfs/md1/brick1 49152 Y 12242 Self-heal Daemon on localhost N/A Y 2336 Self-heal Daemon on tsunami2N/A Y 2359 Self-heal Daemon on tsunami5N/A Y 27619 Self-heal Daemon on tsunami4N/A Y 12318 Self-heal Daemon on tsunami3N/A Y 19118 Self-heal Daemon on tsunami6N/A Y 27650 Task Status of Volume md1 -- Task : Rebalance ID : 9dfee1a2-49ac-4766-bdb6-00de5e5883f6 Status : completed so it seems that all brick server are up. gluster volume heal md1 info returns Brick tsunami1.oma.be:/data/glusterfs/md1/brick1/ Number of entries: 0 Brick tsunami2.oma.be:/data/glusterfs/md1/brick1/ Number of entries: 0 Brick tsunami3.oma.be:/data/glusterfs/md1/brick1/ Number of entries: 0 Brick tsunami4.oma.be:/data/glusterfs/md1/brick1/ Number of entries: 0 Brick tsunami5.oma.be:/data/glusterfs/md1/brick1/ Number of entries: 0 Brick tsunami6.oma.be:/data/glusterfs/md1/brick1/ Number of entries: 0 Should I run "gluster volume heal md1 full" ? Thanks, A. On Monday 23 February 2015 18:12:43 Ravishankar N wrote: On 02/23/2015 05:42 PM, Alessandro Ipe wrote: Hi, We have a "md1" volume under gluster 3.5.3over 6 servers configured as distributed and replicated. Whentrying on a client, thourgh fuse mount (which turns out to bealso a brick server) to delete (as root) recursively a directory with "rm -rf /home/.md1/linux/suse/12.1", I get the errormessages rm: cannot remove‘/home/.md1/linux/suse/12.1/KDE4.7.4/i586’: Directory not empty rm: cannot remove‘/home/.md1/linux/suse/12.1/src-oss/suse/src’: Directory not empty rm: cannot remove‘/home/.md1/linux/suse/12.1/oss/suse/noarch’: Directory not empty rm: cannot remove‘/home/.md1/linux/suse/12.1/oss/suse/i586’: Directory not empty (the same occurs as unprivileged user butwith "Permission denied".) while a "ls -Ral /home/.md1/linux/suse/12.1"gives me /home/.md1/linux/suse/12.1: total 0 drwxrwxrwx 5 gerb users 151 Feb 20 16:22 . drwxr-xr-x 6 gerb users 245 Feb 23 12:55 .. drwxrwxrwx 3 gerb users 95 Feb 23 13:03KDE4.7.4 drwxrwxrwx 3 gerb users 311 Feb 20 16:57 oss drwxrwxrwx 3 gerb users 86 Feb 20 16:20src-oss /home/.md1/linux/suse/12.1/KDE4.7.4: total 28 drwxrwxrwx 3 gerb users 95 Feb 23 13:03 . drwxrwxrwx 5 gerb users 151 Feb 20 16:22 .. d- 2 root root 61452 Feb 23 13:03i586 /home/.md1/linux/suse/12.1/KDE4.7.4/i586: total 28 d- 2 root root 61452 Feb 23 13:03 . drwxrwxrwx 3 gerb users 95 Feb 23 13:03 .. /home/.md1/linux/suse/12.1/oss: total 0 drwxrwxrwx 3 gerb users 311 Feb 20 16:57 . drwxrwxrwx 5 gerb users 151 Feb 20 16:22 .. drwxrwxrwx 4 gerb users 90 Feb 23 13:03 suse /home/.md1/linux/suse/12.1/oss/suse: total 536 drwxrwxrwx 4 gerb users 90 Feb 23 13:03 . drwxrwxrwx 3 gerb users 311 Feb 20 16:57 .. d- 2 root root 368652 Feb 23 13:03i586 d- 2 root root 196620 Feb 23 13:03noarch /home/.md1/linux/suse/12.1/oss/suse/i586: total 360 d- 2 root root 368652 Feb 23 13:03 . drwxrwxrwx 4 gerb users 90 Feb 23 13:03 .. /home/.md1/linux/suse/12.1/oss/suse/noarch: ___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users
[Gluster-users] rm -rf some_dir results in "Directory not empty"
Hi, We have a "md1" volume under gluster 3.5.3 over 6 servers configured as distributed and replicated. When trying on a client, thourgh fuse mount (which turns out to be also a brick server) to delete (as root) recursively a directory with "rm -rf /home/.md1/linux/suse/12.1", I get the error messages rm: cannot remove ‘/home/.md1/linux/suse/12.1/KDE4.7.4/i586’: Directory not empty rm: cannot remove ‘/home/.md1/linux/suse/12.1/src-oss/suse/src’: Directory not empty rm: cannot remove ‘/home/.md1/linux/suse/12.1/oss/suse/noarch’: Directory not empty rm: cannot remove ‘/home/.md1/linux/suse/12.1/oss/suse/i586’: Directory not empty (the same occurs as unprivileged user but with "Permission denied".) while a "ls -Ral /home/.md1/linux/suse/12.1" gives me /home/.md1/linux/suse/12.1: total 0 drwxrwxrwx 5 gerb users 151 Feb 20 16:22 . drwxr-xr-x 6 gerb users 245 Feb 23 12:55 .. drwxrwxrwx 3 gerb users 95 Feb 23 13:03 KDE4.7.4 drwxrwxrwx 3 gerb users 311 Feb 20 16:57 oss drwxrwxrwx 3 gerb users 86 Feb 20 16:20 src-oss /home/.md1/linux/suse/12.1/KDE4.7.4: total 28 drwxrwxrwx 3 gerb users95 Feb 23 13:03 . drwxrwxrwx 5 gerb users 151 Feb 20 16:22 .. d- 2 root root 61452 Feb 23 13:03 i586 /home/.md1/linux/suse/12.1/KDE4.7.4/i586: total 28 d- 2 root root 61452 Feb 23 13:03 . drwxrwxrwx 3 gerb users95 Feb 23 13:03 .. /home/.md1/linux/suse/12.1/oss: total 0 drwxrwxrwx 3 gerb users 311 Feb 20 16:57 . drwxrwxrwx 5 gerb users 151 Feb 20 16:22 .. drwxrwxrwx 4 gerb users 90 Feb 23 13:03 suse /home/.md1/linux/suse/12.1/oss/suse: total 536 drwxrwxrwx 4 gerb users 90 Feb 23 13:03 . drwxrwxrwx 3 gerb users311 Feb 20 16:57 .. d- 2 root root 368652 Feb 23 13:03 i586 d- 2 root root 196620 Feb 23 13:03 noarch /home/.md1/linux/suse/12.1/oss/suse/i586: total 360 d- 2 root root 368652 Feb 23 13:03 . drwxrwxrwx 4 gerb users 90 Feb 23 13:03 .. /home/.md1/linux/suse/12.1/oss/suse/noarch: total 176 d- 2 root root 196620 Feb 23 13:03 . drwxrwxrwx 4 gerb users 90 Feb 23 13:03 .. /home/.md1/linux/suse/12.1/src-oss: total 0 drwxrwxrwx 3 gerb users 86 Feb 20 16:20 . drwxrwxrwx 5 gerb users 151 Feb 20 16:22 .. drwxrwxrwx 3 gerb users 48 Feb 23 13:03 suse /home/.md1/linux/suse/12.1/src-oss/suse: total 220 drwxrwxrwx 3 gerb users 48 Feb 23 13:03 . drwxrwxrwx 3 gerb users 86 Feb 20 16:20 .. d- 2 root root 225292 Feb 23 13:03 src /home/.md1/linux/suse/12.1/src-oss/suse/src: total 220 d- 2 root root 225292 Feb 23 13:03 . drwxrwxrwx 3 gerb users 48 Feb 23 13:03 .. Is there a cure such as manually forcing a healing on that directory ? Many thanks, Alessandro. gluster volume info md1 outputs: Volume Name: md1 Type: Distributed-Replicate Volume ID: 6da4b915-1def-4df4-a41c-2f3300ebf16b Status: Started Number of Bricks: 3 x 2 = 6 Transport-type: tcp Bricks: Brick1: tsunami1:/data/glusterfs/md1/brick1 Brick2: tsunami2:/data/glusterfs/md1/brick1 Brick3: tsunami3:/data/glusterfs/md1/brick1 Brick4: tsunami4:/data/glusterfs/md1/brick1 Brick5: tsunami5:/data/glusterfs/md1/brick1 Brick6: tsunami6:/data/glusterfs/md1/brick1 Options Reconfigured: performance.write-behind: on performance.write-behind-window-size: 4MB performance.flush-behind: off performance.io-thread-count: 64 performance.cache-size: 512MB nfs.disable: on features.quota: off cluster.read-hash-mode: 2 server.allow-insecure: on cluster.lookup-unhashed: off ___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] volume replace-brick start is not working
Hi, I suspect that the first issue could come from the fact that I made a typo in /etc/hosts on one of the peers associating by mistake 2 IP addresses to the same name (tsunami5 instead of tsunami6)... Apologize for my stupidity and thanks for the help. A. On Wednesday 07 January 2015 18:06:44 Atin Mukherjee wrote: > On 01/07/2015 05:09 PM, Alessandro Ipe wrote: > > Hi, > > > > > > The corresponding logs in > > /var/log/glusterfs/etc-glusterfs-glusterd.vol.log (OS is openSuSE 12.3) > > [2015-01-06 12:32:14.596601] I > > [glusterd-replace-brick.c:98:__glusterd_handle_replace_brick] > > 0-management: Received replace brick req [2015-01-06 12:32:14.596633] I > > [glusterd-replace-brick.c:153:__glusterd_handle_replace_brick] > > 0-management: Received replace brick status request [2015-01-06 > > 12:32:14.596991] E [glusterd-rpc-ops.c:602:__glusterd_cluster_lock_cbk] > > 0-management: Received lock RJT from uuid: > > b1aa773f-f9f4-491c-9493-00b23d5ee380 > Here is the problem, lock request is rejected by peer > b1aa773f-f9f4-491c-9493-00b23d5ee380, I feel this is one of the peer > which you added as part of your use case. Was the peer probe successful? > Can you please provide the peer status output? > > ~Atin > > > [2015-01-06 12:32:14.598100] E > > [glusterd-rpc-ops.c:675:__glusterd_cluster_unlock_cbk] 0-management: > > Received unlock RJT from uuid: b1aa773f-f9f4-491c-9493-00b23d5ee380 > > > > However, several reboots managed to cancel the replace-brick command. > > Moreover, I read that this command could still have issues (obviously) in > > 3.5, so I managed to find a workaround for it. > > > > > > A. > > > > On Wednesday 07 January 2015 15:41:07 Atin Mukherjee wrote: > >> On 01/06/2015 06:05 PM, Alessandro Ipe wrote: > >>> Hi, > >>> > >>> > >>> We have set up a "md1" volume using gluster 3.4.2 over 4 servers > >>> configured as distributed and replicated. Then, we upgraded smoohtly to > >>> 3.5.3, since it was mentionned that the command "volume replace-brick" > >>> is > >>> broken on 3.4.x. We added two more peers (after having read that the > >>> quota feature neede to be turn off for this command to succeed...). > >>> > >>> We have then issued an > >>> gluster volume replace-brick md1 > >>> 193.190.249.113:/data/glusterfs/md1/brick1 > >>> 193.190.249.122:/data/glusterfs/md1/brick1 start force Then I did an > >>> gluster volume replace-brick md1 > >>> 193.190.249.113:/data/glusterfs/md1/brick1 > >>> 193.190.249.122:/data/glusterfs/md1/brick1 abort > >>> because nothing was happening. > >>> > >>> However wheh trying to monitor the previous command by > >>> gluster volume replace-brick md1 > >>> 193.190.249.113:/data/glusterfs/md1/brick1 > >>> 193.190.249.122:/data/glusterfs/md1/brick1 status it outputs > >>> volume replace-brick: failed: Another transaction could be in progress. > >>> Please try again after sometime. and the following lines are written in > >>> cli.log > >>> [2015-01-06 12:32:14.595387] I [socket.c:3645:socket_init] 0-glusterfs: > >>> SSL support is NOT enabled [2015-01-06 12:32:14.595434] I > >>> [socket.c:3660:socket_init] 0-glusterfs: using system polling thread > >>> [2015-01-06 12:32:14.595590] I [socket.c:3645:socket_init] 0-glusterfs: > >>> SSL support is NOT enabled [2015-01-06 12:32:14.595606] I > >>> [socket.c:3660:socket_init] 0-glusterfs: using system polling thread > >>> [2015-01-06 12:32:14.596013] I > >>> [cli-cmd-volume.c:1706:cli_check_gsync_present] 0-: geo-replication not > >>> installed [2015-01-06 12:32:14.602165] I > >>> [cli-rpc-ops.c:2162:gf_cli_replace_brick_cbk] 0-cli: Received resp to > >>> replace brick [2015-01-06 12:32:14.602248] I [input.c:36:cli_batch] 0-: > >>> Exiting with: -1 > >>> > >>> What am I doing wrong ? > >> > >> Can you please share the glusterd log? > >> > >> ~Atin > >> > >>> Many thanks, > >>> > >>> > >>> Alessandro. > >>> > >>> > >>> gluster volume info md1 outputs: > >>> Volume Name: md1 > >>> Type: Distributed-Replicate > >>> Volume ID: 6da4b915-1def-4df4-a41c-2f3300ebf16b > >>> Status: Started > >>> Number of Bricks: 2 x 2 = 4 > >>> Tran
Re: [Gluster-users] volume replace-brick start is not working
Indeed, I did a gluster peer probe tsunami5 which gave me "Peer Probe Sent (Connected)" After searching on the internet, I found some info telling that under 3.5(.3), "peer probe" was failing if quota was activated... A. On Wednesday 07 January 2015 18:06:44 Atin Mukherjee wrote: > On 01/07/2015 05:09 PM, Alessandro Ipe wrote: > > Hi, > > > > > > The corresponding logs in > > /var/log/glusterfs/etc-glusterfs-glusterd.vol.log (OS is openSuSE 12.3) > > [2015-01-06 12:32:14.596601] I > > [glusterd-replace-brick.c:98:__glusterd_handle_replace_brick] > > 0-management: Received replace brick req [2015-01-06 12:32:14.596633] I > > [glusterd-replace-brick.c:153:__glusterd_handle_replace_brick] > > 0-management: Received replace brick status request [2015-01-06 > > 12:32:14.596991] E [glusterd-rpc-ops.c:602:__glusterd_cluster_lock_cbk] > > 0-management: Received lock RJT from uuid: > > b1aa773f-f9f4-491c-9493-00b23d5ee380 > Here is the problem, lock request is rejected by peer > b1aa773f-f9f4-491c-9493-00b23d5ee380, I feel this is one of the peer > which you added as part of your use case. Was the peer probe successful? > Can you please provide the peer status output? > > ~Atin > > > [2015-01-06 12:32:14.598100] E > > [glusterd-rpc-ops.c:675:__glusterd_cluster_unlock_cbk] 0-management: > > Received unlock RJT from uuid: b1aa773f-f9f4-491c-9493-00b23d5ee380 > > > > However, several reboots managed to cancel the replace-brick command. > > Moreover, I read that this command could still have issues (obviously) in > > 3.5, so I managed to find a workaround for it. > > > > > > A. > > > > On Wednesday 07 January 2015 15:41:07 Atin Mukherjee wrote: > >> On 01/06/2015 06:05 PM, Alessandro Ipe wrote: > >>> Hi, > >>> > >>> > >>> We have set up a "md1" volume using gluster 3.4.2 over 4 servers > >>> configured as distributed and replicated. Then, we upgraded smoohtly to > >>> 3.5.3, since it was mentionned that the command "volume replace-brick" > >>> is > >>> broken on 3.4.x. We added two more peers (after having read that the > >>> quota feature neede to be turn off for this command to succeed...). > >>> > >>> We have then issued an > >>> gluster volume replace-brick md1 > >>> 193.190.249.113:/data/glusterfs/md1/brick1 > >>> 193.190.249.122:/data/glusterfs/md1/brick1 start force Then I did an > >>> gluster volume replace-brick md1 > >>> 193.190.249.113:/data/glusterfs/md1/brick1 > >>> 193.190.249.122:/data/glusterfs/md1/brick1 abort > >>> because nothing was happening. > >>> > >>> However wheh trying to monitor the previous command by > >>> gluster volume replace-brick md1 > >>> 193.190.249.113:/data/glusterfs/md1/brick1 > >>> 193.190.249.122:/data/glusterfs/md1/brick1 status it outputs > >>> volume replace-brick: failed: Another transaction could be in progress. > >>> Please try again after sometime. and the following lines are written in > >>> cli.log > >>> [2015-01-06 12:32:14.595387] I [socket.c:3645:socket_init] 0-glusterfs: > >>> SSL support is NOT enabled [2015-01-06 12:32:14.595434] I > >>> [socket.c:3660:socket_init] 0-glusterfs: using system polling thread > >>> [2015-01-06 12:32:14.595590] I [socket.c:3645:socket_init] 0-glusterfs: > >>> SSL support is NOT enabled [2015-01-06 12:32:14.595606] I > >>> [socket.c:3660:socket_init] 0-glusterfs: using system polling thread > >>> [2015-01-06 12:32:14.596013] I > >>> [cli-cmd-volume.c:1706:cli_check_gsync_present] 0-: geo-replication not > >>> installed [2015-01-06 12:32:14.602165] I > >>> [cli-rpc-ops.c:2162:gf_cli_replace_brick_cbk] 0-cli: Received resp to > >>> replace brick [2015-01-06 12:32:14.602248] I [input.c:36:cli_batch] 0-: > >>> Exiting with: -1 > >>> > >>> What am I doing wrong ? > >> > >> Can you please share the glusterd log? > >> > >> ~Atin > >> > >>> Many thanks, > >>> > >>> > >>> Alessandro. > >>> > >>> > >>> gluster volume info md1 outputs: > >>> Volume Name: md1 > >>> Type: Distributed-Replicate > >>> Volume ID: 6da4b915-1def-4df4-a41c-2f3300ebf16b > >>> Status: Started > >>> Number of Bricks: 2 x 2 = 4 > >>> Transport-type: tcp >
Re: [Gluster-users] volume replace-brick start is not working
Hi, The corresponding logs in /var/log/glusterfs/etc-glusterfs-glusterd.vol.log (OS is openSuSE 12.3) [2015-01-06 12:32:14.596601] I [glusterd-replace-brick.c:98:__glusterd_handle_replace_brick] 0-management: Received replace brick req [2015-01-06 12:32:14.596633] I [glusterd-replace-brick.c:153:__glusterd_handle_replace_brick] 0-management: Received replace brick status request [2015-01-06 12:32:14.596991] E [glusterd-rpc-ops.c:602:__glusterd_cluster_lock_cbk] 0-management: Received lock RJT from uuid: b1aa773f-f9f4-491c-9493-00b23d5ee380 [2015-01-06 12:32:14.598100] E [glusterd-rpc-ops.c:675:__glusterd_cluster_unlock_cbk] 0-management: Received unlock RJT from uuid: b1aa773f-f9f4-491c-9493-00b23d5ee380 However, several reboots managed to cancel the replace-brick command. Moreover, I read that this command could still have issues (obviously) in 3.5, so I managed to find a workaround for it. A. On Wednesday 07 January 2015 15:41:07 Atin Mukherjee wrote: > On 01/06/2015 06:05 PM, Alessandro Ipe wrote: > > Hi, > > > > > > We have set up a "md1" volume using gluster 3.4.2 over 4 servers > > configured as distributed and replicated. Then, we upgraded smoohtly to > > 3.5.3, since it was mentionned that the command "volume replace-brick" is > > broken on 3.4.x. We added two more peers (after having read that the > > quota feature neede to be turn off for this command to succeed...). > > > > We have then issued an > > gluster volume replace-brick md1 > > 193.190.249.113:/data/glusterfs/md1/brick1 > > 193.190.249.122:/data/glusterfs/md1/brick1 start force Then I did an > > gluster volume replace-brick md1 > > 193.190.249.113:/data/glusterfs/md1/brick1 > > 193.190.249.122:/data/glusterfs/md1/brick1 abort > > because nothing was happening. > > > > However wheh trying to monitor the previous command by > > gluster volume replace-brick md1 > > 193.190.249.113:/data/glusterfs/md1/brick1 > > 193.190.249.122:/data/glusterfs/md1/brick1 status it outputs > > volume replace-brick: failed: Another transaction could be in progress. > > Please try again after sometime. and the following lines are written in > > cli.log > > [2015-01-06 12:32:14.595387] I [socket.c:3645:socket_init] 0-glusterfs: > > SSL support is NOT enabled [2015-01-06 12:32:14.595434] I > > [socket.c:3660:socket_init] 0-glusterfs: using system polling thread > > [2015-01-06 12:32:14.595590] I [socket.c:3645:socket_init] 0-glusterfs: > > SSL support is NOT enabled [2015-01-06 12:32:14.595606] I > > [socket.c:3660:socket_init] 0-glusterfs: using system polling thread > > [2015-01-06 12:32:14.596013] I > > [cli-cmd-volume.c:1706:cli_check_gsync_present] 0-: geo-replication not > > installed [2015-01-06 12:32:14.602165] I > > [cli-rpc-ops.c:2162:gf_cli_replace_brick_cbk] 0-cli: Received resp to > > replace brick [2015-01-06 12:32:14.602248] I [input.c:36:cli_batch] 0-: > > Exiting with: -1 > > > > What am I doing wrong ? > > Can you please share the glusterd log? > > ~Atin > > > Many thanks, > > > > > > Alessandro. > > > > > > gluster volume info md1 outputs: > > Volume Name: md1 > > Type: Distributed-Replicate > > Volume ID: 6da4b915-1def-4df4-a41c-2f3300ebf16b > > Status: Started > > Number of Bricks: 2 x 2 = 4 > > Transport-type: tcp > > Bricks: > > Brick1: tsunami1:/data/glusterfs/md1/brick1 > > Brick2: tsunami2:/data/glusterfs/md1/brick1 > > Brick3: tsunami3:/data/glusterfs/md1/brick1 > > Brick4: tsunami4:/data/glusterfs/md1/brick1 > > Options Reconfigured: > > server.allow-insecure: on > > cluster.read-hash-mode: 2 > > features.quota: off > > nfs.disable: on > > performance.cache-size: 512MB > > performance.io-thread-count: 64 > > performance.flush-behind: off > > performance.write-behind-window-size: 4MB > > performance.write-behind: on > > > > > > > > > > ___ > > Gluster-users mailing list > > Gluster-users@gluster.org > > http://www.gluster.org/mailman/listinfo/gluster-users -- Dr. Ir. Alessandro Ipe Department of Observations Tel. +32 2 373 06 31 Remote Sensing from Space Fax. +32 2 374 67 88 Royal Meteorological Institute Avenue Circulaire 3Email: B-1180 BrusselsBelgium alessandro@meteo.be Web: http://gerb.oma.be ___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users
[Gluster-users] volume replace-brick start is not working
Hi, We have set up a "md1" volume using gluster 3.4.2 over 4 servers configured as distributed and replicated. Then, we upgraded smoohtly to 3.5.3, since it was mentionned that the command "volume replace-brick" is broken on 3.4.x. We added two more peers (after having read that the quota feature neede to be turn off for this command to succeed...). We have then issued an gluster volume replace-brick md1 193.190.249.113:/data/glusterfs/md1/brick1 193.190.249.122:/data/glusterfs/md1/brick1 start force Then I did an gluster volume replace-brick md1 193.190.249.113:/data/glusterfs/md1/brick1 193.190.249.122:/data/glusterfs/md1/brick1 abort because nothing was happening. However wheh trying to monitor the previous command by gluster volume replace-brick md1 193.190.249.113:/data/glusterfs/md1/brick1 193.190.249.122:/data/glusterfs/md1/brick1 status it outputs volume replace-brick: failed: Another transaction could be in progress. Please try again after sometime. and the following lines are written in cli.log [2015-01-06 12:32:14.595387] I [socket.c:3645:socket_init] 0-glusterfs: SSL support is NOT enabled [2015-01-06 12:32:14.595434] I [socket.c:3660:socket_init] 0-glusterfs: using system polling thread [2015-01-06 12:32:14.595590] I [socket.c:3645:socket_init] 0-glusterfs: SSL support is NOT enabled [2015-01-06 12:32:14.595606] I [socket.c:3660:socket_init] 0-glusterfs: using system polling thread [2015-01-06 12:32:14.596013] I [cli-cmd-volume.c:1706:cli_check_gsync_present] 0-: geo-replication not installed [2015-01-06 12:32:14.602165] I [cli-rpc-ops.c:2162:gf_cli_replace_brick_cbk] 0-cli: Received resp to replace brick [2015-01-06 12:32:14.602248] I [input.c:36:cli_batch] 0-: Exiting with: -1 What am I doing wrong ? Many thanks, Alessandro. gluster volume info md1 outputs: Volume Name: md1 Type: Distributed-Replicate Volume ID: 6da4b915-1def-4df4-a41c-2f3300ebf16b Status: Started Number of Bricks: 2 x 2 = 4 Transport-type: tcp Bricks: Brick1: tsunami1:/data/glusterfs/md1/brick1 Brick2: tsunami2:/data/glusterfs/md1/brick1 Brick3: tsunami3:/data/glusterfs/md1/brick1 Brick4: tsunami4:/data/glusterfs/md1/brick1 Options Reconfigured: server.allow-insecure: on cluster.read-hash-mode: 2 features.quota: off nfs.disable: on performance.cache-size: 512MB performance.io-thread-count: 64 performance.flush-behind: off performance.write-behind-window-size: 4MB performance.write-behind: on ___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users
[Gluster-users] Error message: getxattr failed on : user.virtfs.rdev (No data available)
Hi, We have set up a "home" volume using gluster 3.4.2 over 4 servers configured as distributed and replicated. On each server, 4 ext4 bricks are mounted with the following options: defaults,noatime,nodiratime This "home" volume is mounted using FUSE on a client server with the following options: defaults,_netdev,noatime,direct-io-mode=disable,backupvolfile-server=tsunami2,log-level=ERROR,log-file=/var/log/gluster.log This client (host) also runs a virtual machine (qemu-kvm guest) with a "Filesystem Passthrough" from the host using "Mapped" mode to the guest. The filesystem exported from the host is located on the gluster "home" volume. This filesystem is mounted inside the guest using the fstab line: home /home9p trans=virtio,version=9p2000.L,rw,noatime0 0 On my guest, userid mapping is working correctly and I can copy files on /home. However, doing so results in my bricks' logs (/var/log/glusterfs/bricks on the 4 gluster servers) filling with error messages similar to: E [posix.c:2668:posix_getxattr] 0-home-posix: getxattr failed on /data/glusterfs/home/brick1/hail/mailman/mailman: user.virtfs.rdev (No data available) for all files being copied on /home. For the moment, a quick fix to avoid filling my system partition holding the logs was to set the volume's parameter "diagnostics.brick-log-level" to CRITICAL, but this will prevent me to see other important error messages which could occur. Is there a cleaner way (use of ACL ?) to prevent these error messages filling my logs and my system partition, please ? Many thanks, Alessandro. gluster volume info home outputs: Volume Name: home Type: Distributed-Replicate Volume ID: 501741ed-4146-4022-af0b-41f5b1297766 Status: Started Number of Bricks: 8 x 2 = 16 Transport-type: tcp Bricks: Brick1: tsunami1:/data/glusterfs/home/brick1 Brick2: tsunami2:/data/glusterfs/home/brick1 Brick3: tsunami1:/data/glusterfs/home/brick2 Brick4: tsunami2:/data/glusterfs/home/brick2 Brick5: tsunami1:/data/glusterfs/home/brick3 Brick6: tsunami2:/data/glusterfs/home/brick3 Brick7: tsunami1:/data/glusterfs/home/brick4 Brick8: tsunami2:/data/glusterfs/home/brick4 Brick9: tsunami3:/data/glusterfs/home/brick1 Brick10: tsunami4:/data/glusterfs/home/brick1 Brick11: tsunami3:/data/glusterfs/home/brick2 Brick12: tsunami4:/data/glusterfs/home/brick2 Brick13: tsunami3:/data/glusterfs/home/brick3 Brick14: tsunami4:/data/glusterfs/home/brick3 Brick15: tsunami3:/data/glusterfs/home/brick4 Brick16: tsunami4:/data/glusterfs/home/brick4 Options Reconfigured: diagnostics.brick-log-level: CRITICAL cluster.read-hash-mode: 2 features.limit-usage: features.quota: on performance.cache-size: 512MB performance.io-thread-count: 64 performance.flush-behind: off performance.write-behind-window-size: 4MB performance.write-behind: on nfs.disable: on ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users