[Gluster-users] Structure needs cleaning on some files
Hi All, When reading some files we get this error: md5sum: /path/to/file.xml: Structure needs cleaning in /var/log/glusterfs/mnt-sharedfs.log we see these errors: [2013-12-10 08:07:32.256910] W [client-rpc-fops.c:526:client3_3_stat_cbk] 1-testvolume-client-0: remote operation failed: No such file or directory [2013-12-10 08:07:32.257436] W [client-rpc-fops.c:526:client3_3_stat_cbk] 1-testvolume-client-1: remote operation failed: No such file or directory [2013-12-10 08:07:32.259356] W [fuse-bridge.c:705:fuse_attr_cbk] 0-glusterfs-fuse: 8230: STAT() /path/to/file.xml = -1 (Structure needs cleaning) We are using gluster 3.4.1-3 on CentOS6. Our servers are 64-bit, our clients 32-bit (we are already using --enable-ino32 on the mountpoint) This is my gluster configuration: Volume Name: testvolume Type: Replicate Volume ID: ca9c2f87-5d5b-4439-ac32-b7c138916df7 Status: Started Number of Bricks: 1 x 2 = 2 Transport-type: tcp Bricks: Brick1: SRV-1:/gluster/brick1 Brick2: SRV-2:/gluster/brick2 Options Reconfigured: performance.force-readdirp: on performance.stat-prefetch: off network.ping-timeout: 5 And this is how the applications work: We have 2 client nodes who both have a fuse.glusterfs mountpoint. On 1 client node we have a application which writes files. On the other client node we have a application which reads these files. On the node where the files are written we don't see any problem, and can read that file without problems. On the other node we have problems (error messages above) reading that file. The problem occurs when we perform a md5sum on the exact file, when perform a md5sum on all files in that directory there is no problem. How can we solve this problem as this is annoying. The problem occurs after some time (can be days), an umount and mount of the mountpoint solves it for some days. Once it occurs (and we don't remount) it occurs every time. I hope someone can help me with this problems. Thanks, Johan Huysmans ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] replace-brick failing - transport.address-family not specified
Am 10.12.2013 06:39:47, schrieb Vijay Bellur: On 12/08/2013 07:06 PM, Nguyen Viet Cuong wrote: Thanks for sharing. Btw, I do believe that GlusterFS 3.2.x is much more stable than 3.4.x in production. This is quite contrary to what we have seen in the community. From a development perspective too, we feel much better about 3.4.1. Are there specific instances that worked well with 3.2.x which does not work fine for you in 3.4.x? 987555 - is that fixed in 3.5?Or did it even make it into 3.4.2couldn't find a note on that.Show stopper for moving from 3.2.x to anywhere for me! cheersb Cheers, Vijay ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users -- Bernhard Glomm IT Administration Phone: +49 (30) 86880 134 Fax: +49 (30) 86880 100 Skype: bernhard.glomm.ecologic Ecologic Institut gemeinnützige GmbH | Pfalzburger Str. 43/44 | 10717 Berlin | Germany GF: R. Andreas Kraemer | AG: Charlottenburg HRB 57947 | USt/VAT-IdNr.: DE811963464 Ecologic™ is a Trade Mark (TM) of Ecologic Institut gemeinnützige GmbH ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Structure needs cleaning on some files
I could reproduce this problem with while my mount point is running in debug mode. logfile is attached. gr. Johan Huysmans On 10-12-13 09:30, Johan Huysmans wrote: Hi All, When reading some files we get this error: md5sum: /path/to/file.xml: Structure needs cleaning in /var/log/glusterfs/mnt-sharedfs.log we see these errors: [2013-12-10 08:07:32.256910] W [client-rpc-fops.c:526:client3_3_stat_cbk] 1-testvolume-client-0: remote operation failed: No such file or directory [2013-12-10 08:07:32.257436] W [client-rpc-fops.c:526:client3_3_stat_cbk] 1-testvolume-client-1: remote operation failed: No such file or directory [2013-12-10 08:07:32.259356] W [fuse-bridge.c:705:fuse_attr_cbk] 0-glusterfs-fuse: 8230: STAT() /path/to/file.xml = -1 (Structure needs cleaning) We are using gluster 3.4.1-3 on CentOS6. Our servers are 64-bit, our clients 32-bit (we are already using --enable-ino32 on the mountpoint) This is my gluster configuration: Volume Name: testvolume Type: Replicate Volume ID: ca9c2f87-5d5b-4439-ac32-b7c138916df7 Status: Started Number of Bricks: 1 x 2 = 2 Transport-type: tcp Bricks: Brick1: SRV-1:/gluster/brick1 Brick2: SRV-2:/gluster/brick2 Options Reconfigured: performance.force-readdirp: on performance.stat-prefetch: off network.ping-timeout: 5 And this is how the applications work: We have 2 client nodes who both have a fuse.glusterfs mountpoint. On 1 client node we have a application which writes files. On the other client node we have a application which reads these files. On the node where the files are written we don't see any problem, and can read that file without problems. On the other node we have problems (error messages above) reading that file. The problem occurs when we perform a md5sum on the exact file, when perform a md5sum on all files in that directory there is no problem. How can we solve this problem as this is annoying. The problem occurs after some time (can be days), an umount and mount of the mountpoint solves it for some days. Once it occurs (and we don't remount) it occurs every time. I hope someone can help me with this problems. Thanks, Johan Huysmans ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users [2013-12-10 08:37:58.532425] D [afr-self-heal-common.c:138:afr_sh_print_pending_matrix] 0-testvolume-replicate-0: pending_matrix: [ 0 0 ] [2013-12-10 08:37:58.532493] D [afr-self-heal-common.c:138:afr_sh_print_pending_matrix] 0-testvolume-replicate-0: pending_matrix: [ 0 0 ] [2013-12-10 08:37:58.532513] D [afr-self-heal-common.c:887:afr_mark_sources] 0-testvolume-replicate-0: Number of sources: 0 [2013-12-10 08:37:58.532530] D [afr-self-heal-data.c:825:afr_lookup_select_read_child_by_txn_type] 0-testvolume-replicate-0: returning read_child: 1 [2013-12-10 08:37:58.532546] D [afr-common.c:1380:afr_lookup_select_read_child] 0-testvolume-replicate-0: Source selected as 1 for / [2013-12-10 08:37:58.532564] D [afr-common.c:1117:afr_lookup_build_response_params] 0-testvolume-replicate-0: Building lookup response from 1 [2013-12-10 08:37:58.533041] D [afr-common.c:131:afr_lookup_xattr_req_prepare] 0-testvolume-replicate-0: /path: failed to get the gfid from dict [2013-12-10 08:37:58.540362] D [afr-self-heal-common.c:138:afr_sh_print_pending_matrix] 0-testvolume-replicate-0: pending_matrix: [ 0 0 ] [2013-12-10 08:37:58.540395] D [afr-self-heal-common.c:138:afr_sh_print_pending_matrix] 0-testvolume-replicate-0: pending_matrix: [ 0 0 ] [2013-12-10 08:37:58.540412] D [afr-self-heal-common.c:887:afr_mark_sources] 0-testvolume-replicate-0: Number of sources: 0 [2013-12-10 08:37:58.540428] D [afr-self-heal-data.c:825:afr_lookup_select_read_child_by_txn_type] 0-testvolume-replicate-0: returning read_child: 0 [2013-12-10 08:37:58.540443] D [afr-common.c:1380:afr_lookup_select_read_child] 0-testvolume-replicate-0: Source selected as 0 for /path [2013-12-10 08:37:58.540460] D [afr-common.c:1117:afr_lookup_build_response_params] 0-testvolume-replicate-0: Building lookup response from 0 [2013-12-10 08:37:58.540804] D [afr-common.c:131:afr_lookup_xattr_req_prepare] 0-testvolume-replicate-0: /path/to: failed to get the gfid from dict [2013-12-10 08:37:58.541377] D [afr-self-heal-common.c:138:afr_sh_print_pending_matrix] 0-testvolume-replicate-0: pending_matrix: [ 0 0 ] [2013-12-10 08:37:58.541408] D [afr-self-heal-common.c:138:afr_sh_print_pending_matrix] 0-testvolume-replicate-0: pending_matrix: [ 0 0 ] [2013-12-10 08:37:58.541425] D [afr-self-heal-common.c:887:afr_mark_sources] 0-testvolume-replicate-0: Number of sources: 0 [2013-12-10 08:37:58.541440] D [afr-self-heal-data.c:825:afr_lookup_select_read_child_by_txn_type] 0-testvolume-replicate-0: returning read_child: 1 [2013-12-10 08:37:58.541455] D [afr-common.c:1380:afr_lookup_select_read_child] 0-testvolume-replicate-0: Source selected as 1 for /path/to [2013-12-10 08:37:58.541473] D
Re: [Gluster-users] Gluster infrastructure question
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Hi guys, thanks for all these reports. Well, I think I'll change my Raid level to 6 and let the Raid controller build and rebuild all Raid members and replicate again with glusterFS. I get more capacity but I need to check if the write throughput acceptable. I think, I can't take advantage of using glusterFS with a lot of Bricks because I've found more cons as pros in my case. @Ben thx for this very detailed document! Cheers and Thanks Heiko On 10.12.2013 00:38, Dan Mons wrote: On 10 December 2013 08:09, Joe Julian j...@julianfamily.org wrote: Replicas are defined in the order bricks are listed in the volume create command. So gluster volume create myvol replica 2 server1:/data/brick1 server2:/data/brick1 server3:/data/brick1 server4:/data/brick1 will replicate between server1 and server2 and replicate between server3 and server4. Bricks added to a replica 2 volume after it's been created will require pairs of bricks, The best way to force replication to happen on another server is to just define it that way. Yup, that's understood. The problem is when (for argument's sake) : * We've defined 4 hosts with 10 disks each * Each individual disk is a brick * Replication is defined correctly when creating the volume initially * I'm on holidays, my employer buys a single node, configures it brick-per-disk, and the IT junior adds it to the cluster All good up until that final point, and then I've got that fifth node at the end replicating to itself. Node goes down some months later, chaos ensues. Not a GlusterFS/technology problem, but a problem with what frequently happens at a human level. As a sysadmin, these are also things I need to work around, even if it means deviating from best practices. :) -Dan ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users - -- Anynines.com Avarteq GmbH B.Sc. Informatik Heiko Krämer CIO Twitter: @anynines - Geschäftsführer: Alexander Faißt, Dipl.-Inf.(FH) Julian Fischer Handelsregister: AG Saarbrücken HRB 17413, Ust-IdNr.: DE262633168 Sitz: Saarbrücken -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.14 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iQEcBAEBAgAGBQJSptoTAAoJELxFogM4ixOFJTsIAJBWed3AGiiI+PDC2ubfboKc UPkMc+zuirRh2+QJBAoZ4CsAv9eIZ5NowclSSby9PTq2XRjjLvMdKuI+IbXCRT4j AbMLYfP3g4Q+agXnY6N6WJ6ZIqXQ8pbCK3shYp9nBfVYkiDUT1bGk0WcgQmEWTCw ta1h17LYkworIDRtqWQAl4jr4JR4P3x4cmwOZiHCVCtlyOP02x/fN4dji6nyOtuB kQPBVsND5guQNU8Blg5cQoES5nthtuwJdkWXB+neaCZd/u3sexVSNe5m15iWbyYg mAoVvlBJ473IKATlxM5nVqcUhmjFwNcc8MMwczXxTkwniYzth53BSoltPn7kIx4= =epys -END PGP SIGNATURE- attachment: hkraemer.vcf___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Errors from PHP stat() on files and directories in a glusterfs mount
Hi, It seems I have a related problem (just posted this on the mailing list). Do you already have a solution for this problem? gr. Johan Huysmans On 05-12-13 20:05, Bill Mair wrote: Hi, I'm trying to use glusterfs to mirror the ownCloud data area between 2 servers. They are using debian jessie due to some dependancies that I have for other components. This is where my issue rears it's ugly head. This is failing because I can't stat the files and directories on my glusterfs mount. /var/www/owncloud/data is where I am mounting the volume and I can reproduce the error using a simple php test application, so I don't think that it is apache or owncloud related. I'd be grateful for any pointers on how to resolve this problem. Thanks, Bill Attached is simple.php test and the results of executing strace php5 simple.php twice, once with the glusterfs mounted (simple.php.strace-glusterfs) and once against the file system when unmounted (simple.php.strace-unmounted). Here is what I get in the gluster log when I run the test (as root): /var/log/glusterfs/var-www-owncloud-data.log [2013-12-05 18:33:50.802250] D [client-handshake.c:185:client_start_ping] 0-gv-ocdata-client-0: returning as transport is already disconnected OR there are no frames (0 || 0) [2013-12-05 18:33:50.825132] D [afr-self-heal-common.c:138:afr_sh_print_pending_matrix] 0-gv-ocdata-replicate-0: pending_matrix: [ 0 0 ] [2013-12-05 18:33:50.825322] D [afr-self-heal-common.c:138:afr_sh_print_pending_matrix] 0-gv-ocdata-replicate-0: pending_matrix: [ 0 0 ] [2013-12-05 18:33:50.825393] D [afr-self-heal-common.c:887:afr_mark_sources] 0-gv-ocdata-replicate-0: Number of sources: 0 [2013-12-05 18:33:50.825456] D [afr-self-heal-data.c:825:afr_lookup_select_read_child_by_txn_type] 0-gv-ocdata-replicate-0: returning read_child: 0 [2013-12-05 18:33:50.825511] D [afr-common.c:1380:afr_lookup_select_read_child] 0-gv-ocdata-replicate-0: Source selected as 0 for / [2013-12-05 18:33:50.825579] D [afr-common.c:1117:afr_lookup_build_response_params] 0-gv-ocdata-replicate-0: Building lookup response from 0 [2013-12-05 18:33:50.827069] D [afr-common.c:131:afr_lookup_xattr_req_prepare] 0-gv-ocdata-replicate-0: /check.txt: failed to get the gfid from dict [2013-12-05 18:33:50.829409] D [client-handshake.c:185:client_start_ping] 0-gv-ocdata-client-0: returning as transport is already disconnected OR there are no frames (0 || 0) [2013-12-05 18:33:50.836719] D [afr-self-heal-common.c:138:afr_sh_print_pending_matrix] 0-gv-ocdata-replicate-0: pending_matrix: [ 0 0 ] [2013-12-05 18:33:50.836870] D [afr-self-heal-common.c:138:afr_sh_print_pending_matrix] 0-gv-ocdata-replicate-0: pending_matrix: [ 0 0 ] [2013-12-05 18:33:50.836941] D [afr-self-heal-common.c:887:afr_mark_sources] 0-gv-ocdata-replicate-0: Number of sources: 0 [2013-12-05 18:33:50.837002] D [afr-self-heal-data.c:825:afr_lookup_select_read_child_by_txn_type] 0-gv-ocdata-replicate-0: returning read_child: 0 [2013-12-05 18:33:50.837058] D [afr-common.c:1380:afr_lookup_select_read_child] 0-gv-ocdata-replicate-0: Source selected as 0 for /check.txt [2013-12-05 18:33:50.837129] D [afr-common.c:1117:afr_lookup_build_response_params] 0-gv-ocdata-replicate-0: Building lookup response from 0 Other bits of information root@bbb-1:/var/www/owncloud# uname -a Linux bbb-1 3.8.13-bone30 #1 SMP Thu Nov 14 02:59:07 UTC 2013 armv7l GNU/Linux root@bbb-1:/var/www/owncloud# dpkg -l glusterfs-* Desired=Unknown/Install/Remove/Purge/Hold | Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend |/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad) ||/ Name Version Architecture Description +++--===-===-== ii glusterfs-client 3.4.1-1 armhf clustered file-system (client package) ii glusterfs-common 3.4.1-1 armhf GlusterFS common libraries and translator modules ii glusterfs-server 3.4.1-1 armhf clustered file-system (server package) mount bbb-1:gv-ocdata on /var/www/owncloud/data type fuse.glusterfs (rw,relatime,user_id=0,group_id=0,default_permissions,allow_other,max_read=131072) /etc/fstab UUID=---- /sdhc ext4 defaults 0 0 bbb-1:gv-ocdata /var/www/owncloud/data glusterfs defaults,_netdev,log-level=DEBUG 0 0 ls -al on the various paths root@bbb-1:/var/log/glusterfs# ll -d /sdhc/ drwxrwxr-x 7 root root 4096 Nov 28 19:15 /sdhc/ root@bbb-1:/var/log/glusterfs# ll -d /sdhc/gv-ocdata/ drwxrwx--- 5 www-data www-data 4096 Dec 5 00:50 /sdhc/gv-ocdata/ root@bbb-1:/var/log/glusterfs# ll -d /sdhc/gv-ocdata/check.txt -rw-r--r-- 2 root root 10 Dec 5 00:50
Re: [Gluster-users] Gluster infrastructure question
Hi Ben, For glusterfs would you recommend the enterprise-storage or throughput-performance tuned profile? Thanks, Andrew On Tue, Dec 10, 2013 at 6:31 AM, Ben Turner btur...@redhat.com wrote: - Original Message - From: Ben Turner btur...@redhat.com To: Heiko Krämer hkrae...@anynines.de Cc: gluster-users@gluster.org List gluster-users@gluster.org Sent: Monday, December 9, 2013 2:26:45 PM Subject: Re: [Gluster-users] Gluster infrastructure question - Original Message - From: Heiko Krämer hkrae...@anynines.de To: gluster-users@gluster.org List gluster-users@gluster.org Sent: Monday, December 9, 2013 8:18:28 AM Subject: [Gluster-users] Gluster infrastructure question -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Heyho guys, I'm running since years glusterfs in a small environment without big problems. Now I'm going to use glusterFS for a bigger cluster but I've some questions :) Environment: * 4 Servers * 20 x 2TB HDD, each * Raidcontroller * Raid 10 * 4x bricks = Replicated, Distributed volume * Gluster 3.4 1) I'm asking me, if I can delete the raid10 on each server and create for each HDD a separate brick. In this case have a volume 80 Bricks so 4 Server x 20 HDD's. Is there any experience about the write throughput in a production system with many of bricks like in this case? In addition i'll get double of HDD capacity. Have a look at: http://rhsummit.files.wordpress.com/2012/03/england-rhs-performance.pdf That one was from 2012, here is the latest: http://rhsummit.files.wordpress.com/2013/07/england_th_0450_rhs_perf_practices-4_neependra.pdf -b Specifically: ● RAID arrays ● More RAID LUNs for better concurrency ● For RAID6, 256-KB stripe size I use a single RAID 6 that is divided into several LUNs for my bricks. For example, on my Dell servers(with PERC6 RAID controllers) each server has 12 disks that I put into raid 6. Then I break the RAID 6 into 6 LUNs and create a new PV/VG/LV for each brick. From there I follow the recommendations listed in the presentation. HTH! -b 2) I've heard a talk about glusterFS and out scaling. The main point was if more bricks are in use, the scale out process will take a long time. The problem was/is the Hash-Algo. So I'm asking me how is it if I've one very big brick (Raid10 20TB on each server) or I've much more bricks, what's faster and is there any issues? Is there any experiences ? 3) Failover of a HDD is for a raid controller with HotSpare HDD not a big deal. Glusterfs will rebuild automatically if a brick fails and there are no data present, this action will perform a lot of network traffic between the mirror bricks but it will handle it equal as the raid controller right ? Thanks and cheers Heiko - -- Anynines.com Avarteq GmbH B.Sc. Informatik Heiko Krämer CIO Twitter: @anynines - Geschäftsführer: Alexander Faißt, Dipl.-Inf.(FH) Julian Fischer Handelsregister: AG Saarbrücken HRB 17413, Ust-IdNr.: DE262633168 Sitz: Saarbrücken -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.14 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iQEcBAEBAgAGBQJSpcMfAAoJELxFogM4ixOF/ncH/3L9DvOWHrF0XBqCgeT6QQ6B lDwtXiD9xoznht0Zs2S9LA9Z7r2l5/fzMOUSOawEMv6M16Guwq3gQ1lClUi4Iwj0 GKKtYQ6F4aG4KXHY4dlu1QKT5OaLk8ljCQ47Tc9aAiJMhfC1/IgQXOslFv26utdJ N9jxiCl2+r/tQvQRw6mA4KAuPYPwOV+hMtkwfrM4UsIYGGbkNPnz1oqmBsfGdSOs TJh6+lQRD9KYw72q3I9G6ZYlI7ylL9Q7vjTroVKH232pLo4G58NLxyvWvcOB9yK6 Bpf/gRMxFNKA75eW5EJYeZ6EovwcyCAv7iAm+xNKhzsoZqbBbTOJxS5zKm4YWoY= =bDly -END PGP SIGNATURE- ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] replace-brick failing - transport.address-family not specified
On 12/10/2013 02:26 PM, Bernhard Glomm wrote: Am 10.12.2013 06:39:47, schrieb Vijay Bellur: On 12/08/2013 07:06 PM, Nguyen Viet Cuong wrote: Thanks for sharing. Btw, I do believe that GlusterFS 3.2.x is much more stable than 3.4.x in production. This is quite contrary to what we have seen in the community. From a development perspective too, we feel much better about 3.4.1. Are there specific instances that worked well with 3.2.x which does not work fine for you in 3.4.x? 987555 - is that fixed in 3.5? Or did it even make it into 3.4.2 couldn't find a note on that. Yes, this will be part of 3.4.2. Note that the original problem was due to libvirt being rigid about the ports that it needs to use for migrations. AFAIK this has been addressed in upstream libvirt as well. Through this bug fix, glusterfs provides a mechanism where it can use a separate range of ports for bricks. This configuration can be enabled to work with other applications that do not adhere with guidelines laid out by IANA. Cheers, Vijay ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Gluster - replica - Unable to self-heal contents of '/' (possible split-brain)
On 12/09/2013 07:21 PM, Alexandru Coseru wrote: [2013-12-09 13:20:52.066978] E [afr-self-heal-common.c:197:afr_sh_print_split_brain_log] 0-stor1-replicate-0: Unable to self-heal contents of '/' (possible split-brain). Please delete the file from all but the preferred subvolume.- Pending matrix: [ [ 0 2 ] [ 2 0 ] ] [2013-12-09 13:20:52.067386] E [afr-self-heal-common.c:2212:afr_self_heal_completion_cbk] 0-stor1-replicate-0: background meta-data self-heal failed on / [2013-12-09 13:20:52.067452] E [mount3.c:290:mnt3svc_lookup_mount_cbk] 0-nfs: error=Input/output error [2013-12-09 13:20:53.092039] E [afr-self-heal-common.c:197:afr_sh_print_split_brain_log] 0-stor1-replicate-0: Unable to self-heal contents of '/' (possible split-brain). Please delete the file from all but the preferred subvolume.- Pending matrix: [ [ 0 2 ] [ 2 0 ] ] [2013-12-09 13:20:53.092497] E [afr-self-heal-common.c:2212:afr_self_heal_completion_cbk] 0-stor1-replicate-0: background meta-data self-heal failed on / [2013-12-09 13:20:53.092559] E [mount3.c:290:mnt3svc_lookup_mount_cbk] 0-nfs: error=Input/output error What I’m doing wrong ? Looks like there is a metadata split-brain on /. The split-brain resolution document at [1] can possibly be of help here. -Vijay [1] https://github.com/gluster/glusterfs/blob/master/doc/split-brain.md PS: Volume stor_fast works like a charm. Good to know, thanks! ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users
[Gluster-users] Error after crash of Virtual Machine during migration
Greetings, Legend: storage-gfs-3-prd - the first gluster. storage-1-saas - new gluster where the first gluster had to be migrated. storage-gfs-4-prd - the second gluster (which had to be migrated later). I've started command replace-brick: 'gluster volume replace-brick sa_bookshelf storage-gfs-3-prd:/ydp/shared storage-1-saas:/ydp/shared start' During that Virtual Machine (Xen) has crashed. Now I can't abort migration and continue it again. When I try: '# gluster volume replace-brick sa_bookshelf storage-gfs-3-prd:/ydp/shared storage-1-saas:/ydp/shared abort' The command lasts about 5 minutes then finishes with no results. Apart from that Gluster after that command starts behave very strange. For example I can't do '# gluster volume heal sa_bookshelf info' because it lasts about 5 minutes and returns black screen (the same like abort). Then I restart Gluster server and Gluster returns to normal work except the replace-brick commands. When I do: '# gluster volume replace-brick sa_bookshelf storage-gfs-3-prd:/ydp/shared storage-1-saas:/ydp/shared status' I get: Number of files migrated = 0 Current file= I can do 'volume heal info' commands etc. until I call the command: '# gluster volume replace-brick sa_bookshelf storage-gfs-3-prd:/ydp/shared storage-1-saas:/ydp/shared abort'. # gluster --version glusterfs 3.3.1 built on Oct 22 2012 07:54:24 Repository revision: git://git.gluster.com/glusterfs.git Copyright (c) 2006-2011 Gluster Inc. http://www.gluster.com GlusterFS comes with ABSOLUTELY NO WARRANTY. You may redistribute copies of GlusterFS under the terms of the GNU General Public License. Brick (/ydp/shared) logs (repeats the same constantly): [2013-12-06 11:29:44.790299] W [dict.c:995:data_to_str] (--/usr/lib/glusterfs/3.3.1/rpc-transport/socket.so(socket_connect+0xab ) [0x7ff4a5d35fcb] (--/usr/lib/glusterfs/3.3.1/rpc-transport/socket.so(socket_client_get_r emote_sockaddr+0x15d) [0x7ff4a5d3d64d] (--/usr/lib/glusterfs/3.3.1/rpc-transport/socket.so(client_fill_address _family+0x2bb) [0x7ff4a5d3d4ab]))) 0-dict: data is NULL [2013-12-06 11:29:44.790402] W [dict.c:995:data_to_str] (--/usr/lib/glusterfs/3.3.1/rpc-transport/socket.so(socket_connect+0xab ) [0x7ff4a5d35fcb] (--/usr/lib/glusterfs/3.3.1/rpc-transport/socket.so(socket_client_get_r emote_sockaddr+0x15d) [0x7ff4a5d3d64d] (--/usr/lib/glusterfs/3.3.1/rpc-transport/socket.so(client_fill_address _family+0x2c6) [0x7ff4a5d3d4b6]))) 0-dict: data is NULL [2013-12-06 11:29:44.790465] E [name.c:141:client_fill_address_family] 0-sa_bookshelf-replace-brick: transport.address-family not specified. Could not guess default value from (remote-host:(null) or transport.unix.connect-path:(null)) options [2013-12-06 11:29:47.791037] W [dict.c:995:data_to_str] (--/usr/lib/glusterfs/3.3.1/rpc-transport/socket.so(socket_connect+0xab ) [0x7ff4a5d35fcb] (--/usr/lib/glusterfs/3.3.1/rpc-transport/socket.so(socket_client_get_r emote_sockaddr+0x15d) [0x7ff4a5d3d64d] (--/usr/lib/glusterfs/3.3.1/rpc-transport/socket.so(client_fill_address _family+0x2bb) [0x7ff4a5d3d4ab]))) 0-dict: data is NULL [2013-12-06 11:29:47.791141] W [dict.c:995:data_to_str] (--/usr/lib/glusterfs/3.3.1/rpc-transport/socket.so(socket_connect+0xab ) [0x7ff4a5d35fcb] (--/usr/lib/glusterfs/3.3.1/rpc-transport/socket.so(socket_client_get_r emote_sockaddr+0x15d) [0x7ff4a5d3d64d] (--/usr/lib/glusterfs/3.3.1/rpc-transport/socket.so(client_fill_address _family+0x2c6) [0x7ff4a5d3d4b6]))) 0-dict: data is NULL [2013-12-06 11:29:47.791174] E [name.c:141:client_fill_address_family] 0-sa_bookshelf-replace-brick: transport.address-family not specified. Could not guess default value from (remote-host:(null) or transport.unix.connect-path:(null)) options [2013-12-06 11:29:50.791775] W [dict.c:995:data_to_str] (--/usr/lib/glusterfs/3.3.1/rpc-transport/socket.so(socket_connect+0xab ) [0x7ff4a5d35fcb] (--/usr/lib/glusterfs/3.3.1/rpc-transport/socket.so(socket_client_get_r emote_sockaddr+0x15d) [0x7ff4a5d3d64d] (--/usr/lib/glusterfs/3.3.1/rpc-transport/socket.so(client_fill_address _family+0x2bb) [0x7ff4a5d3d4ab]))) 0-dict: data is NULL [2013-12-06 11:29:50.791986] W [dict.c:995:data_to_str] (--/usr/lib/glusterfs/3.3.1/rpc-transport/socket.so(socket_connect+0xab ) [0x7ff4a5d35fcb] (--/usr/lib/glusterfs/3.3.1/rpc-transport/socket.so(socket_client_get_r emote_sockaddr+0x15d) [0x7ff4a5d3d64d] (--/usr/lib/glusterfs/3.3.1/rpc-transport/socket.so(client_fill_address _family+0x2c6) [0x7ff4a5d3d4b6]))) 0-dict: data is NULL [2013-12-06 11:29:50.792046] E [name.c:141:client_fill_address_family] 0-sa_bookshelf-replace-brick: transport.address-family not specified. Could not guess default value from (remote-host:(null) or transport.unix.connect-path:(null)) options # gluster volume info Volume Name: sa_bookshelf Type: Distributed-Replicate Volume ID: 74512f52-72ec-4538-9a54-4e50c4691722 Status: Started Number of Bricks: 2 x 2 = 4 Transport-type: tcp Bricks: Brick1: storage-gfs-3-prd:/ydp/shared Brick2:
Re: [Gluster-users] Structure needs cleaning on some files
Hi All, It seems I can easily reproduce the problem. * on node 1 create a file (touch , cat , ...). * on node 2 take md5sum of direct file (md5sum /path/to/file) * on node 1 move file to other name (mv file file1) * on node 2 take md5sum of direct file (md5sum /path/to/file), this is still working although the file is not really there * on node 1 change file content * on node 2 take md5sum of direct file (md5sum /path/to/file), this is still working and has a changed md5sum This is really strange behaviour. Is this normal, can this be altered with a a setting? Thanks for any info, gr. Johan On 10-12-13 10:02, Johan Huysmans wrote: I could reproduce this problem with while my mount point is running in debug mode. logfile is attached. gr. Johan Huysmans On 10-12-13 09:30, Johan Huysmans wrote: Hi All, When reading some files we get this error: md5sum: /path/to/file.xml: Structure needs cleaning in /var/log/glusterfs/mnt-sharedfs.log we see these errors: [2013-12-10 08:07:32.256910] W [client-rpc-fops.c:526:client3_3_stat_cbk] 1-testvolume-client-0: remote operation failed: No such file or directory [2013-12-10 08:07:32.257436] W [client-rpc-fops.c:526:client3_3_stat_cbk] 1-testvolume-client-1: remote operation failed: No such file or directory [2013-12-10 08:07:32.259356] W [fuse-bridge.c:705:fuse_attr_cbk] 0-glusterfs-fuse: 8230: STAT() /path/to/file.xml = -1 (Structure needs cleaning) We are using gluster 3.4.1-3 on CentOS6. Our servers are 64-bit, our clients 32-bit (we are already using --enable-ino32 on the mountpoint) This is my gluster configuration: Volume Name: testvolume Type: Replicate Volume ID: ca9c2f87-5d5b-4439-ac32-b7c138916df7 Status: Started Number of Bricks: 1 x 2 = 2 Transport-type: tcp Bricks: Brick1: SRV-1:/gluster/brick1 Brick2: SRV-2:/gluster/brick2 Options Reconfigured: performance.force-readdirp: on performance.stat-prefetch: off network.ping-timeout: 5 And this is how the applications work: We have 2 client nodes who both have a fuse.glusterfs mountpoint. On 1 client node we have a application which writes files. On the other client node we have a application which reads these files. On the node where the files are written we don't see any problem, and can read that file without problems. On the other node we have problems (error messages above) reading that file. The problem occurs when we perform a md5sum on the exact file, when perform a md5sum on all files in that directory there is no problem. How can we solve this problem as this is annoying. The problem occurs after some time (can be days), an umount and mount of the mountpoint solves it for some days. Once it occurs (and we don't remount) it occurs every time. I hope someone can help me with this problems. Thanks, Johan Huysmans ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] replace-brick failing - transport.address-family not specified
Hi Vijay, Thank you for your prompt, and accurate reply, it is appreciated... I was starting to worry about which version I should run! I've logged https://bugzilla.redhat.com/show_bug.cgi?id=1039954 for this issue. It would be good to sort out some of the documentation on this issue. If I get chance I might start knocking some information up... Its a shame the documentation is so lacking... Thanks again Alex - Original Message - From: Vijay Bellur vbel...@redhat.com To: Alex Pearson a...@apics.co.uk Cc: gluster-users Discussion List Gluster-users@gluster.org Sent: Tuesday, 10 December, 2013 5:32:52 AM Subject: Re: [Gluster-users] replace-brick failing - transport.address-family not specified On 12/08/2013 05:44 PM, Alex Pearson wrote: Hi All, Just to assist anyone else having this issue, and so people can correct me if I'm wrong... It would appear that replace-brick is 'horribly broken' and should not be used in Gluster 3.4. Instead a combination of remove-brick ... count X ... start should be used to remove the resilience from a volume and the brick, then add-brick ... count X to add the new brick. This does beg the question of why the hell a completely broken command was left in the 'stable' release of the software. This sort of thing really hurts Glusters credibility. A mention of replace-brick not being functional was made in the release note for 3.4.0: https://github.com/gluster/glusterfs/blob/release-3.4/doc/release-notes/3.4.0.md Ref: http://www.gluster.org/pipermail/gluster-users/2013-August/036936.html This discussion happened after the release of GlusterFS 3.4. However, I do get the point you are trying to make here. We can have an explicit warning in CLI when operations considered broken are attempted. There is a similar plan to add a warning for rdma volumes: https://bugzilla.redhat.com/show_bug.cgi?id=1017176 There is a patch under review currently to remove the replace-brick command from CLI: http://review.gluster.org/6031 This is intended for master. If you can open a bug report indicating an appropriate warning message that you would like to see when replace-brick is attempted, I would be happy to get such a fix in to both 3.4 and 3.5. Thanks, Vijay Cheers Alex - Original Message - From: Alex Pearson a...@apics.co.uk To: gluster-users@gluster.org Sent: Friday, 6 December, 2013 5:25:43 PM Subject: [Gluster-users] replace-brick failing - transport.address-family not specified Hello, I have what I think is a fairly basic Gluster setup, however when I try to carry out a replace-brick operation it consistently fails... Here are the command line options: root@osh1:~# gluster volume info media Volume Name: media Type: Replicate Volume ID: 4c290928-ba1c-4a45-ac05-85365b4ea63a Status: Started Number of Bricks: 1 x 2 = 2 Transport-type: tcp Bricks: Brick1: osh1.apics.co.uk:/export/sdc/media Brick2: osh2.apics.co.uk:/export/sdb/media root@osh1:~# gluster volume replace-brick media osh1.apics.co.uk:/export/sdc/media osh1.apics.co.uk:/export/WCASJ2055681/media start volume replace-brick: success: replace-brick started successfully ID: 60bef96f-a5c7-4065-864e-3e0b2773d7bb root@osh1:~# gluster volume replace-brick media osh1.apics.co.uk:/export/sdc/media osh1.apics.co.uk:/export/WCASJ2055681/media status volume replace-brick: failed: Commit failed on localhost. Please check the log file for more details. root@osh1:~# tail /var/log/glusterfs/bricks/export-sdc-media.log [2013-12-06 17:24:54.795754] E [name.c:147:client_fill_address_family] 0-media-replace-brick: transport.address-family not specified. Could not guess default value from (remote-host:(null) or transport.unix.connect-path:(null)) options [2013-12-06 17:24:57.796422] W [dict.c:1055:data_to_str] (--/usr/lib/x86_64-linux-gnu/glusterfs/3.4.1/rpc-transport/socket.so(+0x528b) [0x7fb826e3428b] (--/usr/lib/x86_64-linux-gnu/glusterfs/3.4.1/rpc-transport/socket.so(socket_client_get_remote_sockaddr+0x4e) [0x7fb826e3a25e] (--/usr/lib/x86_64-linux-gnu/glusterfs/3.4.1/rpc-transport/socket.so(client_fill_address_family+0x200) [0x7fb826e39f50]))) 0-dict: data is NULL [2013-12-06 17:24:57.796494] W [dict.c:1055:data_to_str] (--/usr/lib/x86_64-linux-gnu/glusterfs/3.4.1/rpc-transport/socket.so(+0x528b) [0x7fb826e3428b] (--/usr/lib/x86_64-linux-gnu/glusterfs/3.4.1/rpc-transport/socket.so(socket_client_get_remote_sockaddr+0x4e) [0x7fb826e3a25e] (--/usr/lib/x86_64-linux-gnu/glusterfs/3.4.1/rpc-transport/socket.so(client_fill_address_family+0x20b) [0x7fb826e39f5b]))) 0-dict: data is NULL [2013-12-06 17:24:57.796519] E [name.c:147:client_fill_address_family] 0-media-replace-brick: transport.address-family not specified. Could not guess default value from (remote-host:(null) or transport.unix.connect-path:(null)) options [2013-12-06 17:25:00.797153] W [dict.c:1055:data_to_str]
Re: [Gluster-users] Where does the 'date' string in '/var/log/glusterfs/gl.log' come from?
On Tuesday, December 10, 2013 12:49:25 PM Sharuzzaman Ahmat Raslan wrote: Hi Harry, Did you setup ntp on each of the node, and sync the time to one single source? Yes, this is done by ROCKS and all the nodes have the identical time. (2admins have checked repeatedly) Thanks. On Tue, Dec 10, 2013 at 12:44 PM, harry mangalam harry.manga...@uci.eduwrote: Admittedly I should search the source, but I wonder if anyone knows this offhand. Background: of our 84 ROCKS (6.1) -provisioned compute nodes, 4 have picked up an 'advanced date' in the /var/log/glusterfs/gl.log file - that date string is running about 5-6 hours ahead of the system date and all the Gluster servers (which are identical and correct). The time advancement does not appear to be identical tho it's hard to tell since it only shows on errors and those update irregularly. All the clients are the same version and all the servers are the same (gluster v 3.4.0-8.el6.x86_64 This would not be of interest except that those 4 clients are losing files, unable to reliably do IO, etc on the gluster fs. They don't appear to be having problems with NFS mounts, nor with a Fraunhofer FS that is also mounted on each node, Rebooting 2 of them has no effect - they come right back with an advanced date. --- Harry Mangalam - Research Computing, OIT, Rm 225 MSTB, UC Irvine [m/c 2225] / 92697 Google Voice Multiplexer: (949) 478-4487 415 South Circle View Dr, Irvine, CA, 92697 [shipping] MSTB Lat/Long: (33.642025,-117.844414) (paste into Google Maps) --- ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users --- Harry Mangalam - Research Computing, OIT, Rm 225 MSTB, UC Irvine [m/c 2225] / 92697 Google Voice Multiplexer: (949) 478-4487 415 South Circle View Dr, Irvine, CA, 92697 [shipping] MSTB Lat/Long: (33.642025,-117.844414) (paste into Google Maps) --- ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Where does the 'date' string in '/var/log/glusterfs/gl.log' come from?
On Tuesday, December 10, 2013 10:42:28 AM Vijay Bellur wrote: On 12/10/2013 10:14 AM, harry mangalam wrote: Admittedly I should search the source, but I wonder if anyone knows this offhand. Background: of our 84 ROCKS (6.1) -provisioned compute nodes, 4 have picked up an 'advanced date' in the /var/log/glusterfs/gl.log file - that date string is running about 5-6 hours ahead of the system date and all the Gluster servers (which are identical and correct). The time advancement does not appear to be identical tho it's hard to tell since it only shows on errors and those update irregularly. The timestamps in the log file are by default in UTC. That could possibly explain why the timestamps look advanced in the log file. that seems to make sense. The advanced time on the 4 problems nodes looks to be the correct UTC time, but the others are using /local time/ in their logs, for some reason. And localtime nodes are the ones NOT having problems. ...??! However, this looks to be more of a ROCKS / config problem than a general gluster problem at this point. All the nodes have the md5-identical /etc/localtime, but they seem to be behaving differently as to the logging. Thanks for the pointer. hjm All the clients are the same version and all the servers are the same (gluster v 3.4.0-8.el6.x86_64 This would not be of interest except that those 4 clients are losing files, unable to reliably do IO, etc on the gluster fs. They don't appear to be having problems with NFS mounts, nor with a Fraunhofer FS that is also mounted on each node, Do you observe anything in the client log files of these machines that indicate I/O problems? Yes. Thanks, Vijay ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users --- Harry Mangalam - Research Computing, OIT, Rm 225 MSTB, UC Irvine [m/c 2225] / 92697 Google Voice Multiplexer: (949) 478-4487 415 South Circle View Dr, Irvine, CA, 92697 [shipping] MSTB Lat/Long: (33.642025,-117.844414) (paste into Google Maps) --- ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Where does the 'date' string in '/var/log/glusterfs/gl.log' come from?
On 12/10/2013 10:57 AM, harry mangalam wrote: On Tuesday, December 10, 2013 10:42:28 AM Vijay Bellur wrote: On 12/10/2013 10:14 AM, harry mangalam wrote: Admittedly I should search the source, but I wonder if anyone knows this offhand. Background: of our 84 ROCKS (6.1) -provisioned compute nodes, 4 have picked up an 'advanced date' in the /var/log/glusterfs/gl.log file - that date string is running about 5-6 hours ahead of the system date and all the Gluster servers (which are identical and correct). The time advancement does not appear to be identical tho it's hard to tell since it only shows on errors and those update irregularly. The timestamps in the log file are by default in UTC. That could possibly explain why the timestamps look advanced in the log file. that seems to make sense. The advanced time on the 4 problems nodes looks to be the correct UTC time, but the others are using /local time/ in their logs, for some reason. And localtime nodes are the ones NOT having problems. ...??! However, this looks to be more of a ROCKS / config problem than a general gluster problem at this point. All the nodes have the md5-identical /etc/localtime, but they seem to be behaving differently as to the logging. Thanks for the pointer. hjm All the clients are the same version and all the servers are the same (gluster v 3.4.0-8.el6.x86_64 This would not be of interest except that those 4 clients are losing files, unable to reliably do IO, etc on the gluster fs. They don't appear to be having problems with NFS mounts, nor with a Fraunhofer FS that is also mounted on each node, Do you observe anything in the client log files of these machines that indicate I/O problems? Yes. If I were to hazard a guess, since the timestamp is not configurable and *is* UTC in 3.4, it would seem that any server that's logging in local time must not be running 3.4. Sure, it's installed, but the application hasn't been restarted since it was installed. That's the only thing I can think of that would allow that behavior. ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Gluster infrastructure question
- Original Message - From: Andrew Lau and...@andrewklau.com To: Ben Turner btur...@redhat.com Cc: gluster-users@gluster.org List gluster-users@gluster.org Sent: Tuesday, December 10, 2013 5:03:36 AM Subject: Re: [Gluster-users] Gluster infrastructure question Hi Ben, For glusterfs would you recommend the enterprise-storage or throughput-performance tuned profile? I usually use rhs-high-throughput. I am not sure which versions of tuned have this profile but I am running tuned-0.2.19-11.el6.1.noarch, I am pretty sure you can grab the srpm here: http://ftp.redhat.com/redhat/linux/enterprise/6Server/en/os/SRPMS/ HTH, -b Thanks, Andrew On Tue, Dec 10, 2013 at 6:31 AM, Ben Turner btur...@redhat.com wrote: - Original Message - From: Ben Turner btur...@redhat.com To: Heiko Krämer hkrae...@anynines.de Cc: gluster-users@gluster.org List gluster-users@gluster.org Sent: Monday, December 9, 2013 2:26:45 PM Subject: Re: [Gluster-users] Gluster infrastructure question - Original Message - From: Heiko Krämer hkrae...@anynines.de To: gluster-users@gluster.org List gluster-users@gluster.org Sent: Monday, December 9, 2013 8:18:28 AM Subject: [Gluster-users] Gluster infrastructure question -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Heyho guys, I'm running since years glusterfs in a small environment without big problems. Now I'm going to use glusterFS for a bigger cluster but I've some questions :) Environment: * 4 Servers * 20 x 2TB HDD, each * Raidcontroller * Raid 10 * 4x bricks = Replicated, Distributed volume * Gluster 3.4 1) I'm asking me, if I can delete the raid10 on each server and create for each HDD a separate brick. In this case have a volume 80 Bricks so 4 Server x 20 HDD's. Is there any experience about the write throughput in a production system with many of bricks like in this case? In addition i'll get double of HDD capacity. Have a look at: http://rhsummit.files.wordpress.com/2012/03/england-rhs-performance.pdf That one was from 2012, here is the latest: http://rhsummit.files.wordpress.com/2013/07/england_th_0450_rhs_perf_practices-4_neependra.pdf -b Specifically: ● RAID arrays ● More RAID LUNs for better concurrency ● For RAID6, 256-KB stripe size I use a single RAID 6 that is divided into several LUNs for my bricks. For example, on my Dell servers(with PERC6 RAID controllers) each server has 12 disks that I put into raid 6. Then I break the RAID 6 into 6 LUNs and create a new PV/VG/LV for each brick. From there I follow the recommendations listed in the presentation. HTH! -b 2) I've heard a talk about glusterFS and out scaling. The main point was if more bricks are in use, the scale out process will take a long time. The problem was/is the Hash-Algo. So I'm asking me how is it if I've one very big brick (Raid10 20TB on each server) or I've much more bricks, what's faster and is there any issues? Is there any experiences ? 3) Failover of a HDD is for a raid controller with HotSpare HDD not a big deal. Glusterfs will rebuild automatically if a brick fails and there are no data present, this action will perform a lot of network traffic between the mirror bricks but it will handle it equal as the raid controller right ? Thanks and cheers Heiko - -- Anynines.com Avarteq GmbH B.Sc. Informatik Heiko Krämer CIO Twitter: @anynines - Geschäftsführer: Alexander Faißt, Dipl.-Inf.(FH) Julian Fischer Handelsregister: AG Saarbrücken HRB 17413, Ust-IdNr.: DE262633168 Sitz: Saarbrücken -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.14 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iQEcBAEBAgAGBQJSpcMfAAoJELxFogM4ixOF/ncH/3L9DvOWHrF0XBqCgeT6QQ6B lDwtXiD9xoznht0Zs2S9LA9Z7r2l5/fzMOUSOawEMv6M16Guwq3gQ1lClUi4Iwj0 GKKtYQ6F4aG4KXHY4dlu1QKT5OaLk8ljCQ47Tc9aAiJMhfC1/IgQXOslFv26utdJ N9jxiCl2+r/tQvQRw6mA4KAuPYPwOV+hMtkwfrM4UsIYGGbkNPnz1oqmBsfGdSOs TJh6+lQRD9KYw72q3I9G6ZYlI7ylL9Q7vjTroVKH232pLo4G58NLxyvWvcOB9yK6 Bpf/gRMxFNKA75eW5EJYeZ6EovwcyCAv7iAm+xNKhzsoZqbBbTOJxS5zKm4YWoY= =bDly -END PGP SIGNATURE- ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users
[Gluster-users] Introducing gluster @ Belgian Post ... and some hiccups.
Hello; Appologies for being so direct, but I need some help. We're a 30 000 people company, mostly doing postal financial things in Belgium. In fact, we are the largest company in Belgium. We got ever greater pressure on budgets (don't we all), and we're facing a 5 year old VTL near end of life. Our biggest consumption is Oracle (give or take 143 Tb backups / archives per week). Archives are written to SAN disks first and then offloaded to the VTL. That box is 128 Tb ( 128 Tb on 2 datacentres) in a redundant setup. Our backup retention is about 3 days due to this... After those 3 days, the VTL offloads to a tape robot (very large storagetek sl8500 with all expansions). What we would like to do; is introduce gluster ( in the form of Red Hat Storage for support) to replace the VTL. I know this is possible, in my free time I organize fosdem ( https://fosdem.org/2014/ ) , and I've seen test setups which were similar but smaller. Then our management asked for reference cases. Believe it or not, but red hat couldn't give one customer that used gluster for oracle backups nor archives, and I can't find any trace online. However .. I did find this : http://www.gestas.net/files/webinar_follow_up.pdf Where I find this statement : Do you have anyone using this as a backup target for RMAN (Oracle backups) and replicating those backups/logs? = Absolutely. Backup-to-disk is a great use case for gluster, lots of groups use us for that. We work well with RMAN, Netbackup, Commvault, etc. Well; we need exactly that, and we use netbackup to orchestrate the oracle backups... Can we have some kind of testimonial, someone who has done this (and how )? Mvg; Bart [bpost_logo_kleur_klein]Weijs Bart Senior Sysadmin unix/linux M. +32 498 40 96 53 Muntcentrum, 1000 Brussel bart.we...@bpost.bemailto:bart.we...@bpost.be inline: image001.jpg___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users