Re: [Gluster-users] Very slow directory listing and high CPU usage on replicated volume
That's a very good point. I've been evaluating glusterfs from version 1.0 and refused to use it for one and only reason: the split-brain problem. With version 3.3 I have finally switched to glusterfs, but after a few months of production usage, I'm thinking of going back to separate servers with big raids. /home/freecloud# time echo * |wc -w 87926 real16m42.242s user0m0.384s sys0m0.072s I just don't get it. Until version 3.3 - Why would I need openstack, qemu support etc etc when after one simple reboot I would loose part of my data. On 11/6/12 11:35 AM, Fernando Frediani (Qube) wrote: Joe, I don't think we have to accept this as this is not acceptable thing. I have seen countless people complaining about this problem for a while and seems no improvements have been done. The thing about the ramdisk although might help, looks more a chewing gun. I have seen other distributed filesystems that don't suffer for the same problem, so why Gluster have to ? ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] rdma tansport on 3.3?
I have the same experiance. All comunication goes through ethernet. I think the documentation should be changed to "NOT SUPPORTED AT ALL!", because with my broken english I figured that there was no commercial support for rdma, but the code is there. On 10/19/12 9:00 PM, Bartek Krawczyk wrote: Funny thing is I was able to mount using transport rdma on 3.3.0 but there wasn't any speed difference. I'm not sure there is any difference in 3.3.1. Regards, ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] rdma tansport on 3.3?
I have an existing volume configured to use GbE and just got two Infiniband cards. How can I reconfigure the peers to use the IPoIB? On 10/19/12 2:48 PM, Bartek Krawczyk wrote: Due to the lack of rdma support in 3.3.x we decided to stick with plain IPoIB. ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Gluster 3.3.0 on CentOS 6 - GigabitEthernet vs InfiniBand
On 10/18/12 10:48 AM, Bartek Krawczyk wrote: On 18 October 2012 08:44, Ling Ho wrote: If your volume is created with both tcp and rdma, my experience is rdma does not work under 3.3.0 and it will always fall back to tcp. I just converted from gbe to infiniband and I was cursing the entire last week about this. Please devs: Make sure we can set the transport type between peers! ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Change transport type on volume from tcp to rdma
http://community.gluster.org/q/how-to-change-transport-type-on-active-volume---glusterfs-3-3/ On 10/11/12 4:41 PM, John Mark Walker wrote: Cool - can you add this to http://community.gluster.org/ ? -JM - Original Message - What I did was: gluster volume stop VOLUME gluster volume delete VOLUME On each peer on each brick I did: setfattr -x trusted.glusterfs.volume-id /mnt/brick1 setfattr -x trusted.gfid /mnt/brick1 setfattr -x trusted.glusterfs.volume-id /mnt/brick2 setfattr -x trusted.gfid /mnt/brick2 rm -r /mnt/brick1/.glusterfs/ rm -r /mnt/brick2/.glusterfs/ gluster volume create VOLUME replica 2 transport rdma,tcp peer1:brick1 peer2:brick1 peer1:brick2 peer2:brick2 Now I was able to mount with -o transport=rdma where I have Infiniband cards and -o transport=tcp where I have only ethernet Best Regards Ivan Dimitrov On 10/10/12 4:59 PM, Ivan Dimitrov wrote: So there is no manual way to change the transport right now? I need the transport between peers to be rdma and the transport between clients/peers to be tcp. Regards Ivan Dimitrov On 10/10/12 3:57 PM, Amar Tumballi wrote: On 10/10/2012 04:47 PM, Ivan Dimitrov wrote: Hello I have two peers setup and working with x2 bricks each. They have been working via tcp for the last 4-5 months. I just got two Infiniband cards and put the on the peers. I want to change the transport type to rdma instead of tcp but I don't see an easy way to do this. Can you please help me with proper instructions. Hi Ivan, You are asking for a feature which just got merged upstream (http://review.gluster.org/4008). This will make it to 3.4.0 release, till then the functionality you are asking will not be available. Regards, Amar ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Change transport type on volume from tcp to rdma
What I did was: gluster volume stop VOLUME gluster volume delete VOLUME On each peer on each brick I did: setfattr -x trusted.glusterfs.volume-id /mnt/brick1 setfattr -x trusted.gfid /mnt/brick1 setfattr -x trusted.glusterfs.volume-id /mnt/brick2 setfattr -x trusted.gfid /mnt/brick2 rm -r /mnt/brick1/.glusterfs/ rm -r /mnt/brick2/.glusterfs/ gluster volume create VOLUME replica 2 transport rdma,tcp peer1:brick1 peer2:brick1 peer1:brick2 peer2:brick2 Now I was able to mount with -o transport=rdma where I have Infiniband cards and -o transport=tcp where I have only ethernet Best Regards Ivan Dimitrov On 10/10/12 4:59 PM, Ivan Dimitrov wrote: So there is no manual way to change the transport right now? I need the transport between peers to be rdma and the transport between clients/peers to be tcp. Regards Ivan Dimitrov On 10/10/12 3:57 PM, Amar Tumballi wrote: On 10/10/2012 04:47 PM, Ivan Dimitrov wrote: Hello I have two peers setup and working with x2 bricks each. They have been working via tcp for the last 4-5 months. I just got two Infiniband cards and put the on the peers. I want to change the transport type to rdma instead of tcp but I don't see an easy way to do this. Can you please help me with proper instructions. Hi Ivan, You are asking for a feature which just got merged upstream (http://review.gluster.org/4008). This will make it to 3.4.0 release, till then the functionality you are asking will not be available. Regards, Amar ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Change transport type on volume from tcp to rdma
So there is no manual way to change the transport right now? I need the transport between peers to be rdma and the transport between clients/peers to be tcp. Regards Ivan Dimitrov On 10/10/12 3:57 PM, Amar Tumballi wrote: On 10/10/2012 04:47 PM, Ivan Dimitrov wrote: Hello I have two peers setup and working with x2 bricks each. They have been working via tcp for the last 4-5 months. I just got two Infiniband cards and put the on the peers. I want to change the transport type to rdma instead of tcp but I don't see an easy way to do this. Can you please help me with proper instructions. Hi Ivan, You are asking for a feature which just got merged upstream (http://review.gluster.org/4008). This will make it to 3.4.0 release, till then the functionality you are asking will not be available. Regards, Amar ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users
[Gluster-users] Change transport type on volume from tcp to rdma
Hello I have two peers setup and working with x2 bricks each. They have been working via tcp for the last 4-5 months. I just got two Infiniband cards and put the on the peers. I want to change the transport type to rdma instead of tcp but I don't see an easy way to do this. Can you please help me with proper instructions. Best Regards Ivan Dimitrov ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Gluster and maildir
I agree on the fewer bricks. Also see if you can use Infiniband. Best Regards Ivan On 10/2/12 11:29 PM, Robert Hajime Lanning wrote: On 10/02/12 13:01, ja...@combatyoga.net wrote: Basically, I'm trying to figure out if Gluster will perform better with more storage nodes in the storage block or if I would be better off consolidating the storage to a few of the systems and freeing up the resources for the email services on the remaining systems. I've had mixed results testing this in a KVM virtual environment, however it's getting down to the time where I need to make some decisions on ordering hardware. I do know that RAID1 and RAID5 do not compare apples to apples for performance, I'm looking for thoughts from the community as to which way you would set it up. With maildir, I believe that fewer bricks would perform better. The maildir format tends to be readdir() heavy. Since Gluster does not have a master index of directory entries, it has to hit every brick in the volume. ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] Gluster speed sooo slow
I have a low traffic free hosting and I converted some x,000 users on glusterfs a few months ago. I'm not impressed at all and would probably not convert any more users. It works ok for now, but with 88GB of 2TB volume. It's kind of pointless for now... :( I'm researching a way to convert my payed hosting users, but I can't find any system suitable for the job. Fernando, what gluster structure are you talking about? Best Regards Ivan Dimitrov Fernando, what On 8/13/12 2:16 PM, Fernando Frediani (Qube) wrote: I heard from a Large ISP talking to someone that works there they were trying to use GlusteFS for Maildir and they had a hell because of the many small files and had customer complaining all the time. Latency is acceptable on a networked filesystem, but the results people are reporting are beyond any latency problems, they are due to the way Gluster is structured and that was already confirmed by some people on this list, so changed are indeed needed on the code. If you take even a Gigabit network the round trip isn't that much really, (not more than a quarter of a ms) so it shouldn't be a big thing. Yes FUSE might also contribute to decrease performance but still the performance problems are on the architecture of the filesystem. One thing that is new to Gluster and that in my opinion could contribute to increase performance is the Distributed-Stripped volumes, but that doesn't still work for all enviroemnts. So as it stands for Multimedia or Archive files fine, for other usages I wouldn't bet my chips and would rather test thoroughly first. -Original Message- From: Brian Candler [mailto:b.cand...@pobox.com] Sent: 13 August 2012 11:00 To: Fernando Frediani (Qube) Cc: 'Ivan Dimitrov'; 'gluster-users@gluster.org' Subject: Re: [Gluster-users] Gluster speed sooo slow On Mon, Aug 13, 2012 at 09:40:49AM +, Fernando Frediani (Qube) wrote: I think Gluster as it stands now and current level of development is more for Multimedia and Archival files, not for small files nor for running Virtual Machines. It requires still a fair amount of development which hopefully RedHat will put in place. I know a large ISP is using gluster successfully for Maildir storage - or at least was a couple of years ago when I last spoke to them about it - which means very large numbers of small files. I think you need to be clear on the difference between throughput and latency. Any networked filesystem is going to have latency, and gluster maybe suffers more than most because of the FUSE layer at the client. This will show as poor throughput if a single client is sequentially reading or writing lots of small files, because it has to wait a round trip for each request. However, if you have multiple clients accessing at the same time, you can still have high total throughput. This is because the "wasted" time between requests from one client is used to service other clients. If gluster were to do aggressive client-side caching then it might be able to make responses appear faster to a single client, but this would be at the risk of data loss (e.g. responding that a file has been committed to disk, when in fact it hasn't). But this would make no difference to total throughput with multiple clients, which depends on the available bandwidth into the disk drives and across the network. So it all depends on your overall usage pattern. Only make your judgement based on a single-threaded benchmark if that's what your usage pattern is really going to be like: i.e. are you really going to have a single user accessing the filesystem, and their application reads or writes one file after the other rather than multiple files concurrently. Regards, Brian. ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] Gluster speed sooo slow
There is a big difference with working with small files (around 16kb) and big files (2mb). Performance is much better with big files. Witch is too bad for me ;( On 8/11/12 2:15 AM, Gandalf Corvotempesta wrote: What do you mean with "small files"? 16k ? 160k? 16mb? Do you know any workaround or any other software for this? Mee too i'm trying to create a clustered storage for many small file 2012/8/10 Philip Poten <mailto:philip.po...@gmail.com>> Hi Ivan, that's because Gluster has really bad "many small files" performance due to it's architecture. On all stat() calls (which rsync is doing plenty of), all replicas are being checked for integrity. regards, Philip 2012/8/10 Ivan Dimitrov mailto:dob...@amln.net>>: > So I stopped a node to check the BIOS and after it went up, the rebalance > kicked in. I was looking for those kind of speeds on a normal write. The > rebalance is much faster than my rsync/cp. > > https://dl.dropbox.com/u/282332/Screen%20Shot%202012-08-10%20at%202.04.09%20PM.png > > Best Regards > Ivan Dimitrov > > > On 8/10/12 1:23 PM, Ivan Dimitrov wrote: >> >> Hello >> What am I doing wrong?!? >> >> I have a test setup with 4 identical servers with 2 disks each in >> distribute-replicate 2. All servers are connected to a GB switch. >> >> I am experiencing really slow speeds at anything I do. Slow write, slow >> read, not to mention random write/reads. >> >> Here is an example: >> random-files is a directory with 32768 files with average size 16kb. >> [root@gltclient]:~# rsync -a /root/speedtest/random-files/ >> /home/gltvolume/ >> ^^ This will take more than 3 hours. >> >> On any of the servers if I do "iostat" the disks are not loaded at all: >> >> https://dl.dropbox.com/u/282332/Screen%20Shot%202012-08-10%20at%201.08.54%20PM.png >> >> This is similar result for all servers. >> >> Here is an example of simple "ls" command on the content. >> [root@gltclient]:~# unalias ls >> [root@gltclient]:~# /usr/bin/time -f "%e seconds" ls /home/gltvolume/ | wc >> -l >> 2.81 seconds >> 5393 >> >> almost 3 seconds to display 5000 files?!?! When they are 32,000, the ls >> will take around 35-45 seconds. >> >> This directory is on local disk: >> [root@gltclient]:~# /usr/bin/time -f "%e seconds" ls >> /root/speedtest/random-files/ | wc -l >> 1.45 seconds >> 32768 >> >> [root@gltclient]:~# /usr/bin/time -f "%e seconds" cat /home/gltvolume/* >> >/dev/null >> 190.50 seconds >> >> [root@gltclient]:~# /usr/bin/time -f "%e seconds" du -sh /home/gltvolume/ >> 126M/home/gltvolume/ >> 75.23 seconds >> >> >> Here is the volume information. >> >> [root@glt1]:~# gluster volume info >> >> Volume Name: gltvolume >> Type: Distributed-Replicate >> Volume ID: 16edd852-8d23-41da-924d-710b753bb374 >> Status: Started >> Number of Bricks: 4 x 2 = 8 >> Transport-type: tcp >> Bricks: >> Brick1: 1.1.74.246:/home/sda3 >> Brick2: glt2.network.net:/home/sda3 >> Brick3: 1.1.74.246:/home/sdb1 >> Brick4: glt2.network.net:/home/sdb1 >> Brick5: glt3.network.net:/home/sda3 >> Brick6: gltclient.network.net:/home/sda3 >> Brick7: glt3.network.net:/home/sdb1 >> Brick8: gltclient.network.net:/home/sdb1 >> Options Reconfigured: >> performance.io-thread-count: 32 >> performance.cache-size: 256MB >> cluster.self-heal-daemon: on >> >> >> [root@glt1]:~# gluster volume status all detail >> Status of volume: gltvolume >> >> -- >> Brick: Brick 1.1.74.246:/home/sda3 >> Port : 24009 >> Online : Y >> Pid : 1479 >> File System : ext4 >> Device : /dev/sda3 >> Mount Options: rw,noatime >> Inode Size : 256 >> Disk Space Free : 179.3GB >> Total Disk Spac
Re: [Gluster-users] Gluster speed sooo slow
So I stopped a node to check the BIOS and after it went up, the rebalance kicked in. I was looking for those kind of speeds on a normal write. The rebalance is much faster than my rsync/cp. https://dl.dropbox.com/u/282332/Screen%20Shot%202012-08-10%20at%202.04.09%20PM.png Best Regards Ivan Dimitrov On 8/10/12 1:23 PM, Ivan Dimitrov wrote: Hello What am I doing wrong?!? I have a test setup with 4 identical servers with 2 disks each in distribute-replicate 2. All servers are connected to a GB switch. I am experiencing really slow speeds at anything I do. Slow write, slow read, not to mention random write/reads. Here is an example: random-files is a directory with 32768 files with average size 16kb. [root@gltclient]:~# rsync -a /root/speedtest/random-files/ /home/gltvolume/ ^^ This will take more than 3 hours. On any of the servers if I do "iostat" the disks are not loaded at all: https://dl.dropbox.com/u/282332/Screen%20Shot%202012-08-10%20at%201.08.54%20PM.png This is similar result for all servers. Here is an example of simple "ls" command on the content. [root@gltclient]:~# unalias ls [root@gltclient]:~# /usr/bin/time -f "%e seconds" ls /home/gltvolume/ | wc -l 2.81 seconds 5393 almost 3 seconds to display 5000 files?!?! When they are 32,000, the ls will take around 35-45 seconds. This directory is on local disk: [root@gltclient]:~# /usr/bin/time -f "%e seconds" ls /root/speedtest/random-files/ | wc -l 1.45 seconds 32768 [root@gltclient]:~# /usr/bin/time -f "%e seconds" cat /home/gltvolume/* >/dev/null 190.50 seconds [root@gltclient]:~# /usr/bin/time -f "%e seconds" du -sh /home/gltvolume/ 126M/home/gltvolume/ 75.23 seconds Here is the volume information. [root@glt1]:~# gluster volume info Volume Name: gltvolume Type: Distributed-Replicate Volume ID: 16edd852-8d23-41da-924d-710b753bb374 Status: Started Number of Bricks: 4 x 2 = 8 Transport-type: tcp Bricks: Brick1: 1.1.74.246:/home/sda3 Brick2: glt2.network.net:/home/sda3 Brick3: 1.1.74.246:/home/sdb1 Brick4: glt2.network.net:/home/sdb1 Brick5: glt3.network.net:/home/sda3 Brick6: gltclient.network.net:/home/sda3 Brick7: glt3.network.net:/home/sdb1 Brick8: gltclient.network.net:/home/sdb1 Options Reconfigured: performance.io-thread-count: 32 performance.cache-size: 256MB cluster.self-heal-daemon: on [root@glt1]:~# gluster volume status all detail Status of volume: gltvolume -- Brick: Brick 1.1.74.246:/home/sda3 Port : 24009 Online : Y Pid : 1479 File System : ext4 Device : /dev/sda3 Mount Options: rw,noatime Inode Size : 256 Disk Space Free : 179.3GB Total Disk Space : 179.7GB Inode Count : 11968512 Free Inodes : 11901550 -- Brick: Brick glt2.network.net:/home/sda3 Port : 24009 Online : Y Pid : 1589 File System : ext4 Device : /dev/sda3 Mount Options: rw,noatime Inode Size : 256 Disk Space Free : 179.3GB Total Disk Space : 179.7GB Inode Count : 11968512 Free Inodes : 11901550 -- Brick: Brick 1.1.74.246:/home/sdb1 Port : 24010 Online : Y Pid : 1485 File System : ext4 Device : /dev/sdb1 Mount Options: rw,noatime Inode Size : 256 Disk Space Free : 228.8GB Total Disk Space : 229.2GB Inode Count : 15269888 Free Inodes : 15202933 -- Brick: Brick glt2.network.net:/home/sdb1 Port : 24010 Online : Y Pid : 1595 File System : ext4 Device : /dev/sdb1 Mount Options: rw,noatime Inode Size : 256 Disk Space Free : 228.8GB Total Disk Space : 229.2GB Inode Count : 15269888 Free Inodes : 15202933 -- Brick: Brick glt3.network.net:/home/sda3 Port : 24009 Online : Y Pid : 28963 File System : ext4 Device : /dev/sda3 Mount Options: rw,noatime Inode Size : 256 Disk Space Free : 179.3GB Total Disk Space : 179.7GB Inode Count : 11968512 Free Inodes : 11906058 -- Brick: Brick gltclient.network.net:/home/sda3 Port : 24009 Online
[Gluster-users] Gluster speed sooo slow
Hello What am I doing wrong?!? I have a test setup with 4 identical servers with 2 disks each in distribute-replicate 2. All servers are connected to a GB switch. I am experiencing really slow speeds at anything I do. Slow write, slow read, not to mention random write/reads. Here is an example: random-files is a directory with 32768 files with average size 16kb. [root@gltclient]:~# rsync -a /root/speedtest/random-files/ /home/gltvolume/ ^^ This will take more than 3 hours. On any of the servers if I do "iostat" the disks are not loaded at all: https://dl.dropbox.com/u/282332/Screen%20Shot%202012-08-10%20at%201.08.54%20PM.png This is similar result for all servers. Here is an example of simple "ls" command on the content. [root@gltclient]:~# unalias ls [root@gltclient]:~# /usr/bin/time -f "%e seconds" ls /home/gltvolume/ | wc -l 2.81 seconds 5393 almost 3 seconds to display 5000 files?!?! When they are 32,000, the ls will take around 35-45 seconds. This directory is on local disk: [root@gltclient]:~# /usr/bin/time -f "%e seconds" ls /root/speedtest/random-files/ | wc -l 1.45 seconds 32768 [root@gltclient]:~# /usr/bin/time -f "%e seconds" cat /home/gltvolume/* >/dev/null 190.50 seconds [root@gltclient]:~# /usr/bin/time -f "%e seconds" du -sh /home/gltvolume/ 126M/home/gltvolume/ 75.23 seconds Here is the volume information. [root@glt1]:~# gluster volume info Volume Name: gltvolume Type: Distributed-Replicate Volume ID: 16edd852-8d23-41da-924d-710b753bb374 Status: Started Number of Bricks: 4 x 2 = 8 Transport-type: tcp Bricks: Brick1: 1.1.74.246:/home/sda3 Brick2: glt2.network.net:/home/sda3 Brick3: 1.1.74.246:/home/sdb1 Brick4: glt2.network.net:/home/sdb1 Brick5: glt3.network.net:/home/sda3 Brick6: gltclient.network.net:/home/sda3 Brick7: glt3.network.net:/home/sdb1 Brick8: gltclient.network.net:/home/sdb1 Options Reconfigured: performance.io-thread-count: 32 performance.cache-size: 256MB cluster.self-heal-daemon: on [root@glt1]:~# gluster volume status all detail Status of volume: gltvolume -- Brick: Brick 1.1.74.246:/home/sda3 Port : 24009 Online : Y Pid : 1479 File System : ext4 Device : /dev/sda3 Mount Options: rw,noatime Inode Size : 256 Disk Space Free : 179.3GB Total Disk Space : 179.7GB Inode Count : 11968512 Free Inodes : 11901550 -- Brick: Brick glt2.network.net:/home/sda3 Port : 24009 Online : Y Pid : 1589 File System : ext4 Device : /dev/sda3 Mount Options: rw,noatime Inode Size : 256 Disk Space Free : 179.3GB Total Disk Space : 179.7GB Inode Count : 11968512 Free Inodes : 11901550 -- Brick: Brick 1.1.74.246:/home/sdb1 Port : 24010 Online : Y Pid : 1485 File System : ext4 Device : /dev/sdb1 Mount Options: rw,noatime Inode Size : 256 Disk Space Free : 228.8GB Total Disk Space : 229.2GB Inode Count : 15269888 Free Inodes : 15202933 -- Brick: Brick glt2.network.net:/home/sdb1 Port : 24010 Online : Y Pid : 1595 File System : ext4 Device : /dev/sdb1 Mount Options: rw,noatime Inode Size : 256 Disk Space Free : 228.8GB Total Disk Space : 229.2GB Inode Count : 15269888 Free Inodes : 15202933 -- Brick: Brick glt3.network.net:/home/sda3 Port : 24009 Online : Y Pid : 28963 File System : ext4 Device : /dev/sda3 Mount Options: rw,noatime Inode Size : 256 Disk Space Free : 179.3GB Total Disk Space : 179.7GB Inode Count : 11968512 Free Inodes : 11906058 -- Brick: Brick gltclient.network.net:/home/sda3 Port : 24009 Online : Y Pid : 3145 File System : ext4 Device : /dev/sda3 Mount Options: rw,noatime Inode Size : 256 Disk Space Free : 179.3GB Total Disk Space : 179.7GB Inode Count : 11968512 Free Inodes : 11906058 -- Brick: Brick glt3.network.net:/home/sdb1 Port
[Gluster-users] Transport endpoint is not connected
Hi group, I'm in production with gluster for the last 2 weeks. No problems until today. As of today I've got the "Transport endpoint is not connected" problem on the client, maybe once every hour. df: `/services/users/6': Transport endpoint is not connected Here is my setup: I have 1 Client and 2 Servers with 2 Disks each for bricks. Glusterfs 3.3 compiled from source. # gluster volume info Volume Name: freecloud Type: Distributed-Replicate Volume ID: 1cf4804f-12aa-4cd1-a892-cec69fc2cf22 Status: Started Number of Bricks: 2 x 2 = 4 Transport-type: tcp Bricks: Brick1: XX.25.137.252:/mnt/35be42b4-afb3-48a2-8b3c-17a422fd1e15 Brick2: YY.40.3.216:/mnt/7ee4f117-8aee-4cae-b08c-5e441b703886 Brick3: XX.25.137.252:/mnt/9ee7c816-085d-4c5c-9276-fd3dadac6c72 Brick4: YY.40.3.216:/mnt/311399bc-4d55-445d-8480-286c56cf493e Options Reconfigured: cluster.self-heal-daemon: on performance.cache-size: 256MB performance.io-thread-count: 32 features.quota: on Quota is ON but not used - # gluster volume status all detail Status of volume: freecloud -- Brick: Brick XX.25.137.252:/mnt/35be42b4-afb3-48a2-8b3c-17a422fd1e15 Port : 24009 Online : Y Pid : 29221 File System : xfs Device : /dev/sdd1 Mount Options: rw Inode Size : 256 Disk Space Free : 659.7GB Total Disk Space : 698.3GB Inode Count : 732571968 Free Inodes : 730418928 -- Brick: Brick YY.40.3.216:/mnt/7ee4f117-8aee-4cae-b08c-5e441b703886 Port : 24009 Online : Y Pid : 15496 File System : xfs Device : /dev/sdc1 Mount Options: rw Inode Size : 256 Disk Space Free : 659.7GB Total Disk Space : 698.3GB Inode Count : 732571968 Free Inodes : 730410396 -- Brick: Brick XX.25.137.252:/mnt/9ee7c816-085d-4c5c-9276-fd3dadac6c72 Port : 24010 Online : Y Pid : 29227 File System : xfs Device : /dev/sdc1 Mount Options: rw Inode Size : 256 Disk Space Free : 659.9GB Total Disk Space : 698.3GB Inode Count : 732571968 Free Inodes : 730417864 -- Brick: Brick YY.40.3.216:/mnt/311399bc-4d55-445d-8480-286c56cf493e Port : 24010 Online : Y Pid : 15502 File System : xfs Device : /dev/sdb1 Mount Options: rw Inode Size : 256 Disk Space Free : 659.9GB Total Disk Space : 698.3GB Inode Count : 732571968 Free Inodes : 730409337 On server1 I mount the volume and start copying files to it. Server1 is used like storage. 209.25.137.252:freecloud 1.4T 78G 1.3T 6% /home/freecloud One thing to mention is that I have a large list of subdirectories in the main directory and the list keeps getting bigger. client1# ls | wc -l 42424 --- I have one client server that mounts glusterfs and uses the files directly as the files are for low traffic web sites. On the client, there is no gluster daemon, just the mount. client1# mount -t glusterfs rscloud1.domain.net:/freecloud /services/users/6/ This all worked fine for the last 2-3 weeks. Here is a log from the crash client1:/var/log/glusterfs/services-users-6-.log pending frames: frame : type(1) op(RENAME) frame : type(1) op(RENAME) frame : type(1) op(RENAME) frame : type(1) op(RENAME) patchset: git://git.gluster.com/glusterfs.git signal received: 11 time of crash: 2012-07-12 14:51:01 configuration details: argp 1 backtrace 1 dlfcn 1 fdatasync 1 libpthread 1 llistxattr 1 setfsid 1 spinlock 1 epoll.h 1 xattr.h 1 st_atim.tv_nsec 1 package-string: glusterfs 3.3.0 /lib/x86_64-linux-gnu/libc.so.6(+0x32480)[0x7f1e0e9f0480] /services/glusterfs//lib/libglusterfs.so.0(uuid_unpack+0x0)[0x7f1e0f79d760] /services/glusterfs//lib/libglusterfs.so.0(+0x4c526)[0x7f1e0f79d526] /services/glusterfs//lib/libglusterfs.so.0(uuid_utoa+0x26)[0x7f1e0f77ca66] /services/glusterfs//lib/glusterfs/3.3.0/xlator/features/quota.so(quota_rename_cbk+0x308)[0x7f1e09b940c8] /services/glusterfs//lib/glusterfs/3.3.0/xlator/cluster/distribute.so(dht_rename_unlink_cbk+0x454)[0x7f1e09dad264] /services/glusterfs//lib/glusterfs/3.3.0/xlator/cluster/replicate.so(afr_unlink_unwind+0xf7)[0x7f1e09ff23c7] /services/glusterfs//lib/glusterfs/3.3.0/xlator/cluster/replicate.so(afr_unlink_wind_cbk+0xb6)[0x7f1e09ff43d6] /services/glusterfs//lib/glusterfs/3.3.0/xlator/protocol/cli