Re: [Gluster-users] not sure how to troubleshoot SMB / CIFS overload when using GlusterFS
On 07/17/2011 11:19 PM, Ken Randall wrote: Joe, Thank you for your response. After seeing what you wrote, I bumped up the performance.cache-size up to 4096MB, the max allowed, and ran into the same wall. Hmmm ... I wouldn't think that any SMB caching would help in this case, since the same Samba server on top of the raw Gluster data wasn't exhibiting any trouble, or am I deceived? Samba could cache better so it didn't have to hit Gluster so hard. I haven't used strace before, but I ran it on the glusterfs process, and saw a lot of: epoll_wait(3, {{EPOLLIN, {u32=9, u64=9}}}, 257, 4294967295) = 1 readv(9, [{"\200\0\16,", 4}], 1)= 4 readv(9, [{"\0\n;\227\0\0\0\1", 8}], 1) = 8 readv(9, [{"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\31\0\0\0\0\0\0\0\1\0\0\0\0"..., 3620}], 1) = 1436 readv(9, 0xa90b1b8, 1) = -1 EAGAIN (Resource temporarily unavailable) Interesting ... I am not sure why its reporting an EAGAIN for readv, other than it can't fill the vector from the read. And when I ran it on smbd, I saw a constant stream of this kind of activity: getdents(29, /* 25 entries */, 32768) = 840 getdents(29, /* 25 entries */, 32768) = 856 getdents(29, /* 25 entries */, 32768) = 848 getdents(29, /* 24 entries */, 32768) = 856 getdents(29, /* 25 entries */, 32768) = 864 getdents(29, /* 24 entries */, 32768) = 832 getdents(29, /* 25 entries */, 32768) = 832 getdents(29, /* 24 entries */, 32768) = 856 getdents(29, /* 25 entries */, 32768) = 840 getdents(29, /* 24 entries */, 32768) = 832 getdents(29, /* 25 entries */, 32768) = 784 getdents(29, /* 25 entries */, 32768) = 824 getdents(29, /* 25 entries */, 32768) = 808 getdents(29, /* 25 entries */, 32768) = 840 getdents(29, /* 25 entries */, 32768) = 864 getdents(29, /* 25 entries */, 32768) = 872 getdents(29, /* 25 entries */, 32768) = 832 getdents(29, /* 24 entries */, 32768) = 832 getdents(29, /* 25 entries */, 32768) = 840 getdents(29, /* 25 entries */, 32768) = 824 getdents(29, /* 25 entries */, 32768) = 824 getdents(29, /* 24 entries */, 32768) = 864 getdents(29, /* 25 entries */, 32768) = 848 getdents(29, /* 24 entries */, 32768) = 840 Get directory entries. This is the stuff that NTFS is caching for its web server, and it appears Samba is not. Try aio read size = 32768 csc policy = documents dfree cache time = 60 directory name cache size = 10 fake oplocks = yes getwd cache = yes level2 oplocks = yes max stat cache size = 16384 That chunk would get repeated over and over and over again as fast as the screen could go, with the occasional (every 5-10 seconds or so), would you see anything that you'd normally expect to see, such as: close(29) = 0 stat("Storage/01", 0x7fff07dae870) = -1 ENOENT (No such file or directory) write(23, "\0\0\0#\377SMB24\0\0\300\210A\310\0\0\0\0\0\0\0\0\0\0\0\0\1\0d\233"..., 39) = 39 select(38, [5 20 23 27 30 31 35 36 37], [], NULL, {60, 0}) = 1 (in [23], left {60, 0}) read(23, "\0\0\0x", 4) = 4 read(23, "\377SMB2\0\0\0\0\30\7\310\0\0\0\0\0\0\0\0\0\0\0\0\1\0\250P\273\0[8"..., 120) = 120 stat("Storage", {st_mode=S_IFDIR|0755, st_size=1581056, ...}) = 0 stat("Storage/011235", 0x7fff07dad470) = -1 ENOENT (No such file or directory) stat("Storage/011235", 0x7fff07dad470) = -1 ENOENT (No such file or directory) open("Storage", O_RDONLY|O_NONBLOCK|O_DIRECTORY) = 29 fcntl(29, F_SETFD, FD_CLOEXEC) = 0 (The no such file or directory part is expected since some of the image references don't exist.) Ok. It looks like Samba is pounding on GlusterFS metadata (getdents). GlusterFS doesn't really do a great job in this case ... you have to give it help and cache pretty aggressively here. Samba can do this caching to some extent. You might want to enable stat-cache and fast lookups. These have been problematic for us in the past though, and I'd recommend caution. Ken ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users -- Joseph Landman, Ph.D Founder and CEO Scalable Informatics, Inc. email: land...@scalableinformatics.com web : http://scalableinformatics.com http://scalableinformatics.com/sicluster phone: +1 734 786 8423 x121 fax : +1 866 888 3112 cell : +1 734 612 4615 ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] not sure how to troubleshoot SMB / CIFS overload when using GlusterFS
On Sun, Jul 17, 2011 at 10:19:00PM -0500, Ken Randall wrote: > (The no such file or directory part is expected since some of the image > references don't exist.) Wild guess on that: Gluster may work harder at files it doesn't find than files it finds. It's going to look on one side or the other of the replicated file at first, and if it finds the file deliver it. But if it doesn't find the file, wouldn't it then check the other side of the replicated storage to make sure this wasn't a replication error? Might be interesting to run a version of the test where all the images referenced do exist, to see if it's the missing files that are driving up the CPU cycles. Whit ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] not sure how to troubleshoot SMB / CIFS overload when using GlusterFS
Joe, Thank you for your response. After seeing what you wrote, I bumped up the performance.cache-size up to 4096MB, the max allowed, and ran into the same wall. I wouldn't think that any SMB caching would help in this case, since the same Samba server on top of the raw Gluster data wasn't exhibiting any trouble, or am I deceived? I haven't used strace before, but I ran it on the glusterfs process, and saw a lot of: epoll_wait(3, {{EPOLLIN, {u32=9, u64=9}}}, 257, 4294967295) = 1 readv(9, [{"\200\0\16,", 4}], 1)= 4 readv(9, [{"\0\n;\227\0\0\0\1", 8}], 1) = 8 readv(9, [{"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\31\0\0\0\0\0\0\0\1\0\0\0\0"..., 3620}], 1) = 1436 readv(9, 0xa90b1b8, 1) = -1 EAGAIN (Resource temporarily unavailable) And when I ran it on smbd, I saw a constant stream of this kind of activity: getdents(29, /* 25 entries */, 32768) = 840 getdents(29, /* 25 entries */, 32768) = 856 getdents(29, /* 25 entries */, 32768) = 848 getdents(29, /* 24 entries */, 32768) = 856 getdents(29, /* 25 entries */, 32768) = 864 getdents(29, /* 24 entries */, 32768) = 832 getdents(29, /* 25 entries */, 32768) = 832 getdents(29, /* 24 entries */, 32768) = 856 getdents(29, /* 25 entries */, 32768) = 840 getdents(29, /* 24 entries */, 32768) = 832 getdents(29, /* 25 entries */, 32768) = 784 getdents(29, /* 25 entries */, 32768) = 824 getdents(29, /* 25 entries */, 32768) = 808 getdents(29, /* 25 entries */, 32768) = 840 getdents(29, /* 25 entries */, 32768) = 864 getdents(29, /* 25 entries */, 32768) = 872 getdents(29, /* 25 entries */, 32768) = 832 getdents(29, /* 24 entries */, 32768) = 832 getdents(29, /* 25 entries */, 32768) = 840 getdents(29, /* 25 entries */, 32768) = 824 getdents(29, /* 25 entries */, 32768) = 824 getdents(29, /* 24 entries */, 32768) = 864 getdents(29, /* 25 entries */, 32768) = 848 getdents(29, /* 24 entries */, 32768) = 840 That chunk would get repeated over and over and over again as fast as the screen could go, with the occasional (every 5-10 seconds or so), would you see anything that you'd normally expect to see, such as: close(29) = 0 stat("Storage/01", 0x7fff07dae870) = -1 ENOENT (No such file or directory) write(23, "\0\0\0#\377SMB24\0\0\300\210A\310\0\0\0\0\0\0\0\0\0\0\0\0\1\0d\233"..., 39) = 39 select(38, [5 20 23 27 30 31 35 36 37], [], NULL, {60, 0}) = 1 (in [23], left {60, 0}) read(23, "\0\0\0x", 4) = 4 read(23, "\377SMB2\0\0\0\0\30\7\310\0\0\0\0\0\0\0\0\0\0\0\0\1\0\250P\273\0[8"..., 120) = 120 stat("Storage", {st_mode=S_IFDIR|0755, st_size=1581056, ...}) = 0 stat("Storage/011235", 0x7fff07dad470) = -1 ENOENT (No such file or directory) stat("Storage/011235", 0x7fff07dad470) = -1 ENOENT (No such file or directory) open("Storage", O_RDONLY|O_NONBLOCK|O_DIRECTORY) = 29 fcntl(29, F_SETFD, FD_CLOEXEC) = 0 (The no such file or directory part is expected since some of the image references don't exist.) Ken ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] Tools for the admin
On Mon, Jul 18, 2011 at 03:48:11AM +0100, Dan Bretherton wrote: > I had a closer look at this. It is the output of gfid-mismatch > causing the problem; paths are shown with a trailing colon as in > GlusterFS log files. The "cut -f1 -d:" to extract the paths > obviously removes all the colons. I'm sure there is an easy way to > remove the trailing ':' from filenames but I can't think of one off > hand (and it is 3:30AM). Something along the lines of "sed 's/.$//", as in: dog="doggy:"; echo $dog | sed 's/.$//' That would remove any last character. To just get ":": dog="doggy:"; echo $dog | sed 's/:$//' (No, I didn't know that. I googled.) Whit ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] Tools for the admin
On 18/07/11 02:05, Dan Bretherton wrote: -- Message: 1 Date: Fri, 8 Jul 2011 15:54:19 -0700 From: Vikas Gorur Subject: [Gluster-users] GFID mismatches and tools to fix them To: gluster-users@gluster.org Cc: gluster-de...@gluster.com Message-ID: Content-Type: text/plain; charset="iso-8859-1" Hi everyone, As some of you may know, the presence of files/directories which have different GFID's on different backends can cause the GlusterFS client to throw up errors, or even hang. Among others, we've had users of Enomaly affected by this issue. A little background on GFIDs: Each file/directory on a Gluster volume has a unique 128-bit number associated with it called the GFID. This is true regardless of Gluster configuration (distribute or distribute/replicate). One inode, one GFID. The GFID is stored on the backend as the value of the extended attribute "trusted.gfid". Under normal circumstances, the value of this attribute is the same on all the backend bricks. However, certain conditions can cause the value on one or more of the bricks to differ from that on the other bricks. This causes the GlusterFS client to become confused and throw up errors. This applies to both the 3.1.5 and 3.2.1 versions of the filesystem, and previous versions in those series. In a future bugfix release GlusterFS will fix this issue automatically when it detects it. Until then, if you encounter this problem, please use the following set of tools to manually fix it on the backends: https://github.com/vikasgorur/gfid The repository contains the tools as well as a README that explains how to use them. Your questions and comments are welcome. Dear Vikas- Thanks for provding these tools. Unfortunately I think I have found a problem with the procedure outlined in the README - I don't think it works for files with names containing the colon character. I still have a lot of gfid errors in my logs after running the gfid tools on one volume, and all the filenames have one or more ':' characters. There are 1677 files still affected with "gfid different" so I don't think it can be a coincidence. Regards -Dan. I had a closer look at this. It is the output of gfid-mismatch causing the problem; paths are shown with a trailing colon as in GlusterFS log files. The "cut -f1 -d:" to extract the paths obviously removes all the colons. I'm sure there is an easy way to remove the trailing ':' from filenames but I can't think of one off hand (and it is 3:30AM). -Dan. ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] not sure how to troubleshoot SMB / CIFS overload when using GlusterFS
Thanks for the reply, Whit! Perfectly reasonable first question. The websites have user-generated content (think CMS), where people could put in that kind of content. The likelihood of such a scenario is slim-to-none, but I'd rather not have that kind of vulnerability in the first place. And yes, we could also add in validation and/or stripping of content that is outside the bounds of normal, but the main reason I bring up this Page of Death scenario is that I worry that it may be indicative of a weakness in the system that a different kind of load pattern could trigger this kind of hang. To answer the second question, running top on the Linux side during the Page of Death (with nothing else running) I get a CPU % spike of anywhere between 80-110% on glusterfsd, and 20% on glusterfs, with close to 22GB of memory free. The machines are 16-core apiece, though. On the Windows side there is next to no effect on CPU, memory, or network utilization. Ken On Sun, Jul 17, 2011 at 8:06 PM, Whit Blauvelt wrote: > On Sun, Jul 17, 2011 at 07:56:57PM -0500, Ken Randall wrote: > > > However, as a part of a different suite of tests is a Page of Death, > which > > contains tens of thousands of image references on a single page. > > Off topic response: Is there ever in real production any page, anywhere, > tht contains tens of thousands of image references? I'm all for testing at > the extreme, and capacity that goes far beyond what's needed for practical > purposes. Is that what this is, or do you anticipate real-life Page o' > Death > scenarios? > > Closer to the topic: What's going on with the load on the various systems. > On the Linux side, have you watched each of them with something like htop? > > Whit > > ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] not sure how to troubleshoot SMB / CIFS overload when using GlusterFS
On Sun, Jul 17, 2011 at 07:56:57PM -0500, Ken Randall wrote: > However, as a part of a different suite of tests is a Page of Death, which > contains tens of thousands of image references on a single page. Off topic response: Is there ever in real production any page, anywhere, tht contains tens of thousands of image references? I'm all for testing at the extreme, and capacity that goes far beyond what's needed for practical purposes. Is that what this is, or do you anticipate real-life Page o' Death scenarios? Closer to the topic: What's going on with the load on the various systems. On the Linux side, have you watched each of them with something like htop? Whit ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] Tools for the admin
-- Message: 1 Date: Fri, 8 Jul 2011 15:54:19 -0700 From: Vikas Gorur Subject: [Gluster-users] GFID mismatches and tools to fix them To: gluster-users@gluster.org Cc: gluster-de...@gluster.com Message-ID: Content-Type: text/plain; charset="iso-8859-1" Hi everyone, As some of you may know, the presence of files/directories which have different GFID's on different backends can cause the GlusterFS client to throw up errors, or even hang. Among others, we've had users of Enomaly affected by this issue. A little background on GFIDs: Each file/directory on a Gluster volume has a unique 128-bit number associated with it called the GFID. This is true regardless of Gluster configuration (distribute or distribute/replicate). One inode, one GFID. The GFID is stored on the backend as the value of the extended attribute "trusted.gfid". Under normal circumstances, the value of this attribute is the same on all the backend bricks. However, certain conditions can cause the value on one or more of the bricks to differ from that on the other bricks. This causes the GlusterFS client to become confused and throw up errors. This applies to both the 3.1.5 and 3.2.1 versions of the filesystem, and previous versions in those series. In a future bugfix release GlusterFS will fix this issue automatically when it detects it. Until then, if you encounter this problem, please use the following set of tools to manually fix it on the backends: https://github.com/vikasgorur/gfid The repository contains the tools as well as a README that explains how to use them. Your questions and comments are welcome. Dear Vikas- Thanks for provding these tools. Unfortunately I think I have found a problem with the procedure outlined in the README - I don't think it works for files with names containing the colon character. I still have a lot of gfid errors in my logs after running the gfid tools on one volume, and all the filenames have one or more ':' characters. There are 1677 files still affected with "gfid different" so I don't think it can be a coincidence. Regards -Dan. ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] not sure how to troubleshoot SMB / CIFS overload when using GlusterFS
On 07/17/2011 08:56 PM, Ken Randall wrote: You may be asking, why am I asking here instead of on a Samba group, or even a Windows group? Here's why: My control is that I have a Windows file server that I can swap in Gluster's place, and I'm able to load that page without it blinking an eye (it actually becomes a test of the computer that the browser is on). It does not affect any of the web servers' in the slightest. My second control is that I have exported the raw Gluster data directory as an SMB share (with the same exact Samba configuration as the Gluster one), and it performs equally as well as the Windows file server. I can load the Page of Death with no consequence. NTFS with SMB sharing caches everything. First page load may take a bit of time, but subsequent will be running from data stored in RAM. You can adjust SMB caching and Gluster caching as needed. I've pushed IO-threads all the way to the maximum 64 without any benefit. I can't see anything noteworthy in the Gluster or Samba logs, but perhaps I am not sure what to look for. Not likely your issue. More probably its a Gluster cache size coupled with some CIFS tuning you need. Thank you to anybody who can point me the right direction. I am hoping I don't have to dive into Wireshark or tcpdump territory, but I'm open if you can guide the way! ;) You might need to strace -P the slow servers. Would help to know what calls they are stuck on. -- Joseph Landman, Ph.D Founder and CEO Scalable Informatics, Inc. email: land...@scalableinformatics.com web : http://scalableinformatics.com http://scalableinformatics.com/sicluster phone: +1 734 786 8423 x121 fax : +1 866 888 3112 cell : +1 734 612 4615 ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
[Gluster-users] not sure how to troubleshoot SMB / CIFS overload when using GlusterFS
I'll try to keep it brief, I've been testing GlusterFS for the last month or so. My production setup will be more complex than what I'm listing below, but I've whittled things down to where the below setup will cause the problem to happen. I'm running GlusterFS 3.2.2 on two CentOS 5.6 boxes in a replicated volume. I am connecting to it with a Windows Server 2008 R2 box over an SMB share. Basically, the web app portion runs locally on the Windows box, but content (e.g. HTML templates, images, CSS files, JS, etc.) is being pulled from the Gluster volume. I've performed a fair degree of load testing on the setup so far, scaling up the load to nearly four times what our normal production environment sees in primetime, and it seems to handle it fine. We run tens of thousands of websites, so this is pretty significant that it's able to handle that. However, as a part of a different suite of tests is a Page of Death, which contains tens of thousands of image references on a single page. All I have to do is load that page for a few seconds, and it will grind my web server's SMB connection to a near complete standstill. I can close the browser after just a few seconds, and it still takes several minutes for the web server to respond to any requests at all. Connecting to the share over Explorer is extremely slow from that same machine. (I can connect to that same share from another machine, which is an export of the same exact GlusterFS mount, and it is just fine. Similarly, accessing the Gluster mount on the Linux boxes shows zero problems at all, it's as happy to respond to requests as ever.) Even if I scale it out to a swath of web servers, loading that single page, one time, for just a few seconds will freeze every single web server, making every website on the system inaccessible. You may be asking, why am I asking here instead of on a Samba group, or even a Windows group? Here's why: My control is that I have a Windows file server that I can swap in Gluster's place, and I'm able to load that page without it blinking an eye (it actually becomes a test of the computer that the browser is on). It does not affect any of the web servers' in the slightest. My second control is that I have exported the raw Gluster data directory as an SMB share (with the same exact Samba configuration as the Gluster one), and it performs equally as well as the Windows file server. I can load the Page of Death with no consequence. I've pushed IO-threads all the way to the maximum 64 without any benefit. I can't see anything noteworthy in the Gluster or Samba logs, but perhaps I am not sure what to look for. Thank you to anybody who can point me the right direction. I am hoping I don't have to dive into Wireshark or tcpdump territory, but I'm open if you can guide the way! ;) Ken ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] hardware raid controller
Hello again, I think the job to do your raid controller is the part of your OS. Glusters serves upon your file system nothing else. Gluster 3.2 is working on my raid controller (raid 5 1 spare disk) without any problems. On Fri, 15 Jul 2011 10:55:11 +0200, Derk Roesink wrote: > Hello! > > Im trying to install my first Gluster Storage Platform server. > > It has a Jetway JNF99FL-525-LF motherboard with an internal raid > controller (based on a Intel ICH9R chipset) which has 4x 1tb drives for > data that i would like to run in a RAID5 configuration > > It seems Gluster doesnt support the raid controller.. Because i still see > the 4 disks as 'servers' in the WebUI. > > Any ideas?! > > Kind Regards, > > Derk > > > > ___ > Gluster-users mailing list > Gluster-users@gluster.org > http://gluster.org/cgi-bin/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] glusterfs and pacemaker
On Friday 15 July 2011 13:12:02 Marcel Pennewiß wrote: > > My idea is, that pacemaker starts and monitors the glusterfs mountpoints > > and migrates some resources to the remaining node if one or more > > mountpoint(s) fails. > > For using mountpoints, please have a look at OCF Filesystem agent. Uwe informed me (via PM) that this didn't work - we did not use this until now. After some investigation you'll see that ocf::Filesystem did not detect/work with glusterfs-shares :( A few changes are necessary to create a basic support for glusterfs. @Uwe: Please have a look at [1] and try to patch your "Filesystem"-OCF-script (which maybe located in /usr/lib/ocf/resource.d/heartbeat). [1] http://subversion.fem.tu-ilmenau.de/repository/fem-overlay/trunk/sys- cluster/resource-agents/files/filesystem-glusterfs-support.patch best regards Marcel ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users