Re: [Gluster-users] [Gluster-devel] Freenode takeover and GlusterFS IRC channels
And why don't you just do the right thing and drop these semi-closed-source stuff and use XMPP, about the only free/GPLed messenger service. It has: - no central provider - no central servers - server software free for anyone to install - lots of free xmpp services around - crypted or not crypted on user choice -> https://xmpp.org/ You use email, why? -- Regards, Stephan On Mon, 7 Jun 2021 21:41:11 +0530 Amar Tumballi wrote: > We (at least many developers and some users) actively use Slack at > https://gluster.slack.com > > While I agree that it's not a free/open alternative to IRC, it does get > many questions answered, and also gets communication happen related to the > project. > > Regards, > Amar > > > On Mon, 7 Jun, 2021, 9:27 pm Jordan Erickson, < > jerick...@logicalnetworking.net> wrote: > > > I'm relatively new to the community but I would vote for having a point > > of presence on libera.chat, or OFTC as some other F/OSS projects are > > moving there as an alternative. I use IRC daily for supporting my own > > projects as well as related projects such as GlusterFS. Personally I > > hadn't heard of Matrix until the whole Freenode fiasco happened, so I > > would imagine others may be in the same boat. Anyway, just my $0.02 :) > > > > > > Cheers, > > Jordan Erickson > > > > > > On 6/7/21 5:51 AM, Anoop C S wrote: > > > Hi all, > > > > > > I hope many of us are aware of the recent changes that happened at > > > Freenode IRC network(in case you are not, feel free to look into > > > details based on various resignation letters from long-time then > > > Freenode staff starting with [1]). In the light of this take over > > > situation, many open source communities have moved over to its > > > replacement i.e, libera.chat[2]. > > > > > > Now I would like to open this up to GlusterFS community to think about > > > moving forward with our current IRC channels(#gluster, #gluster-dev and > > > #gluster-meeting) on Freenode. How important are those channels for > > > GlusterFS project? How about moving over to libera.chat in case we > > > stick to IRC communication? > > > > > > Let's discuss and conclude on the way forward.. > > > > > > Note:- Matrix[3] platform is also an option nowadays and we do have a > > > Gluster room(#gluster:matrix.org) there ! welcome..welcome :-) > > > > > > Regards, > > > Anoop C S > > > > > > > > > [1] https://fuchsnet.ch/freenode-resign-letter.txt > > > [2] https://libera.chat/ > > > [3] https://matrix.org/ > > > > > > > > > > > > > > > > > > Community Meeting Calendar: > > > > > > Schedule - > > > Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC > > > Bridge: https://meet.google.com/cpu-eiue-hvk > > > Gluster-users mailing list > > > Gluster-users@gluster.org > > > https://lists.gluster.org/mailman/listinfo/gluster-users > > > > > > > -- > > Jordan Erickson (PGP: 0x78DD41CB) > > Logical Networking Solutions, 707-636-5678 > > > > > > > > > > Community Meeting Calendar: > > > > Schedule - > > Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC > > Bridge: https://meet.google.com/cpu-eiue-hvk > > Gluster-users mailing list > > Gluster-users@gluster.org > > https://lists.gluster.org/mailman/listinfo/gluster-users > > Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://meet.google.com/cpu-eiue-hvk Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Replica 3 scale out and ZFS bricks
And Joe is the only man on the planet that thinks that nfs is fast because it fell from heaven and did not get better when it was moved from userspace to kernel. Of course this was done because someone had lots of spare time to waste ... How long will it take until it is accepted that none of the people programming glusterfs has the skills to do it right, and this is the simple truth why it this project is lost? Ask yourself why redhat dumped it. On Thu, 17 Sep 2020 04:18:20 -0700 Joe Julian wrote: > He's a troll that has wasted 10 years trying to push his unfounded belief > that moving to an in-kernel driver would give significantly more performance. > > On September 17, 2020 3:21:01 AM PDT, Alexander Iliev > wrote: > >On 9/17/20 3:37 AM, Stephan von Krawczynski wrote: > >> Nevertheless you will break performance anyway by deploying > >user-space > >> crawling-slow glusterfs... outcome of 10 wasted years of development > >in the > >> wrong direction. > > > >Genuinely asking - what would you recommend instead of GlusterFS for a > >highly available, horizontally scalable storage system? > > > >Best regards, > >-- > >alexander iliev Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://bluejeans.com/441850968 Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Replica 3 scale out and ZFS bricks
On Thu, 17 Sep 2020 12:21:01 +0200 Alexander Iliev wrote: > On 9/17/20 3:37 AM, Stephan von Krawczynski wrote: > > Nevertheless you will break performance anyway by deploying user-space > > crawling-slow glusterfs... outcome of 10 wasted years of development in the > > wrong direction. > > Genuinely asking - what would you recommend instead of GlusterFS for a > highly available, horizontally scalable storage system? I was a glusterfs user for years waiting for significant performance improvements. But they never arrived. Instead the software got from a fs driver to a userspace collection of tools of a fs emulation with complete bogus configs without the slightest path to a working and overall useable setup. IOW it was developed into a deadend. And honestly I would be the first to deploy it again if it came back to where it was, a network fs exporting a linux-fs with no additional bs. The day where this changed to something needing to copy every file onto over glusterfs to work "properly" was the first day of its death. The original idea was great, the implementation is useless. And yes, there is a lack of a HA network fs (which is indeed one, not something like ceph). This is why I even feel more sorry about the whole ongoings. It could have been a big hit. But it failed miserably. My last trust is in Matt Dillon and Hammer2. Yes, this is a longterm believe ... -- Regards Stephan > Best regards, > -- > alexander iliev Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://bluejeans.com/441850968 Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] State of Gluster project
On Wed, 17 Jun 2020 00:06:33 +0300 Mahdi Adnan wrote: > [gluster going down ] I am following this project for quite some years now, probably longer than most of the people nowadays on the list. The project started with the brilliant idea of making a fs on top of classical fs's distributed over several hardware pieces without need to re-copy data for entering or leaving the gluster. (I thought) it started as a proof-of-concept fs on fuse with the intention to turn into kernel-space as soon as possible to get the performance that it should have for a fs. After five years of waiting (and using) I declared the project dead for our use (about five years ago) because it evolved more and more to bloatware. And I do think that Red Hat understood that finally (too), and what you mentioned is just the outcome of that. I really hate the way this project took, because for me it was visible from the very start that it is a dead-end. After all those years it is bloatware on fuse. And I feel very sorry for that brilliant idea that it once was. _FS IN USERSPACE IS SH*T_ - understand that. -- Regards, Stephan Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://bluejeans.com/441850968 Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] State of Gluster project
On Thu, 18 Jun 2020 13:27:19 -0400 Alvin Starr wrote: > > [me] > This is an amazingly unreasonable comment. > First off ALL distributed file systems are slower than non-distributed > file systems. Obviously you fail to understand my point: the design of glusterfs implies that it can be as fast as a non-distributed fs. If you have not understood this by now you should stay another 10 years on this list. As glusterfs should only read from a single node, and write concurrently to all nodes it must only be slower than a non-distributed fs if your network is not designed according to the needed paths. Glusterfs being slow is not by design but by implementation. > Second ALL network file systems are slower than local hardware. Uh, really no comment. > Kernel inclusion does not make for a radically faster implementation. > I have worked with Kernel included NFS and user space NFS > implementations and the performance differences have not been all that > amazingly radical. Probably you are not talking about linux based NFS experience. The last time we used userspace nfs (very long ago) on linux it was slow _and_ amazingly buggy. Many thanks to Neil that he made kernel nfs what it is today. > If your so convinced that a kernel included file system is the answer > you are free to implement a solution. Well, people were _paid_ for years now and came up with this mess. And you want me to implement it for free in what time? If you give me the bucks that were paid during the last decade you can be sure the solution is a lot better, easier to configure, and thought-through. > I am sure the project maintainers would love to have someone come along > and improve the code. Yes, we perfectly agree on that. -- Regards, Stephan Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://bluejeans.com/441850968 Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] State of Gluster project
On Thu, 18 Jun 2020 07:40:36 -0700 Joe Julian wrote: > You're still here and still hurt about that? It was never intended to be in > kernel. It was always intended to run in userspace. After all these years I > thought you'd be over that by now. Top Poster ;-) And in fact, it's not true. The clear message to me once was: we are not able to make a kernel version. Which I understood as: we have not the knowledge to do that. Since that was quite some time before Red Hat stepped in there was still hope that some day someone capable may come ... Since 2009 when I entered the list there was not a single month where there were no complaints about gluster being slow. I wonder if you could accept after 11 years and the projects near death now that I was right from the very first day. -- Regards, Stephan Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://bluejeans.com/441850968 Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] State of Gluster project
On Thu, 18 Jun 2020 13:06:51 +0400 Dmitry Melekhov wrote: > 18.06.2020 12:54, Stephan von Krawczynski пишет: > > > > _FS IN USERSPACE IS SH*T_ - understand that. > > > > we use qemu and it uses gfapi... :-) And exactly this kind of "insight" is base of my critics. gfapi is _userspace_ on client (given, without fuse), but does not at all handle the basic glusterfs problem: the need to go through _userspace_ on _server_. Simply look at the docs here and understand where the work should have been done: https://www.humblec.com/libgfapi-interface-glusterfs/ On the server you have to go from kernel-space network to userspace glusterfs back to kernel-space underlying fs. So gfapi only eliminates one of two major problems. Comparing performance to NFS on ZFS shows the flaw. If it was implemented like it should you would have almost _no_ difference, because you would be able to split up the two network paths to gluster servers (for a setup with two) on different switches and network cards on client. So for reading it should be as fast, for writing you may calculate a (very) small loss. -- Regards, Stephan Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://bluejeans.com/441850968 Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] State of Gluster project
On Wed, 17 Jun 2020 00:06:33 +0300 Mahdi Adnan wrote: > [gluster going down ] I am following this project for quite some years now, probably longer than most of the people nowadays on the list. The project started with the brilliant idea of making a fs on top of classical fs's distributed over several hardware pieces without need to re-copy data for entering or leaving the gluster. (I thought) it started as a proof-of-concept fs on fuse with the intention to turn into kernel-space as soon as possible to get the performance that it should have for a fs. After five years of waiting (and using) I declared the project dead for our use (about five years ago) because it evolved more and more to bloatware. And I do think that Red Hat understood that finally (too), and what you mentioned is just the outcome of that. I really hate the way this project took, because for me it was visible from the very start that it is a dead-end. After all those years it is bloatware on fuse. And I feel very sorry for that brilliant idea that it once was. _FS IN USERSPACE IS SH*T_ - understand that. -- Regards, Stephan Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://bluejeans.com/441850968 Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] FW: Performance with Gluster+Fuse is 60x slower then Gluster+NFS ?
On Thu, 18 Feb 2016 10:14:59 +1000 Dan Monswrote: > Without knowing the details, I'm putting my money on cache. > > Choosing how to mount Gluster is workload dependent. If you're doing > a lot of small files with single threaded writes, I suggest NFS. Your > client's nfscache will dramatically improve performance from the > end-user's point of view. > > If you're doing heavy multi-threaded reads and writes, and you have > very good bandwidth from your client (e.g.: 10GbE) FUSE+GlusterFS is > better, as it allows your client to talk to all Gluster nodes. > [...] Dan, forgive my jump in this matter which is obvious to everyone using glusterfs for years: fuse+glusterfs is simply sh*t talking of performance. There is absolutely nobody whose setup wouldn't be at least two (to several hundred) times faster using simple NFS. So Stefans numbers are no surprise. I really cannot believe you are trying to argue for fuse. It is completely clear that fuse is only used because of the incompetence to write a kernel-space driver (and this was said years ago by the people who originally wrote the whole lot). You probably can find this answer to my question in the archives of this (or the devel) list years back. And because of this I pretty much stopped writing here, I mean you cannot blame someone for not being skilled enough to produce the right code in a GPL situation. The basic concept is good, the implementation is just a mess. And that's it. Regards, Stephan > If you are using FUSE+GlusterFS, on the gluster nodes themselves, > experiment with the "performance.write-behind-window-size" and > "performance.cache-size" options. Note that these will affect the > cache used by the clients, so don't set them so high as to exhaust the > RAM of any client connecting (or, for low-memory clients, use NFS > instead). > > Gluster ships with conservative defaults for cache, which is a good > thing. It's up to the user to tweak for their optimal needs. > > There's no right or wrong answer here. Experiment with NFS and > various cache allocations with FUSE+GlusterFS, and see how you go. > And again, consider your workloads, and whether or not they're taking > full advantage of the FUSE client's ability to deal with highly > parallel workloads. > > -Dan > > Dan Mons - VFX Sysadmin > Cutting Edge > http://cuttingedge.com.au > > > On 18 February 2016 at 08:56, Stefan Jakobs wrote: > > Van Renterghem Stijn: > >> Interval2 > >> Block Size: 1b+ 16b+ > >> 32b+ > >> No. of Reads:0 0 > >> 0 No. of Writes: 34225 > >>575 > >> > >>Block Size: 64b+ 128b+ > >> 256b+ No. of Reads:0 0 > >>0 No. of Writes: 143 898 > >> 118 > >> > >>Block Size:512b+1024b+ > >> 2048b+ No. of Reads:1 4 > >>11 No. of Writes: 82 0 > >> 0 > >> > >>Block Size: 4096b+8192b+ > >> 16384b+ No. of Reads: 1131 > >> 39 No. of Writes:0 0 > >> 0 > >> > >>Block Size: 32768b+ 65536b+ > >> 131072b+ No. of Reads: 59 148 > >> 555 No. of Writes:0 0 > >> 0 > >> > >> %-latency Avg-latency Min-Latency Max-Latency No. of calls > >> Fop - --- --- --- > >> 0.00 0.00 us 0.00 us 0.00 us 1 > >> FORGET 0.00 0.00 us 0.00 us 0.00 us201 > >> RELEASE 0.00 0.00 us 0.00 us 0.00 us 54549 > >> RELEASEDIR 0.00 47.00 us 47.00 us 47.00 us 1 > >> REMOVEXATTR 0.00 94.00 us 74.00 us 114.00 us 2 > >> XATTROP 0.00 191.00 us 191.00 us 191.00 us 1 > >> TRUNCATE 0.00 53.50 us 35.00 us 74.00 us 4 > >> STATFS 0.00 79.67 us 70.00 us 91.00 us 3 > >> RENAME 0.00 37.33 us 27.00 us 68.00 us 15 > >> INODELK 0.00 190.67 us 116.00 us 252.00 us 3 > >> UNLINK 0.00 28.83 us 8.00 us 99.00 us 30 > >> ENTRYLK 0.00 146.33 us 117.00 us 188.00 us 6 > >> CREATE 0.00 37.63 us 12.00 us 73.00 us 84 > >> READDIR 0.00 23.75 us 8.00 us 75.00 us198 > >> FLUSH 0.00 65.33 us 42.00 us 141.00 us204 > >> OPEN 0.01 45.78 us 11.00 us
Re: [Gluster-users] 40 gig ethernet
On Fri, 14 Jun 2013 14:35:26 -0700 Bryan Whitehead dri...@megahappy.net wrote: GigE is slower. Here is ping from same boxes but using the 1GigE cards: [root@node0.cloud ~]# ping -c 10 10.100.0.11 PING 10.100.0.11 (10.100.0.11) 56(84) bytes of data. 64 bytes from 10.100.0.11: icmp_seq=1 ttl=64 time=0.628 ms 64 bytes from 10.100.0.11: icmp_seq=2 ttl=64 time=0.283 ms 64 bytes from 10.100.0.11: icmp_seq=3 ttl=64 time=0.307 ms 64 bytes from 10.100.0.11: icmp_seq=4 ttl=64 time=0.275 ms 64 bytes from 10.100.0.11: icmp_seq=5 ttl=64 time=0.313 ms 64 bytes from 10.100.0.11: icmp_seq=6 ttl=64 time=0.278 ms 64 bytes from 10.100.0.11: icmp_seq=7 ttl=64 time=0.309 ms 64 bytes from 10.100.0.11: icmp_seq=8 ttl=64 time=0.197 ms 64 bytes from 10.100.0.11: icmp_seq=9 ttl=64 time=0.267 ms 64 bytes from 10.100.0.11: icmp_seq=10 ttl=64 time=0.187 ms --- 10.100.0.11 ping statistics --- 10 packets transmitted, 10 received, 0% packet loss, time 9000ms rtt min/avg/max/mdev = 0.187/0.304/0.628/0.116 ms Note: The Infiniband interfaces have a constant load of traffic from glusterfs. The Nic cards comparatively have very little traffic. Uh, you should throw away your GigE switch. Example: # ping 192.168.83.1 PING 192.168.83.1 (192.168.83.1) 56(84) bytes of data. 64 bytes from 192.168.83.1: icmp_seq=1 ttl=64 time=0.310 ms 64 bytes from 192.168.83.1: icmp_seq=2 ttl=64 time=0.199 ms 64 bytes from 192.168.83.1: icmp_seq=3 ttl=64 time=0.119 ms 64 bytes from 192.168.83.1: icmp_seq=4 ttl=64 time=0.115 ms 64 bytes from 192.168.83.1: icmp_seq=5 ttl=64 time=0.099 ms 64 bytes from 192.168.83.1: icmp_seq=6 ttl=64 time=0.082 ms 64 bytes from 192.168.83.1: icmp_seq=7 ttl=64 time=0.091 ms 64 bytes from 192.168.83.1: icmp_seq=8 ttl=64 time=0.096 ms 64 bytes from 192.168.83.1: icmp_seq=9 ttl=64 time=0.097 ms 64 bytes from 192.168.83.1: icmp_seq=10 ttl=64 time=0.095 ms 64 bytes from 192.168.83.1: icmp_seq=11 ttl=64 time=0.097 ms 64 bytes from 192.168.83.1: icmp_seq=12 ttl=64 time=0.102 ms 64 bytes from 192.168.83.1: icmp_seq=13 ttl=64 time=0.103 ms 64 bytes from 192.168.83.1: icmp_seq=14 ttl=64 time=0.108 ms 64 bytes from 192.168.83.1: icmp_seq=15 ttl=64 time=0.098 ms 64 bytes from 192.168.83.1: icmp_seq=16 ttl=64 time=0.093 ms 64 bytes from 192.168.83.1: icmp_seq=17 ttl=64 time=0.099 ms 64 bytes from 192.168.83.1: icmp_seq=18 ttl=64 time=0.102 ms 64 bytes from 192.168.83.1: icmp_seq=19 ttl=64 time=0.092 ms 64 bytes from 192.168.83.1: icmp_seq=20 ttl=64 time=0.111 ms 64 bytes from 192.168.83.1: icmp_seq=21 ttl=64 time=0.112 ms 64 bytes from 192.168.83.1: icmp_seq=22 ttl=64 time=0.099 ms 64 bytes from 192.168.83.1: icmp_seq=23 ttl=64 time=0.092 ms 64 bytes from 192.168.83.1: icmp_seq=24 ttl=64 time=0.102 ms 64 bytes from 192.168.83.1: icmp_seq=25 ttl=64 time=0.108 ms ^C --- 192.168.83.1 ping statistics --- 25 packets transmitted, 25 received, 0% packet loss, time 23999ms rtt min/avg/max/mdev = 0.082/0.112/0.310/0.047 ms That is _loaded_. -- Regards, Stephan ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] 40 gig ethernet
On Fri, 14 Jun 2013 12:13:53 -0700 Bryan Whitehead dri...@megahappy.net wrote: I'm using 40G Infiniband with IPoIB for gluster. Here are some ping times (from host 172.16.1.10): [root@node0.cloud ~]# ping -c 10 172.16.1.11 PING 172.16.1.11 (172.16.1.11) 56(84) bytes of data. 64 bytes from 172.16.1.11: icmp_seq=1 ttl=64 time=0.093 ms 64 bytes from 172.16.1.11: icmp_seq=2 ttl=64 time=0.113 ms 64 bytes from 172.16.1.11: icmp_seq=3 ttl=64 time=0.163 ms 64 bytes from 172.16.1.11: icmp_seq=4 ttl=64 time=0.125 ms 64 bytes from 172.16.1.11: icmp_seq=5 ttl=64 time=0.125 ms 64 bytes from 172.16.1.11: icmp_seq=6 ttl=64 time=0.125 ms 64 bytes from 172.16.1.11: icmp_seq=7 ttl=64 time=0.198 ms 64 bytes from 172.16.1.11: icmp_seq=8 ttl=64 time=0.171 ms 64 bytes from 172.16.1.11: icmp_seq=9 ttl=64 time=0.194 ms 64 bytes from 172.16.1.11: icmp_seq=10 ttl=64 time=0.115 ms --- 172.16.1.11 ping statistics --- 10 packets transmitted, 10 received, 0% packet loss, time 8999ms rtt min/avg/max/mdev = 0.093/0.142/0.198/0.035 ms What you like to say is that there is no significant difference compared to GigE, right? Anyone got a ping between two kvm-qemu virtio-net cards at hand? -- Regards, Stephan ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Rebalancing with no new bricks.
On Wed, 12 Jun 2013 09:04:30 -0400 Jeff Darcy jda...@redhat.com wrote: [...] that need to be moved, it shouldn't be too hard to combine this little bag of tricks into a solution that meets your needs. Just let me know if you'd like me to assist. The true question is indeed: why does he need tricks at all to come to something obvious for humans: a way of distributing files over the glusterfs so that full means really all bricks are full. It cannot be the right way to design software (for humans) so that they have to adapt to the software. Instead the software should be able to adapt to the users' needs and situation. It is very obvious today that bricks can be of different size. In fact I always thought it would be a big advantage of glusterfs to be able to use what's already there and make more out of it (just as linux did from the first day on). Which means for me: 1) It must be easy to deploy to an already filled fileserver = no need to copy data over onto the new glusterfs (soft migration). 2) whatever layout the bricks are glusterfs must be able to follow the obvious: if there is space left then use it. 3) The data must be left accessible even if glusterfs is not used on the bricks any longer - without copying back. -- Regards, Stephan ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Rebalancing with no new bricks.
On Wed, 12 Jun 2013 09:57:15 -0400 Jeff Darcy jda...@redhat.com wrote: On 06/12/2013 09:46 AM, Stephan von Krawczynski wrote: The true question is indeed: why does he need tricks at all to come to something obvious for humans: a way of distributing files over the glusterfs so that full means really all bricks are full. That's my view too. I keep trying. 3) The data must be left accessible even if glusterfs is not used on the bricks any longer - without copying back. This part is already true in practically all cases. You can ignore the .glusterfs directory and extra xattrs, or nuke them, and you have a perfectly normal file/directory structure that's usable as-is. The exceptions are if you use striping or erasure coding, but (like RAID) those are fundamentally ways of slicing and dicing data across storage units so some reassembly would be necessary. I only tried to list _the_ major advantages glusterfs could/should have over almost all other competitors. Of special importance is 1) and 3) because it allows everyone to go and try without having to fiddle around with tons of data. 2) is convenience, something a good piece of software should deliver :-) -- Regards, Stephan ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Fwd: [Gluster-devel] glusterfs-3.3.2qa1 released
On Sat, 13 Apr 2013 23:47:21 +0530 Vijay Bellur vbel...@redhat.com wrote: On 04/13/2013 08:11 PM, Stephan von Krawczynski wrote: On Sat, 13 Apr 2013 10:37:23 -0400 (EDT) John Walker jowal...@redhat.com wrote: Try the new qa build for 3.3.2. We're hopeful that this will solve some lingering problems out there. The ext4, too? Not in this qa release. A subsequent qa release will have the ext4 fix. -Vijay Great news, thank you. -- Regards, Stephan ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Performance for KVM images (qcow)
On Tue, 09 Apr 2013 03:13:10 -0700 Robert Hajime Lanning lann...@lanning.cc wrote: On 04/09/13 01:17, Eyal Marantenboim wrote: Hi Bryan, We have 1G nics on all our servers. Do you think that changing our design to distribute-replicate will improve the performance? Anything in the gluster performance settings that you think I should change? With GlusterFS, almost all the processing is in the client side. This includes replication. So, when you have replica 4, the client will be duplicating all transactions 4 times, synchronously. Your 1G ethernet just became 256M. Let me drop in that no clear mind does it this way. Obviously one would give the client more physical network cards, best choice as many as there are replications, and do the subnetting accordingly. -- Regards, Stephan ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Slow read performance
I really do wonder if this bug in _glusterfs_ is not fixed. It really makes no sense to do an implementation that breaks on the most used fs on linux. And just as you said: don't wait on btrfs, it will never be production-ready. And xfs is no solution, it is just a bad work-around. On Fri, 8 Mar 2013 10:43:41 -0800 Bryan Whitehead dri...@megahappy.net wrote: Here are some details about ext4 changes in the kernel screwing up glusterfs: http://www.gluster.org/2012/08/glusterfs-bit-by-ext4-structure-change/ https://bugzilla.redhat.com/show_bug.cgi?id=838784 I thought I read there was a work-around in recent versions of gluster but I think it came at a cost somewhere. I'm not sure since I've been using xfs since the 1.x days of gluster and only see random ext3/4 problems bubble up on these maillinglist. In general, ext4 was just a stopgap for the wait on btrfs getting flushed out. That said, I don't see ext4 going away for a long long time. :-/ NOTE: I don't even know if this is your problem. You might try updating 2 bricks that are replica pairs to use xfs then do some performance tests on files living on them to confirm. Example, you have 20 some servers/bricks. If hostD and hostE are replica pairs for some subset of files, shutdown glusterd on HostD, change fs to xfs, fire glusterd back up - let it resync and recover all the files, do the same on hostE (once hostD is good), then see if there is a read speed improvement for files living on those two host pairs. On Fri, Mar 8, 2013 at 6:40 AM, Thomas Wakefield tw...@cola.iges.orgwrote: I am still confused how ext4 is suddenly slow to read when it's behind Gluster, but plenty fast stand alone reading? And it writes really fast from both the server and client. On Mar 8, 2013, at 4:07 AM, Jon Tegner teg...@renget.se wrote: We had issues with ext4 about a bit less than a year ago, at that time I upgraded the servers to CentOS-6.2. But that gave us large problems (more than slow reads). Since I didn't want to reformat the disks at that time (and switch to XFS) I went back to CentOS-5.5 (which we had used before). On some link (think it was https://bugzilla.redhat.com/show_bug.cgi?id=713546 but can't seem to reach that now) it was stated that the ext4-issue was present even on later versions of CentOS-5 (I _think_ 5.8 was affected). Are there hope that the ext4-issue will be solved in later kernels/versions of gluster? If not, it seems one is eventually forced to switch to XFS. Regards, /jon On Mar 8, 2013 03:27 Thomas Wakefield tw...@iges.org tw...@iges.orgwrote: inode size is 256. Pretty stuck with these settings and ext4. I missed the memo that Gluster started to prefer xfs, back in the 2.x days xfs was not the preferred filesystem. At this point it's a 340TB filesystem with 160TB used. I just added more space, and was doing some followup testing and wasn't impressed with the results. But I am sure I was happier before with the performance. Still running CentOS 5.8 Anything else I could look at? Thanks, Tom On Mar 7, 2013, at 5:04 PM, Bryan Whitehead dri...@megahappy.net wrote: I'm sure you know, but xfs is the recommended filesystem for glusterfs. Ext4 has a number of issues. (Particularly on CentOS/Redhat6). The default inode size for ext4 (and xfs) is small for the number of extended attributes glusterfs uses. This causes a minor hit in performance on xfs if theextended attributes grow more than 265 (xfs default size). In xfs, this is fixed by setting the size of an inode to 512. How big the impact is on ext4 is something I don't know offhand. But looking at a couple of boxes I have it looks like some ext4 filesystems have 128 inode size and some have 256 inode size (both of which are too small for glusterfs). The performance hit is everytimeextended attributes need to be read several inodes need to be seeked and found. run dumpe2fs -h blockdevice | grep size on your ext4 mountpoints. If it is not too much of a bother - I'd try xfs as your filesystem for the bricks mkfs.xfs -i size=512 blockdevice Please see this for more detailed info: https://access.redhat.com/knowledge/docs/en-US/Red_Hat_Storage/2.0/ht ml-single/Administration_Guide/index.html#chap-User_Guide-Setting_Volu mes On Thu, Mar 7, 2013 at 12:08 PM, Thomas Wakefield tw...@cola.iges.org wrote: Everything is built as ext4, no options other than lazy_itable_init=1 when I built the filesystems. Server mount example: LABEL=disk2a /storage/disk2a ext4defaults 0 0 Client mount: fs-disk2:/shared /shared glusterfs defaults 0 0 Remember, the slow reads are only from gluster clients, the disks are really fast when I am
Re: [Gluster-users] NFS availability
On Wed, 30 Jan 2013 20:44:52 -0800 harry mangalam harry.manga...@uci.edu wrote: On Thursday, January 31, 2013 11:28:04 AM glusterzhxue wrote: Hi all, As is known to us all, gluster provides NFS mount. However, if the mount point fails, clients will lose connection to Gluster. While if we use gluster native client, this fail will have no effect on clients. For example: mount -t glusterfs host1:/vol1 /mnt If host1 goes down for some reason, client works still, it has no sense about the failure(suppose we have multiple gluster servers). The client will still fail (in most cases) since host1 (if I follow you) is part of the gluster groupset. Certainly if it's a distributed-only, maybe not if it's a dist/repl gluster. But if host1 goes down, the client will not be able to find a gluster vol to mount. For sure it will not fail if replication is used. However, if we use the following: mount -t nfs -o vers=3 host1:/vol1 /mnt If host1 failed, client will lose connection to gluster servers. If the client was mounting the glusterfs via a re-export from an intermediate host, you might be able to failover to another intermediate NFS server, but if it was a gluster host, it would fail due to the reasons above. kernel-nfs _may_ failover from server A to server B if B takes the original server IP and some requirements are met. You don't need an intermediate (re-exporting) server for this. -- Regards, Stephan ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] NFS availability
On Thu, 31 Jan 2013 12:47:30 + Brian Candler b.cand...@pobox.com wrote: On Thu, Jan 31, 2013 at 09:18:26AM +0100, Stephan von Krawczynski wrote: The client will still fail (in most cases) since host1 (if I follow you) is part of the gluster groupset. Certainly if it's a distributed-only, maybe not if it's a dist/repl gluster. But if host1 goes down, the client will not be able to find a gluster vol to mount. For sure it will not fail if replication is used. Aside: it will *fail* if the client reboots, and /etc/fstab has server1:/volname, and server1 is the one which failed. Well, this is exactly the reason we generally deny to fetch the volfile from the server. This whole idea is obvious nonsense for exactly the reason you described. -- Regards, Stephan ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] NFS availability
On Thu, 31 Jan 2013 09:07:50 -0800 Joe Julian j...@julianfamily.org wrote: On 01/31/2013 08:38 AM, Stephan von Krawczynski wrote: On Thu, 31 Jan 2013 12:47:30 + Brian Candler b.cand...@pobox.com wrote: On Thu, Jan 31, 2013 at 09:18:26AM +0100, Stephan von Krawczynski wrote: The client will still fail (in most cases) since host1 (if I follow you) is part of the gluster groupset. Certainly if it's a distributed-only, maybe not if it's a dist/repl gluster. But if host1 goes down, the client will not be able to find a gluster vol to mount. For sure it will not fail if replication is used. Aside: it will *fail* if the client reboots, and /etc/fstab has server1:/volname, and server1 is the one which failed. Well, this is exactly the reason we generally deny to fetch the volfile from the server. This whole idea is obvious nonsense for exactly the reason you described. That doesn't lend me much confidence in your expertise with regard to your other recommendations, Stephan. There are two good ways to make this work even if a server is down: * Round robin DNS. A hostname (ie. glusterfs.domain.dom) with multiple A records that point to all your servers. Using that hostname in fstab will allow the client to roll over to the additional servers in the event the first one it gets is not available (ie. glusterfs.domain.dom:myvol /mnt/myvol glusterfs defaults 0 0). You don't want to use DNS in an environment where security is your first rule. If your DNS drops dead your setup is dead. Not very promising ... The basic goal of glusterfs has been to secure data by replicating it. Data distribution is really not interesting for us. Now you say go and replicate your data for security, but use DNS to secure your setup. ??? You really seem to like Domino-setups. DNS dead = everything dead. * The mount option backupvolfile-server. An fstab entry like server1:myvol /mnt/myvol glusterfs backupvolfile-server=server2 0 0 will allow the mount command to try server2 if server1 does not mount successfully. And how many backup servers do you want to name in your fstab? In fact you have to name all your servers because else there will always be at least one situation you are busted. This whole idea is obvious experience and forethought, not nonsense. By having a management service that provides configuration, on-the-fly configuration changes are possible. If one denies to fetch the volfile one cripples their cluster's flexibility. I don't know what kind of setups you drive. In our environment we don't want to fiddle around with fs configs. We want them to work as expected even if other parts of the total setup fall apart. Flexibility in our world means you can do widespread types of configurations. It does not mean we switch the running configs every day only because gluster is so flexible. -- Regards, Stephan ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] NFS availability
On Thu, 31 Jan 2013 14:17:32 -0500 Jeff Darcy jda...@redhat.com wrote: There is *always* at least one situation, however unlikely, where you're busted. Designing reliable systems is always about probabilities. If none of the solutions mentioned so far suffice for you, there are still others that don't involve sacrificing the advantages of dynamic configuration. If your network is so FUBAR that you have trouble reaching any server to fetch a volfile, then it probably wouldn't do you any good to have one locally because you wouldn't be able to reach those servers for I/O anyway. You'd be just asking for split brain and other problems. Redesigning the mount is likely to yield less benefit than redesigning the network that's susceptible to such failures. You are asking in the wrong direction. The simple question is: is there any dynamic configuration equally safe than a local config file? If your local fs is dead, then you are really dead. But if it is alive you have a config. And that's about it. You need no working DNS, no poisoned cache and no special server that must not fail. Everything with less security is inacceptable. There is no probability, either you are a dead client or a working one. And if you really want to include the network as a question. I would expect the gluster client-server and server-server protocols to accept network failure as a default case. It wouldn't be useful to release a network filesystem that drops dead in case of network errors. If there is some chance to survive it should be able to do so and still work. Most common network errors are not a matter of design, but of dead iron. -- Regards, Stephan ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] NFS availability
On Thu, 31 Jan 2013 16:00:38 -0500 Jeff Darcy jda...@redhat.com wrote: Most common network errors are not a matter of design, but of dead iron. It's usually both - a design that is insufficiently tolerant of component failure, plus a combination of component failures that exceeds that tolerance. You seem to have a very high standard for filesystems continuing to maintain 100% functionality - and I suppose 100% performance as well - if there's any possibility whatsoever that they could do so. Why don't you apply that same standard to the part of the system that you're responsible for designing? Running any distributed system on top of a deficient network infrastructure will lead only to disappointment. I am sorry that glusterfs is part of the design and your critics. Everyone working sufficiently long with networks of all kinds of sizes and components can tell you that in the end you want a design for a file service that works as long as possible. This means it should survive even if there is only one client and server and network path left. At least that is what is expected from glusterfs. Unfortunately sometimes you get disappointed. We saw just about everything happening when switching off all but one reliable network path including network hangs and server hangs (the last one) (read the list for examples by others). On the other end of the story clients see servers go offline if you increase the non-gluster traffic on the network. Main (but not only) reason is the very low default ping time (read the list for examples by others). All these seen effects show clearly that noone ever tested this to an extent I would have done writing this kind of software. After all this is a piece of software whose merely only purpose is surviving dead servers and networks. It is no question of design, because on paper everything looks promising. Sometimes your arguments let me believe you want glusterfs working like a ford car. A lot of technical gameplay built in but the idea that a car should be a good car in the first place got lost on the way somewhere. Quite a lot of the features built in lately have the quality of an mp3-player in your ford. Nice to have but does not help you a lot driving 200 and a rabbit crossing. And this is why I am requesting the equivalent of a BMW. -- Regards, Stephan ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Self healing metadata info
Hi Patric, your paper shows clearly you are infected by the fs-programmer-virus :-) Noone else would give you tags/gfids/inode nums of a file inside a logfile instead of the full true filename, simply because looking at the logfile days/months/years later you know exactly nothing about the files affected by e.g. a self heal. Can you explain why a fs cannot give the user/admin the files' name currently fiddling around in a logfile instead of a cryptic number? For the completeness in split-brain case I would probably do a gluster volume heal repvol prefer brick filename command which prefers the files' copy on brick and triggers the self-heal for that file. As an addition you would be able to allow gluster volume heal repvol prefer brick (without filename) to generally prefer files on brick and trigger self-heal for all files. There are cases where admins do not care about the actual copy but more about the accessibility of the file per se. Everything is easy around self-heal/splitbrain if you deal with 5 files affected. But dealing with 5000 files instead shows you that no admin is probably able to look at every single file. So he should be able to choose some general option like gluster volume heal repvol prefer tag where tag can be: brickname (as above) length, choose longest file always date, choose latest file date always delete, simply remove all affected files name-one ... Regards, Stephan On Fri, 25 Jan 2013 10:11:07 +0100 Patric Uebele pueb...@redhat.com wrote: Hi JPro, perhaps the attached doc does explain it a bit. Best regards, Patric On Fri, 2013-01-25 at 01:26 -0500, Java Pro wrote: Hi, If a brick is down and comes back up later, how does Glusterfs know which files in this brick need to be 'self-healed'? Since the metadata of whether to 'heal' is stored as an xattr in a replica on other bricks. Does Glusterfs scan these files on the other bricks to see if one is accusing its replica and therefore need to heal its replica? In short, does Glusterfs keep a record of writes to a brick when a brick is down and apply these writes to the brick when its backup? Thanks, JPro ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users -- Patric Uebele Solution Architect Storage Red Hat GmbH Technopark II, Haus C Werner-von-Siemens-Ring 14 85630 Grasbrunn Germany Office:+49 89 205071-162 Cell: +49 172 669 14 99 mailto:patric.ueb...@redhat.com gpg keyid: 48E64CC1 gpg fingerprint: C63E 6320 A03B 4410 D208 4EE7 12FC D0E6 48E6 4CC1 Reg. Adresse: Red Hat GmbH, Werner-von-Siemens-Ring 14, 85630 Grasbrunn Handelsregister: Amtsgericht Muenchen HRB 153243 Geschaeftsfuehrer: Mark Hegarty, Charlie Peters, Michael Cunningham, Charles Cachera ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Meta
On Tue, 22 Jan 2013 09:05:56 -0500 Whit Blauvelt whit.glus...@transpect.com wrote: On Tue, Jan 22, 2013 at 08:37:03AM -0500, F. Ozbek wrote: [...] We've not only got freedom of speech. We've got freedom of guns. Still, walking into the meeting with your gun drawn will get you viewed as rude or worse. We're supposed to be data pros here, not cowboys. So, data please. Best, Whit Whit, just for the sake of it. Jeffs method of discussion is to lengthen every idea/opinion/fact to an academical epos. This is why sometimes you simply don't have the time to argue with him, especially if you are not paid but wasting your own spare time. Additionally one very basic fact should be accepted. People are on different levels of experience on this _user_ list. Some have tested the software for years and experienced the lacks and dead ends. Some don't. Quite some of the pro-arguers do not accept experiences as long as you do not hard-proove them within a lenghty article starting by definition of the alphabet used. It is not really helpful to hit everyone writing two sentences with data please. Quite some data can be found if you really care. But even the long pdf someone posted lately with comparison data has significant lacks in presentation. I would love to see some acceptance around the major problems the software has currently, because without acceptance there is no way to true solution. Again, the design is impressive, only the implementation does not keep up. Don't trust my words, look for the comparisons and judge for yourself. -- Regards, Stephan ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] glusterfs performance issues
On Tue, 08 Jan 2013 07:04:48 -0500 Jeff Darcy jda...@redhat.com wrote: Timestamps are totally unreliable as a conflict resolution mechanism. Even if one were to accept the dependency on time synchronization, there's still the possibility of drift as yet uncorrected by the synchronization protocol. The change logs used by self heal are the *only* viable solution here. If you want to participate constructively, we could have a discussion about how those change logs should be set and checked, and whether a brick should be allowed to respond to requests for a file between coming up and completion of at least one self-heal check (Mario's example would be a good one to follow), but insisting on even less reliable methods isn't going to help. Nobody besides you is talking about timestamps. I would simply choose an increasing stamp, increased by every write-touch of the file. In a trivial comparison this assures you choose the latest copy of the file. There is really no time needed at all, and therefore no time synchronisation issues. -- MfG, Stephan ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] glusterfs performance issues
On Mon, 07 Jan 2013 20:21:25 -0800 Joe Julian j...@julianfamily.org wrote: I don't know the answer. I know that they want this problem to be solved, but right now the best solution is hardware. The lower the latency, the less of a problem you'll have. The only solution is correct programming, no matter what the below hardware looks like. The only outcome of good or bad hardware is how _fast_ the _correct_ answer reaches the fs client. Yes, if you can control the programming of your application, that would be a better solution. Unfortunately most of us use pre-packaged software like apache, php, etc. Since most of us don't have the chance to use the correct programming solution, then you're going to need to decrease latency if your going to open thousands of fd's for every operation and are unsatisfied with the results. I am _not_ talking about the application software. I am talking about the fact that everybody using glusterfs has seen glusterfs choosing the _wrong_ (i.e. old) version of a file from a brick just coming back from downstate to the replicated unit. In fact I already saw just about every possibility you can think of when accessing files, be it a simple ls or writing or reading a file. I verified files being absent if opened although shown in ls. I saw outdated file content, although timestamp in ls being up to date. I saw file content being new although ls shows outdated file date _and_ length. Please don't tell me the fs has no immanent confusion about the various stats of different bricks. I don't state this happens with every file, I'm just saying it does happen. Am I the only one with these kind of experiences? -- Regards, Stephan ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] glusterfs performance issues
On Tue, 08 Jan 2013 07:54:05 -0500 Jeff Darcy jda...@redhat.com wrote: On 1/8/13 7:11 AM, Stephan von Krawczynski wrote: Nobody besides you is talking about timestamps. I would simply choose an increasing stamp, increased by every write-touch of the file. In a trivial comparison this assures you choose the latest copy of the file. There is really no time needed at all, and therefore no time synchronisation issues. When you dismiss change logs and then say latest without elaboration then it's not unreasonable to assume you mean timestamps. Perhaps you should try to write more clearly. Versions are certainly an improvement over timestamps, but they're not as simple as you say either - and I've actually used versioning in a functional replication translator[1] so I'm not just idly speculating about work other people might do. If two replicas are both at (integer) version X but are partitioned from one another, then writes to both could result in two copies each with version X+1 but with different data. This can only happen in a broken versioning. Obviously one would take (very rough explanation) at least a two-shot concept. You increase the version by one when starting the file modification process and again by one when the process is completed without error. You end up knowing that version nr 1,3,5,... are intermediate/incomplete versions and 2,4,6,... are files with completed operations. Now you can tell at any time throughout any stat comparison which file is truely actual and which one is in intermediate state. If you want that you can even await the completion of an ongoing modification before returning some result to your requesting app. Yes, this would result in immanent locking. -- Regards, Stephan ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] glusterfs performance issues
On Tue, 8 Jan 2013 08:01:16 -0500 Whit Blauvelt whit.glus...@transpect.com wrote: On Tue, Jan 08, 2013 at 01:11:24PM +0100, Stephan von Krawczynski wrote: Nobody besides you is talking about timestamps. I would simply choose an increasing stamp, increased by every write-touch of the file. In a trivial comparison this assures you choose the latest copy of the file. There is really no time needed at all, and therefore no time synchronisation issues. So rather than the POSIX attribute of a time stamp, which is I'm pretty sure what we all thought you were talking about, you're asking for a new xattribute? And you want that to be simply iterative? Okay, so in a split-brain, a file gets touched 5 times on one side, and actually written to just once, not touched at all, on the other. Then the system's brought back together. Your trivial comparison will choose the wrong file version. What an dead-end argument. _Nothing_ will save you in case of a split-brain. Lets clarify that a split-brain is a situation where your replication unit is teared into bricks and these used independently from each other. There is no way at all to join such a situation again regarding equal files being written to. You cannot blame a versioning for having another really bad conceptional problem. Since there is no automated solution to a split brain you can either decide to give the user access to two different file versions, none of which has the original file name to prevent irritation or live with lost data by choosing one of the available file versions. Lets spell it this way: either you want maximum availability and accept a split brain in worst case (indeed very acceptable in case of read-only data), or you prevent split brain and accept downtime of _some_ of the clients by choosing which brick is master in this special case. That's the thing about complex systems. Trivial solutions are usually both simple and wrong. Some work most of the time, but there are corner cases. As we see with Gluster even complex solutions tend to have corner cases; but at least in complex solutions the corners can be whittled down. Can they? I'd rather say if it is non-trivial it is broken most of the time. Ask btrfs for confirmation. Regards, Whit -- Regards, Stephan ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] glusterfs performance issues
On Tue, 08 Jan 2013 07:55:41 -0500 Jeff Darcy jda...@redhat.com wrote: On 1/8/13 7:35 AM, Stephan von Krawczynski wrote: In fact I already saw just about every possibility you can think of when accessing files, be it a simple ls or writing or reading a file. Would you mind citing the bug IDs for the problems you found? Yes, I mind. The problem with this kind of bugs is that you cannot describe reproduction. Which makes them pretty useless as bug reports. They can therefore only contain the information that such situations are seen, but not much else. And me and others have told that continously over the years on the lists. Take 4 physical boxes and check out some damage situations (switch bricks off and on), you will see the described problems within a day. You only need bonnie and ls to find out. -- Regards, Stephan ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] glusterfs performance issues
On Tue, 8 Jan 2013 09:25:05 -0500 Whit Blauvelt whit.glus...@transpect.com wrote: On Tue, Jan 08, 2013 at 02:42:49PM +0100, Stephan von Krawczynski wrote: What an dead-end argument. _Nothing_ will save you in case of a split-brain. So then, to your mind, there's _nothing_ Gluster can do to heal after a split brain? Some non-trivial portion of the error scenarios discussed in this thread result from a momentary or longer split-brain situation. I'm using split-brain in the broad sense of any situation where two sides of a replicated system are out-of-touch for some period and thus get out-of-sync. Isn't that exactly what we're discussing, how to heal from that? Sure, you can have instances of specific files beyond algorithmic treatment. But aren't we discussing how to ensure that the largest possible portion of the set of files amenable to algorithmic treatment are so-handled? Really about the only thing regarding split brain (in our common sense) that is important is that you are notified that it happened at all. I would never recommend a setup to joe-average-user/admin that does real split-brain instead of tearing down every brick besides the master configured for split brain situation. There is no good way to avoid a lot of nasty problems. It is not really sufficient to tell people that _most_ of the split brain can be healed. If not all, then it is better to tear down or at least switch all but one to read-only. That's the thing about complex systems. Trivial solutions are usually both simple and wrong. Some work most of the time, but there are corner cases. As we see with Gluster even complex solutions tend to have corner cases; but at least in complex solutions the corners can be whittled down. Can they? I'd rather say if it is non-trivial it is broken most of the time. Ask btrfs for confirmation. Pointing out that a complex system can go wrong doesn't invalidate complex systems as a class. It's well established in ecological science that more complex natural systems are far more resiliant than simple ones. A rich, complex local ecosystem has a higher rate of stability and survival than a simple, poorer one. That's assuming the systems are evolved and have niches well-fitted with organisms - that the complexity is organic, not just random. That is a good example for excluded corner cases, just like the current split brain discussion. All I need to do to your complex natural system to invalidate is to throw a big stone on it. Ask dinosaurs for real life experience after that. People really tend to think what you think. But most of the complexity that you think might help is in fact worthless. If you did not solve the basic questions completely, then added complexity won't help. In split-brain you have to solve only one question: who is the survivor for writes? Every other problem or question is just a drawback of this unresolved issue. Computer software, hardware, and the human culture that supports them also form complex, evolved ecosystems. Can there be simple solutions that help optimize such complex systems? Sure. But to look only for simple solutions is to be like the proverbial drunk looking for his keys under the streetlight, even though he heard them drop a half-block away, because The light is better here. When people try to apply simple solutions to complex, evolved ecosystems, the law of unintended consequences is more the rule than the exception. Solutions that appear simple and obvious should always be suspect. Granted, complex, obscure ones also require scrutiny. It's just, the simple stuff should never get a pass. Where's the guy that said keep it simple ? ;-) Best, Whit -- Regards, Stephan ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] glusterfs performance issues - meta
On Tue, 8 Jan 2013 11:44:15 -0500 Whit Blauvelt whit.glus...@transpect.com wrote: On Tue, Jan 08, 2013 at 04:49:30PM +0100, Stephan von Krawczynski wrote: Pointing out that a complex system can go wrong doesn't invalidate complex systems as a class. It's well established in ecological science that more complex natural systems are far more resilient than simple ones. A rich, complex local ecosystem has a higher rate of stability and survival than a simple, poorer one. That's assuming the systems are evolved and have niches well-fitted with organisms - that the complexity is organic, not just random. That is a good example for excluded corner cases, just like the current split brain discussion. All I need to do to your complex natural system to invalidate is to throw a big stone on it. Ask dinosaurs for real life experience after that. Throw a big enough stone and anything can be totally crushed. The question is one of resilience when the stone is less than totally crushing. The ecosystem the big stone was thrown at which included the dinosaurs survived, because in its complexity it also included little mammals - which themselves were more complex organisms than the dinosaurs. Not that some simpler organisms didn't make it through the extinction event too. Plenty did. The chicken I ate for dinner is a descendant of feathered dinosaurs. Take two local ecosystems, one more complex than the other. Throw in some big disturbance, the same size of disruption in each. On average, the complex local ecosystem is more likely to survive and bounce back, while the simple one is more likely to go into terminal decline. This is field data, not mere conjecture. Your argument here could be that technological systems don't obey the same laws as ecosystems. But work in complexity theory shows that the right sorts of complexity produce greater stability across a broad range of systems, not just biological ones. Free, open source software's particular advantage is that it advances in a more evolutionary manner than closed software, since there is evolutionary pressure from many directions on each part of it, at every scale. Evolutionary pressure produces complexity, the _right sort_ of complexity. That's why Linux systems are more complex, and at the same time more stable and manageable, than Windows systems. Simplicity does not have the advantage. Even when smashing things with rocks, the more complex thing is more likely to survive the assault, if it has the right sort of complexity. Listen, I don't really want to lengthen the discussion about complexity issues in ecosystems. But let me please point out that the fundamental flaw in your example as you turn it now is that a natural ecosystem has no _goal_ of existence. Whereas programmed code should at least have _some_. Which means you can take it as negative example for reasons why something does not work. But you cannot elaborate it as positive example why something should work. Glusterfs(d) has a clearly stated goal of being, an ecosystem has not. So you cannot say that only because _something_ survived a crashing ecosystem proves that an equally complex code does something useful after an equally complex crash. In fact it most certainly does not. Contrary is true. You should strip down complexity to the lowest possible level to make the code more obvious and therefore more debuggable and readable to a larger number of people. That will have a positive effect on its stability. But if you throw in more and more code for fragile corner cases instead of drawing a clear line between clearly working and clearly failing you will not end up at the desired state where everything works stable. This path is as wrong as it was to release a complete fileserver installation image back in the old days of glusterfs. In the end, everything boils down to the question where efforts are invested best in order to make the project more successful. And its really not that hard to find out what the biggest show stopper is. simply count the articles in the list dealing with performance issues and strange effects of not-synced files during _normal_ operation. there is not much left. Read the fs comparison between NFS, Samba, Ceph and Glusterfs in some german linux magazine lately? Guess who's last... Best, Whit -- Regards, Stephan ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] glusterfs performance issues
On Mon, 07 Jan 2013 13:19:49 -0800 Joe Julian j...@julianfamily.org wrote: You have a replicated filesystem, brick1 and brick2. Brick 2 goes down and you edit a 4k file, appending data to it. That change, and the fact that there is a pending change, is stored on brick1. Brick2 returns to service. Your app wants to append to the file again. It calls stat on the file. Brick2 answers first stating that the file is 4k long. Your app seeks to 4k and writes. Now the data you wrote before is gone. Forgive my ignorance, but it obvious that this implementation of a stat on a replicating fs is shit. Of course a stat should await _all_ returning local stats and should choose the stat of the _latest_ file version and note that the file needs self heal. This is one of the processes by which stale stat data can cause data loss. That's why each lookup() (which precedes the stat) causes a self-heal check and why it's a problem that hasn't been resolved in the last two years. self-heal is no answer to this question. The only valid answer is choosing the _latest_ file version no matter if self heal is necessary or not. I don't know the answer. I know that they want this problem to be solved, but right now the best solution is hardware. The lower the latency, the less of a problem you'll have. The only solution is correct programming, no matter what the below hardware looks like. The only outcome of good or bad hardware is how _fast_ the _correct_ answer reaches the fs client. Your description is a satire, not? On 01/07/2013 12:59 PM, Dennis Jacobfeuerborn wrote: On 01/07/2013 06:11 PM, Jeff Darcy wrote: On 01/07/2013 12:03 PM, Dennis Jacobfeuerborn wrote: The gm convert processes make almost no progress even though on a regular filesystem each call takes only a fraction of a second. Can you run gm_convert under strace? That will give us a more accurate idea of what kind of I/O it's generating. I recommend both -t and -T to get timing information as well. Also, it never hurts to file a bug so we can track/prioritize/etc. Thanks. https://bugzilla.redhat.com/enter_bug.cgi?product=GlusterFS Thanks for the strace hint. As it turned out the gm convert call was issued on the filename with a [0] appended which apparently led gm to stat() all (!) files in the directory. While this particular problem isn't really a glusterfs problem is there a way to improve the stat() performance in general? Regards, Dennis ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users -- MfG, Stephan ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Turning GlusterFS into something else (was Re: how well will this work)
On Sun, 30 Dec 2012 10:13:52 -0500 Jeff Darcy jda...@redhat.com wrote: On 12/27/12 3:36 PM, Stephan von Krawczynski wrote: And the same goes for glusterfs. It _could_ be the greatest fs on earth, but only if you accept: 1) Throw away all non-linux code. Because this war is over since long. Sorry, but we do have non-Linux users already and won't abandon them. We wouldn't save all that much time even if we did, so it just doesn't make sense. Jeff, really, if you argue, please state your argument openly. You don't want this point because its next logical step would be my point 2), the kernel implementation. As long as you hold up dead boxes like orcale-owned solaris you have a good point in not doing 2). Success needs focussing. If you try to be everybody's darling you may well end up being dropped by everybody because you are not good enough. 2) Make a kernel based client/server implementation. Because it is the only way to acceptable performance. That's an easy thing to state, but a bit harder to prove. Come on, how old are you? can you remember userspace-nfs? In case you cannot: it had just about the same problems glusterfs has today, and guess why it is gone... [a lot of bad examples deleted] Really, you cannot prove you are right by naming some examples that are even more horrible. 3) Implement true undelete feature. Make delete a move to a deleted-files area. Some people want that, some people do not. Haha! A good argument for a config parameter :-) - I would have suggested that anyway. Some are even precluded from using it e.g. for compliance reasons. It's hardly a must-have feature. In any case, it already exists - called landfill I believe, though I'm not sure of its support status or configurability via the command line. If it didn't exist, it would still be easy to create - which wouldn't be the case at all if we followed your advice to put this in the kernel. Now I wonder how you argue about this. Let me bring in some analogy you will probably hate. Linux MM uses free memory to cache for just about anything thinkable of. This drives W*indows users crazy using Android. They always try to put the latest kill-all-not-needed-apps tool to let them read a big number in free space statistics. They do not understand that free memory is in fact wasted memory. And the same thing goes for disk space. If I delete something on a disk that is far from being full it is just plain dumb to really erase this data from the disk. It won't help anyone. It will only hurt you if you deleted it accidently. Read my lips: free disk space is wasted space, just like free mem is wasted mem. And _that_ is the true reason for undelete. It won't hurt anybody, and will help some. And since it is the true goal of a fs to organise data on a drive it is most obvious that undelete (you may call it lazy-delete) is a very basic fs feature and _not_ an add-on patched onto it. If it's a priority for you and existing facilities do not suffice, then I suggest adding a feature page on the wiki and/or an enhancement-request bug report, so that we can incorporate that feedback into our planning process. Thank you for your help making GlusterFS better. [politics end] Jeff, this is really no technical question we are talking about. It's more a question of a management decision. If redhat wants a truely successful glusterfs someone has to decide to follow my steps. If the stuff was only bought because it looked interesting and no one else should use its true potential, well then go ahead. -- Regards, Stephan ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Turning GlusterFS into something else (was Re: how well will this work)
On Sun, 30 Dec 2012 12:29:53 -0800 Joe Julian j...@julianfamily.org wrote: Here's were you're getting labeled as a Troll. You have a tendency to do this on just about every mailing list except LKML (not sure why they get your love over others, but to each their own). There is one basic difference between LKML and quite almost every other project you probably saw me posting. The kernel project has _one_ head who has proven to take real _management_ decisions in his project. Sometimes they look rude, sometimes they are a bit late, very often they are just-in-time or even early. And if you read the archives you probably notice one or two times where I requested a _decision_ on fundamental strategies. Probably you remember me being laughed at when I suggested to make cpus hot-pluggable years ago. Nobody thought of the implications back then. Nowadays cpu hotplug is in every arm-driven multicore android handy. I am not Jesus. Only sometimes I can read the writings on the wall a bit earlier than others do, that's all. You come in, spout some diatribe claiming how you know better than everybody else to the point of being told that this is the last post I'm going to make on this subject. You don't work with the developers, you antagonize them. I still don't see the features you're asking for on the wiki, nor in bugzilla. You obviously have some knowledge of C judging by your analysis of issues in LKML and patch offers relating to the same. Why not offer your abilities in a constructive way by using the tools we make publicly available? From writing lots of lines of code in C and quite a bunch of other languages for the last about 30 years I can tell you that the biggest effect of things I did is not based on released code but on exactly this kind of discussions. One of the fundamental problems in open source is that quite some good projects die because nobody has the guts to tell that the basic direction needs correction. I know that most people do not want to hear that, nevertheless someone has to stand up and say this is sh*t if it really is. And if nobody else does, I do. At the end of the day most people may hate me for that, but if the project got better, I don't give a damn. I am no team player, I believe in one man, one vision. -- Regards, Stephan ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] how well will this work
On Wed, 26 Dec 2012 22:04:09 -0800 Joe Julian j...@julianfamily.org wrote: It would probably be better to ask this with end-goal questions instead of with a unspecified critical feature list and performance problems. 6 months ago, for myself and quite an extensive (and often impressive) list of users there were no missing critical features nor was there any problems with performance. That's not to say that they did not meet your design specifications, but without those specs you're the only one who could evaluate that. Well, then the list of users does obviously not contain me ;-) The damn thing will only become impressive if a native kernel client module is done. FUSE is really a pain. And read my lips: the NFS implementation has general load/performance problems. Don't be surprised if it jumps into your face. Why on earth do they think linux has NFS as kernel implementation? -- Regards, Stephan ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] how well will this work
Dear JM, unfortunately one has to tell openly that the whole concept that is tried here is simply wrong. The problem is not the next-bug-to-fix. The problem is the client strategy in user space. It is broken by design. You can either believe this or go ahead ignoring it and never really get a good and stable setup. Really, the whole we-close-our-eyes-and-hope-it-will-turn-out-well strategy looks just like btrfs. Read the archives, I told them years ago it will not work out in our life time. And today, still they have no ready-for-production fs, and believe me: it never will be there. And the same goes for glusterfs. It _could_ be the greatest fs on earth, but only if you accept: 1) Throw away all non-linux code. Because this war is over since long. 2) Make a kernel based client/server implementation. Because it is the only way to acceptable performance. 3) Implement true undelete feature. Make delete a move to a deleted-files area. These are the minimal steps to take for a real success, everything else is just beating the dead horse. Regards, Stephan On Thu, 27 Dec 2012 10:03:10 -0500 (EST) John Mark Walker johnm...@redhat.com wrote: Look, fuse its issues that we all know about. Either it works for you or it doesn't. If fuse bothers you that much, look into libgfapi. Re: NFS - I'm trying to help track this down. Please either add your comment to an existing bug or create a new ticket. Either way, ranting won't solve your problem or inspire anyone to fix it. -JM Stephan von Krawczynski sk...@ithnet.com wrote: On Wed, 26 Dec 2012 22:04:09 -0800 Joe Julian j...@julianfamily.org wrote: It would probably be better to ask this with end-goal questions instead of with a unspecified critical feature list and performance problems. 6 months ago, for myself and quite an extensive (and often impressive) list of users there were no missing critical features nor was there any problems with performance. That's not to say that they did not meet your design specifications, but without those specs you're the only one who could evaluate that. Well, then the list of users does obviously not contain me ;-) The damn thing will only become impressive if a native kernel client module is done. FUSE is really a pain. And read my lips: the NFS implementation has general load/performance problems. Don't be surprised if it jumps into your face. Why on earth do they think linux has NFS as kernel implementation? -- Regards, Stephan ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] how well will this work
On Thu, 27 Dec 2012 13:24:55 -0800 Dan Cyr d...@truenorthmanagement.com wrote: I also don’t think this is a rant. I, as well, have been following this list for a few years, and have been waiting for GlusterFS to stabilize for VM deployment. I hope this discussion helps the devs understand areas that people are waiting for. We have 2 SAN servers with Infiniband connections to a Blade Center. I would like all the KVM VMs hosted on the SAN with the ability to add more SAN servers in the future. – Currently Gluster allows this via NFS but I’ve read about performance issues. – So, right now, after 2 years of not deploying this gear (and running the VMs images on each blade), am looking for an expandable solution for the backend storage so I stop manually babying this network and install OpenNebula so I’m not the only person in our office who can manage our VM infrastructure. This does fit into the OP’s question because I would love to see GlusterFS work like this. Miles - As is right now GlusterFS is not what you want for backend VM storage. Question: “how well will this work” Answer: “horribly” Dan From: gluster-users-boun...@gluster.org [mailto:gluster-users-boun...@gluster.org] On Behalf Of John Mark Walker Sent: Thursday, December 27, 2012 12:39 PM To: Stephan von Krawczynski Cc: gluster-users@gluster.org Subject: Re: [Gluster-users] how well will this work Stephan, I'm going to make this as simple as possible. Every message to this list should follow these rules: 1. be helpful 2. be constructive 3. be respectful I will not tolerate ranting that serves no purpose. If your message doesn't follow any of the rules above, then you shouldn't be posting it. This is your 2nd warning. -JM Hola JM, are you aware that your above message has neither arrived at my side through the list, nor through personal mail. Does this mean I got deleted from the list by you? -- Regards, Stephan ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Client-side GlusterFS
I follow both lists quite some time longer than you or redhat have been here. The basic idea of the project is good, the implementation idea is mostly wrong. quite some users follow the lists and are still hoping for better times. Very few told you so now - again. If you are not acting political here, then ask yourself why nfs in userspace on linux has gone dead years ago long before you re-implemented it in glusterfs. I really don't want to teach you things everyone knowing the past should have understood. I am only a reminder. Don't revive the dinosaurs, they are extinct for good reasons. The future will not become better if you follow a path only because it's easy. I honor the original decision to use userspace, because it made implementation a lot easier and the goal was to show the whole thing is possible at all. But years have gone by and the time of becoming production ready has come for some time. And production readyness needs kernel modules. I made my point. Lets see how things look in a year or two. I will remind you - again. Regards, Stephan On Thu, 27 Dec 2012 18:33:10 -0500 (EST) John Mark Walker johnm...@redhat.com wrote: If you feel that our strategy on the client side is broken, while I respect that opinion, its kind of a pointless discussion. We made the architectural decisions we made understanding the tradeoffs as they were - which have been enumerated on this list numerous times. In any case, if you want to have an architectural discussion or debate, that's better directed towards gluster-devel, and that's a discussion we welcome. However, this list is gluster-users, which as the name implies, is about users of the software as it exists today, warts and all. Feel free to use the wiki to develop any thoughts you may have regarding ideal architectures. Even better if you can round up developers to implement said architecture. -JM Stephan von Krawczynski sk...@ithnet.com wrote: Dear JM, unfortunately one has to tell openly that the whole concept that is tried here is simply wrong. The problem is not the next-bug-to-fix. The problem is the client strategy in user space. It is broken by design. You can either believe this or go ahead ignoring it and never really get a good and stable setup. Really, the whole we-close-our-eyes-and-hope-it-will-turn-out-well strategy looks just like btrfs. Read the archives, I told them years ago it will not work out in our life time. And today, still they have no ready-for-production fs, and believe me: it never will be there. And the same goes for glusterfs. It _could_ be the greatest fs on earth, but only if you accept: 1) Throw away all non-linux code. Because this war is over since long. 2) Make a kernel based client/server implementation. Because it is the only way to acceptable performance. 3) Implement true undelete feature. Make delete a move to a deleted-files area. These are the minimal steps to take for a real success, everything else is just beating the dead horse. Regards, Stephan On Thu, 27 Dec 2012 10:03:10 -0500 (EST) John Mark Walker johnm...@redhat.com wrote: Look, fuse its issues that we all know about. Either it works for you or it doesn't. If fuse bothers you that much, look into libgfapi. Re: NFS - I'm trying to help track this down. Please either add your comment to an existing bug or create a new ticket. Either way, ranting won't solve your problem or inspire anyone to fix it. -JM Stephan von Krawczynski sk...@ithnet.com wrote: On Wed, 26 Dec 2012 22:04:09 -0800 Joe Julian j...@julianfamily.org wrote: It would probably be better to ask this with end-goal questions instead of with a unspecified critical feature list and performance problems. 6 months ago, for myself and quite an extensive (and often impressive) list of users there were no missing critical features nor was there any problems with performance. That's not to say that they did not meet your design specifications, but without those specs you're the only one who could evaluate that. Well, then the list of users does obviously not contain me ;-) The damn thing will only become impressive if a native kernel client module is done. FUSE is really a pain. And read my lips: the NFS implementation has general load/performance problems. Don't be surprised if it jumps into your face. Why on earth do they think linux has NFS as kernel implementation? -- Regards, Stephan ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users
Re: [Gluster-users] Meta-discussion
Sorry, JM forgive my ignorance, but it simply does not match up what you say. First you say: In general, I don't recommend any distributed filesystems for VM images, but I can also see that this is the wave of the future. Which means you do not believe at all in one major goal of this fs. Hu? And then: I am sorry that you haven't been able to deploy glusterfs in production. Discussing how and why glusterfs works - or doesn't work - for particular use cases is welcome on this list. Starting off a discussion about how the entire approach is unworkable is kind of counter-productive and not exactly helpful to those of us who just want to use the thing. Now how can you expect a productive input to a question where you yourself do not believe in an answer being possible at all. I mean, you expect it to fail anyway but nevertheless want people to spend their time? Most of us are _not_ paid for debugging glusterfs. Are you paid for it? And you do not believe in the project anyway (you said so above)? I am astonished ... Regards, Stephan Sean Fulton s...@gcnpublishing.com wrote: I didn't think his message violated any of your rules. Seems to me he has some disagreements with the approach being used to develop Gluster. I think you should listen to people who disagree with you. From monitoring this list for more than a year and tried--unsuccessfully--to put Gluster into production use, I think there are a lot of people who have problems with stability. So please, can you respond to his comments with why his suggestions are invalid? sean On 12/27/2012 03:39 PM, John Mark Walker wrote: Stephan, I'm going to make this as simple as possible. Every message to this list should follow these rules: 1. be helpful 2. be constructive 3. be respectful I will not tolerate ranting that serves no purpose. If your message doesn't follow any of the rules above, then you shouldn't be posting it. This is your 2nd warning. -JM ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users -- Sean Fulton GCN Publishing, Inc. Internet Design, Development and Consulting For Today's Media Companies http://www.gcnpublishing.com (203) 665-6211, x203 ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Renaming a file in a distributed volume
On Sat, 13 Oct 2012 15:52:56 +0100 Brian Candler b.cand...@pobox.com wrote: In a distributed volume (glusterfs 3.3), files within a directory are assigned to a brick by a hash of their filename, correct? So what happens if you do mv foo bar? Does the file get copied to another brick? Is this no longer an atomic operation? Thanks, Brian. In fact it has never been atomic. Take a look at my corresponding bug report from back then... You can use a small script to show it is not. -- Regards, Stephan ___ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Throughout over infiniband
On Mon, 10 Sep 2012 08:48:03 +0100 Brian Candler b.cand...@pobox.com wrote: On Sun, Sep 09, 2012 at 09:28:47PM +0100, Andrei Mikhailovsky wrote: While trying to figure out the cause of the bottleneck i've realised that the bottle neck is coming from the client side as running concurrent test from two clients would give me about 650mb/s per each client. Yes - so in workloads where you have many concurrent clients, this isn't a problem. It's only a problem if you have a single client doing a lot of sequential operations. That is not correct for most cases. GlusterFS always has a problem on clients with high workloads. This obviously derives from the fact that the FS is userspace-based. If other userspace applications eat lots of cpu your FS comes to a crawl. [...] Have you tried doing exactly the same test but over NFS? I didn't see that in your posting (you only mentioned NFS in the context of KVM) And as I said above NFS (kernel-version) does have no problem at all in these scenarios. And it does not have the GlusterFS-problems with multiple concurrent FS action on the same client, too. Neither there is a problem with maximum bandwidth. -- Regards, Stephan ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] XFS and MD RAID
On Mon, 10 Sep 2012 09:39:18 +0100 Brian Candler b.cand...@pobox.com wrote: On Mon, Sep 10, 2012 at 09:29:25AM +0800, Jack Wang wrote: below patch should fix your bug. Thank you Jack - that was a very quick response! I'm building a new kernel with this patch now and will report back. However, I think the existence of this bug suggests that Linux with software RAID is unsuitable for production use. There has obviously been no testing of basic critical functionality like hot-plugging drives, and serious regressions are introduced into supposedly stable kernels. Brian, please re-think this. What you call a stable kernel (Ubuntu 3.2.0-30) is indeed very old. If you want to check a MD raid you should really use a stock kernel from kernel.org (probably 3.4.10). _That_ is the latest stable kernel. So I'm now on the lookout for a 24-port SATA RAID controller with good Linux support. What are my options? Googling I have found: * 3ware 9650SE-24 * Areca ARC-1280ML * LSI MegaRAID 9280-24i (newer SAS/SATA) * Areca ARC-1882ix-24 (newer SAS/SATA) I can tell you that I just had to throw away Areca because it had exactly the problem you don't like: drives going offline for no good reason. I went back to MD with the very same drives in the very same box, online using the onboard SATA (6 ports) which works flawlessly. My impression is Areca has troubles with new big drives of 2 TB and above. The 1TB worked ok. I have some 3ware too, but did not check them with 2TB drives so far. I must say I would probably drop them only because current processors are faster with MD anyway. I just built a box with XEON E3-1280v2 with MD raid 4x2TB and I am impressed by the performance. -- Regards, Stephan ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] Throughout over infiniband
On Mon, 10 Sep 2012 09:44:26 +0100 Brian Candler b.cand...@pobox.com wrote: On Mon, Sep 10, 2012 at 10:03:14AM +0200, Stephan von Krawczynski wrote: Yes - so in workloads where you have many concurrent clients, this isn't a problem. It's only a problem if you have a single client doing a lot of sequential operations. That is not correct for most cases. GlusterFS always has a problem on clients with high workloads. This obviously derives from the fact that the FS is userspace-based. If other userspace applications eat lots of cpu your FS comes to a crawl. It's only obvious if your application is CPU-bound, rather than I/O-bound. I think one can drop the 5% market share that uses storage only for storing _big_ files from client boxes with zero load. This is about the only case where GlusterFS works ok if you don't mind the throughput problem of FUSE at high rates. If you have small files you are busted, if you have workload on the clients you are busted and if you have lots of concurrent FS action on the client you are busted. Which leaves you with test cases nowhere near real life. I replaced nfs servers with glusterfs and I know what's going on in these setups afterwards. If you're lucky you reach something like 1/3 of the NFS performance. -- Regards, Stephan ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] Throughout over infiniband
On Mon, 10 Sep 2012 08:06:51 -0400 Whit Blauvelt whit.glus...@transpect.com wrote: On Mon, Sep 10, 2012 at 11:13:11AM +0200, Stephan von Krawczynski wrote: [...] If you're lucky you reach something like 1/3 of the NFS performance. [Gluster NFS Client] Whit There is a reason why one would switch from NFS to GlusterFS, and mostly it is redundancy. If you start using a NFS-client type you cut yourself off the complete solution. As said elsewhere you can as well export GlusterFS via kernel-nfs-server. But honestly, it is a patch. It would be better by far if things are done right, native glusterfs client in kernel-space. And remember, generally there should be no big difference between NFS and GlusterFS with bricks spread over several networks - if it is done how it should be, without userspace. -- MfG, Stephan ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] Throughout over infiniband
Ok, now you can see why I am talking about dropping the long-gone unix versions (BSD/Solaris/name-one) and concentrate on doing a linux-kernel module for glusterfs without fuse overhead. It is the _only_ way to make this project a really successful one. Everything happening now is just a project pre-test environment. And saying that open is the reason why quite some people dislike my comments... Please stop riding dead horses guys. -- Regards, Stephan ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] Ownership changed to root
On Mon, 27 Aug 2012 18:43:27 +0100 Brian Candler b.cand...@pobox.com wrote: On Mon, Aug 27, 2012 at 03:08:21PM +0200, Stephan von Krawczynski wrote: The gluster version is 2.X and cannot be changed. Ah, that's the important bit. If you have a way to replicate the problem with current code it will be easier to get someone to look at it. Again, let me note two things: - the current code has a lot more (other) problems than the 2.X tree, that is why we won't use that. - if one has to look at the code to find out the basic problem he is not the target person of our question. -- Regards, Stephan ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] Ownership changed to root
On Tue, 28 Aug 2012 09:21:57 +0100 Brian Candler b.cand...@pobox.com wrote: On Tue, Aug 28, 2012 at 10:01:16AM +0200, Stephan von Krawczynski wrote: Again, let me note two things: - the current code has a lot more (other) problems than the 2.X tree, that is why we won't use that. - if one has to look at the code to find out the basic problem he is not the target person of our question. To which I would suggest that if such a fundamental problem were known about, it would have been fixed long ago. IMO your best bet is to raise a bug report in bugzilla. It is obvious I cannot do that because the only answer will be to update to a current version and re-file the report if the problem still persists. I am well aware though that the problem is quite fundamental for a fs... -- Regards, Stephan ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] [Gluster-devel] FeedBack Requested : Changes to CLI output of 'peer status'
Top posting and kidding is a bit exaggerated for one posting ... You are not seriously talking about 80 char terminals for an output that is commonly used by scripts and stuff like nagios, are you? On Tue, 28 Aug 2012 08:46:22 -0400 (EDT) Pranith Kumar Karampuri pkara...@redhat.com wrote: hi Amar, This is the format we considered initially but we did not go with this because it may exceed 80 chars and wrap over for small terminals if we want to add more fields in future. Pranith. - Original Message - From: Amar Tumballi ama...@redhat.com To: Gluster Devel gluster-de...@nongnu.org, gluster-users gluster-users@gluster.org Sent: Tuesday, August 28, 2012 4:36:07 PM Subject: [Gluster-users] FeedBack Requested : Changes to CLI output of 'peer status' Hi, Wanted to check if any one is using gluster CLI output of 'peer status' in their scripts/programs? If yes, let me know. If not, we are trying to make it more script friendly. For example the current output would look something like: - Hostname: 10.70.36.7 Uuid: c7283ee7-0e8d-4cb8-8552-a63ab05deaa7 State: Peer in Cluster (Connected) Hostname: 10.70.36.6 Uuid: 5a2fdeb3-e63e-4e56-aebe-8b68a5abfcef State: Peer in Cluster (Connected) - New changes would make it look like : --- UUID Hostname Status c7283ee7-0e8d-4cb8-8552-a63ab05deaa7 10.70.36.7 Connected 5a2fdeb3-e63e-4e56-aebe-8b68a5abfcef 10.70.36.6 Connected --- If anyone has better format, or want more information, let us know now. I would keep timeout for this mail as 3 more working days, and without any response, we will go ahead with the change. Regards, Amar ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users ___ Gluster-devel mailing list gluster-de...@nongnu.org https://lists.nongnu.org/mailman/listinfo/gluster-devel -- Regards, Stephan ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] [Gluster-devel] FeedBack Requested : Changes to CLI output of 'peer status'
Ok, maybe I didn't explain the true nature in detail: The number of fields and the formatting is all the same. nobody wants to read the output. Instead it is read by scripts most of the time. so the only valid question is the field delimiter, simply to make the output parseable as easy as possible for some scripts. There is no human in front of a terminal who really likes to read this output all day long. Does that make the point clear? On Tue, 28 Aug 2012 09:57:13 -0400 (EDT) Pranith Kumar Karampuri pkara...@redhat.com wrote: No. Output formats in that way generally start out nice but as you start adding more fields, formatting them becomes difficult IMO. Pranith - Original Message - From: Stephan von Krawczynski sk...@ithnet.com To: Pranith Kumar Karampuri pkara...@redhat.com Cc: gluster-users gluster-users@gluster.org, Gluster Devel gluster-de...@nongnu.org Sent: Tuesday, August 28, 2012 7:01:57 PM Subject: Re: [Gluster-devel] [Gluster-users] FeedBack Requested : Changes to CLI output of 'peer status' Top posting and kidding is a bit exaggerated for one posting ... You are not seriously talking about 80 char terminals for an output that is commonly used by scripts and stuff like nagios, are you? On Tue, 28 Aug 2012 08:46:22 -0400 (EDT) Pranith Kumar Karampuri pkara...@redhat.com wrote: hi Amar, This is the format we considered initially but we did not go with this because it may exceed 80 chars and wrap over for small terminals if we want to add more fields in future. Pranith. - Original Message - From: Amar Tumballi ama...@redhat.com To: Gluster Devel gluster-de...@nongnu.org, gluster-users gluster-users@gluster.org Sent: Tuesday, August 28, 2012 4:36:07 PM Subject: [Gluster-users] FeedBack Requested : Changes to CLI output of 'peerstatus' Hi, Wanted to check if any one is using gluster CLI output of 'peer status' in their scripts/programs? If yes, let me know. If not, we are trying to make it more script friendly. For example the current output would look something like: - Hostname: 10.70.36.7 Uuid: c7283ee7-0e8d-4cb8-8552-a63ab05deaa7 State: Peer in Cluster (Connected) Hostname: 10.70.36.6 Uuid: 5a2fdeb3-e63e-4e56-aebe-8b68a5abfcef State: Peer in Cluster (Connected) - New changes would make it look like : --- UUID Hostname Status c7283ee7-0e8d-4cb8-8552-a63ab05deaa7 10.70.36.7 Connected 5a2fdeb3-e63e-4e56-aebe-8b68a5abfcef 10.70.36.6 Connected --- If anyone has better format, or want more information, let us know now. I would keep timeout for this mail as 3 more working days, and without any response, we will go ahead with the change. Regards, Amar ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users ___ Gluster-devel mailing list gluster-de...@nongnu.org https://lists.nongnu.org/mailman/listinfo/gluster-devel -- Regards, Stephan ___ Gluster-devel mailing list gluster-de...@nongnu.org https://lists.nongnu.org/mailman/listinfo/gluster-devel -- MfG, Stephan von Krawczynski -- ith Kommunikationstechnik GmbH Lieferanschrift : Reiterstrasse 24, D-94447 Plattling Telefon : +49 9931 9188 0 Fax : +49 9931 9188 44 Geschaeftsfuehrer: Stephan von Krawczynski Registergericht : Deggendorf HRB 1625 -- ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] Ownership changed to root
On Sun, 26 Aug 2012 20:01:20 +0100 Brian Candler b.cand...@pobox.com wrote: On Sun, Aug 26, 2012 at 03:50:16PM +0200, Stephan von Krawczynski wrote: I'd like to point you to [Gluster-devel] Specific bug question dated few days ago, where I describe a trivial situation when owner changes on a brick can occur, asking if someone can point me to a patch for that. I guess this is http://lists.gnu.org/archive/html/gluster-devel/2012-08/msg00130.html ? This could be helpful but as far as I can see a lot of important information is missing: e.g. what glusterfs version you are using, what operating system and kernel version, what underlying filesystem is used for the bricks. Is the volume mounted on a separate client machine, or on one of the brick servers? gluster volume info would be useful too. In fact I wrote the pieces of information that seemed really important for me, only they seem unclear. The setup has two independant hardware bricks and one client (on seperate hardware). It is an all-linux setup with ext4 on the bricks. The kernel versions are really of no use because I tested quite some and the behaviour is always the same. The problem has to do with the load on the client which is about the only sure thing I can say. The gluster version is 2.X and cannot be changed. AFAIK the glusterfsd versions are not downward compatible to a point where one can build a setup with one brick 2.X and the other 3.X, which is - if true - a general design flaw amongst others. I did in fact not intend to enter a big discussion about the point. I thought there must be at least one person knowing the code to an extent where my question can be answered immediately with one sentence. All you have to know is how it may be possible that a mv command overruns a former one that should in fact have already completed its job, because it exited successfully. Regards, Brian. -- Regards, Stephan ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] Ownership changed to root
On Sun, 26 Aug 2012 08:53:33 +0100 Brian Candler b.cand...@pobox.com wrote: On Fri, Aug 24, 2012 at 07:45:35PM -0600, Joe Topjian wrote: This removed mdadm and LVM out of the equation and the problem went away. I then tried with just LVM and still did not see this problem. Unfortunately I don't have enough hardware at the moment to create another RAID1 mirror, so I can't single that out. I will try when I get a chance -- unless anyone else knows if it would cause a problem? Or maybe it is the mdamd+LVM combination? This sounds extremely unlikely. mdadm and LVM both work at the block device layer - reading and writing 512-byte blocks. They have no understanding of filesystems and no understanding of user IDs. I suspect there were other differences between the tests. For example, did you do one with an ext4 filesystem and one with xfs? Or did you have a failed drive in your RAID1, which meant that some writes were timing out? FWIW, I've also seen the files owned by root occasionally in testing, but wasn't able to pin down the cause. Regards, Brian. Hello, I'd like to point you to [Gluster-devel] Specific bug question dated few days ago, where I describe a trivial situation when owner changes on a brick can occur, asking if someone can point me to a patch for that. If you have no replication setup (like mine) where the other brick may help you around the owner change then you may possibly see a real change on your fs. I don't know if your bug is the same, but its nature and cause may be the same. I have not received any answers to the topic so far. -- Regards, Stephan ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] 3.2.2 Performance Issue
On Wed, 10 Aug 2011 12:08:39 -0700 Mohit Anchlia mohitanch...@gmail.com wrote: Did you run dd tests on all your servers? Could it be one of the disk is slower? On Wed, Aug 10, 2011 at 10:51 AM, Joey McDonald j...@scare.org wrote: Hi Joe, thanks for your response! An order of magnitude slower with replication. What's going on I wonder? Thanks for any suggestions. You are dealing with contention for Gigabit bandwidth. Replication will do that, and will be pronounced over 1GbE. Much less of an issue over 10GbE or Infiniband. If that was a GBit contention you can check out by spreading your boxes over different switches. That should prevent a contention problem. Unfortunately I can tell you it did not help on our side, so we doubt the explanation. -- Regards, Stephan ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] 3.2.2 Performance Issue
On Thu, 11 Aug 2011 09:13:53 -0400 Joe Landman land...@scalableinformatics.com wrote: On 08/11/2011 09:11 AM, Burnash, James wrote: Cogently put and helpful, Joe. Thanks. I'm filing this under good answers to frequently asked technical questions. You have a number of spots in that archive already :-) Thanks :) Unfortunately he failed to understand my point. Obviously I was not talking about simply _supplying_ more switches, I talked about _spreading_ the network over several switches. This means you take a client that has at least two GBit Ports and connect your two gluster servers (bricks) to one each. Obviously you can do the same with a bigger number of bricks, it only depends on the number of interfaces your client has. This means contention is not possible by accessing several bricks at the same time in a replication setup. But as told before, the problem of bad performance did not go away for us. -- Regards, Stephan ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] gluster 3.2.0 - totally broken?
On Sat, 21 May 2011 13:27:38 +0200 Tomasz Chmielewski man...@wpkg.org wrote: If you found a bug, and even more, it's repeatable for you, please file a bug report and describe the way to reproduce it. Ha, very sorry that the project is not an easy-go for a dev. Creating reproducable setups for software spreading over 3 or more boxes is a pretty complex thing to do. And even if something is reproducable on my side that does not mean it is with _other_ hardware and the same setup on the devs' side. Drop the idea this can be debugged with the same strategy you debug hello world. I stopped to look at the bugs long ago because the software does not give you a chance to even find out when a problem started. If you want to see something where you can find out yourself about what is going on look at netfilter. There you have tables and output in /proc about ongoing nats and open connections (connection-tracker). In glusterfs you have exactly nothing, and if you stop the replication setup at some point you need to ls terabytes of data to find the not-synced files. This is complete nonsense and not worth looking at it. If you need input, how about reading udo? I already mentioned the bugs that seem to describe the same problems. I really do not think that creating new ones describing the same problems would help. Maybe the old ones should be reopened. These bugs mentioned in: http://gluster.org/pipermail/gluster-users/2011-May/007619.html are basically the same. Currently I really do not know how to describe/analyze the problem further. ? Initiating flame discussions is not really a good development model. I did not start the topic, but I can well imagine the feelings of the first poster. I was in the same situation more than a year ago and had to find out that nobody cares to improve the fundamental strategy. And that people still find out the same - months later - is the real bad news. I have no doubts that we read the same topics with new version number in a year. -- Tomasz Chmielewski http://wpkg.org -- Regards, Stephan ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] gluster 3.2.0 - totally broken?
On Fri, 20 May 2011 17:01:22 +0200 Tomasz Chmielewski man...@wpkg.org wrote: On 20.05.2011 15:51, Stephan von Krawczynski wrote: most of them are just an outcome of not being able to find a working i.e. best solution for a problem. cache-timeout? thread-count? quick-read? stat-prefetch? Gimme a break. Being a fs I'd even say all the cache-size paras are bogus. When did you last tune the ext4 cache size or timeout? Don't come up with ext4 being kernel vs. userspace fs. It was their decision to make it userspace, so don't blame me. As a fs with networking it has to take the comparison with nfs - as most interested users come from nfs. Ever heard of fsc (FS-Cache), To my knowledge there is no persistent (disk-based) caching in glusterfs at all ... acreg*, acdir*, actimeo options for NFS? ... as well as options only dealing with caching of file/dir attributes. You are talking about completely different things here. If you want to argue about that you should probably _request_ these types of options additionally to the already existing ones. Yes, they are related to cache, and oh, NFS is kernelspace. And yes, there are tunable timeout options for NFS as well. The only reasonable configurable timeout in nfs is the rpc timeout. As of timeout options with ext4, or any other local filesystem - if you ever used iSCSI, you would also discover that it's recommended to set reasonable timeout options there as well, depending on your network infrastructure and usage/maintenance patterns. Incidentally, iSCSI is also kernelspace. And is it incidentally as slow as glusterfs in the same environment? Not? And did you ever manage to hard freeze your boxes with it? To show double files? Not being able to open existing files? Wrong filedates? Wrong UIDs/GIDs? Shall I continue to name problems we saw through all tested versions of glusterfs? I don't because I dropped the idea that it would be helpful at all. If you want to share helpful information tell us how you would default-configure glusterfs so it is equally performing to nfs in most cases. If you can't, what is your point then? -- Tomasz Chmielewski http://wpkg.org -- Regards, Stephan ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] gluster 3.2.0 - totally broken?
On Wed, 18 May 2011 13:16:59 -0700 Anand Babu Periasamy a...@gluster.com wrote: GlusterFS is completely free. Same versions released to the community are used for commercial deployments too. Their issues gets higher priority though. Code related to other proprietary software such as VMWare, AWS, RightScale are kept proprietary. We acknowledge that we have done a poor job when it comes to managing community, documentation and bug tracking. While we improved a lot since 2.x versions, I agree we are not there yet. We hired a lot of engineers to specifically focus on testing and bug fixes recently. QA team is growing steadily. Lab size has been doubled. New QA lead is joining us next month. QA team will have closer interaction with the community moving forward. We also appointed Dave Garnett from HP as VP product manager and Vidya Sakar from Sun/Oracle as Engineering manager. We fully understand the importance of community. Paid vs Non-paid should not matter when it comes to quality of software. Intangible contributions from the community are equally valuable to the success of GlusterFS project. We have appointed John Mark Walker as community manager. We launched community.gluster.org site recently. Starting next month, we will have regular community sessions. Problems raised by the community will also get prioritized. We are redoing the documentation completely. New system will be based on Red Hat's Publican. Documentation team too will closely work with the community. *Criticisms are taken positively. So please don't hesitate.* Thanks! -ab Sorry, this clearly shows the problem: understanding. It really does not help you a lot to hire a big number of people, you do not fail in terms of business relation. Your problem is the _code_. You need a filesystem expert. A _real_ one, not _some_ one. Like lets say Daniel Phillips, Theodore Ted Ts'o or the like. -- Regards, Stephan ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] gluster 3.2.0 - totally broken?
On Fri, 20 May 2011 08:35:35 -0400 Jeff Darcy jda...@redhat.com wrote: On 05/20/2011 05:15 AM, Stephan von Krawczynski wrote: Sorry, this clearly shows the problem: understanding. It really does not help you a lot to hire a big number of people, you do not fail in terms of business relation. Your problem is the _code_. You need a filesystem expert. A _real_ one, not _some_ one. Like lets say Daniel Phillips, Theodore Ted Ts'o or the like. I know both Daniel and Ted professionally. As a member of the largest Linux filesystem group in the world, I am also privileged to work with many other world-class filesystem experts. I also know the Gluster folks quite well, and I can assure you that they have all the filesystem expertise they need. They also have a *second* kind of expertise - distributed systems - which is even more critical to this work and which the vast majority of filesystem developers lack. What Gluster needs is not more filesystem experts but more *other kinds* of experts as well as non-experts and resources. The actions AB has mentioned are IMO exactly those Gluster should be taking, and should be appreciated as such by any knowledgeable observer. Your flames are not only counter-productive but factually incorrect as well. Please, if only for the sake of your own reputation, try to do better. Forgive my ignorance Jeff, but it is obvious to anyone having used glusterfs for months or years that the guys have a serious software design issue. If you look at the tuning options configurable in glusterfs you should notice that most of them are just an outcome of not being able to find a working i.e. best solution for a problem. cache-timeout? thread-count? quick-read? stat-prefetch? Gimme a break. Being a fs I'd even say all the cache-size paras are bogus. When did you last tune the ext4 cache size or timeout? Don't come up with ext4 being kernel vs. userspace fs. It was their decision to make it userspace, so don't blame me. As a fs with networking it has to take the comparison with nfs - as most interested users come from nfs. The first thing they experience is that glusterfs is really slow compared to their old setups with nfs. And the cause is _not_ replication per se. And as long as they cannot cope with nfs performance my argument stands: they have a problem,be it inferior per design or per coding. As you see I am not talking at all about things that I count as basics in a replication fs. I mean, really, I cannot express my feelings about the lack of information for the admin around replication. Its pretty much like a wheel of your car just fell off and you cannot find out which one. Would you trust that car? Let me clearly state this: the idea is quite brilliant, but the coding is at the stage of a design study and could have been far better if they only concentrated on the basics. If you want to build a house you don't buy the tv set at first... -- Regards, Stephan ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] gluster 3.2.0 - totally broken?
On Wed, 18 May 2011 14:45:19 +0200 Udo Waechter udo.waech...@uni-osnabrueck.de wrote: Hi there, after reporting some trouble with group access permissions, http://gluster.org/pipermail/gluster-users/2011-May/007619.html (which still persist, btw.) things get worse and worse with each day. [...] Currently our only option seems to be to go away from glusterfs to some other filesystem which would be a bitter decission. Thanks for any help, udo. Hello Udo, unfortunately I can only confirm your problems. The last known-to-work version we see is 2.0.9. Everything beyond is just bogus. 3.X did not solve a single issue but brought quite a lot of new ones instead. The project only gained featurism but did not solve the very basic problems. Up to the current day there is no way to see a list of not-synced files on a replication setup, that is ridiculous. I hope ever since 2.0.9 that someone does a fork and really attacks the basics. IOW: good idea, pretty bad implementation, no will to listen or learn. Regards, Stephan -- Institute of Cognitive Science - System Administration Team Albrechtstrasse 28 - 49076 Osnabrueck - Germany Tel: +49-541-969-3362 - Fax: +49-541-969-3361 https://doc.ikw.uni-osnabrueck.de ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] Seeking Feedback on Gluster Development Priorities/Roadmap
How about the _basics_ of such a fs? Create an answer to the still unresolved question: What files are currently not in-sync? From the very first day of glusterfs there is no answer to this fundamental question for the user. No way to monitor the real state of a replicating fs up to the current day. -- Regards, Stephan ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] very bad performance on small files
On Sun, 16 Jan 2011 02:45:50 +0530 Anand Avati anand.av...@gmail.com wrote: In any case comparing to local disk performance and network disk performance is never right and is always misleading. Avati This statement is fundamentally broken. -- Regards, Stephan ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] ReiserFS problems
On Sun, 2 Jan 2011 12:18:08 +0100 nurdin david duchn...@free.fr wrote: Hello, When i launch the server glusterFS on a reiserFS partition i got this error : [2011-01-02 12:17:20.269951] C [posix.c:4313:init] posix: Extended attribute not supported, exiting. [2011-01-02 12:17:20.269973] E [xlator.c:909:xlator_init] posix: Initialization of volume 'posix' failed, review your volfile again And strace is : stat64(/data/export, {st_mode=S_IFDIR|0755, st_size=48, ...}) = 0 lsetxattr(/data/export, trusted.glusterfs.test, working, 8, 0) = -1 EDQUOT (Disk quota exceeded) gettimeofday({1293965111, 13398}, NULL) = 0 I turn ON Xattr on the partition mount : /dev/mapper/pve-data on /var/lib/vz type reiserfs (rw,attrs,acl,user_xattr) Have u got an idea ? Thanks Yes. Don't use reiserfs. Even if you manage to get it working (which _is_ possible) you will find out that its performance regarding xattrs is pretty bad. Take this advice: use ext3 for this case. -- Regards, Stephan ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] GlusterFS replica question
Which is a regression compared to 2.X btw... On Wed, 1 Dec 2010 02:40:53 -0600 (CST) Raghavendra Bhat raghavendrab...@gluster.com wrote: If you create a volume with only one brick, and then add one more brick to the volume then, the volume will be of distribute type and not replicate. If replica feature is neede , then a replicate volume itself should be created and to create replicate volume minimum 2 bricks are needed. - Original Message - From: Raghavendra G raghaven...@gluster.com To: raveenpl ravee...@gmail.com Cc: gluster-users@gluster.org Sent: Wednesday, December 1, 2010 12:52:03 PM Subject: Re: [Gluster-users] GlusterFS replica question Yes, it is possible in 3.1.x without downtime. - Original Message - From: raveenpl ravee...@gmail.com To: gluster-users@gluster.org Sent: Sunday, November 28, 2010 2:54:13 AM Subject: [Gluster-users] GlusterFS replica question Hi, For small lab environment I want to use GlusterFS with only ONE node. After some time I would like to add the second node as the redundant node (replica). Is it possible in GlusterFS 3.1 without downtime? Cheers PK ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users -- MfG, Stephan von Krawczynski -- ith Kommunikationstechnik GmbH Lieferanschrift : Reiterstrasse 24, D-94447 Plattling Telefon : +49 9931 9188 0 Fax : +49 9931 9188 44 Geschaeftsfuehrer: Stephan von Krawczynski Registergericht : Deggendorf HRB 1625 -- ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] Gluster client 32bit
On Tue, 16 Nov 2010 16:54:07 -0800 Craig Carl cr...@gluster.com wrote: Stephan - Based on your feedback, and from other members of the community we have opened discussions internally around adding support for a 32-bit client. We have not made a decision at this point, and I can't make any guarantees but I will do my best to get it added to the next version of the product (3.1.2, (3.1.1 is feature locked)). On the sync question you brought up that is only an issue in the rare case of split brain (if I understand the scenario you've brought up). Split brain is a difficult problem with no answer right now. Gluster 3.1 added much more aggressive locking to reduce the possibility of split brain. The process you described as ...the deamons are talking with each other about whatever... will also reduce the likelihood of split brain by eliminating the possibility that client or server vol files are not the same across the entire cluster, the cause of a vast majority of split brain issues with Gluster. Auto heal is slow, we have some processes along the lines you are thinking, please let me know if these address some of your ideas around stat - #cd gluster mount #find ./ -type f -exec stat /backend device’{}’ \; this will heal only the files on that device. If you know when you had a failure you want to recover from this is even faster - #cd gluster mount #find ./ -type f -mmin minutes since failure+ some extra -exec stat /backend device’{}’ \; this will heal only the files on that device changed x or more minutes ago. Thanks, Craig Hello Craig, let me repeat a very old suggestion (in fact I believe it was before your time at gluster). I suggested to create a module (for server) that does only one thing: maintain a special file in a way that a filename (with path) is added to it when the server sets acls meaning the file is currently not in sync. When acls are set to the file that mean it is in sync remove the filename from the list again. Lets say this special file is named /.glusterfs-server-ip (root of the mounted glusterfs). Now that would allow you to have a look at _all_ files on _all_ servers not in sync from the clients view. All you had to do for healing is to stat only these filelists and you are done. You can simply drop the auto-healing, because you could as well do a cronjob for that now as there is no find involved the whole method uses virtually no resources on the servers and clients. You have full control, you know what files on what servers are out-of-sync. This solves all possible questions around replication. Regards, Stephan ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] Gluster client 32bit
On Tue, 16 Nov 2010 08:51:17 -0800 Jeff Anderson-Lee jo...@eecs.berkeley.edu wrote: On 11/16/2010 05:36 AM, Stefano Baronio wrote: Hi MArtin, the XenServer Dom0 is 32bit whilst the hypervisor is 64 bit. You need to know it when you install third part sw on the host. http://forums.citrix.com/thread.jspa?threadID=269924tstart=0 So I need the 32bit compiled version to be able to mount glusterfs directly from the XenServer host. The built-in NFS module is typically as fast or faster than using the fuse wrapper on the client side. So the best way to support 32-bit clients is likely via NFS. NFS is really something completely different. And - what is also ignored - the infrastructure usage is completely different when using nfs. nfs does not replicate at the client side, which means that the data paths explicitly built for client replication are useless for nfs. Using the nfs translator leads to server-server replication. For that case a data path exclusively used for this server traffic would be best (because it cannot interfere with 64 bit client replication). So if you happen to upgrade a 2.0.9 setup with 64 bit servers and 64 as well as 32 bit clients you have to redesign the network for best performance _and_ glusterfsd on the servers have to use the shortest data path for the nfss' data replication (which I don't know if they are able to do that at all). In other words: whereas the setup in 2.0.9 was clear and simple, the very same usage case in 3.X is a _mess_. Obviously nobody really thought about that - unbelievable for me as it is really obvious. But I got accustomed to that situation because up to the current day there is no solution for another most obvious problem: which files are not in sync in a replication setup? There is no trivial answer to this question I already brought up in early 2.X development phase... How can you sell someone a storage platform if you're unable to answer such an essential question? Really, nobody needed auto-healing. All you need is the answer to this question and then stat exactly this file list at a time _of your choice_. The good thing about 2.0.X was that you as an admin had quite full control over things. in 3.X you have exactly nothing, the deamons are talking with each other about whatever and hopefully things work out. That is no setup I want to be an admin. Regards, Stephan Cheers Stefano 2010/11/16 Deadpan110deadpan...@gmail.com My home testing environment I also use XenServer (again, Citrix - with a Centos minimalistic core OS) - even though the Dom0 is 64bit, in any Xen setup (maybe even for other virtuali[s\z]ation solutions), performance is better using 32bit VM's (DomU). My production environment comprises of Xen virtual machines (not XenServer, but still Xen), scattered around a remote datacenter. I too will be sharing my experiences as GlusterFS offers exactly what I need and would like to deploy. Martin On 16 November 2010 20:39, Stefano Baroniostefano.baro...@gmail.com wrote: From my point of view, 64 bit on server side is easy to handle but the client side can have different needs and limitations. For example, we are using XenServer from Citrix, the Dom0 is taken from a CentOS 5 distro and it is 32bit. I cannot change that, because is a Citrix design choice and there might be lots of these situations around. Sorry but I can't code any patches.. Anyway, I will share what our experience will be with 32bit client. Cheers Stefano 2010/11/16 Bernard Libern...@vanhpc.org Hi Christian: On Tue, Nov 16, 2010 at 1:34 AM, Christian Fischer christian.fisc...@easterngraphics.com wrote: No statement from the developers about usability of glusterfs client on 32bit systems. But this was probably discussed in earlier threads. I believe the official comment is that Gluster is not going to support 32-bit systems. However, it doesn't mean that the community cannot support it. If we find bugs and can code up patches, we should still file a bug and submit the patches and hopefully they will be checked into the official repository. Cheers, Bernard ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org
[Gluster-users] GlusterFS on mailservers
Hi all, I just read this one on the dovecot web: --- FUSE / GlusterFS FUSE caches dentries and file attributes internally. If you're using multiple GlusterFS clients to access the same mailboxes, you're going to have problems. Worst of these problems can be avoided by using NFS cache flushes, which just happen to work with FUSE as well: mail_nfs_index = yes mail_nfs_storage = yes These probably don't work perfectly. Can someone comment on that? Does anybody use glusterfs as a storage for mailboxes/mailfolders ? -- Regards, Stephan ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] GlusterFS on mailservers
On Mon, 15 Nov 2010 06:25:23 -0800 Craig Carl cr...@gluster.com wrote: On 11/15/2010 04:57 AM, Stephan von Krawczynski wrote: Hi all, I just read this one on the dovecot web: --- FUSE / GlusterFS FUSE caches dentries and file attributes internally. If you're using multiple GlusterFS clients to access the same mailboxes, you're going to have problems. Worst of these problems can be avoided by using NFS cache flushes, which just happen to work with FUSE as well: mail_nfs_index = yes mail_nfs_storage = yes These probably don't work perfectly. Can someone comment on that? Does anybody use glusterfs as a storage for mailboxes/mailfolders ? Stephan - Dovecot has been a challenge in the past. We don't specifically test with it here, if you are interested in using it with Gluster I would suggest testing with 3.1.1, and always keep the index files local, that makes a big difference. Thanks, Craig Well, Craig, I cannot follow your advice as these are 32 bit clients and AFAIK you said 3.1.1 is not expected to be used in such an environment. Really quite a lot of interesting setups for glusterfs turn around mail servers, I judge it to be a major deficiency if the fs cannot be used for such purposes. You cannot expect voting for glusterfs if there are other options that have no problems with such a standard setup. I mean is there something more obvious than mailservers for such a fs? Honestly, I got the impression that you're heading away from the mainstream fs usage to very special environments and usage patterns. I feel very sorry about that because 2.X looked very promising. But I did not find a single setup where 3.X could be used at all. -- Craig Carl Senior Systems Engineer Gluster -- Regards, Stephan ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] GlusterFS on mailservers
On Mon, 15 Nov 2010 10:18:28 -0500 Joe Landman land...@scalableinformatics.com wrote: On 11/15/2010 09:47 AM, Stephan von Krawczynski wrote: Stephan - Dovecot has been a challenge in the past. We don't specifically test with it here, if you are interested in using it with Gluster I would suggest testing with 3.1.1, and always keep the index files local, that makes a big difference. Thanks, Craig Well, Craig, I cannot follow your advice as these are 32 bit clients and AFAIK you said 3.1.1 is not expected to be used in such an environment. Really quite a lot of interesting setups for glusterfs turn around mail servers, I judge it to be a major deficiency if the fs cannot be used for such Quick interjection here: We have some customers using Dovecot on our storage units with GlusterFS 3.0.x. There are some issues, usually interactions between dovecot and fuse/glusterfs. Nothing that can't be worked around. Well, a work-around is not the same as just working. Do you really think that it is no sign of a problem if you need a work-around for a pretty standard usage request? We are seeing strong/growing interest from our customer base in this use case. Well, that means I am right, not? Craig's advice is spot on. purposes. You cannot expect voting for glusterfs if there are other options that have no problems with such a standard setup. I mean is there something more obvious than mailservers for such a fs? Hmmm ... apart from NFS (which isn't a cluster file system), which has a number of its own issues, which other cluster file system are you referring to, that don't have these sorts of issues? Small file and small record performance on any sort of cluster file system is very hard. You have to get it right first, and then work on the performance side later. I am not talking of performance currently (though argueable), I am talking about the shere basic usage. Probably a lot of potential users come from nfs setups and want to make them redundant. And none has ever heard of a fs problem with 32 bit clients (just as an example) ... So this is an obvious problem. Dovecot has been a challenge in the past, well, and how does the fs currently cope with this challenge? I am no supporter of the idea that fs tuning should be necessary just to make something work at all. For faster performance let there be tuning options, but for general support of a certain environment? I mean, did you ever tune fat,ntfs,extX or the like just to make email work? And don't argue about them not being network related: the simple truth is that this product is only a big hit if it is as easy to deploy as a local fs. That should be the primary goal. Honestly, I got the impression that you're heading away from the mainstream fs usage to very special environments and usage patterns. I feel very sorry about that because 2.X looked very promising. But I did not find a single setup where 3.X could be used at all. While I respect your opinion, I do disagree with it. In our opinion 3.1.x has gotten better than 3.0.x, which was a huge step up from 2.0.x. 2.0.x was something like a filesystem, 3.X is obviously heading to be a storage platform. That makes a big difference. And I'd say it did not get really better in general comparing apples to apples. glusterfs 2.0.x is a lot closer to a useable filesystem (lets say on linux boxes) than glusterfs 3.X is to netapp or emc storage platforms. There is nothing comparable to glusterfs 2.0.X on its boxes whereas one cannot really choose glusterfs storage in comparison to netapp. I mean you're trying to enter the wrong league because the big players will just crash you. Regards, Joe -- Joseph Landman, Ph.D Founder and CEO Scalable Informatics, Inc. email: land...@scalableinformatics.com web : http://scalableinformatics.com http://scalableinformatics.com/jackrabbit phone: +1 734 786 8423 x121 fax : +1 866 888 3112 cell : +1 734 612 4615 ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users -- Regards, Stephan ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] GlusterFS on mailservers
On Mon, 15 Nov 2010 12:17:48 -0800 Craig Carl cr...@gluster.com wrote: Please don't think we are not working hard to meet your expectations. Really, Craig, I am not expecting _anything_ for _me_ from glusterfs. I only feel very sorry for an interesting project that gave a great vision but choose featurism over completely solving basic requirements of a fs, not to mention trivial expectations concerning a replication setup - which should have been a true strength. At a higher level Gluster is changing, and I think improving based on feedback from the community, our paid subscribers and the storage industry as a whole. Designing and writing a file system that is used on thousands of servers in less than 3 years was, and is incredibly challenging, and expensive. Contrast Gluster with another excellent file system project, brtfs, which also has paid engineering resources and is still very experimental [1]. I really don't want to talk about btrfs here, because its problems are unrelated to glusterfs problems. Our community asked for a couple of things from Gluster 3.1; Well, honestly, whatever the community asked, you managed to create the first project I have seen in more than a decade that is not able to upgrade its older versions because trivial deployment setups have just been _dropped_. I cannot remember ever seeing something like this before. That is really outstanding. Thanks, Craig -- Craig Carl Senior Systems Engineer Gluster -- Regards, Stephan ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] upgrading from 2.0.9 to 3.1, any gotchas?
On Fri, 12 Nov 2010 18:26:11 -0800 Liam Slusser lslus...@gmail.com wrote: Hey Gluster Users, Been awhile since i've posted here. I'm looking to upgrade our 150tb 10 brick cluster from 2.0.9 to 3.1. Is there any gotcha's that i should be aware of? Anybody run into any problems? Any suggestions or hints would be most helpful. I hoping the new Gluster will be a bit more forgiving on split brain issues and an increase in performance is always welcome. thanks, liam You will loose your 32bit clients if you have some... -- Regards, Stephan ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] Gluster client 32bit
I can tell you that 3.1 does not compile under 32bit on my box - I tried lately. Honestly I find it a bit strange not to support 32 bit clients as there are lots of them - and 2.9 did work on 32 bit. Which means you cannot upgrade such setups. Regards, Stephan On Sat, 13 Nov 2010 01:17:05 +1030 Deadpan110 deadpan...@gmail.com wrote: It should work... but it is very unsupported by the devs... USE AT YOUR OWN RISK... I successfully used glusterfs 3.1.0 for a while on Ubuntu Lucid 32bit - the only problems i encountered are a few of the ones recently discussed in this mailing list for 64bit. I will be implementing it again soon - I hope! Martin On 13 November 2010 00:54, Christian Fischer christian.fisc...@easterngraphics.com wrote: On Friday 12 November 2010 11:29:52 Bernard Li wrote: Hi Stefano: On Fri, Nov 12, 2010 at 2:18 AM, Stefano Baronio stefano.baro...@gmail.com wrote: is there a way to have a 32bit Glusterfs client? You can definitely build it yourself, but it is not officially supported by Gluster. They recommend you use GlusterFS on 64-bit architecture servers. Someone knows the reason for it? Are problems to expect on 32bit architecture? Cheers, Bernard ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] Some client problems with TCP-only NFS in Gluster 3.1
On Fri, 22 Oct 2010 04:46:44 -0500 (CDT) Craig Carl cr...@gluster.com wrote: {Resending due to incomplete response] Brent, Thanks for your feedback . To mount with a Solaris client use - ` mount -o proto=tcp,vers=3 nfs://SERVER-ADDR:38467/EXPORT MNT-POINT` As to UDP access we want to force users to use TCP. Everything about Gluster is designed to be fast , as NFS over UDP approaches line speed it becomes increasingly inefficient, [1] we want to avoid that. I have updated our documentation to reflect the required tcp option and Solaris instructions. [1] http://nfs.sourceforge.net/#faq_b10 Sorry to jump in at this point. If you read the FAQ you may have noticed that the problem only hits very ancient boxes with kernels lower 2.4.20 (!). On contrary you gain a real problem with NFS over TCP if you are experiencing even very minor packet loss. Your tcp-based server comes to a crawl in such a scenario. In fact we completely dropped the idea of NFS over TCP exactly for that reason. We never experienced any performance problem with NFS over UDP. Regards, Stephan Thanks again, Craig -- Craig Carl Senior Systems Engineer Gluster From: Brent A Nelson br...@phys.ufl.edu To: gluster-users@gluster.org Sent: Thursday, October 21, 2010 8:18:02 AM Subject: [Gluster-users] Some client problems with TCP-only NFS in Gluster 3.1 I see that the built-in NFS support registers mountd in portmap only with tcp and not udp. While this makes sense for a TCP-only NFS implementation, it does cause problems for some clients: Ubuntu 10.04 and 7.04 mount just fine. Ubuntu 8.04 gives requested NFS version or transport protocol is not supported, unless you specify -o mountproto=tcp as a mount option, in which case it works just fine. Solaris 2.6 7 both give RPC: Program not registered. Solaris apparently doesn't support the mountproto=tcp option, so there doesn't seem to be any way for Solaris clients to mount. There may be other clients that assume mountd will be contactable via udp, even though they (otherwise) happily support TCP NFS... Thanks, Brent ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] Some client problems with TCP-only NFS in Gluster 3.1
On Fri, 22 Oct 2010 15:18:09 +0200 Beat Rubischon b...@0x1b.ch wrote: Hi Stephan! Quoting sk...@ithnet.com (22.10.10 15:05): We never experienced any performance problem with NFS over UDP. Be careful when using NFSoUDP on recent networking hardware. It's simply too fast for the primitive reassembly algorithm in UDP. You will get silent data corruption. SuSE warns about this fact quite some years in their nfs manpage. You'll find a lot of copies when Googleing the title Using NFS over UDP on high-speed links such as Gigabit can cause silent data corruption. Beat Hi Beat, you are talking of the problem with identification field being only 16 bit, right? We experienced this scenario to be far less severe than TCP busted by packet drops. In fact we were not able to run NFS over TCP for more than 2 days without a complete service breakdown. Whereas UDP runs for several years now without seeing the corruption issue. We were very astonished about the bad tcp performance, but we had to accept it as a fact. -- Regards, Stephan ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] Configuration suggestions (aka poor/slow performance on new hardware)
Can you check how things look like when using ext3 instead of xfs? On Fri, 26 Mar 2010 18:04:07 +0100 Ramiro Magallanes lis...@sabueso.org wrote: Hello there! Im working on a 6-nodes cluster, with SuperMicro new hardware. The cluster have to store a millons of JPG's about (200k-4MB),and little text files. Each node is : -Single Xeon(R) CPU E5405 @ 2.00GHz (4 cores) -4 GB RAM -64 bits Distro-based (Debian Lenny) -3ware 9650 sataII-raid, with 1 logical drive in raid 5 mode, the unit with 3 sata hardisk of 2TB wdc with 64MB of cache each one. -Xfs filesystem on each logical unit. When i run the genfiles.sh test on each node in local (in the raid-5 unit) mode i've have the follow results: -3143 files created in 60 seconds. and if i comment the sync line in the script: -8947 files created in 60 seconds. Now , with Gluster mounted (22TB) i run the test and the results are: -1370 files created in 60 seconds. Now, I'm running the cluster with standard distributed configuration, and i was making significant number of change in the test process , but i obtain the same number of wroted files all the time. Never more than 1400 files created, and 170mbits of network load (top). The switching layer is gigabit (obviusly) , and there's no high resources being used , all is normal. I'm using the 3.0.3 version of Gluster. Here is my configuration file (only the last part of the file): ## volume distribute type cluster/distribute subvolumes 172.17.15.1-1 172.17.15.2-1 172.17.15.3-1 172.17.15.4-1 172.17.15.5-1 172.17.15.6-1 end-volume volume writebehind type performance/write-behind option cache-size 1MB option flush-behind on subvolumes distribute end-volume volume readahead type performance/read-ahead option page-count 4 subvolumes writebehind end-volume volume iocache type performance/io-cache option cache-size `grep 'MemTotal' /proc/meminfo | awk '{print $2 * 0.2 / 1024}' | cut -f1 -d.`MB option cache-timeout 1 subvolumes readahead end-volume volume iothreads type performance/io-threads option thread-count 32 # default is 16 subvolumes distribute end-volume volume quickread type performance/quick-read option cache-timeout 1 option max-file-size 128kB subvolumes iocache end-volume volume statprefetch type performance/stat-prefetch subvolumes quickread end-volume ## Any idea or suggestion to make the performance goes up? Thanks everyone! ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] Setup for production - which one would you choose?
In fact, background for my post is very trivial: glusterfs is really in development stage. So there is a real difference in using 2.0.9, 3.0.2 or 3.0.3. In fact it might be a difference of go vs no-go in your very special setup. That's why I judge the comparison to other rpm questions as not valid. This is not fetchmail where you can use almost any rpm flying around. And I did not tell to compile your whole setup by hand. I am talking about glusterfs and using its latest version in favor of using some available rpm not containing the latest version. -- Regards, Stephan On Wed, 24 Mar 2010 23:19:30 +0100 Steve stev...@gmx.net wrote: Original-Nachricht Datum: Wed, 24 Mar 2010 23:01:55 +0100 Von: Oliver Hoffmann o...@dom.de An: gluster-users@gluster.org Betreff: Re: [Gluster-users] Setup for production - which one would you choose? Yep, thanx. @Stephan: It is not a matter of knowing how use tar and make, but if you have a bunch of servers than you want to do an apt-get update/upgrade once in a while without compiling this piece of software on that server and another one on another server, etc. Not only that. On a RPM system (aka Red Hat, SuSE, Mandriva, etc) where you have a support contract, installing packages that are not made by the vendor does void support. So there is a good reason to use by vendor pre-build RPMs. A bunch of years ago I have helped a big vendor to virtualize the biggest Linux installation in northern Europe for one of their customers. There where over thousand Red Hat Enterprise Server installed in total. The customer followed ITIL Release To Production. No you could jump up and down about a new release of application XYZ and that you could install it form a self made RPM. The customer does not care. Installing own made RPMS = no support from Red Hat. Now if your business is depended on running systems and ever second downtime can cost you hundreds of € then you don't think twice about installing from source. You just don't do it. It's that easy. Just compare the potential problem (aka: downtime, loss of money, loss of trust from customers, etc) to the potential benefit of a own made RPM then you will quickly realize that it is a no go. Stephan is probably a small shop doing all his stuff by hand. But there are situations where this handicraft stuff is just not the way to go. It is hard to fully understand what you just wrote. If you are suggesting that someone else's personal preferences (or company objectives) are incorrect or misguided simply because they don't match your own I'm trying to understand how your last post pertains to the user forum for Gluster? There are plenty of reasons to prefer packages over source installations but that academic conversation is also not appropriate for this list. Cheers, Benjamin -Original Message- From: gluster-users-boun...@gluster.org [mailto:gluster-users-boun...@gluster.org] On Behalf Of Stephan von Krawczynski Sent: Wednesday, March 24, 2010 4:37 PM To: Ian Rogers Cc: gluster-users@gluster.org Subject: Re: [Gluster-users] Setup for production - which one would you choose? Ok, guys, honestly: it is allowed to learn (RMS fought for your right to do so) :-) Really rarely in the open source universe you will find a piece of software that is as easy to compile and run as glusterfs. All you have to know yourself is how to use tar. Then enter the source directory and do ./configure ; make ; make install What exactly is difficult to do? Why would you install _some_ rpm that is outdated anyways (be it 2.0.9 or 3.0.2)? Please don't tell you configure and drive LAMP but can't build glusterfs. The docs for 5 apache config options are longer than the whole glusterfs-source... -- Regards, Stephan PS: yes, I know it's the user-list. On Wed, 24 Mar 2010 17:14:32 + Ian Rogers ian.rog...@contactclean.com wrote: I've just done part one of a writeup of my EC2 gluster LAMP installation at http://www.sirgroane.net/2010/03/distributed-file-system-on-amazon-ec2/ - may or may not be useful to you :-) Ian On 24/03/2010 17:09, Oliver Hoffmann wrote: Yes, that's an idea. Thanx. That will be important for all the debian clients, mostly lenny. I think waiting and testing a month is quite ok though. To have glusterfs 3.0.3 on ubuntu 9.10 you can also just install the debian package for gluster 3.0.3 with dpkg -i. http://packages.debian.org/source/sid/glusterfs But then 10.04 is only a month away, so depends how much of a rush your in! On Wednesday 24 Mar 2010 16:45:40 Oliver Hoffmann wrote: Haha
Re: [Gluster-users] gluster local vs local = gluster x4 slower
Hi Jeremy, have you tried to reproduce with all performance options disabled? They are possibly no good idea on a local system. What local fs do you use? -- Regards, Stephan On Tue, 23 Mar 2010 19:11:28 -0500 Jeremy Enos je...@ncsa.uiuc.edu wrote: Stephan is correct- I primarily did this test to show a demonstrable overhead example that I'm trying to eliminate. It's pronounced enough that it can be seen on a single disk / single node configuration, which is good in a way (so anyone can easily repro). My distributed/clustered solution would be ideal if it were fast enough for small block i/o as well as large block- I was hoping that single node systems would achieve that, hence the single node test. Because the single node test performed poorly, I eventually reduced down to single disk to see if it could still be seen, and it clearly can be. Perhaps it's something in my configuration? I've pasted my config files below. thx- Jeremy ##glusterfsd.vol## volume posix type storage/posix option directory /export end-volume volume locks type features/locks subvolumes posix end-volume volume disk type performance/io-threads option thread-count 4 subvolumes locks end-volume volume server-ib type protocol/server option transport-type ib-verbs/server option auth.addr.disk.allow * subvolumes disk end-volume volume server-tcp type protocol/server option transport-type tcp/server option auth.addr.disk.allow * subvolumes disk end-volume ##ghome.vol## #---IB remotes-- volume ghome type protocol/client option transport-type ib-verbs/client # option transport-type tcp/client option remote-host acfs option remote-subvolume raid end-volume #Performance Options--- volume readahead type performance/read-ahead option page-count 4 # 2 is default option option force-atime-update off # default is off subvolumes ghome end-volume volume writebehind type performance/write-behind option cache-size 1MB subvolumes readahead end-volume volume cache type performance/io-cache option cache-size 1GB subvolumes writebehind end-volume ##END## On 3/23/2010 6:02 AM, Stephan von Krawczynski wrote: On Tue, 23 Mar 2010 02:59:35 -0600 (CST) Tejas N. Bhisete...@gluster.com wrote: Out of curiosity, if you want to do stuff only on one machine, why do you want to use a distributed, multi node, clustered, file system ? Because what he does is a very good way to show the overhead produced only by glusterfs and nothing else (i.e. no network involved). A pretty relevant test scenario I would say. -- Regards, Stephan Am I missing something here ? Regards, Tejas. - Original Message - From: Jeremy Enosje...@ncsa.uiuc.edu To: gluster-users@gluster.org Sent: Tuesday, March 23, 2010 2:07:06 PM GMT +05:30 Chennai, Kolkata, Mumbai, New Delhi Subject: [Gluster-users] gluster local vs local = gluster x4 slower This test is pretty easy to replicate anywhere- only takes 1 disk, one machine, one tarball. Untarring to local disk directly vs thru gluster is about 4.5x faster. At first I thought this may be due to a slow host (Opteron 2.4ghz). But it's not- same configuration, on a much faster machine (dual 3.33ghz Xeon) yields the performance below. THIS TEST WAS TO A LOCAL DISK THRU GLUSTER [r...@ac33 jenos]# time tar xzf /scratch/jenos/intel/l_cproc_p_11.1.064_intel64.tgz real0m41.290s user0m14.246s sys 0m2.957s THIS TEST WAS TO A LOCAL DISK (BYPASS GLUSTER) [r...@ac33 jenos]# cd /export/jenos/ [r...@ac33 jenos]# time tar xzf /scratch/jenos/intel/l_cproc_p_11.1.064_intel64.tgz real0m8.983s user0m6.857s sys 0m1.844s THESE ARE TEST FILE DETAILS [r...@ac33 jenos]# tar tzvf /scratch/jenos/intel/l_cproc_p_11.1.064_intel64.tgz |wc -l 109 [r...@ac33 jenos]# ls -l /scratch/jenos/intel/l_cproc_p_11.1.064_intel64.tgz -rw-r--r-- 1 jenos ac 804385203 2010-02-07 06:32 /scratch/jenos/intel/l_cproc_p_11.1.064_intel64.tgz [r...@ac33 jenos]# These are the relevant performance options I'm using in my .vol file: #Performance Options--- volume readahead type performance/read-ahead option page-count 4 # 2 is default option option force-atime-update off # default is off subvolumes ghome end-volume volume writebehind type performance/write-behind option cache-size 1MB subvolumes readahead end-volume volume cache type performance/io-cache option cache-size 1GB subvolumes writebehind end
Re: [Gluster-users] gluster local vs local = gluster x4 slower
On Tue, 23 Mar 2010 02:59:35 -0600 (CST) Tejas N. Bhise te...@gluster.com wrote: Out of curiosity, if you want to do stuff only on one machine, why do you want to use a distributed, multi node, clustered, file system ? Because what he does is a very good way to show the overhead produced only by glusterfs and nothing else (i.e. no network involved). A pretty relevant test scenario I would say. -- Regards, Stephan Am I missing something here ? Regards, Tejas. - Original Message - From: Jeremy Enos je...@ncsa.uiuc.edu To: gluster-users@gluster.org Sent: Tuesday, March 23, 2010 2:07:06 PM GMT +05:30 Chennai, Kolkata, Mumbai, New Delhi Subject: [Gluster-users] gluster local vs local = gluster x4 slower This test is pretty easy to replicate anywhere- only takes 1 disk, one machine, one tarball. Untarring to local disk directly vs thru gluster is about 4.5x faster. At first I thought this may be due to a slow host (Opteron 2.4ghz). But it's not- same configuration, on a much faster machine (dual 3.33ghz Xeon) yields the performance below. THIS TEST WAS TO A LOCAL DISK THRU GLUSTER [r...@ac33 jenos]# time tar xzf /scratch/jenos/intel/l_cproc_p_11.1.064_intel64.tgz real0m41.290s user0m14.246s sys 0m2.957s THIS TEST WAS TO A LOCAL DISK (BYPASS GLUSTER) [r...@ac33 jenos]# cd /export/jenos/ [r...@ac33 jenos]# time tar xzf /scratch/jenos/intel/l_cproc_p_11.1.064_intel64.tgz real0m8.983s user0m6.857s sys 0m1.844s THESE ARE TEST FILE DETAILS [r...@ac33 jenos]# tar tzvf /scratch/jenos/intel/l_cproc_p_11.1.064_intel64.tgz |wc -l 109 [r...@ac33 jenos]# ls -l /scratch/jenos/intel/l_cproc_p_11.1.064_intel64.tgz -rw-r--r-- 1 jenos ac 804385203 2010-02-07 06:32 /scratch/jenos/intel/l_cproc_p_11.1.064_intel64.tgz [r...@ac33 jenos]# These are the relevant performance options I'm using in my .vol file: #Performance Options--- volume readahead type performance/read-ahead option page-count 4 # 2 is default option option force-atime-update off # default is off subvolumes ghome end-volume volume writebehind type performance/write-behind option cache-size 1MB subvolumes readahead end-volume volume cache type performance/io-cache option cache-size 1GB subvolumes writebehind end-volume What can I do to improve gluster's performance? Jeremy ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] How to re-sync
I love top-post ;-) Generally, you are right. But in real-life you cannot trust on this smartness. We tried exactly this point and had to find out that the clients do not always select the correct file version (i.e. the latest) automatically. Our idea in the testcase was to bring down a node, update its kernel an revive it - just as you would like to do it in real world for a kernel update. We found out that some files were taken from the downed node afterwards and the new contents on the other node got in fact overwritten. This does not happen generally, of course. But it does happen. We could only stop this behaviour by setting favorite-child. But that does not really help a lot, since we want to take down all nodes some other day. This is in fact one of our show-stoppers. On Sun, 7 Mar 2010 01:33:14 -0800 Liam Slusser lslus...@gmail.com wrote: Assuming you used raid1 (distribute), you DO bring up the new machine and start gluster. On one of your gluster mounts you run a ls -alR and it will resync the new node. The gluster clients are smart enough to get the files from the first node. liam On Sat, Mar 6, 2010 at 11:48 PM, Chad ccolu...@hotmail.com wrote: Ok, so assuming you have N glusterfsd servers (say 2 cause it does not really matter). Now one of the servers dies. You repair the machine and bring it back up. I think 2 things: 1. You should not start glusterfsd on boot (you need to sync the HD first) 2. When it is up how do you re-sync it? Do you rsync the underlying mount points? If it is a busy gluster cluster it will be getting new files all the time. So how do you sync and bring it back up safely so that clients don't connect to an incomplete server? ^C ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users -- MfG, Stephan von Krawczynski -- ith Kommunikationstechnik GmbH Lieferanschrift : Reiterstrasse 24, D-94447 Plattling Telefon : +49 9931 9188 0 Fax : +49 9931 9188 44 Geschaeftsfuehrer: Stephan von Krawczynski Registergericht : Deggendorf HRB 1625 -- ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] Migrate from an NFS storage to GlusterFS
On Tue, 16 Feb 2010 17:31:00 +0530 Vikas Gorur vi...@gluster.com wrote: Olivier Le Cam wrote: Thanks Vikas. BTW, might it be possible to have the same volume exported both by regular-NFS and GlusterFS at the same time in order to migrate my clients smoothly? Is there any risks to get GlusterFS confused and/or the ext3 volume damaged? That would be quite risky. If you have both GlusterFS clients and NFS clients operating on the same files or directories there are chances of race conditions which might lead to lost files, GlusterFS getting confused, NFS getting confused etc. I wouldn't recommend it. But isn't that a setup that every average user would expect to work? You can share data between nfs and a local (nfs-server) user, too. Is your file locking racy? Did you break atomic operations? Remember that long discussion about soft migrating data by just exporting already existing data via glusterfs without copying? This point is very similar. It is a common understanding in modern fs that multiple users of the same file should be managed by record- and/or file-locking. A network based fs on top of some other fs should behave just as it were some average local user - then your data should (and must) be safe. Vikas -- Regards, Stephan ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] Bonded Gigabit
On Tue, 05 Jan 2010 21:39:58 +0530 Vikas Gorur vi...@gluster.com wrote: Adrian Revill wrote: That sounds OK So if I have a client on server A and I write a file on server A, would the file be copied to server B, C and D all at the same time, or will the file be first copped to server B then coied to C and D in turn? It will be written to all servers simultaneously. Vikas Forgive my ignorance, but I doubt that. Simultaneously would mean that you have parallel network paths to all servers, then your client would be able to copy data at almost the same time. If your network path to your servers is in fact a bottle-neck at one client network card, then you might notice what I did, too: your servers look like processing the data linear and not parallel. the first one shows hd blinking, then the second one, and so on. I already noticed that with two servers and simple bonnie tests. So again: are you really sure about simultaneous ? I'd say pushing large chunks of data per server through a single network path cannot be called that way. -- Regards, Stephan ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] Strange server locks isuess with 2.0.7 - updating
The problem we experienced was occasional packet loss (not high, only very occasional). You will see that in almost every LAN. If your ping-packet is lost and you configured a low value a brick will be offline quite fast, though there is no real problem. The bigger the timeout the more chances you have that a following ping packet will make it and reset the wait-time. On Fri, 20 Nov 2009 14:18:46 +0100 Marek m...@kis.p.lodz.pl wrote: Why You suggest ping-timeout with that high value? When some brick gets in trouble, mounted fs on client side is unusable (I/O is locked) and have to wait 120 sec. for timeout and release fs. Locked client IO for 120 sec. is not acceptable. regards, Stephan von Krawczynski wrote: Try setting your ping-timeout way higher, since we use 120 we have almost no issues in regular use. Nevertheless we do believe every problem will come back when some brick(s) die... On Tue, 10 Nov 2009 14:59:07 +0100 Marek Blaszkowski m...@kis.p.lodz.pl wrote: OK, here goes some more details, on a bad servers (with strange lockups) we got problems with open/move files. We are unable to open,move or just ls files (file utils just hangs ) Marek wrote: Hello, we're testing a simple configuration of glusterfs 2.0.7 with 4 servers and 1 client (2+2 bricks each replicated with a distribute translator, configs below). Durning our tests (client side copying/moving a lot of small files on glusterfs mounted FS) we got a strange lockups on two of servers (bricks). I was unable to login (via ssh) to server, on already started terminal sessions I couldn't spawn a top process (it just hangs), vmstats exists with floating point exception. Other fileutils commands behaves normal. There was no any dmesg kernel messages (first guess was a kernel ups or other kernel related problems). This server never had any CPU/memory problems under high loads before. Problems starts when we run glusterfsd on this server. We had to a hard reset malfunction server (reboot doesn't work). After a couple hours of testing another server disconected from a client (according to a client debug log). Scenario was the same: 1. unable to login to a server, connection was established but sshd on server side hang/timeout after entering a user password 2. on a previous established terminal sessions was unable to run top or vmstat utility (vmstats exit with with floating point exception. Copying/moving files was OK. Load was 0.00, 0.00, 0.00 What could be wrong? These servers never had problems before (simple terminal/proxy servers). Strange locking looks like related to a kernel VM structures (why top/vmstat behaves odd??) or other kernel related problems. Server remote1 details: Linux version 2.6.26-1-686 (Debian 2.6.26-13lenny2) (da...@debian.org) (gcc version 4.1.3 20080704 (prerelease) (Debian 4.1.2-25)) #1 SMP Fri Mar 13 18:08:45 UTC 2009 running debian 5.0 Server remote2 details: Linux version 2.6.22-14-server (bui...@palmer) (gcc version 4.1.3 20070929 (prerelease) (Ubuntu 4.1.2-16ubuntu2)) #1 SMP Sun Oct 14 23:34:23 GMT 2007 running ubuntu both run glusterfsd: /usr/local/sbin/glusterfsd -p /var/run/glusterfsd.pid -f /usr/local/etc/glusterfs/glusterfs-server.vol Note that both servers runs different os versions and got simillar lockup problems, never having problems before (without glusterfsd). Server gluster config file (the same on 4 servers): -cut here volume brick type storage/posix option directory /var/gluster end-volume volume locks type features/posix-locks subvolumes brick end-volume volume server type protocol/server option transport-type tcp/server option auth.ip.locks.allow * option auth.ip.brick-ns.allow * subvolumes locks end-volume -cut here--- client gluster config below (please note remote1 and remote4 got problems metioned above), gluster client was start with a command: glusterfs --log-file=/var/log/gluster-client -f /usr/local/etc/glusterfs/glusterfs-client.vol /var/glustertest -client config-cut here--- volume remote1 type protocol/client option transport-type tcp/client option remote-host 192.168.2.184 option ping-timeout 5 option remote-subvolume locks end-volume volume remote2 type protocol/client option transport-type tcp/client option remote-host 192.168.2.195 option ping-timeout 5 option remote-subvolume locks end-volume volume remote3 type protocol/client option transport-type tcp/client option remote-host 192.168.2.145 option ping-timeout 5 option remote-subvolume locks end-volume volume remote4 type protocol/client option transport-type tcp/client option remote-host 192.168.2.193 option ping-timeout 5
Re: [Gluster-users] Gluster in HTTP cluster
On Tue, 28 Jul 2009 16:31:44 -0500 Brian Koloszyc br...@creativemerch.com wrote: Hi, I am in the process of building out a sandbox glusterFS environment in Amazon's EC2 cloud. I have successfully configured the NFS clone, but I'm looking to transition over to gluster in order to get away from NFS in the first place. Our desired configuration would be to have x number of web slaves, each having a local attached device for storage, with replication enabled between all 4 attached devices in order to keep dynamically generated content in sync. Can someone point me in the direction of the correct config for this? I've read over this: http://www.gluster.org/docs/index.php/Translators I'm a bit confused. Is it even possible to have the client always read/write to the local disk? Or will each client round robin between gluster server storage? My concern is that we want optimal read/write times (nfs is too slow), and we are worried that the tcp connection times will be as slow as nfs. I'd be surprised if you manage to get even nfs performance. We never made that in real world situation. Thanks, --Brian. -- Regards, Stephan ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] Strange server locks isuess with 2.0.7 - updating
-bin/mailman/listinfo/gluster-users -- MfG, Stephan von Krawczynski -- ith Kommunikationstechnik GmbH Lieferanschrift : Reiterstrasse 24, D-94447 Plattling Telefon : +49 9931 9188 0 Fax : +49 9931 9188 44 Geschaeftsfuehrer: Stephan von Krawczynski Registergericht : Deggendorf HRB 1625 -- ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] Rsync
Remember, the gluster-team does not like my way of data-feeding. If your setup blows up, don't blame them (or me :-) I can only tell you what I am doing: simply move (or copy) the initial data to the primary server of the replication setup and then start glusterfsd for exporting. You will notice that the data gets replicated as soon as stat is going on (first ls or the like). If you already exported the data via nfs before you probably only need to setup up glusterfs on the very same box and use it as primary server. Then there is no data copying at all. After months of experiments I can say that glusterfs runs pretty stable on _low_ performance setups. But you have to do one thing: lengthen the ping-timeout (something like option ping-timeout 120). If you do not do that you will loose some of your server(s) at any time and that will turn your glusterfs setup in a mess. If your environment is ok, it works. If your environment fails it will fail, too, sooner or later. In other words: it exports data, but it does not fulfill the promise of keeping your setup alive during failures - at this stage. My advice for the team is to stop whatever they may work on and take for physical boxes (2 server, 2 client), run a lot of bonnies and unplug/re-plug the servers non-deterministic. You can find all kinds of weirdos this way. Regards, Stephan On Mon, 5 Oct 2009 16:49:53 +0100 Hiren Joshi j...@moonfruit.com wrote: My users are more pitch fork less shooting. I don't understand what you're saying, should I have locally copied all the files over not using gluster before attempting an rsync? -Original Message- From: Stephan von Krawczynski [mailto:sk...@ithnet.com] Sent: 05 October 2009 14:13 To: Hiren Joshi Cc: Pavan Vilas Sondur; gluster-users@gluster.org Subject: Re: [Gluster-users] Rsync It would be nice to remember my thread about _not_ copying data initially to gluster via the mountpoint. And one major reason for _local_ feed was: speed. Obviously a lot of cases are merely impossible because of the pure waiting time. If you had a live setup people would have already shot you... This is why I talked about a feature and not an accepted bug behaviour. Regards, Stephan On Mon, 5 Oct 2009 11:00:36 +0100 Hiren Joshi j...@moonfruit.com wrote: Just a quick update: The rsync is *still* not finished. -Original Message- From: gluster-users-boun...@gluster.org [mailto:gluster-users-boun...@gluster.org] On Behalf Of Hiren Joshi Sent: 01 October 2009 16:50 To: Pavan Vilas Sondur Cc: gluster-users@gluster.org Subject: Re: [Gluster-users] Rsync Thanks! I'm keeping a close eye on the is glusterfs DHT really distributed? thread =) I tried nodelay on and unhashd no. I tarred about 400G to the share in about 17 hours (~6MB/s?) and am running an rsync now. Will post the results when it's done. -Original Message- From: Pavan Vilas Sondur [mailto:pa...@gluster.com] Sent: 01 October 2009 09:00 To: Hiren Joshi Cc: gluster-users@gluster.org Subject: Re: Rsync Hi, We're looking into the problem on similar setups and workng on it. Meanwhile can you let us know if performance increases if you use this option: option transport.socket.nodelay on' in each of your protocol/client and protocol/server volumes. Pavan On 28/09/09 11:25 +0100, Hiren Joshi wrote: Another update: It took 1240 minutes (over 20 hours) to complete on the simplified system (without mirroring). What else can I do to debug? -Original Message- From: gluster-users-boun...@gluster.org [mailto:gluster-users-boun...@gluster.org] On Behalf Of Hiren Joshi Sent: 24 September 2009 13:05 To: Pavan Vilas Sondur Cc: gluster-users@gluster.org Subject: Re: [Gluster-users] Rsync -Original Message- From: Pavan Vilas Sondur [mailto:pa...@gluster.com] Sent: 24 September 2009 12:42 To: Hiren Joshi Cc: gluster-users@gluster.org Subject: Re: Rsync Can you let us know the following: * What is the exact directory structure? /abc/def/ghi/jkl/[1-4] now abc, def, ghi and jkl are one of a thousand dirs. * How many files are there in each individual directory and of what size? Each of the [1-4] dirs has about 100 files in, all under 1MB. * It looks like each server process has 6 export directories. Can you run one server process each for a single export directory and check if the rsync speeds up? I had no idea you could do that. How? Would I need to create 6 config files and start gluster
Re: [Gluster-users] Rsync
It would be nice to remember my thread about _not_ copying data initially to gluster via the mountpoint. And one major reason for _local_ feed was: speed. Obviously a lot of cases are merely impossible because of the pure waiting time. If you had a live setup people would have already shot you... This is why I talked about a feature and not an accepted bug behaviour. Regards, Stephan On Mon, 5 Oct 2009 11:00:36 +0100 Hiren Joshi j...@moonfruit.com wrote: Just a quick update: The rsync is *still* not finished. -Original Message- From: gluster-users-boun...@gluster.org [mailto:gluster-users-boun...@gluster.org] On Behalf Of Hiren Joshi Sent: 01 October 2009 16:50 To: Pavan Vilas Sondur Cc: gluster-users@gluster.org Subject: Re: [Gluster-users] Rsync Thanks! I'm keeping a close eye on the is glusterfs DHT really distributed? thread =) I tried nodelay on and unhashd no. I tarred about 400G to the share in about 17 hours (~6MB/s?) and am running an rsync now. Will post the results when it's done. -Original Message- From: Pavan Vilas Sondur [mailto:pa...@gluster.com] Sent: 01 October 2009 09:00 To: Hiren Joshi Cc: gluster-users@gluster.org Subject: Re: Rsync Hi, We're looking into the problem on similar setups and workng on it. Meanwhile can you let us know if performance increases if you use this option: option transport.socket.nodelay on' in each of your protocol/client and protocol/server volumes. Pavan On 28/09/09 11:25 +0100, Hiren Joshi wrote: Another update: It took 1240 minutes (over 20 hours) to complete on the simplified system (without mirroring). What else can I do to debug? -Original Message- From: gluster-users-boun...@gluster.org [mailto:gluster-users-boun...@gluster.org] On Behalf Of Hiren Joshi Sent: 24 September 2009 13:05 To: Pavan Vilas Sondur Cc: gluster-users@gluster.org Subject: Re: [Gluster-users] Rsync -Original Message- From: Pavan Vilas Sondur [mailto:pa...@gluster.com] Sent: 24 September 2009 12:42 To: Hiren Joshi Cc: gluster-users@gluster.org Subject: Re: Rsync Can you let us know the following: * What is the exact directory structure? /abc/def/ghi/jkl/[1-4] now abc, def, ghi and jkl are one of a thousand dirs. * How many files are there in each individual directory and of what size? Each of the [1-4] dirs has about 100 files in, all under 1MB. * It looks like each server process has 6 export directories. Can you run one server process each for a single export directory and check if the rsync speeds up? I had no idea you could do that. How? Would I need to create 6 config files and start gluster: /usr/sbin/glusterfsd -f /etc/glusterfs/export1.vol or similar? I'll give this a go * Also, do you have any benchmarks with a similar setup on say, NFS? NFS will create the dir tree in about 20 minutes then start copying the files over, it takes about 2-3 hours. Pavan On 24/09/09 12:13 +0100, Hiren Joshi wrote: It's been running for over 24 hours now. Network traffic is nominal, top shows about 200-400% cpu (7 cores so it's not too bad). About 14G of memory used (the rest is being used as disk cache). Thoughts? snip An update, after running the rsync for a day, I killed it and remounted all the disks (the underlying filesystem, not the gluster) with noatime, the rsync completed in about 600 minutes. I'm now going to try one level up (about 1,000,000,000 dirs). -Original Message- From: Pavan Vilas Sondur [mailto:pa...@gluster.com] Sent: 23 September 2009 07:55 To: Hiren Joshi Cc: gluster-users@gluster.org Subject: Re: Rsync Hi Hiren, What glusterfs version are you using? Can you send us the volfiles and the log files. Pavan On 22/09/09 16:01 +0100, Hiren Joshi wrote: I forgot to mention, the mount is mounted with direct-io, would this make a difference? -Original Message- From: gluster-users-boun...@gluster.org [mailto:gluster-users-boun...@gluster.org] On Behalf Of Hiren Joshi Sent: 22 September 2009 11:40 To: gluster-users@gluster.org Subject: [Gluster-users] Rsync Hello all,
Re: [Gluster-users] The continuing story ...
On Fri, 18 Sep 2009 10:35:22 +0200 Peter Gervai grin...@gmail.com wrote: Funny thread we have. Just a sidenote on the last week part about userspace cannot lock up the system: blocking resource waits / I/O waits can stall _all_ disk access, and try to imagine what you can do with a system without disk access. Obviously, you cannot log in, cannot start new programs, cannot load dynamic libraries. Yet the system pings, and your already logged in shells may function more or less, especially if you have a statically linked one (like sash). As a bitter sidenote: google for 'xtreemfs', may be interesting if you only need a shared redundant access with extreme network fault tolerance. (And yes, it can stall the system, too. :-)) I would not want to use it for exactly this reason (from the docs): - XtreemFS implements an object-based file system architecture (Fig. 2.1). The name of this architecture comes from the fact that an object-based file system splits file content into a series of fixed-size objects and stores them on its storage servers. In contrast to block-based file systems, the size of such an object can vary from file to file. The metadata of a file (such as the file name or file size) is stored separate from the file content on a Metadata server. This metadata server organizes file system metadata as a set of volumes, each of which implements a separate file system namespace in form of a directory tree. - That's exactly what we don't want. We want a disk layout that is accessible even if glusterfs (or call it the network fs) has a bad day and doesn't want to start. Another sidenote: I tend to see FUSE as a low-speed toy nowadays. It doesn't seem to be able to handle any serious I/O load. Really, i can't judge. I haven't opened (this) pandora's box up to now ... -- byte-byte, grin -- Regards, Stephan ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] booster
On Mon, 14 Sep 2009 15:03:05 +0530 Shehjar Tikoo shehj...@gluster.com wrote: Stephan von Krawczynski wrote: On Mon, 14 Sep 2009 11:40:03 +0530 Shehjar Tikoo shehj...@gluster.com wrote: We only tried to run some bash scripts with preloaded booster... Do you mean the scripts contained commands with LD_PRELOADed booster? Or were you trying to run bash with LD_PRELOADed booster? The second scenario will not work at this point. Thanks -Shehjar Oh, that's bad news. We tried to PRELOAD booster in front of bash (implicit, called the bash-script with LD_PRELOADED). Is this a general problem or a not-yet-implemented feature? A general problem, I'd say. The last time, i.e. when we revamped booster, we tried running with bash but there was some clash with bash internals. We havent done anything special to fix the problem since then because: 1. it requires changes deep inside GlusterFS and; 2. running bash wasnt a very useful scenario when the LD_PRELOAD variable can be added for the bash environment as a whole. For eg. if you just do export LD_PRELOAD=blah on the command line, you can actually have every program started from that shell use booster. -Shehjar Well, how about other interpreters like sh,csh,perl,python,php,name-one ? There are tons of perl-applications out there, we use some, too. Is the problem only linked to bash? -- Regards, Stephan ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] booster
2. running bash wasnt a very useful scenario when the LD_PRELOAD variable can be added for the bash environment as a whole. For eg. if you just do export LD_PRELOAD=blah on the command line, you can actually have every program started from that shell use booster. -Shehjar There is a problem with that: if your bash environment calls only one other bash-script it will fail either. Another problem can be script-replaced binaries. If you replace some classical binary with a shell-script for additional parameters (or any other thinkable reason) this general export approach will fail, too. Or lets say your favourite email client calls some script to mark spam... There are a lot of black holes in this ground ... -- Regards, Stephan ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] The continuing story ...
On Wed, 09 Sep 2009 19:43:15 -0400 Mark Mielke m...@mark.mielke.cc wrote: On Wed, 9 Sep 2009 23:17:07 +0530 Anand Avatiav...@gluster.com wrote: Please reply back to this thread only after you have a response from the appropriate kernel developer indicating that the cause of this lockup is because of a misbehaving userspace application. After that, let us give you the benefit of doubt that the misbehaving userspace process is glusterfsd and then continue any further debugging. It is not that we do not want to help you, but we really are pointing you to the right place where your problem can actually get fixed. You have all the necessary input they need. This is the kind of statement that often drives listeners to think about a project fork... Only if backed up. Has the trace been shown to the linux developers? What do they think? If the linux developers come back with this is totally a userspace program - go away, then yes, it can lead to people thinking about a project fork. But, if the linux developers come back with crap - yes, this is a kernel program, then I think you might owe Anand an apology for pushing him... :-) In this case, there is too many unknowns - but I agree with Anand's logic 100%. Gluster should not be able to cause a CPU lock up. It should be impossible. If it is not impossible - it means a kernel bug, and the best place to have this addressed is the kernel devel list, or, if you have purchased a subscription from a company such as RedHat, than this belongs as a ticket open with RedHat. You know, I am really bothered about the way the maintainers are acting since I read this list. There is really a lot of ideology going on (can't be, is impossible for userspace etc) and very few real debugging. This application is not the only one in the world. People use heavily file- and net-acting applications like firefox, apache, shell-scripts, name-one on their boxes. None leads to effects seen if you play with glusterfs. If you really think it is a logical way of debugging to go out and simply tell userspace can't do that while the rest of the application-world does not show up with dead-ends like seen on this list, how can I change your mind? I hardly believe I can. I can only tell you what I would do: I would try to document _first_ that my piece of code really does behave well. But as you may have noticed there is no real way to provide this information. And that is indeed part of the problem. Wouldn't it be a nice step if you could debug the ongoings of a glusterfs-server on the client by simply reading an exported file (something like a server-dependant meta-debug-file) that outputs something like strace does? Something that enables you to say: Ok, here you can see what the application did, and there you can see what the kernel made of it. As we noticed a server-logfile is not sufficient. Is ideology really a prove for anything in todays' world? Do you really think it is possible to understand the complete world by seeing half of it and the other half painted by ideology? What is wrong about _proving_ being not guilty? About acting defensive ? It is important to understand that this application is a kind of core technology for data storage. This means people want to be sure that their setup does not explode just because they made a kernel update or some other change where their experience tells them it should have no influence on the glusterfs service. You want to be sure, just like you are when using nfs. It just does work (even being in kernel-space!). Now, answer for yourself if you think glusterfs is as stable as nfs on the same box. Cheers, mark -- Regards, Stephan ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] The continuing story ...
Only if backed up. Has the trace been shown to the linux developers? What do they think? Maybe we should just ask questions about the source before bothering others... From 2.0.6 /transport/socket/src/socket.c line 867 ff: new_trans = CALLOC (1, sizeof (*new_trans)); new_trans-xl = this-xl; new_trans-fini = this-fini; memcpy (new_trans-peerinfo.sockaddr, new_sockaddr, addrlen); new_trans-peerinfo.sockaddr_len = addrlen; new_trans-myinfo.sockaddr_len = sizeof (new_trans-myinfo.sockaddr); ret = getsockname (new_sock, SA (new_trans-myinfo.sockaddr), new_trans-myinfo.sockaddr_len); CALLOC from libglusterfs/src/mem-pool.h: #define CALLOC(cnt,size) calloc(cnt,size) man calloc: RETURN VALUE For calloc() and malloc(), the value returned is a pointer to the allocated memory, which is suitably aligned for any kind of variable, or NULL if the request fails. Did I understand the source? What about calloc returning NULL? -- Regards, Stephan ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] The continuing story ...
On Thu, 10 Sep 2009 21:20:04 +0530 Krishna Srinivas kris...@gluster.com wrote: Now, failing to check for NULL pointer here is a bug which we will fix in future releases (blame it on our laziness for not doing the check already!) Thanks for pointing it out. Really, this was only _one_ quick example of which there are numerous in your code. Look at all CALLOC/MALLOC calls. Most of them are not safe. Look at your documentation. It is quite a mess. There seems to be no intention to document which version knows which options, they are pretty different. It would have been easy-go if you started that from the very first release and just copied all the docs to a new tree deleting dead options and adding the new ones, linking the docs to version numbers. This would allow people to find out what is really a valid option. I will not stop to post every single case that looks bogus, even without understanding a single bit of the semantics. Talking about analogy, in a car assume that engine is the glusterfs and tyres the kernel. If you get flat tyres and the car doesn't move you can't blame the engine! Boy, you really entered cloud nr 9. To bring your example down to reality I'd rather suggest the kernel being the engine and and glusterfs being the rear view mirror. The car can live without, nice to have one though. Thanks Krishna And todays' example of coding is in glusterfs-2.0.6/transport/socket/src/name.c. # grep -n UNIX_PATH_MAX name.c 95:if (!path || strlen (path) UNIX_PATH_MAX) { 281:if (strlen (connect_path) UNIX_PATH_MAX) { 284:strlen (connect_path), UNIX_PATH_MAX); 321:#ifndef UNIX_PATH_MAX 322:#define UNIX_PATH_MAX 108 323:#endif 325:if (strlen (listen_path) UNIX_PATH_MAX) { 329:strlen (listen_path), UNIX_PATH_MAX); Now what does that mean? UNIX_PATH_MAX used in lines 95,281,284 and then in 321 it comes to programmers' mind that it may be undefined? Ah well, things get more interesting: libglusterfs/src/compat.h:#define UNIX_PATH_MAX 108 libglusterfs/src/compat.h:#define UNIX_PATH_MAX 104 libglusterfs/src/compat.h:#define UNIX_PATH_MAX 104 libglusterfs/src/compat.h:#define UNIX_PATH_MAX 108 libglusterfs/src/transport.h: char identifier[UNIX_PATH_MAX]; Ok, if you define it depending on the OS, how can it be absolute 108 in socket/src/name.c (and elsewhere) ? Remember, no semantics analyzed, just reading ... may as well be bs from me. -- Regards, Stephan ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] The continuing story ...
On Tue, 8 Sep 2009 10:13:17 +1000 (EST) Jeff Evans je...@tricab.com wrote: - server was ping'able - glusterfsd was disconnected by the client because of missing ping-pong - no login possible - no fs action (no lights on the hd-stack) - no screen (was blank, stayed blank) This is very similar to what I have seen many times (even back on 1.3), and have also commented on the list. It seems that we have quite a few ACK's on this, or similar problems. The only thing different in my scenario, is that the console doesn't stay blank. When attempting to login I get the last login message, and nothing more, no prompt ever. Also, I can see that other processes are still listening on sockets etc.. so it seems like the kernel just can't grab new FD's. I too found the hang happens more easily if a downed node from a replicate pair re-joins after some time. Following suggestions that this is all kernel related, I have just moved up to RHEL 5.4 in the hope that the new kernel will help. This fix stood out as potentially related for me: https://bugzilla.redhat.com/show_bug.cgi?id=44543 This is an ext3 fix, unlikely that we run into a similar effect on reiserfs3, they are really very different in internals and coding. We also have a broadcom network card, which had reports of hangs under load, the kernel has a patch for that too. We used tg3 in this setup, but the load was not very high (below 10 MBit on a 1000MBit link). If I still run into the hangs, I'll try xfs. I doubt that this can be a real solution. My guess is that glusterfsd runs into some race condition where it locks itself up completely. It is not funny to debug something the like on a production setup. Best would be to have debugging output sent from the servers' glusterfsd directly to a client to save the logs. I would not count on syslog in this case, if it survives one could use a serial console for syslog output though. Thanks, Jeff. -- Regards, Stephan ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] The continuing story ...
On Tue, 8 Sep 2009 03:23:37 -0700 Anand Avati anand.av...@gmail.com wrote: I doubt that this can be a real solution. My guess is that glusterfsd runs into some race condition where it locks itself up completely. It is not funny to debug something the like on a production setup. Best would be to have debugging output sent from the servers' glusterfsd directly to a client to save the logs. I would not count on syslog in this case, if it survives one could use a serial console for syslog output though. Does the system which is locking up have a fuse mountpoint? or is it a pure glusterfsd export server without a glusterfs mountpoint? Avati The system acts as pure server for both glusterfs and nfs. It has no fuse nor nfs client mount points. -- Regards, Stephan ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] The continuing story ...
On Tue, 8 Sep 2009 05:37:09 -0700 Anand Avati av...@gluster.com wrote: I doubt that this can be a real solution. My guess is that glusterfsd runs into some race condition where it locks itself up completely. It is not funny to debug something the like on a production setup. Best would be to have debugging output sent from the servers' glusterfsd directly to a client to save the logs. I would not count on syslog in this case, if it survives one could use a serial console for syslog output though. I'm going to iterate through this yet again at the risk of frustrating you. glusterfsd (on the server side) is yet another process running only system calls. If glusterfsd has a race condition and locks itself up, then it locks _only its own process_ up. What you are having is a frozen system. There is no way glusterfsd can lock up your system through just VFS system calls, even if it wanted to, intentionally. It is a pure user space process and has no power to lock up the system. The worst glusterfsd can do to your system is deadlock its own process resulting in a glusterfs fuse mountpoint hang, or segfault and result in a core dump. Please consult system/kernel programmers you trust. Or ask on the kernel-devel mailing list. The system freeze you are facing is not something which can be caused by _any_ user space application. Please read carefully what I told about the system condition. The fact that I can ping the box means that the kernel is not messed up, i.e. this is no freeze. But as I cannot login nor use any other user-space software to get hands on the box only means that an application should only be able to mess up the userspace to an extent that every other application gets few to no timeslices, or some system resource is eaten up to an extent that others are simply locked out. That does not sound impossible to me as it is just like a local DoS attack which is possible. Maybe one only needs some messed up pointers to create such a situation. What really bothers me more is the fact that you continously deny to see what several people on the list described. It is not our intention to waste someones time, we try to give as much information as possible to go out and find some problem. Unfortunately we cannot do that job, because we don't have the background knowledge about your code. Since it all is userspace maybe it would be helpful to have a version that just outputs logs to serial, so that we can trace where it went before things blew up. Maybe we can watch it cycling somewhere... Do you really deny that a local DoS attack is generally possible? -- Regards, Stephan ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
[Gluster-users] The continuing story ...
Hello all, last week we saw our first try to enable something like a real-world environment on glusterfs fail. Nevertheless we managed to get a working combination of _one_ server and _one_ client (using a replicate setup with a missing second server). This setup worked for about 4 days, so yesterday we tried to enable the second server. Within minutes the first one crashed. Well, really we do not know if it crashed in its true meaning, the situation looked like this: - server was ping'able - glusterfsd was disconnected by the client because of missing ping-pong - no login possible - no fs action (no lights on the hd-stack) - no screen (was blank, stayed blank) This could also be a user-space hang or cpu busy/looping. We don't know. The really interesting part is that the server worked for days being single, but as soon as dual server fs action (obviously in combination with self healing) started it did not survive 10 minutes. Of course the second server went on, but we had to stop the whole thing because the data was not completely healed, so it made no sense to go on with old copies. This was glusterfs 2.0.6 with a minimal server setup (storage/posix, features/locks, performance/io-threads) on a linux kernel 2.6.25.2. Is there someone out there that experienced something the like? Any ideas? -- Regards, Stephan ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] NFS replacement
On Tue, 01 Sep 2009 11:33:38 +0530 Shehjar Tikoo shehj...@gluster.com wrote: Stephan von Krawczynski wrote: On Mon, 31 Aug 2009 19:48:46 +0530 Shehjar Tikoo shehj...@gluster.com wrote: Stephan von Krawczynski wrote: Hello all, after playing around for some weeks we decided to make some real world tests with glusterfs. Therefore we took a nfs-client and mounted the very same data with glusterfs. The client does some logfile processing every 5 minutes and needs around 3,5 mins runtime in a nfs setup. We found out that it makes no sense to try this setup with gluster replicate as long as we do not have the same performance in a single server setup with glusterfs. So now we have one server mounted (halfway replicate) and would like to tune performance. Does anyone have experience with some simple replacement like that? We had to find out that almost all performance options have exactly zero effect. The only thing that seems to make at least some difference is read-ahead on the server. We end up with around 4,5 - 5,5 minutes runtime of the scripts, which is on the edge as we need something quite below 5 minutes (just like nfs was). Our goal is to maximise performance in this setup and then try a real replication setup with two servers. The load itselfs looks like around 100 scripts starting at one time and processing their data. Any ideas? What nfs server are you using? The in-kernel one? Yes. You could try the unfs3booster server, which is the original unfs3 with our modifications for bug fixes and slight performance improvements. It should give better performance in certain cases since it avoids the FUSE bottleneck on the server. For more info, do take a look at this page: http://www.gluster.org/docs/index.php/Unfs3boosterConfiguration When using unfs3booster, please use GlusterFS release 2.0.6 since that has the required changes to make booster work with NFS. I read the docs, but I don't understand the advantage. Why should we use nfs as kind of a transport layer to an underlying glusterfs server, when we can easily export the service (i.e. glusterfs) itself. Remember, we don't want nfs on the client any longer, but a replicate setup with two servers (though we do not use it right now, but nevertheless it stays our primary goal). Ok. My answer was simply under the impression that moving to NFS was the motive. unfs3booster-over-gluster is a better solution as opposed to having kernel-nfs-over-gluster because of the avoidance of the FUSE layer completely. Sorry. To make that one clear again: I don't want to use NFS if not ultimately necessary. I would be happy to use a complete glusterfs environment without any patches and glues to nfs, cifs or the like. It sounds obvious to me that a nfs-over-gluster must be slower than a pure kernel-nfs. On the other hand glusterfs per se may even have some advantages on the network side, iff performance tuning (and of course the options themselves) is well designed. The first thing we noticed is that load dropped dramatically both on server and client when not using kernel-nfs. Client dropped from around 20 to around 4. Server dropped from around 10 to around 5. Since all boxes are pretty much dedicated to their respective jobs a lot of caching is going on anyways. Thanks, that is useful information. So I would not expect nfs to have advantages only because it is kernel-driven. And the current numbers (loss of around 30% in performance) show that nfs performance is not completely out of reach. That is true, we do have setups performing as well and in some cases better than kernel NFS despite the replication overhead. It is a matter of testing and arriving at a config that works for your setup. What advantages would you expect from using unfs3booster at all? To begin with, unfs3booster must be compared against kernel nfsd and not against a GlusterFS-only config. So when comparing with kernel-nfsd, one should understand that knfsd involves the FUSE layer, kernel's VFS and network layer, all of which have their advantages and also disadvantages, especially FUSE when using with the kernel nfsd. Those bottlenecks with FUSE+knfsd interaction are well documented elsewhere. unfs3booster enables you to avoid the FUSE layer, the VFS, etc and talk directly to the network and through that, to the GlusterFS server. In our measurements, we found that we could perform better than kernel nfs-over-gluster by avoiding FUSE and using our own caching(io-cache), buffering(write-behind, read-ahead) and request scheduling(io-threads). Another thing we really did not understand is the _negative_ effect of adding iothreads on client or server. Our nfs setup needs around 90 nfs kernel threads to run smoothly. Every number greater than 8 iothreads reduces the performance
[Gluster-users] NFS replacement
Hello all, after playing around for some weeks we decided to make some real world tests with glusterfs. Therefore we took a nfs-client and mounted the very same data with glusterfs. The client does some logfile processing every 5 minutes and needs around 3,5 mins runtime in a nfs setup. We found out that it makes no sense to try this setup with gluster replicate as long as we do not have the same performance in a single server setup with glusterfs. So now we have one server mounted (halfway replicate) and would like to tune performance. Does anyone have experience with some simple replacement like that? We had to find out that almost all performance options have exactly zero effect. The only thing that seems to make at least some difference is read-ahead on the server. We end up with around 4,5 - 5,5 minutes runtime of the scripts, which is on the edge as we need something quite below 5 minutes (just like nfs was). Our goal is to maximise performance in this setup and then try a real replication setup with two servers. The load itselfs looks like around 100 scripts starting at one time and processing their data. Any ideas? -- Regards, Stephan ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] NFS replacement
On Mon, 31 Aug 2009 19:48:46 +0530 Shehjar Tikoo shehj...@gluster.com wrote: Stephan von Krawczynski wrote: Hello all, after playing around for some weeks we decided to make some real world tests with glusterfs. Therefore we took a nfs-client and mounted the very same data with glusterfs. The client does some logfile processing every 5 minutes and needs around 3,5 mins runtime in a nfs setup. We found out that it makes no sense to try this setup with gluster replicate as long as we do not have the same performance in a single server setup with glusterfs. So now we have one server mounted (halfway replicate) and would like to tune performance. Does anyone have experience with some simple replacement like that? We had to find out that almost all performance options have exactly zero effect. The only thing that seems to make at least some difference is read-ahead on the server. We end up with around 4,5 - 5,5 minutes runtime of the scripts, which is on the edge as we need something quite below 5 minutes (just like nfs was). Our goal is to maximise performance in this setup and then try a real replication setup with two servers. The load itselfs looks like around 100 scripts starting at one time and processing their data. Any ideas? What nfs server are you using? The in-kernel one? Yes. You could try the unfs3booster server, which is the original unfs3 with our modifications for bug fixes and slight performance improvements. It should give better performance in certain cases since it avoids the FUSE bottleneck on the server. For more info, do take a look at this page: http://www.gluster.org/docs/index.php/Unfs3boosterConfiguration When using unfs3booster, please use GlusterFS release 2.0.6 since that has the required changes to make booster work with NFS. I read the docs, but I don't understand the advantage. Why should we use nfs as kind of a transport layer to an underlying glusterfs server, when we can easily export the service (i.e. glusterfs) itself. Remember, we don't want nfs on the client any longer, but a replicate setup with two servers (though we do not use it right now, but nevertheless it stays our primary goal). It sounds obvious to me that a nfs-over-gluster must be slower than a pure kernel-nfs. On the other hand glusterfs per se may even have some advantages on the network side, iff performance tuning (and of course the options themselves) is well designed. The first thing we noticed is that load dropped dramatically both on server and client when not using kernel-nfs. Client dropped from around 20 to around 4. Server dropped from around 10 to around 5. Since all boxes are pretty much dedicated to their respective jobs a lot of caching is going on anyways. So I would not expect nfs to have advantages only because it is kernel-driven. And the current numbers (loss of around 30% in performance) show that nfs performance is not completely out of reach. What advantages would you expect from using unfs3booster at all? Another thing we really did not understand is the _negative_ effect of adding iothreads on client or server. Our nfs setup needs around 90 nfs kernel threads to run smoothly. Every number greater than 8 iothreads reduces the performance of glusterfs measurably. -Shehjar -- Regards, Stephan ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] NFS replacement, rest stopped
Hello all, as told earlier we tried to replace a nfs-server/client combination in semi-production environment with a trivial one-server gluster setup. We thought at first that this pretty simple setup would allow some more testing. Unfortunately we have to stop those tests because it turns out that the client system has troubles with networking as soon as we start glusterfs. The client has three network cards, first is for internet use, second is for connection to glusterfs-server, third for collecting data from several other boxes. It turned out that the third interface had troubles soon after we started to work with glusterfs. We could not ping several hosts on the same lan, or packet delay was very high (up to 20 s). The effects were pretty weird, looked like a bad interface card. But switching back to kernel-nfs everything went back to normal. It really looks like glusterfs client has some problems, too. It looks like buffer re-usage or mem thrashing or pointer mixup or the like. Interestingly no problems were visible on the interface where the glusterfs was happening, I have no idea how something like this happens. Anyway, the story looks like someone will tell me it is the kernel networking that has troubles, just like reiserfs that has troubles or ext3 :-( To give you an idea what ugly things look like: Aug 31 08:20:16 heather kernel: [ cut here ] Aug 31 08:20:16 heather kernel: WARNING: at net/ipv4/tcp.c:1405 tcp_recvmsg+0x1c7/0x7b6() Aug 31 08:20:16 heather kernel: Hardware name: empty Aug 31 08:20:16 heather kernel: Modules linked in: nfs lockd nfs_acl sunrpc fuse loop i2c_i801 e100 i2c_core e1000e Aug 31 08:20:16 heather kernel: Pid: 31500, comm: netcat Not tainted 2.6.30.5 #1 Aug 31 08:20:16 heather kernel: Call Trace: Aug 31 08:20:16 heather kernel: [80431497] ? tcp_recvmsg+0x1c7/0x7b6 Aug 31 08:20:16 heather kernel: [80431497] ? tcp_recvmsg+0x1c7/0x7b6 Aug 31 08:20:16 heather kernel: [8023282d] ? warn_slowpath_common+0x77/0xa3 Aug 31 08:20:16 heather kernel: [80431497] ? tcp_recvmsg+0x1c7/0x7b6 Aug 31 08:20:16 heather kernel: [80401340] ? sock_common_recvmsg+0x30/0x45 Aug 31 08:20:16 heather kernel: [8029b3d8] ? mnt_drop_write+0x25/0x12e Aug 31 08:20:16 heather kernel: [803fee67] ? sock_aio_read+0x109/0x11d Aug 31 08:20:16 heather kernel: [80287131] ? do_sync_read+0xce/0x113 Aug 31 08:20:16 heather kernel: [80244348] ? autoremove_wake_function+0x0/0x2e Aug 31 08:20:16 heather kernel: [80293243] ? poll_select_copy_remaining+0xd0/0xf3 Aug 31 08:20:16 heather kernel: [80287b83] ? vfs_read+0xbd/0x133 Aug 31 08:20:16 heather kernel: [80287cb5] ? sys_read+0x45/0x6e Aug 31 08:20:16 heather kernel: [8020ae6b] ? system_call_fastpath+0x16/0x1b Aug 31 08:20:16 heather kernel: ---[ end trace 31e61d5bab6e7cc0 ]--- Hopefully you would not tell that netcat has problems, or not? Hopefully we can agree on the fact that there are nasty things going on inside this code and someone with better brain and kernel knowledge than me should give it a very close look. -- Regards, Stephan ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] Known Issues : Replicate will only self-heal if the files exist on the first subvolume. Server A- B works, Server A -B does not work.
On Sat, 29 Aug 2009 03:46:04 +0200 supp...@citytoo.com supp...@citytoo.com wrote: Hello, Known Issues : Replicate will only self-heal if the files exist on the first subvolume. Server A- B works, Server A -B does not work. When this probleme will be fixed because it's very important ? Ben Cordialement Hi Ben, really, don't push to hard in this direction, because this is easily solvable by running find on server b and statd'ing the filelist on server a. You may call that inconveniant, but at least there is a trivial solution. -- Regards, Stephan ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] Replication not working on server hang
[...] Glusterfs log only shows lines like this ones: [2009-08-28 09:19:28] E [client-protocol.c:292:call_bail] data2: bailing out frame LOOKUP(32) frame sent = 2009-08-28 08:49:18. frame-timeout = 1800 [2009-08-28 09:23:38] E [client-protocol.c:292:call_bail] data2: bailing out frame LOOKUP(32) frame sent = 2009-08-28 08:53:28. frame-timeout = 1800 Once server2 has been rebooted all gluster fs become available again on all clients and the hanged df and ls processes terminate, but difficult to understand why a replicated share that must survive to failure on one server does not. You are suffering from the problem we talked about few days ago on the list. If your local fs produces a deadlock somehow on one server glusterfs is currently unable to cope with the situation and just _waits_ for things to come. This deadlocks your clients, too, without any need. Your experience backs my critics on the handling of these situations. -- Regards, Stephan ___ Gluster-users mailing list Gluster-users@gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Re: [Gluster-users] 2.0.6
Hello Avati, back to our original problem of all-hanging glusterfs servers and clients. Today we got another hang with same look and feel, but this time we got something in the logs, please read and tell us how to further proceed. Configuration is as before. I send the whole log since boot, crash is visible at the end. We did the same testing as before, running two bonnies on two clients. Linux version 2.6.30.5 (r...@linux-tnpx) (gcc version 4.3.2 [gcc-4_3-branch revision 141291] (SUSE Linux) ) #1 SMP Tue Aug 18 12:06:06 CEST 2009 Command line: root=/dev/sda3 resume=/dev/sda1 splash=silent console=ttyS0,9600 console=tty0 KERNEL supported cpus: Intel GenuineIntel AMD AuthenticAMD Centaur CentaurHauls BIOS-provided physical RAM map: BIOS-e820: - 0009dc00 (usable) BIOS-e820: 0009dc00 - 000a (reserved) BIOS-e820: 000ca000 - 000cc000 (reserved) BIOS-e820: 000e4000 - 0010 (reserved) BIOS-e820: 0010 - d7e8 (usable) BIOS-e820: d7e8 - d7e8a000 (ACPI data) BIOS-e820: d7e8a000 - d7f0 (ACPI NVS) BIOS-e820: d7f0 - d800 (reserved) BIOS-e820: e000 - f000 (reserved) BIOS-e820: fec0 - fec1 (reserved) BIOS-e820: fee0 - fee01000 (reserved) BIOS-e820: ff00 - 0001 (reserved) BIOS-e820: 0001 - 00012800 (usable) DMI present. Phoenix BIOS detected: BIOS may corrupt low RAM, working around it. last_pfn = 0x128000 max_arch_pfn = 0x1 x86 PAT enabled: cpu 0, old 0x7040600070406, new 0x7010600070106 last_pfn = 0xd7e80 max_arch_pfn = 0x1 init_memory_mapping: -d7e8 init_memory_mapping: 0001-00012800 ACPI: RSDP 000f6390 00014 (v00 PTLTD ) ACPI: RSDT d7e822bb 0003C (v01 PTLTDRSDT 0604 LTP ) ACPI: FACP d7e89e54 00074 (v01 INTEL 0604 PTL 0003) ACPI: DSDT d7e83b29 0632B (v01 INTEL MUKLTEO2 0604 MSFT 010E) ACPI: FACS d7e8afc0 00040 ACPI: MCFG d7e89ec8 0003C (v01 PTLTDMCFG 0604 LTP ) ACPI: APIC d7e89f04 00084 (v01 PTLTD APIC 0604 LTP ) ACPI: BOOT d7e89f88 00028 (v01 PTLTD $SBFTBL$ 0604 LTP 0001) ACPI: SPCR d7e89fb0 00050 (v01 PTLTD $UCRTBL$ 0604 PTL 0001) ACPI: SSDT d7e822f7 013EC (v01 PmRefCpuPm 3000 INTL 20050228) (7 early reservations) == bootmem [00 - 012800] #0 [00 - 001000] BIOS data page == [00 - 001000] #1 [006000 - 008000] TRAMPOLINE == [006000 - 008000] #2 [20 - 6e5778]TEXT DATA BSS == [20 - 6e5778] #3 [09dc00 - 10]BIOS reserved == [09dc00 - 10] #4 [6e6000 - 6e6174] BRK == [6e6000 - 6e6174] #5 [01 - 014000] PGTABLE == [01 - 014000] #6 [014000 - 015000] PGTABLE == [014000 - 015000] found SMP MP-table at [880f63c0] f63c0 Zone PFN ranges: DMA 0x0010 - 0x1000 DMA320x1000 - 0x0010 Normal 0x0010 - 0x00128000 Movable zone start PFN for each node early_node_map[3] active PFN ranges 0: 0x0010 - 0x009d 0: 0x0100 - 0x000d7e80 0: 0x0010 - 0x00128000 ACPI: PM-Timer IO Port: 0x1008 ACPI: LAPIC (acpi_id[0x00] lapic_id[0x00] enabled) ACPI: LAPIC (acpi_id[0x01] lapic_id[0x01] enabled) ACPI: LAPIC (acpi_id[0x02] lapic_id[0x02] enabled) ACPI: LAPIC (acpi_id[0x03] lapic_id[0x03] enabled) ACPI: LAPIC_NMI (acpi_id[0x00] high edge lint[0x1]) ACPI: LAPIC_NMI (acpi_id[0x01] high edge lint[0x1]) ACPI: LAPIC_NMI (acpi_id[0x02] high edge lint[0x1]) ACPI: LAPIC_NMI (acpi_id[0x03] high edge lint[0x1]) ACPI: IOAPIC (id[0x04] address[0xfec0] gsi_base[0]) IOAPIC[0]: apic_id 4, version 0, address 0xfec0, GSI 0-23 ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 high edge) ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level) Using ACPI (MADT) for SMP configuration information SMP: Allowing 4 CPUs, 0 hotplug CPUs PM: Registered nosave memory: 0009d000 - 0009e000 PM: Registered nosave memory: 0009e000 - 000a PM: Registered nosave memory: 000a - 000ca000 PM: Registered nosave memory: 000ca000 - 000cc000 PM: Registered nosave memory: 000cc000 - 000e4000 PM: Registered nosave memory: 000e4000 - 0010 PM: Registered nosave memory: d7e8 - d7e8a000 PM: Registered nosave memory: d7e8a000 - d7f0 PM: Registered nosave memory: d7f0 - d800 PM: Registered nosave memory: d800 - e000 PM: Registered nosave memory: e000 -