Re: Subversion Windows Performance compared to Linux
Branko Čibej said the following, on 25-04-14, 4:26 PM: On 25.04.2014 19:09, Roman Naumenko wrote: That was a known consequence of moving to SQLite for storage of the metadata. SVN 1.8 offers a solution for those that can use it: http://subversion.apache.org/docs/release-notes/1.8.html#exclusivelocking Mark, thank for the link. There is indeed a nice performance boost to the client with exclusive access. Anyone who insists on using Subversion on NFS, whether as client or server, should be aware of two things: * File locking is, at best, flaky on NFS (even NFSv4+); and it's always slow. This will affect the working copy. * NFS does not guarantee that all clients see renames as atomic operations, which affects both working copy and repository, and in the worst case, can cause corruption. This is more likely if you allow both local and remote access to the same files. In short, no-one should ever assume that NFS behaves as a local file system; and even less complain when it doesn't. To be fair, CIFS isn't much better. Furthermore, these limitations and caveats are not specific to Subversion. If you absolutely must put your working copies or repositories on non-local storage, you should use a SAN with a real, multi-homed distributed filesystem. Anything else is half-baked, at least as far as data integrity is concerned. But git clients are doing pretty good on nfs, no? --Roman
Re: Subversion Windows Performance compared to Linux
- Original Message - On Fri, Apr 25, 2014 at 11:10 AM, Roman Naumenko ro...@naumenko.ca wrote: - Original Message - On Tue, Apr 22, 2014 at 9:53 AM, Mark Phippard markp...@gmail.com I remember this. The deadly operation was the initial checkout on network based file systems, especially CIFS on the Windows boxes. The few servers that ran NFS acted much more like Linux hosts, or like Linux hosts usin gNFS. A number of changes in Subversion, over time, reduced the perfidious chattiness that hampered CIFS baed checkouts, and all Windows users with network mounted working copies became *much* happier. Let's do be careful to draw distinctions between local file systems, like NTFS and ext4, and network file systems like CIFS and NFS. I'm afraid it's common to handwave those away as not making a difference, and they really do. Maybe windows users are happier (they are not), but Linux users are just scratching their heads over svn performance. svn, version 1.7.8 (r1419691), standard redhat vm. NFS: A benchmark-svn/trunk/notes/tree-conflicts/scratch-pad.txt A benchmark-svn/trunk/notes/tree-conflicts/use-cases-resolution.txt A benchmark-svn/trunk/notes/tree-conflicts/design-overview.txt A benchmark-svn/trunk/notes/tree-conflicts/detection.txt ^Csvn: E200015: Caught signal real 0m26.980s user 0m0.454s sys 0m1.281s [11:02:30 user@host:~/svn_tests ] $ du -sh benchmark-svn 12M benchmark-svn Local: A /tmp/benchmark-svn/branches/1.6.x/subversion/libsvn_fs_base/bdb/reps-table.c A /tmp/benchmark-svn/branches/1.6.x/subversion/libsvn_fs_base/bdb/bdb_compat.h ^Csvn: E200015: Caught signal real 0m13.241s user 0m3.939s sys 0m4.731s [11:02:30 user@host:~/svn_tests ] $ du -sh /tmp/benchmark-svn 144M /tmp/benchmark-svn What we've got here, 20x or something? That was a known consequence of moving to SQLite for storage of the metadata. SVN 1.8 offers a solution for those that can use it: http://subversion.apache.org/docs/release-notes/1.8.html#exclusivelocking Mark, thank for the link. There is indeed a nice performance boost to the client with exclusive access. --Roman
Re: Subversion Windows Performance compared to Linux
Florian Ludwig said the following, on 16-04-14, 1:13 PM: Hi, this topic was raised several times in the past - the answers range from will be better/solved in the next version 1.7 or it is due to ntfs vs ext3/4 or it's the AV, network setup or the Windows file indexing service. After disabling all those and running a test checkout on Linux and Windows on the same machine I still get a result of Linux being 7.3x times faster. Any ideas why? Commands used to test: * Linux: $ time svn co svn://10.0.0.1/test http://10.0.0.1/test /dev/null * Windows: PS Measure-Command { svn co svn://10.0.0.1/test http://10.0.0.1/test $null } Results (tests run twice, better result taken): * Linux on ext4 (journaling enabled): 1m 16s * Linux on NTFS*: 3m 29s * Windows 7 on NTFS*: 9m 19s I can confirm this results (they were even more significant when I tested it, like 10x gap easily). And this has nothing to do with windows or anything underlying, it's just not efficient software design. The reason for such conslution is that when you copy repo data locally, its almost equially fast on windows w/ linux. --Roman
Re: Subversion Windows Performance compared to Linux
Johan Corveleyn said the following, on 22-04-14, 9:30 AM: On Tue, Apr 22, 2014 at 2:55 PM, Florian Ludwig vierzigundz...@gmail.com wrote: From your numbers I deduce that the performance degradation can be attributed partly to NTFS vs. ext4, and partly to Windows7 vs. Linux: * NTFS vs. ext4: roughly a factor 3 slower. * Windows 7 vs. Linux: roughly a factor 2.5 slower. You assume that the file operation performance of Windows on NTFS and Linux on NTFS is the same - which I am sure it is not. First of all the NTFS driver on Linux is FUSE-based so it runs in userspace and therefor slower than kernel based drivers such as ext4. Also ext4 is one of the most used file system on Linux so I expect its code to be much more optimized. Okay, I handwaved a bit too much. Maybe we should just take the Linux+NTFS numbers with a grain of salt then, and mainly focus on Linux+ext4 vs. Windows+NTFS. The fact remains that there are two variables changing. But maybe it's not a big issue for this comparison, and it's almost unavoidable. And nfs as well, please (sorry for hijacking the thread). Perfomance on nfs is just terrible (for all svn client versions). Take any linux box, checkout to local fs and checkout to nfs vol: you gonna be amazed. The nfs thing should be a big deal, since build servers (jenkins and other such) are severely impacted by this design. --Roman
Re: Subversion Windows Performance compared to Linux
Grierson, David said the following, on 23-04-14, 5:47 AM: Latency Numbers Every Programmer Should Know: https://gist.github.com/jboner/2841832 Always useful to have in mind when considering your benchmarking environment. Looks like svn checkouts repos on Windows strictly through Netherlands. --Roman
Re: Balancing and proxing
Ryan Schmidt said the following, on 09-08-13 7:12 PM: On Aug 9, 2013, at 15:40, Naumenko, Roman wrote: I wanted to check if it's possible to configure subversion in master-slave mode with some sort of common URL on the proxy server or loadbalancer, so end users wouldn't bother with different names for slave/master/readonly and geolocal names. You can configure any number of read-only slaves which maintain copies of the master repository with a very slight delay. The mirroring and keeping in sync would be accomplished using svnsync. To access the repositories, users would use the hostname of a mirror near to them. For read operations, they would occur on the mirror and therefore be faster than accessing the farther-away master. For write operations, you configure the mirror to proxy those requests back to the master. (Search for write-through proxy for more on this.) In this way the users only need to know the address of their closest mirror; they do not need to know which is the master or to know its address. I wanted to have universal URL, which might resolve to different IP based on location - for performance. But more important, I'd like to have a few nodes handling writes. Of course, it would be ideal if subversion nodes could just share a storage, so any sort of requests from a load balancer can processed by any node without need to replicate changes over network. If your storage is robust (i.e. a cluster filesystem, such as Xsan) and you want to run multiple Subversion servers that each have access to the same repositories on the same storage, then yes, you can do that instead. The storage is robust enough - NetApp or possibly SAN with all enterprise bells and whistles. Ok, so if multiple nodes are accessing the same mount point with repos data, will they be able to handle writes from multiple clients correctly? Thinking out loud: yes, they should - since it's no difference for a repository if multiple clients commiting over same server or few distributed nodes. Or is it different when the same process handles all requests? Does it mean that HA and loadbalancing should be pretty easy to setup? It should be, yet the information is almost absent about examples of such architecture. I must be missing something here. --Roman
Re: Balancing and proxing
Nico Kadel-Garcia said the following, on 09-08-13 6:45 PM: On Fri, Aug 9, 2013 at 4:40 PM, Naumenko, Roman roman.naume...@rbccm.com wrote: Hi, I wanted to check if it's possible to configure subversion in master-slave mode with some sort of common URL on the proxy server or loadbalancer, so end users wouldn't bother with different names for slave/master/readonly and geolocal names. Of course, it would be ideal if subversion nodes could just share a storage, so any sort of requests from a load balancer can processed by any node without need to replicate changes over network. Wandisco publishes a multiple master toolkit that might solve your issues. They charge money for it, but it seems to be quite intelligent and has good reports here of its high availability behavior. You mean this one (svn clustering)? http://www.wandisco.com/get?f=documentation/datasheets/DataSheet-Clustering.pdf It doesn't look like it's a simple loadbalancing architecture with a shared storage for repositories. There is some replication and synchronization involved, automatic failover, etc. Is anybody using it, what its like? --Roman
Re: Balancing and proxing
Ryan Schmidt said the following, on 09-08-13 9:15 PM: On Aug 9, 2013, at 19:00, Roman Naumenko wrote: Ryan Schmidt said the following, on 09-08-13 7:12 PM: You can configure any number of read-only slaves which maintain copies of the master repository with a very slight delay. The mirroring and keeping in sync would be accomplished using svnsync. To access the repositories, users would use the hostname of a mirror near to them. For read operations, they would occur on the mirror and therefore be faster than accessing the farther-away master. For write operations, you configure the mirror to proxy those requests back to the master. (Search for write-through proxy for more on this.) In this way the users only need to know the address of their closest mirror; they do not need to know which is the master or to know its address. I wanted to have universal URL, which might resolve to different IP based on location - for performance. I'm not familiar with how to set that up at the DNS level but if you are then go for it. Views in bind or something similar, DNS server will reply with IP that depends on the request's originating network. But more important, I'd like to have a few nodes handling writes. Ah yes. Well then that's different. You must have one heck of a large svn installation for that to be a bottleneck. One day it might grow there, but even with the moderate load it is still a huge convenience when pair or more frontends available to handle the load, can take one down for maintenance any time. VMs can be used instead of physical box too and sized more adequately. Of course, it would be ideal if subversion nodes could just share a storage, so any sort of requests from a load balancer can processed by any node without need to replicate changes over network. Of course, it would be ideal if subversion nodes could just share a storage, so any sort of requests from a load balancer can processed by any node without need to replicate changes over network. If your storage is robust (i.e. a cluster filesystem, such as Xsan) and you want to run multiple Subversion servers that each have access to the same repositories on the same storage, then yes, you can do that instead. The storage is robust enough - NetApp or possibly SAN with all enterprise bells and whistles. It would need to be not just a SAN but a SAN with a cluster filesystem, based on previous conversations (see below). Yeah, of course - SAN storage will require own layer to handle data sharing. Few mentioned GFS worked. Ok, so if multiple nodes are accessing the same mount point with repos data, will they be able to handle writes from multiple clients correctly? Thinking out loud: yes, they should - since it's no difference for a repository if multiple clients commiting over same server or few distributed nodes. Or is it different when the same process handles all requests? I have not set it up myself, but I participated in discussions about it on this list some years ago: http://svn.haxx.se/users/archive-2006-10/0195.shtml http://svn.haxx.se/users/archive-2007-05/0214.shtml You may want to read those threads completely and carefully to get all the nuances. And of course information may have changed since then. Tom Mornini tmornini_at_engineyard.com confirmed that GFS works in that thread and the other too, http://svn.haxx.se/users/archive-2007-01/1307.shtml But again, there is no official confirmation or reference architecture. It seems like the number of repositories or the load on a server is never large enough to make administrators (or subversion developers to some extent) designing or implementing load-balancing cluster. Or maybe it is close to huge, but in most cases svnsycn + write-though solve the problem. On the other side, there are commercial solutions available, so demand must be there :) --Roman