Re: Subversion Windows Performance compared to Linux

2014-04-26 Thread Roman Naumenko

Branko Čibej said the following, on 25-04-14, 4:26 PM:

On 25.04.2014 19:09, Roman Naumenko wrote:

That was a known consequence of moving to SQLite for storage of the
metadata. SVN 1.8 offers a solution for those that can use it:
http://subversion.apache.org/docs/release-notes/1.8.html#exclusivelocking

Mark, thank for the link.  There is indeed a nice performance boost to the 
client with exclusive access.
Anyone who insists on using Subversion on NFS, whether as client or 
server, should be aware of two things:


  * File locking is, at best, flaky on NFS (even NFSv4+); and it's
always slow. This will affect the working copy.
  * NFS does not guarantee that all clients see renames as atomic
operations, which affects both working copy and repository, and in
the worst case, can cause corruption. This is more likely if you
allow both local and remote access to the same files.

In short, no-one should ever assume that NFS behaves as a local file 
system; and even less complain when it doesn't. To be fair, CIFS isn't 
much better. Furthermore, these limitations and caveats are not 
specific to Subversion.


If you absolutely must put your working copies or repositories on 
non-local storage, you should use a SAN with a real, multi-homed 
distributed filesystem. Anything else is half-baked, at least as far 
as data integrity is concerned.

But git clients are doing pretty good on nfs, no?

--Roman


Re: Subversion Windows Performance compared to Linux

2014-04-25 Thread Roman Naumenko
- Original Message - 

 On Fri, Apr 25, 2014 at 11:10 AM, Roman Naumenko  ro...@naumenko.ca
  wrote:

  - Original Message -
 
   On Tue, Apr 22, 2014 at 9:53 AM, Mark Phippard 
   markp...@gmail.com
   
   I remember this. The deadly operation was the initial checkout on
   network based file systems, especially CIFS on the Windows boxes.
   The
   few servers that ran NFS acted much more like Linux hosts, or
   like
   Linux hosts usin gNFS. A number of changes in Subversion, over
   time,
   reduced the perfidious chattiness that hampered CIFS baed
   checkouts,
 
   and all Windows users with network mounted working copies became
 
   *much* happier.
 
  
 
   Let's do be careful to draw distinctions between local file
   systems,
 
   like NTFS and ext4, and network file systems like CIFS and NFS.
   I'm
 
   afraid it's common to handwave those away as not making a
   difference,
 
   and they really do.
  Maybe windows users are happier (they are not), but Linux users are
  just scratching their heads over svn performance.
 

  svn, version 1.7.8 (r1419691), standard redhat vm.
 

  NFS:
 
  A benchmark-svn/trunk/notes/tree-conflicts/scratch-pad.txt
 
  A benchmark-svn/trunk/notes/tree-conflicts/use-cases-resolution.txt
 
  A benchmark-svn/trunk/notes/tree-conflicts/design-overview.txt
 
  A benchmark-svn/trunk/notes/tree-conflicts/detection.txt
 
  ^Csvn: E200015: Caught signal
 

  real 0m26.980s
 
  user 0m0.454s
 
  sys 0m1.281s
 
  [11:02:30 user@host:~/svn_tests ] $ du -sh benchmark-svn
 
  12M benchmark-svn
 

  Local:
 
  A
  /tmp/benchmark-svn/branches/1.6.x/subversion/libsvn_fs_base/bdb/reps-table.c
 
  A
  /tmp/benchmark-svn/branches/1.6.x/subversion/libsvn_fs_base/bdb/bdb_compat.h
 
  ^Csvn: E200015: Caught signal
 

  real 0m13.241s
 
  user 0m3.939s
 
  sys 0m4.731s
 
  [11:02:30 user@host:~/svn_tests ] $ du -sh /tmp/benchmark-svn
 
  144M /tmp/benchmark-svn
 

  What we've got here, 20x or something?
 

 That was a known consequence of moving to SQLite for storage of the
 metadata. SVN 1.8 offers a solution for those that can use it:

 http://subversion.apache.org/docs/release-notes/1.8.html#exclusivelocking

Mark, thank for the link.  There is indeed a nice performance boost to the 
client with exclusive access.

--Roman


Re: Subversion Windows Performance compared to Linux

2014-04-24 Thread Roman Naumenko

Florian Ludwig said the following, on 16-04-14, 1:13 PM:

Hi,

this topic was raised several times in the past - the answers range 
from will be better/solved in the next version 1.7 or it is due to 
ntfs vs ext3/4 or it's the AV, network setup or the Windows file 
indexing service.  After disabling all those and running a test 
checkout on Linux and Windows on the same machine I still get a result 
of Linux being 7.3x times faster. Any ideas why?


Commands used to test:
 * Linux: $ time svn co svn://10.0.0.1/test http://10.0.0.1/test  
/dev/null
 * Windows: PS Measure-Command { svn co svn://10.0.0.1/test 
http://10.0.0.1/test  $null }


Results (tests run twice, better result taken):
  * Linux on ext4 (journaling enabled): 1m 16s
  * Linux on NTFS*: 3m 29s
  * Windows 7 on NTFS*: 9m 19s
I can confirm this results (they were even more significant when I 
tested it, like 10x gap easily).
And this has nothing to do with windows or anything underlying, it's 
just not efficient software design.


The reason for such conslution is that when you copy repo data locally, 
its almost equially fast on windows w/ linux.


--Roman


Re: Subversion Windows Performance compared to Linux

2014-04-24 Thread Roman Naumenko

Johan Corveleyn said the following, on 22-04-14, 9:30 AM:

On Tue, Apr 22, 2014 at 2:55 PM, Florian Ludwig
vierzigundz...@gmail.com wrote:

 From your numbers I deduce that the performance degradation can be
attributed partly to NTFS vs. ext4, and partly to Windows7 vs. Linux:
* NTFS vs. ext4: roughly a factor 3 slower.
* Windows 7 vs. Linux: roughly a factor 2.5 slower.

You assume that the file operation performance of Windows on NTFS and Linux
on NTFS is the same - which I am sure it is not.  First of all the NTFS
driver on Linux is FUSE-based so it runs in userspace and therefor slower
than kernel based drivers such as ext4.  Also ext4 is one of the most used
file system on Linux so I expect its code to be much more optimized.

Okay, I handwaved a bit too much. Maybe we should just take the
Linux+NTFS numbers with a grain of salt then, and mainly focus on
Linux+ext4 vs. Windows+NTFS. The fact remains that there are two
variables changing. But maybe it's not a big issue for this
comparison, and it's almost unavoidable.


And nfs as well, please (sorry for hijacking the thread).

Perfomance on nfs is just terrible (for all svn client versions).
Take any linux box, checkout to local fs and checkout to nfs vol: you 
gonna be amazed.


The nfs thing should be a big deal, since build servers (jenkins and 
other such) are severely impacted by this design.


--Roman


Re: Subversion Windows Performance compared to Linux

2014-04-24 Thread Roman Naumenko

Grierson, David said the following, on 23-04-14, 5:47 AM:

Latency Numbers Every Programmer Should Know:

https://gist.github.com/jboner/2841832

Always useful to have in mind when considering your benchmarking environment.


Looks like svn checkouts repos on Windows strictly through Netherlands.

--Roman


Re: Balancing and proxing

2013-08-09 Thread Roman Naumenko

Ryan Schmidt said the following, on 09-08-13 7:12 PM:

On Aug 9, 2013, at 15:40, Naumenko, Roman wrote:

I wanted to check if it's possible to configure subversion in
master-slave mode with some sort of common URL on the proxy server or
loadbalancer, so end users wouldn't bother with different names for
slave/master/readonly and geolocal names.

You can configure any number of read-only slaves which maintain copies of the master 
repository with a very slight delay. The mirroring and keeping in sync would be 
accomplished using svnsync. To access the repositories, users would use the hostname of a 
mirror near to them. For read operations, they would occur on the mirror and therefore be 
faster than accessing the farther-away master. For write operations, you configure the 
mirror to proxy those requests back to the master. (Search for write-through 
proxy for more on this.) In this way the users only need to know the address of 
their closest mirror; they do not need to know which is the master or to know its address.
I wanted to have universal URL, which might resolve to different IP 
based on location - for performance.

But more important, I'd like to have a few nodes handling writes.

Of course, it would be ideal if subversion nodes could just share a
storage, so any sort of requests from a load balancer can processed by
any node without need to replicate changes over network.

If your storage is robust (i.e. a cluster filesystem, such as Xsan) and you 
want to run multiple Subversion servers that each have access to the same 
repositories on the same storage, then yes, you can do that instead.
The storage is robust enough - NetApp or possibly SAN with all 
enterprise bells and whistles.


Ok, so if  multiple nodes are accessing the same mount point with repos 
data, will they be able to handle writes from multiple clients 
correctly? Thinking out loud: yes, they should - since it's no 
difference for a repository if multiple clients commiting over same 
server or few distributed nodes. Or is it different when the same 
process handles all requests?
Does it mean that HA and loadbalancing should be pretty easy to setup?  
It should be, yet the information is almost absent about examples of 
such architecture. I must be missing something here.

--Roman


Re: Balancing and proxing

2013-08-09 Thread Roman Naumenko

Nico Kadel-Garcia said the following, on 09-08-13 6:45 PM:

On Fri, Aug 9, 2013 at 4:40 PM, Naumenko, Roman
roman.naume...@rbccm.com wrote:

Hi,

I wanted to check if it's possible to configure subversion in
master-slave mode with some sort of common URL on the proxy server or
loadbalancer, so end users wouldn't bother with different names for
slave/master/readonly and geolocal names.

Of course, it would be ideal if subversion nodes could just share a
storage, so any sort of requests from a load balancer can processed by
any node without need to replicate changes over network.

Wandisco publishes a multiple master toolkit that might solve your
issues. They charge money for it, but it seems to be quite intelligent
and has good reports here of its high availability behavior.

You mean this one (svn clustering)?
http://www.wandisco.com/get?f=documentation/datasheets/DataSheet-Clustering.pdf

It doesn't look like it's a simple loadbalancing architecture with a 
shared storage for repositories.
There is some replication and synchronization involved, automatic 
failover, etc.

Is anybody using it, what its like?

--Roman


Re: Balancing and proxing

2013-08-09 Thread Roman Naumenko


Ryan Schmidt said the following, on 09-08-13 9:15 PM:

On Aug 9, 2013, at 19:00, Roman Naumenko wrote:

Ryan Schmidt said the following, on 09-08-13 7:12 PM:

You can configure any number of read-only slaves which maintain copies of the master 
repository with a very slight delay. The mirroring and keeping in sync would be 
accomplished using svnsync. To access the repositories, users would use the hostname of a 
mirror near to them. For read operations, they would occur on the mirror and therefore be 
faster than accessing the farther-away master. For write operations, you configure the 
mirror to proxy those requests back to the master. (Search for write-through 
proxy for more on this.) In this way the users only need to know the address of 
their closest mirror; they do not need to know which is the master or to know its address.

I wanted to have universal URL, which might resolve to different IP based on 
location - for performance.

I'm not familiar with how to set that up at the DNS level but if you are then 
go for it.
Views in bind or something similar, DNS server will reply with IP that 
depends on the request's originating network.



But more important, I'd like to have a few nodes handling writes.

Ah yes. Well then that's different.

You must have one heck of a large svn installation for that to be a bottleneck.


One day it might grow there, but even with the moderate load it is still a huge 
convenience when pair or more frontends available to handle the load, can take 
one down for maintenance any time. VMs can be used instead of physical box too 
and sized more adequately.

Of course, it would be ideal if subversion nodes could just share a
storage, so any sort of requests from a load balancer can processed by
any node without need to replicate changes over network.


Of course, it would be ideal if subversion nodes could just share a
storage, so any sort of requests from a load balancer can processed by
any node without need to replicate changes over network.

If your storage is robust (i.e. a cluster filesystem, such as Xsan) and you 
want to run multiple Subversion servers that each have access to the same 
repositories on the same storage, then yes, you can do that instead.

The storage is robust enough - NetApp or possibly SAN with all enterprise bells 
and whistles.

It would need to be not just a SAN but a SAN with a cluster filesystem, based 
on previous conversations (see below).
Yeah, of course - SAN storage will require own layer to handle data 
sharing.

Few mentioned GFS worked.

Ok, so if  multiple nodes are accessing the same mount point with repos data, 
will they be able to handle writes from multiple clients correctly? Thinking 
out loud: yes, they should - since it's no difference for a repository if 
multiple clients commiting over same server or few distributed nodes. Or is it 
different when the same process handles all requests?

I have not set it up myself, but I participated in discussions about it on this 
list some years ago:

http://svn.haxx.se/users/archive-2006-10/0195.shtml

http://svn.haxx.se/users/archive-2007-05/0214.shtml

You may want to read those threads completely and carefully to get all the 
nuances. And of course information may have changed since then.
Tom Mornini tmornini_at_engineyard.com  confirmed that GFS works in 
that thread and the other too, 
http://svn.haxx.se/users/archive-2007-01/1307.shtml

But again, there is no official confirmation or reference architecture.

It seems like the number of repositories or the load on a server is 
never large enough to make administrators (or subversion developers to 
some extent) designing or implementing load-balancing cluster. Or maybe 
it is close to huge, but in most cases svnsycn + write-though solve the 
problem.
On the other side, there are commercial solutions available, so demand 
must be there :)


--Roman