Re: [Discuss] Dev Ops - architecture (local not cloud)
On 12/07/2013 09:47 PM, Edward Ned Harvey (blu) wrote: From: discuss-bounces+blu=nedharvey@blu.org [mailto:discuss- bounces+blu=nedharvey@blu.org] On Behalf Of Greg Rundlett (freephile) I think it's pretty obvious why it's not performing: user home directories (where developers compile) should not be NFS mounted. [1] The source repositories themselves should also not be stored on a NAS. For high performance hybrid distributed/monolithic environments, I've at a few companies, used systems that were generally interchangeable clones of each other, but each one has a local /scratch (or /no-backup) directory, and each one can access the others via NFS automount /scratches/machine1/ (or /no-backup-automount/machine1/) If I had it do repeat now, I would look at ceph or gluster, to provide a unified namespace while leaving the underlying storage distributed. But it will take quite a bit of experimentation / configuration to get the desired performance characteristics. Autofs has the advantage of simplicity to configure. I highly recommend moosefs instead of ceph or gluster http://www.moosefs.org/ A problem I've seen IT folks (including myself, until I learned better) make over and over was: They use raid5 or raid6 or raid-DP, believing they get redundancy plus performance, but when you benchmark different configurations, you find, they only perform well for large sequential operations. They perform like a single disk (sometimes worse) when you have small random IO, which is unfortunately the norm. I highly, strongly recommend, building your storage out of something more similar to Raid-10. This performs much, much, much better for random IO, which is the typical case. Also: A single disk performance is about 1Gbit. So you need to make your storage network something much faster. The next logical step up would be 10Gb ether, but in terms of bang for buck, you get a LOT more if you go to Infiniband or Fibrechannel instead. ___ Discuss mailing list Discuss@blu.org http://lists.blu.org/mailman/listinfo/discuss ___ Discuss mailing list Discuss@blu.org http://lists.blu.org/mailman/listinfo/discuss
Re: [Discuss] Dev Ops - architecture (local not cloud)
On Sun, 15 Dec 2013, Edward Ned Harvey (blu) wrote: From: discuss-bounces+blu=nedharvey@blu.org [mailto:discuss- bounces+blu=nedharvey@blu.org] On Behalf Of Edward Ned Harvey (blu) I quit about 1.5 years ago from a 9,000 person Fortune 500 company that still used 10/100 hubs in the closet. So I guess it's all a matter of perspective. To me, that was very relevant, and to the people still working there, for all I know, it probably still is. As hard as it is for me to believe that anybody out there is still using hubs, based on all my personal experience seeing things I couldn't believe at that company and others, it's even more difficult for me to believe that the aforementioned company is unique in such matters. The great thing about hubs is that if plug both ends of the same ethernet cable into two ports on the same hub, it doesn't disturb the rest of the network. That means allowing desktop hubs is pretty safe, while allowing desktop dumb (meaning no Spanning Tree) switches is courting danger from careless users. So it is too bad hubs are no longer available, and haven't been for many years. We even bought an HP "Hub", but it turned out to be a switch in reality, just advertised as a hub for reasons unknown.. The only desktop switch (meaning small, inexpensive and 8 or fewer ports) I know of that supports Spanning Tree is a Cisco Series 200 Model 8. But that box doesn't auto-sense full/half duplex correctly with the Dell or Netgear switches in our wiring closets. So that one is out. Does any know of another? Daniel Feenberg NBER ___ Discuss mailing list Discuss@blu.org http://lists.blu.org/mailman/listinfo/discuss
Re: [Discuss] Dev Ops - architecture (local not cloud)
> From: discuss-bounces+blu=nedharvey@blu.org [mailto:discuss- > bounces+blu=nedharvey@blu.org] On Behalf Of Edward Ned Harvey > (blu) > > I quit about 1.5 years ago from a 9,000 person Fortune 500 company that still > used 10/100 hubs in the closet. So I guess it's all a matter of perspective. > To > me, that was very relevant, and to the people still working there, for all I > know, it probably still is. As hard as it is for me to believe that anybody out there is still using hubs, based on all my personal experience seeing things I couldn't believe at that company and others, it's even more difficult for me to believe that the aforementioned company is unique in such matters. ___ Discuss mailing list Discuss@blu.org http://lists.blu.org/mailman/listinfo/discuss
Re: [Discuss] Dev Ops - architecture (local not cloud)
> From: Bill Bogstad [mailto:bogs...@pobox.com] > Sent: Saturday, December 14, 2013 5:44 PM > To: Edward Ned Harvey (blu) > > Mostly I was > trying to suggest that talking about collisions in the context of > Ethernet in this century is not particularly useful. I quit about 1.5 years ago from a 9,000 person Fortune 500 company that still used 10/100 hubs in the closet. So I guess it's all a matter of perspective. To me, that was very relevant, and to the people still working there, for all I know, it probably still is. ___ Discuss mailing list Discuss@blu.org http://lists.blu.org/mailman/listinfo/discuss
Re: [Discuss] Dev Ops - architecture (local not cloud)
On Sat, Dec 14, 2013 at 12:48 AM, Edward Ned Harvey (blu) wrote: This is the behavior of other >> network switching topologies (in particular IB and FC) but it is not the >> behavior of Ethernet. Because Ethernet is asynchronous, buffered, store >> and forward, with flow control packets and collisions... Sure, the most >> intelligent switches can eliminate collisions, but flow control is still >> necessary, >> buffering is still necessary... You have network overhead, and congestion >> leads to degradation of efficiency. Each of the 10 clients might be getting >> 5% >> of the bandwidth, which is an ungraceful degradation. >> >> Ed: Can you define what you mean by "collision" in the context of an >> Ethernet switch where twisted pair wiring is being used? (i.e. any >> of the commonly used *BaseT wiring systems) > > Did you stop reading at the first instance of the word "collision?" Because > I think I went into that immediately thereafter. Switches eliminate > collisions (although hubs did not) but everything else is still relevant. I read everything that you wrote. Including the statement "most intelligent switches".I assumed that the modifiers you applied to switches meant something. I was hoping that you would tell how they differed from regular generic Ethernet switches. Or perhaps you were referring to ancient thinnet or thicknet Ethernet switches where collisions could place on the individual segments. Mostly I was trying to suggest that talking about collisions in the context of Ethernet in this century is not particularly useful. It may very well be that all of the other ills that you ascribe to Ethernet are true, but there is really no reason to talk about collisions anymore. Bill Bogstad ___ Discuss mailing list Discuss@blu.org http://lists.blu.org/mailman/listinfo/discuss
Re: [Discuss] Dev Ops - architecture (local not cloud)
> From: Bill Bogstad [mailto:bogs...@pobox.com] > Sent: Friday, December 13, 2013 5:49 PM > To: Edward Ned Harvey (blu) > Cc: GNHLUG; blu > Subject: Re: [Discuss] Dev Ops - architecture (local not cloud) > > On Fri, Dec 13, 2013 at 1:42 PM, Edward Ned Harvey (blu) > wrote: > >> From: discuss-bounces+blu=nedharvey@blu.org [mailto:discuss- > >> bounces+blu=nedharvey@blu.org] On Behalf Of Kent Borg > > >> Something else I long ago observed: Because ethernet degrades > >> gracefully it always operates degraded. > > > > Ethernet does NOT degrade gracefully. A graceful degradation would be: > You have 11 machines on a network together. 1 is a server, and 10 are > clients. All 10 clients hammer the server, and all 10 of them each get 10% of > the bandwidth that the server can sustain. This is the behavior of other > network switching topologies (in particular IB and FC) but it is not the > behavior of Ethernet. Because Ethernet is asynchronous, buffered, store > and forward, with flow control packets and collisions... Sure, the most > intelligent switches can eliminate collisions, but flow control is still > necessary, > buffering is still necessary... You have network overhead, and congestion > leads to degradation of efficiency. Each of the 10 clients might be getting > 5% > of the bandwidth, which is an ungraceful degradation. > > Ed: Can you define what you mean by "collision" in the context of an > Ethernet switch where twisted pair wiring is being used? (i.e. any > of the commonly used *BaseT wiring systems) Did you stop reading at the first instance of the word "collision?" Because I think I went into that immediately thereafter. Switches eliminate collisions (although hubs did not) but everything else is still relevant. ___ Discuss mailing list Discuss@blu.org http://lists.blu.org/mailman/listinfo/discuss
Re: [Discuss] Dev Ops - architecture (local not cloud)
On Fri, Dec 13, 2013 at 1:42 PM, Edward Ned Harvey (blu) wrote: >> From: discuss-bounces+blu=nedharvey@blu.org [mailto:discuss- >> bounces+blu=nedharvey@blu.org] On Behalf Of Kent Borg >> Something else I long ago observed: Because ethernet degrades gracefully it >> always operates degraded. > > Ethernet does NOT degrade gracefully. A graceful degradation would be: You > have 11 machines on a network together. 1 is a server, and 10 are clients. > All 10 clients hammer the server, and all 10 of them each get 10% of the > bandwidth that the server can sustain. This is the behavior of other network > switching topologies (in particular IB and FC) but it is not the behavior of > Ethernet. Because Ethernet is asynchronous, buffered, store and forward, > with flow control packets and collisions... Sure, the most intelligent > switches can eliminate collisions, but flow control is still necessary, > buffering is still necessary... You have network overhead, and congestion > leads to degradation of efficiency. Each of the 10 clients might be getting > 5% of the bandwidth, which is an ungraceful degradation. Ed: Can you define what you mean by "collision" in the context of an Ethernet switch where twisted pair wiring is being used? (i.e. any of the commonly used *BaseT wiring systems) The definition I use makes collisions impossible and therefore irrelevant to virtually any discussion of Ethernet taking place in this century. Thanks, Bill Bogstad ___ Discuss mailing list Discuss@blu.org http://lists.blu.org/mailman/listinfo/discuss
Re: [Discuss] Dev Ops - architecture (local not cloud)
> From: Kent Borg [mailto:kentb...@borg.org] > > Whenever IT gets beyond engineers managing their own machines, it tends > towards bad. Thankless job, that is not trivial, but usually tries to run on > lists > of inflexible policies and procedures. Only at companies with crap IT. (Which I admit is quite common out there.) Mordak the Preventer exists for a reason. I take exception. I refuse to work at such companies, unless it's my job to turn it around. ___ Discuss mailing list Discuss@blu.org http://lists.blu.org/mailman/listinfo/discuss
Re: [Discuss] Dev Ops - architecture (local not cloud)
On 12/13/2013 01:42 PM, Edward Ned Harvey (dcu) wrote: >Whenever the power blinks at my job my computer stays happy, because I >have a tiny UPS that can ride out short outages. But the rest of the services >on our network seem to take the better part of an hour to all come back. Sounds like a symptom of bad IT. No contradiction from me on that. Whenever IT gets beyond engineers managing their own machines, it tends towards bad. Thankless job, that is not trivial, but usually tries to run on lists of inflexible policies and procedures. Something else I long ago observed: Because ethernet degrades gracefully it always operates degraded. Ethernet does NOT degrade gracefully. A graceful degradation would be: You have 11 machines on a network together. 1 is a server, and 10 are clients. All 10 clients hammer the server, and all 10 of them each get 10% of the bandwidth that the server can sustain. That would be "resists ambush robustly". I mean screwed up configurations with broadcast packets going where they shouldn't and collisions happening when they shouldn't and no one notices unless it gets significantly bad, so it is always a little bad. -kb ___ Discuss mailing list Discuss@blu.org http://lists.blu.org/mailman/listinfo/discuss
Re: [Discuss] Dev Ops - architecture (local not cloud)
On 12/13/2013 01:33 PM, Richard Pieri wrote: What do you do for backups and long-term archives? How do you ensure that, for example, every user leaves their workstation turned on 24/7? You have a point there, but this is different from what offers higher performance. Personally, at work, I hibernate my Linux development machine when I leave for the night. There are NFS mounts for home directories that I think are on Netapp machines. I use a local disk for much better performance. I have a cron job that ping-pong backs up parts of my tree to usually non-mounted volumes, and this is also on software raid 1. If my physical PC takes a bad hit that takes out both disks or software goes crazy, I will lose things, but all my real work is all under source code control, on a Perforce server in I-know-not-what timezone. Because my machine is usually off, anacron is what fires off that backup when I do resume my machine. I don't know whether there is any anacron like way to backup a MS Windows PC when it is up. As for others at my work, it seems every piece of equipment is on 24x7, monitors included. Maybe you would have better on-compliance with no notes at all. Back to larger questions of how to best design a system for general purpose computing by mixed users...maybe Google Chrome makes more sense than I had been thinking. I have been reading that Microsoft is going through conniptions over Dell and HP looking like they are having a good time with new non-Microsoft machines. -kb, the Kent who will have to look into AFS again. ___ Discuss mailing list Discuss@blu.org http://lists.blu.org/mailman/listinfo/discuss
Re: [Discuss] Dev Ops - architecture (local not cloud)
> From: discuss-bounces+blu=nedharvey@blu.org [mailto:discuss- > bounces+blu=nedharvey@blu.org] On Behalf Of Kent Borg > > Sure, the > server will have somewhat faster parts, but it might also have more than one > user. And the network might have some congestion. Depends on the users. Suppose you have 24 engineers, who all perform maximum intensity simulations all the time. Then you're not going to gain anything by building a centralized server 24x as powerful as a single workstation. But suppose you have 24 engineers, who are sometimes drawing pictures, sometimes doing email, browsing the web, attending meetings... You build a centralized server in the closet 24x as powerful as a workstation... You provision 24 VM's that each have 12x the processing capacity of a single workstation. (You're obviously overprovisioning your hardware.) Now, whenever some individual tries to execute a big simulation run, they're able to go 12x faster than they would have been able to go on their individual workstation. And in the worst case, every individual hammers the system simultaneously, so the system load balances, and they all get the meager performance of an individual workstation. So in the worst case they're no worse off than they otherwise would have been, but in the best case, they're able to accomplish much more productivity, much faster. > Whenever the power blinks at my job my computer stays happy, because I > have a tiny UPS that can ride out short outages. But the rest of the services > on our network seem to take the better part of an hour to all come back. Sounds like a symptom of bad IT. > Something else I long ago observed: Because ethernet degrades gracefully it > always operates degraded. Ethernet does NOT degrade gracefully. A graceful degradation would be: You have 11 machines on a network together. 1 is a server, and 10 are clients. All 10 clients hammer the server, and all 10 of them each get 10% of the bandwidth that the server can sustain. This is the behavior of other network switching topologies (in particular IB and FC) but it is not the behavior of Ethernet. Because Ethernet is asynchronous, buffered, store and forward, with flow control packets and collisions... Sure, the most intelligent switches can eliminate collisions, but flow control is still necessary, buffering is still necessary... You have network overhead, and congestion leads to degradation of efficiency. Each of the 10 clients might be getting 5% of the bandwidth, which is an ungraceful degradation. ___ Discuss mailing list Discuss@blu.org http://lists.blu.org/mailman/listinfo/discuss
Re: [Discuss] Dev Ops - architecture (local not cloud)
Kent Borg wrote: Maybe because I was young and impressionable during the early personal computer era, it seems better to me to give users their own hardware rather than servers...unless there is a real economy of scale that kicks in for the server. Now that the PC era is coming to a close, this might What do you do for backups and long-term archives? How do you ensure that, for example, every user leaves their workstation turned on 24/7? Answer: you can't. This is an actual issue that I have with a group of users. Regardless of what they're told, regardless of labels attached to their workstations, they shut them off when they leave their offices. These workstations are never backed up because they can't be backed up. We could implement local backups for these users however, beyond a certain point this becomes more costly than a centralized system. Users' AFS home directories? Backed up nightly, all automatic except for swapping in new tapes every month. It costs nothing to add more users and workstations to the system since we haven't reached absolute capacity. I generally prefer AFS to NFS. It has a much better security model. It uses a local cache for improved performance. It has a robust snapshot and backup mechanism built in. Pretty much anything bad one can say about NFS is addressed in AFS. -- Rich P. ___ Discuss mailing list Discuss@blu.org http://lists.blu.org/mailman/listinfo/discuss
Re: [Discuss] Dev Ops - architecture (local not cloud)
> From: discuss-bounces+blu=nedharvey@blu.org [mailto:discuss- > bounces+blu=nedharvey@blu.org] On Behalf Of Derek Martin > > Now, these days, it's actually hard to buy a disk > that won't give you more than 8MB/s sustained transfer rate (which is > roughly what you could expect over 100Mb network). But Gigabit networks > are common now, and if your NFS server is built for it (i.e. it isn't just yet > another desktop with a single local disk) you should easily be able to far > exceed the performance of a workstation's cheap local disk. I've done a lot of benchmarking over the last decade, and I'll say this: All disks perform approx 1.0 Gbit/sec sustained transfer. This is true regardless of rpm's, regardless SAS or SATA, and even for SSD's. The highest performance enterprise disks sometimes do around 1.2, but even the cheapest commodity SATA 5400 RPM disks sustain 1.0. So even a single commodity disk can max out a 1 Gbit ethernet connection. I am perfectly aware that many SSD's advertise themselves as sustaining 500MB/sec (4 Gbit) but in practice, it's completely hogwash. So, given a single 1Gbit ethernet connection, you will NOT exceed the performance of a single local disk, regardless of how great the storage array is at the other end of the ethernet cable. However, there do exist much faster and more efficient buses out there - Fiber Channel, Infiniband, 10GigE, and SAS, which are able to carry several (or even several dozen) fully utilized disks worth of performance. So your storage network architecture definitely makes a big difference. As does your RAID topology, and your decision to use hard/soft raid, and everything else you can think of. ___ Discuss mailing list Discuss@blu.org http://lists.blu.org/mailman/listinfo/discuss
Re: [Discuss] Dev Ops - architecture (local not cloud)
On 12/13/2013 12:07 PM, Derek Martin wrote: if your NFS server is built for it (i.e. it isn't just yet another desktop with a single local disk) you should easily be able to far exceed the performance of a workstation's cheap local disk. Maybe because I was young and impressionable during the early personal computer era, it seems better to me to give users their own hardware rather than servers...unless there is a real economy of scale that kicks in for the server. Now that the PC era is coming to a close, this might change, but at the moment isn't the sweet spot for disk performance per dollar drawing on the same technology for servers as for individual computers? Sure, the server will have somewhat faster parts, but it might also have more than one user. And the network might have some congestion. Whenever the power blinks at my job my computer stays happy, because I have a tiny UPS that can ride out short outages. But the rest of the services on our network seem to take the better part of an hour to all come back. Because my local computer is local it can be simpler and more reliable. Something else I long ago observed: Because ethernet degrades gracefully it always operates degraded. -kb, the Kent who is skeptical that NFS is really the better way. ___ Discuss mailing list Discuss@blu.org http://lists.blu.org/mailman/listinfo/discuss
Re: [Discuss] Dev Ops - architecture (local not cloud)
I get 100MB/s sustained writes with desktop class disks and 80MB/s sustained writes with notebook disks (3Gb/s devices; I don't have any 6Gb/s yet). Even a relatively slow SATA 1.5Gb/s disk should get you at least 50MB/s throughput. GigE without jumbo frames caps at about 80MB/s sustained transfer. Jumbo frames offer up to a 50% throughput improvement for traffic like NFS so you're looking at up to 120MB/s sustained throughput. That will put you ahead of consumer grade local disk performance. You do need network infrastructure that supports GigE jumbo frames; most consumer grade equipment does not. -- Rich P. ___ Discuss mailing list Discuss@blu.org http://lists.blu.org/mailman/listinfo/discuss
Re: [Discuss] Dev Ops - architecture (local not cloud)
On Fri, Dec 06, 2013 at 11:16:32AM -0500, ma...@mohawksoft.com wrote: > NFS is not as fast as a local disk, but it should not be that slow. As JABR points out, that's really a misconception. It depends a great deal on all the hardware involved. Now, these days, it's actually hard to buy a disk that won't give you more than 8MB/s sustained transfer rate (which is roughly what you could expect over 100Mb network). But Gigabit networks are common now, and if your NFS server is built for it (i.e. it isn't just yet another desktop with a single local disk) you should easily be able to far exceed the performance of a workstation's cheap local disk. -- Derek D. Martinhttp://www.pizzashack.org/ GPG Key ID: 0xDFBEAD02 -=-=-=-=- This message is posted from an invalid address. Replying to it will result in undeliverable mail due to spam prevention. Sorry for the inconvenience. ___ Discuss mailing list Discuss@blu.org http://lists.blu.org/mailman/listinfo/discuss
Re: [Discuss] Dev Ops - architecture (local not cloud)
Daniel Feenberg wrote: How do you then share the disk among multiple machines? We went to 10GB ethernet so that multiple computers could access the same file system on a NAS box. With a Fibrechannel SAN, I couldn't figure out how to share the file system, except to have one of the SAN clients be an NFS server, which means we'd need 10GBE to get the good performance anyway. Was I wrong? Not wrong in your reasoning. Wrong, perhaps, in your conclusions. GigE is not gigabit throughput. It's 500Mbit throughput in each direction. You won't ever get performance near local disk over GigE without lots of very specific optimizations for a very limited set of I/O operations. If consolidation is a requirement and performance is a requirement then I'd take a serious look at hybrid 10GigE NICs that can do TCP/IP and FCoE, use fibre channel to access disk volumes, use a cluster-aware file system for shared volumes when possible, and figure out what to do about the rest of the nodes. Then use lots of fast spindles on the storage system. We also have a strong need for a very fast /tmp local to each machine. I put 2 new Samsung SSD drives in a RAID 0, but for long sequential data (our situation) the performance was similar to local 7,200 RPM drives. As a data point: I've repeatedly stated on this list that flash SSD sustained write performance is terrible. The only way to get reasonable flash write performance is to use LOTS of flash chips in large, distributed arrays. And by "lots" I mean many dozens, maybe hundreds of chips, not the 4 or 8 or 16 you find in consumer SSDs, with a price tag around 100 times higher than what you paid for those Samsung disks. -- Rich P. ___ Discuss mailing list Discuss@blu.org http://lists.blu.org/mailman/listinfo/discuss
Re: [Discuss] Dev Ops - architecture (local not cloud)
> From: Daniel Feenberg [mailto:feenb...@nber.org] > > On Sun, 8 Dec 2013, Edward Ned Harvey (blu) wrote: > > A single disk performance is about 1Gbit. So you need to make your > > storage network something much faster. The next logical step up would > > be 10Gb ether, but in terms of bang for buck, you get a LOT more if you > > go to Infiniband or Fibrechannel instead. > > How do you then share the disk among multiple machines? We went to 10GB > ethernet so that multiple computers could access the same file system on a > NAS box. With a Fibrechannel SAN, I couldn't figure out how to share the > file system, except to have one of the SAN clients be an NFS server, which > means we'd need 10GBE to get the good performance anyway. Was I wrong? It sounds like what you're looking for is a unified namespace to access a distributed filesystem amongst multiple machines. There are several ways to do that... The NFS solution is relatively easy to deploy and tune, but as described, has the disadvantage of performance overhead. You might be able to enhance that performance with 10G ether, in LACP bonding modes, or with multiple interfaces having different subnets, so that each client gets some dedicated bandwidth, etc. Another way would be as you described, fiber channel or infiniband, which present block devices to the client machines. If you do that, you'll need to use a clustering filesystem. (You cannot, for example, share an ext4 volume on a single block device amongst multiple simultaneous clients.) The precise selection of *which* clustering filesystem depends on the clients involved... Given that you're currently using NFS, I'm assuming the clients are linux, so I would guess that GFS would be your choice. In a clustered filesystem, you lose some performance due to synchronization overhead, but in comparison with ethernet & NFS, I think you'll find that it works out to your advantage on the whole. I am personally biased toward Infiniband over Fibre Channel, but I don't have any solid metrics to back that up. Most likely, considering the present state of the world with regards to multipath and SAS, there's probably some way to connect SAS directly, using SAS bus, which is probably considerably cheaper and faster than ether, probably comparable to FC and IB, and probably cheaper. I think it's worth exploring. You can actually do the same thing over ethernet, but you should expect ethernet will have substantially more overhead caused by the way they do switching, signaling, and DMA. So the ethernet performance would likely be several times lower. Another way would be to use ceph or gluster. I think, by default, these are meant to be distributed redundant filesystems, for high availability rather than high performance, so in the default configurations, I would expect performance to be worse than NFS. But BLU a few months ago had a talk given by ... I forget his name, a redhat engineer developer for gluster, who confirmed that you have the ability to tune in such a way that the unified namespace still exists across all the machines, even while you tune the filesystem to access local disk by default and without redundant copies, for maximum performance, maximum distribution. > We also have a strong need for a very fast /tmp local to each machine. I > put 2 new Samsung SSD drives in a RAID 0, but for long sequential data > (our situation) the performance was similar to local 7,200 RPM drives. > They were attached to the motherboard SATA ports - would a RAID > controller > make any difference? Would more drives make a difference? Would SAS > make a > difference? The NAS box is much faster, but I don't want to overload the > network with all the /tmp traffic.. Correct. For sustained throughput, SSD's are comparable to HDD's. Also, the number of RPM's makes little to no difference. You would think higher RPM's would mean higher throughput, because you get more surface to pass under the head per second, but in reality, the bandwidth is limited by the frequency response of the head. Which is around 1Gbit regardless of the rpm's. The RPM's help reduce the rotational latency, but the rotational latency is already down to approx 0.1ms, which is very small compared to the head seek. The SSD performance advantage comes by eliminating the need for head seek. In fact, if you have a very small data set, if you're able to "short stroke" HDD's, then you can get HDD performance very comparable to SSD's. Generally speaking, that's not very realistic. But in some cases, possible. If you want high performance sequential throughput, I would recommend raid-5 or similar. If you want high performance sequential throughput, and also high performance random IO, I would recommend raid-10 or similar. In your case with SSD's, it won't make much difference if you have a RAID card, or if you use soft raid in the OS. In fact, for sequential IO, it won't make much
Re: [Discuss] Dev Ops - architecture (local not cloud)
On Sun, Dec 08, 2013 at 07:28:52AM -0500, Daniel Feenberg wrote: > > We also have a strong need for a very fast /tmp local to each > machine. I put 2 new Samsung SSD drives in a RAID 0, but for long > sequential data (our situation) the performance was similar to local > 7,200 RPM drives. They were attached to the motherboard SATA ports - > would a RAID controller make any difference? Would more drives make > a difference? Would SAS make a difference? The NAS box is much > faster, but I don't want to overload the network with all the /tmp > traffic.. Can you afford to pay more for your /tmp? LSI and Fusion-IO, among others, offer PCIe-attached SSDs in capacities up to 2TB with claims of gigabyte-per-second transfer rates. They are, of course, expensive. -dsr- ___ Discuss mailing list Discuss@blu.org http://lists.blu.org/mailman/listinfo/discuss
Re: [Discuss] Dev Ops - architecture (local not cloud)
On Sun, 8 Dec 2013, Edward Ned Harvey (blu) wrote: A single disk performance is about 1Gbit. So you need to make your storage network something much faster. The next logical step up would be 10Gb ether, but in terms of bang for buck, you get a LOT more if you go to Infiniband or Fibrechannel instead. How do you then share the disk among multiple machines? We went to 10GB ethernet so that multiple computers could access the same file system on a NAS box. With a Fibrechannel SAN, I couldn't figure out how to share the file system, except to have one of the SAN clients be an NFS server, which means we'd need 10GBE to get the good performance anyway. Was I wrong? We also have a strong need for a very fast /tmp local to each machine. I put 2 new Samsung SSD drives in a RAID 0, but for long sequential data (our situation) the performance was similar to local 7,200 RPM drives. They were attached to the motherboard SATA ports - would a RAID controller make any difference? Would more drives make a difference? Would SAS make a difference? The NAS box is much faster, but I don't want to overload the network with all the /tmp traffic.. Daniel Feenberg NBER ___ Discuss mailing list Discuss@blu.org http://lists.blu.org/mailman/listinfo/discuss
Re: [Discuss] Dev Ops - architecture (local not cloud)
> From: discuss-bounces+blu=nedharvey@blu.org [mailto:discuss- > bounces+blu=nedharvey@blu.org] On Behalf Of Greg Rundlett > (freephile) > > I think it's pretty obvious why it's not performing: user home directories > (where developers compile) should not be NFS mounted. [1] The source > repositories themselves should also not be stored on a NAS. For high performance hybrid distributed/monolithic environments, I've at a few companies, used systems that were generally interchangeable clones of each other, but each one has a local /scratch (or /no-backup) directory, and each one can access the others via NFS automount /scratches/machine1/ (or /no-backup-automount/machine1/) If I had it do repeat now, I would look at ceph or gluster, to provide a unified namespace while leaving the underlying storage distributed. But it will take quite a bit of experimentation / configuration to get the desired performance characteristics. Autofs has the advantage of simplicity to configure. A problem I've seen IT folks (including myself, until I learned better) make over and over was: They use raid5 or raid6 or raid-DP, believing they get redundancy plus performance, but when you benchmark different configurations, you find, they only perform well for large sequential operations. They perform like a single disk (sometimes worse) when you have small random IO, which is unfortunately the norm. I highly, strongly recommend, building your storage out of something more similar to Raid-10. This performs much, much, much better for random IO, which is the typical case. Also: A single disk performance is about 1Gbit. So you need to make your storage network something much faster. The next logical step up would be 10Gb ether, but in terms of bang for buck, you get a LOT more if you go to Infiniband or Fibrechannel instead. ___ Discuss mailing list Discuss@blu.org http://lists.blu.org/mailman/listinfo/discuss
Re: [Discuss] Dev Ops - architecture (local not cloud)
On Fri, Dec 6, 2013 at 10:56 AM, Richard Pieri wrote: > Greg Rundlett (freephile) wrote: > >> I think it's pretty obvious why it's not performing: user home >> directories >> (where developers compile) should not be NFS mounted. [1] The source >> repositories themselves should also not be stored on a NAS. >> > > Neither of these statements are true. > > User home directories is one of the best things you can do with NFS. It's > what it was designed to do after all. > User home directories should be NFS/NAS, yes. But their checkout / build area need not be in Home filesystem. Even if it's symlinked or temporarily mounted there. Your performance problem is simple. Every Unix and Linux vendor in the > world these days defaults to setting NFS write caching off. This makes NFS > performance excruciatingly poor for lots of small writes, the kind of > behavior you see when someone compiles lots of little C files. Enable write > caching on the home directories and watch your performance improve > dramatically. > There are many other performance traps in NFS/NAS too. That is one. > Storing repositories on NAS has less to do with yes/no and more to do with > how the code servers -- the servers that users check out and in through -- > talk to NAS and what you've done (or not done) to optimize that performance > Having the repository on NAS can work, although i'd prefer it behind a real server than trust multiple writers to update it concurrently correctly. -- Bill @n1vux bill.n1...@gmail.com ___ Discuss mailing list Discuss@blu.org http://lists.blu.org/mailman/listinfo/discuss
Re: [Discuss] Dev Ops - architecture (local not cloud)
On Fri, Dec 6, 2013 at 12:41 PM, Patrick Flaherty wrote: > Most of our devs do their dev work on their desktops. When 1tb sata drives > are 600 bucks it made more sense to let devs have "perishable" work > environments. We're moving some of our compile/testing/deploy to jenkins, > but that's really just a pilot program. I'm playing with docker/vagrent as > well for testing environments. > I love Jenkins and am setting it up for a large role in the software engineering process, but individual devs still have to do compiles before checking in code - which even that could be handled by Jenkins. > > That being said, what are your mount options for home directories? Sounds > an awful lot like you aren't async. > home dirs in our environment are not on desktops, as the developers are (mostly) remote and the compile environment has to be controlled. Here are the mount options. storage-nfs-svr1:/home on /home type nfs (rw,hard,intr,addr=172.16.0.31) storage-nfs-svr1:/usrlocal on /usr/local type nfs (rw,hard,intr,addr=172.16.0.31) ___ Discuss mailing list Discuss@blu.org http://lists.blu.org/mailman/listinfo/discuss
Re: [Discuss] Dev Ops - architecture (local not cloud)
> On Fri, Dec 6, 2013 at 11:16 AM, wrote: > >> >> NFS is not as fast as a local disk, but it should not be that slow. >> > > I remember the first time I set up a NetApp fileserver,back in 1999. I > expected > that NFS would be slower than local disk, but I was hoping the performance > would still be acceptable. > > We had one of the heaviest users run his overnight jobs both on his local > workstation and on the NetApp NFS share to compare times, and we > discovered > that the NetApp's NFS share gave much *faster* throughput than his local > disks. > > His local desktop was a high-end Sun Ultrasparc workstation with the RAM > maxed out and with fast SAS disks, tuned for maximum performance, yet > over a 100Mb Ethernet, the NetApp outperformed his workstation's local > disks. That's impressive, especially over 100M ethernet. > > > -- > John Abreau / Executive Director, Boston Linux & Unix > Email: abre...@gmail.com / WWW http://www.abreau.net / PGP-Key-ID > 0x920063C6 > PGP-Key-Fingerprint A5AD 6BE1 FEFE 8E4F 5C23 C2D0 E885 E17C 9200 63C6 > ___ Discuss mailing list Discuss@blu.org http://lists.blu.org/mailman/listinfo/discuss
Re: [Discuss] Dev Ops - architecture (local not cloud)
On Fri, Dec 6, 2013 at 11:16 AM, wrote: > > NFS is not as fast as a local disk, but it should not be that slow. > I remember the first time I set up a NetApp fileserver,back in 1999. I expected that NFS would be slower than local disk, but I was hoping the performance would still be acceptable. We had one of the heaviest users run his overnight jobs both on his local workstation and on the NetApp NFS share to compare times, and we discovered that the NetApp's NFS share gave much *faster* throughput than his local disks. His local desktop was a high-end Sun Ultrasparc workstation with the RAM maxed out and with fast SAS disks, tuned for maximum performance, yet over a 100Mb Ethernet, the NetApp outperformed his workstation's local disks. -- John Abreau / Executive Director, Boston Linux & Unix Email: abre...@gmail.com / WWW http://www.abreau.net / PGP-Key-ID 0x920063C6 PGP-Key-Fingerprint A5AD 6BE1 FEFE 8E4F 5C23 C2D0 E885 E17C 9200 63C6 ___ Discuss mailing list Discuss@blu.org http://lists.blu.org/mailman/listinfo/discuss
Re: [Discuss] Dev Ops - architecture (local not cloud)
Greg Rundlett (freephile) wrote: I think it's pretty obvious why it's not performing: user home directories (where developers compile) should not be NFS mounted. [1] The source repositories themselves should also not be stored on a NAS. Neither of these statements are true. User home directories is one of the best things you can do with NFS. It's what it was designed to do after all. Your performance problem is simple. Every Unix and Linux vendor in the world these days defaults to setting NFS write caching off. This makes NFS performance excruciatingly poor for lots of small writes, the kind of behavior you see when someone compiles lots of little C files. Enable write caching on the home directories and watch your performance improve dramatically. Storing repositories on NAS has less to do with yes/no and more to do with how the code servers -- the servers that users check out and in through -- talk to NAS and what you've done (or not done) to optimize that performance. -- Rich P. ___ Discuss mailing list Discuss@blu.org http://lists.blu.org/mailman/listinfo/discuss
Re: [Discuss] Dev Ops - architecture (local not cloud)
Its hard to quantify what's going on here. Yes it is slow, and we can make guesses as to why, but without a whole system diagnostic it is hard to know. NFS: Network connectivity 100M, 1G, 10G? Sync? OS (Solaris, FreeBSD, [any bsd], Linux, etc.) File System NFS server daemon Describe the NFS server in detail, OS, NFS server, storage, etc. Client: Network connectivity 100M, 1G, 10G? Infrastructure: How many hops? Routers/firewall in between? NFS is not as fast as a local disk, but it should not be that slow. > Performance comparison: > svn checkout single repository on old infrastructure > real5m44.100s > user0m36.957s > sys 0m14.757s > > svn checkout single repository on new infrastructure, but only using NFS > for "read" (local working copy stored on local disk) > real3m15.057s > user1m18.195s > sys 0m53.796s > > svn checkout same repository on new infrastructure, with writes stored on > NFS volume > real28m53.220s > user1m45.713s > sys 3m26.948s > > > Greg Rundlett > > > On Fri, Dec 6, 2013 at 8:35 AM, Greg Rundlett (freephile) < > g...@freephile.com> wrote: > >> We are replacing a monolithic software development IT infrastructure >> where >> source code control, development and compiling all take place on a >> single >> machine with something more manageable, scalable, redundant etc. The >> goal >> is to provide more enterprise features like manageability, scalability >> with >> failover and disaster recovery. >> >> Let's call these architectures System A and System B. System A is >> "monolithic" because everything is literally housed and managed on a >> single >> hardware platform. System B is modular and virtualized, but still >> running >> in a traditional IT environment (aka not in the cloud). The problem is >> that the new system does not come close to the old system in >> performance. >> I think it's pretty obvious why it's not performing: user home >> directories >> (where developers compile) should not be NFS mounted. [1] The source >> repositories themselves should also not be stored on a NAS. >> >> What does your (software development) IT infrastructure look like? >> >> One of the specific problems that prompted this re-architecture was disk >> space. Not the repository per se, but with 100+ developers each having >> one >> or more checkouts of the repos (home directories), we have maxed out a >> 4.5TB volume. >> >> More specifically, here is what we have: >> system A (old system) >> single host >> standard Unix user accounts >> svn server using file:/// RA protocol >> 4.5TB local disk storage (maxed out) >> NFS mounted NAS for "tools" - e.g. Windriver Linux for compiling our OS >> >> system B (new system) >> series of hosts managed by VMWare ESX 5.1 (version control host + build >> servers connected via 10GB link to EMC VNXe NAS for home directories and >> tools and source repos >> standard Unix user accounts controlled by NIS server (adds manageability >> across domain) >> svn server using http:/// RA protocol (adds repository access control >> and >> management) >> NFS mounted NAS for "tools", the repositories, the home directories >> >> Notes: >> The repos we're dealing with are multiple "large" repositories eg. 2GB >> 43,203 files, 2,066 directories. >> We're dealing with 100+ users >> >> >> >> [1] >> http://publib.boulder.ibm.com/infocenter/pseries/v5r3/index.jsp?topic=/com.ibm.aix.prftungd/doc/prftungd/misuses_nfs_perf.htm >> >> Greg Rundlett >> > ___ > Discuss mailing list > Discuss@blu.org > http://lists.blu.org/mailman/listinfo/discuss > ___ Discuss mailing list Discuss@blu.org http://lists.blu.org/mailman/listinfo/discuss
Re: [Discuss] Dev Ops - architecture (local not cloud)
Performance comparison: svn checkout single repository on old infrastructure real5m44.100s user0m36.957s sys 0m14.757s svn checkout single repository on new infrastructure, but only using NFS for "read" (local working copy stored on local disk) real3m15.057s user1m18.195s sys 0m53.796s svn checkout same repository on new infrastructure, with writes stored on NFS volume real28m53.220s user1m45.713s sys 3m26.948s Greg Rundlett On Fri, Dec 6, 2013 at 8:35 AM, Greg Rundlett (freephile) < g...@freephile.com> wrote: > We are replacing a monolithic software development IT infrastructure where > source code control, development and compiling all take place on a single > machine with something more manageable, scalable, redundant etc. The goal > is to provide more enterprise features like manageability, scalability with > failover and disaster recovery. > > Let's call these architectures System A and System B. System A is > "monolithic" because everything is literally housed and managed on a single > hardware platform. System B is modular and virtualized, but still running > in a traditional IT environment (aka not in the cloud). The problem is > that the new system does not come close to the old system in performance. > I think it's pretty obvious why it's not performing: user home directories > (where developers compile) should not be NFS mounted. [1] The source > repositories themselves should also not be stored on a NAS. > > What does your (software development) IT infrastructure look like? > > One of the specific problems that prompted this re-architecture was disk > space. Not the repository per se, but with 100+ developers each having one > or more checkouts of the repos (home directories), we have maxed out a > 4.5TB volume. > > More specifically, here is what we have: > system A (old system) > single host > standard Unix user accounts > svn server using file:/// RA protocol > 4.5TB local disk storage (maxed out) > NFS mounted NAS for "tools" - e.g. Windriver Linux for compiling our OS > > system B (new system) > series of hosts managed by VMWare ESX 5.1 (version control host + build > servers connected via 10GB link to EMC VNXe NAS for home directories and > tools and source repos > standard Unix user accounts controlled by NIS server (adds manageability > across domain) > svn server using http:/// RA protocol (adds repository access control and > management) > NFS mounted NAS for "tools", the repositories, the home directories > > Notes: > The repos we're dealing with are multiple "large" repositories eg. 2GB > 43,203 files, 2,066 directories. > We're dealing with 100+ users > > > > [1] > http://publib.boulder.ibm.com/infocenter/pseries/v5r3/index.jsp?topic=/com.ibm.aix.prftungd/doc/prftungd/misuses_nfs_perf.htm > > Greg Rundlett > ___ Discuss mailing list Discuss@blu.org http://lists.blu.org/mailman/listinfo/discuss
[Discuss] Dev Ops - architecture (local not cloud)
We are replacing a monolithic software development IT infrastructure where source code control, development and compiling all take place on a single machine with something more manageable, scalable, redundant etc. The goal is to provide more enterprise features like manageability, scalability with failover and disaster recovery. Let's call these architectures System A and System B. System A is "monolithic" because everything is literally housed and managed on a single hardware platform. System B is modular and virtualized, but still running in a traditional IT environment (aka not in the cloud). The problem is that the new system does not come close to the old system in performance. I think it's pretty obvious why it's not performing: user home directories (where developers compile) should not be NFS mounted. [1] The source repositories themselves should also not be stored on a NAS. What does your (software development) IT infrastructure look like? One of the specific problems that prompted this re-architecture was disk space. Not the repository per se, but with 100+ developers each having one or more checkouts of the repos (home directories), we have maxed out a 4.5TB volume. More specifically, here is what we have: system A (old system) single host standard Unix user accounts svn server using file:/// RA protocol 4.5TB local disk storage (maxed out) NFS mounted NAS for "tools" - e.g. Windriver Linux for compiling our OS system B (new system) series of hosts managed by VMWare ESX 5.1 (version control host + build servers connected via 10GB link to EMC VNXe NAS for home directories and tools and source repos standard Unix user accounts controlled by NIS server (adds manageability across domain) svn server using http:/// RA protocol (adds repository access control and management) NFS mounted NAS for "tools", the repositories, the home directories Notes: The repos we're dealing with are multiple "large" repositories eg. 2GB 43,203 files, 2,066 directories. We're dealing with 100+ users [1] http://publib.boulder.ibm.com/infocenter/pseries/v5r3/index.jsp?topic=/com.ibm.aix.prftungd/doc/prftungd/misuses_nfs_perf.htm Greg Rundlett ___ Discuss mailing list Discuss@blu.org http://lists.blu.org/mailman/listinfo/discuss