Re: [lopsa-tech] shared network disks - vs gfs - vs distributed filesystem - vs ...
> I'll say this once more, in case it got lost in the volume before. You > should definitely have a look at GlusterFS. It's free with commercial > support available (they even have try-and-buy 30 day support packages). In fact, I have a punch list of possibilities to follow-up on, which are: (promising) . Lustre . Pvfswww.pvfs.org . Glusterfs www.gluster.org with local affinity turned on . NFS share with unionfs (not so promising) . afs . using drbd to duplicate a block device then running ocfs2 as the filesystem Thanks for your feedback. I'll surely post here again when I know more. ___ Tech mailing list Tech@lopsa.org http://lopsa.org/cgi-bin/mailman/listinfo/tech This list provided by the League of Professional System Administrators http://lopsa.org/
Re: [lopsa-tech] shared network disks - vs gfs - vs distributed filesystem - vs ...
Edward Ned Harvey wrote: > I have a bunch of compute servers. They all have local disks mounted as > /scratch to use for computation scratch space. This ensures maximum > performance on all systems, and no competition for a shared resource > during crunch time. At present, all of their /scratch directories are > local, separate and distinct. I think it would be awesome if /scratch > looked the same on all systems. Does anyone know of a way to “unify” > this storage, without compromising performance? Of course, if some > files reside on server A, and they are requested from server B, then the > files must go across the network, but I don’t want the files to go > across the network unless they are requested. And yet, if you do > something like “ls /scratch” you would ideally get the same results > regardless of which machine you’re on. > > > > Due to the nature of heavy runtime IO (read, seek, write, repeat…) it’s > not well suited to NFS or any network filesystem… Due to the nature of > many systems all doing the same thing at the same time, it’s not well > suited to a SAN using shared disks… > > > > I looked at gfs (the cluster filesystem) – but – it seems gfs assumes a > shared disk (like a san) in which case there is competition for a shared > resource. > > > > I looked at gfs (the google filesystem) – but – it seems they constantly > push all the data across the network, which is good for redundancy and > mostly-just-read operations, and not good for heavy computation IO. > > > > Not sure what else I should look at. Any ideas? I'll say this once more, in case it got lost in the volume before. You should definitely have a look at GlusterFS. It's free with commercial support available (they even have try-and-buy 30 day support packages). It's *really* fast given our 1GE testing. It's a little rough around some edges, but it's bones are very good. They're actively working on getting a feature-replete software suite. It does pretty much what your requirements say, plus a little bit more. With local affinity turned on, if you write data on machine X to the global FS, it will try to store it on the local disk so your read of the same data will be serviced locally. It also can do block or file level mirroring to other hosts for redundancy. It's definitely worth taking a look for scratch space. We're about to deploy it on some of our smaller clusters after fairly successful sandbox testing. ___ Tech mailing list Tech@lopsa.org http://lopsa.org/cgi-bin/mailman/listinfo/tech This list provided by the League of Professional System Administrators http://lopsa.org/
Re: [lopsa-tech] shared network disks - vs gfs - vs distributed filesystem - vs ...
On Jul 2, 2009, at 11:53 , Atom Powers wrote: I didn't see AFS mentioned yet. My, admittedly incomplete, understanding of afs is that it provides a single namespace (directory tree) to all clients but the files themselves may be stored on a local or remote server; a bit like Microsoft DFS. Standard AFS doesn't quite work like that; even if you're running on the fileserver (a bad idea) you're still going through the protocol (AFS filesystems are not directly accessible via *nix APIs) and the local disk cache, and you want anything like local speeds you need a cache large enough to hold the entire working set. There *is* an experimental hostafs that enables individual workstations to share out their normal disk space in the way you're talking about, but I gather the code doesn't work with current versions of AFS. You could post a query to openafs-i...@openafs.org if you're desperate. -- brandon s. allbery [solaris,freebsd,perl,pugs,haskell] allb...@kf8nh.com system administrator [openafs,heimdal,too many hats] allb...@ece.cmu.edu electrical and computer engineering, carnegie mellon universityKF8NH PGP.sig Description: This is a digitally signed message part ___ Tech mailing list Tech@lopsa.org http://lopsa.org/cgi-bin/mailman/listinfo/tech This list provided by the League of Professional System Administrators http://lopsa.org/
Re: [lopsa-tech] shared network disks - vs gfs - vs distributed filesystem - vs ...
Eric Sorenson wrote: > On Jul 2, 2009, at 8:53 AM, Atom Powers wrote: > >> I didn't see AFS mentioned yet. My, admittedly incomplete, >> understanding of afs is that it provides a single namespace (directory >> tree) to all clients but the files themselves may be stored on a local >> or remote server; a bit like Microsoft DFS. >> >> Would this not unify the storage while maintaining local access speeds >> for files created locally, and still allow any host/client the ability >> to access any file on any server? > > > You really don't want to make every server a fileserver in AFS though, > the way the OP requested. File servers are special in AFS in that you > have to carefully (and manually) manage backup copies of volumes so > there's always n+1; if a volume with no backup copies goes away the > whole cell comes to a screeching halt. You don't have to. The advantage of AFS is that is uses a local disk as cache. And it can work in offline mode. Though, I don't know it's limits in doing that. It might be RO when offline. Moose! care to comment? :) -- END OF LINE --MCP ___ Tech mailing list Tech@lopsa.org http://lopsa.org/cgi-bin/mailman/listinfo/tech This list provided by the League of Professional System Administrators http://lopsa.org/
Re: [lopsa-tech] shared network disks - vs gfs - vs distributed filesystem - vs ...
On Jul 2, 2009, at 8:53 AM, Atom Powers wrote: > I didn't see AFS mentioned yet. My, admittedly incomplete, > understanding of afs is that it provides a single namespace (directory > tree) to all clients but the files themselves may be stored on a local > or remote server; a bit like Microsoft DFS. > > Would this not unify the storage while maintaining local access speeds > for files created locally, and still allow any host/client the ability > to access any file on any server? You really don't want to make every server a fileserver in AFS though, the way the OP requested. File servers are special in AFS in that you have to carefully (and manually) manage backup copies of volumes so there's always n+1; if a volume with no backup copies goes away the whole cell comes to a screeching halt. - Eric Sorenson - N37 17.255 W121 55.738 - http://twitter.com/ ahpook - ___ Tech mailing list Tech@lopsa.org http://lopsa.org/cgi-bin/mailman/listinfo/tech This list provided by the League of Professional System Administrators http://lopsa.org/
Re: [lopsa-tech] shared network disks - vs gfs - vs distributed filesystem - vs ...
On 2009-07-02 at 11:31 -0400, Edward Ned Harvey wrote: > Until today, I only had one idea which came close - google gfs (not > global gfs) does exactly what I want except that it always writes to 3 > or more peers. If google gfs is available for use (can install and be > used on linux) and if it's configurable to take that number down to 1, > then it might do what I want. I've double-checked the published paper on this, to be sure I'm not revealing anything not already published, but am just drawing your attention to things you've perhaps overlooked. GFS almost certainly doesn't do what you want. It gets some of its advantages by not being a regular file-system, you can't mount it and if you try to hack in support via FUSE then you can find some normal POSIX ops being rather unhealthy for the GFS cell. You'd need all of your applications to link against GFS client libraries. > Clarification - Suppose I have local raid 0+1, and I do random > read/write to those disks. Google GFS is not what you want. Files are append-only. You can hack around that, but it's probably more development work than you want to spend on it. -Phil, who has run GFS cells for a living ___ Tech mailing list Tech@lopsa.org http://lopsa.org/cgi-bin/mailman/listinfo/tech This list provided by the League of Professional System Administrators http://lopsa.org/
Re: [lopsa-tech] shared network disks - vs gfs - vs distributed filesystem - vs ...
On Wed, Jul 1, 2009 at 11:16 AM, Edward Ned Harvey wrote: > I have a bunch of compute servers. They all have local disks mounted as > /scratch to use for computation scratch space. This ensures maximum > performance on all systems, and no competition for a shared resource during > crunch time. At present, all of their /scratch directories are local, > separate and distinct. I think it would be awesome if /scratch looked the > same on all systems. Does anyone know of a way to “unify” this storage, > without compromising performance? Of course, if some files reside on server > A, and they are requested from server B, then the files must go across the > network, but I don’t want the files to go across the network unless they are > requested. And yet, if you do something like “ls /scratch” you would > ideally get the same results regardless of which machine you’re on. > ... > > Not sure what else I should look at. Any ideas? > I didn't see AFS mentioned yet. My, admittedly incomplete, understanding of afs is that it provides a single namespace (directory tree) to all clients but the files themselves may be stored on a local or remote server; a bit like Microsoft DFS. Would this not unify the storage while maintaining local access speeds for files created locally, and still allow any host/client the ability to access any file on any server? -- Perfection is just a word I use occasionally with mustard. --Atom Powers-- ___ Tech mailing list Tech@lopsa.org http://lopsa.org/cgi-bin/mailman/listinfo/tech This list provided by the League of Professional System Administrators http://lopsa.org/
Re: [lopsa-tech] shared network disks - vs gfs - vs distributed filesystem - vs ...
> I'm not sure that what he wants does exist (and neither is he, that's > why he's asking) Precisely - although there are some promising leads to check out. Lustre in general seems like it would be good to know, even if it doesn't solve this goal. Gluster with local affinity turned on, sounds like a real likely candidate. Local ext3, plus NFS unionfs is another possibility. > what he's wanting is a cluster filesystem where any system can read > data that lives on any drive, but when he creates a file it should be > created on the local drive. > > the idea is to give local-drive performance for anything created by > this box, while falling back to NFS-like performance if you access > files created on a different box. Precisely. > he's not planning to have a high-bandwidth network in place to share > this data. Well, 1Gb Ethernet. I'd call that medium. It's not a fibrechannel san, but it's also not a 1.5Mb wan. ___ Tech mailing list Tech@lopsa.org http://lopsa.org/cgi-bin/mailman/listinfo/tech This list provided by the League of Professional System Administrators http://lopsa.org/
Re: [lopsa-tech] shared network disks - vs gfs - vs distributed filesystem - vs ...
> On Wed, Jul 1, 2009 at 6:53 PM, Edward Ned Harvey > wrote: > The goal I'm trying to accomplish - it's expected that some amount of > network traffic is required, but it should be minimal. If a 1G file is > created on some machine, then a few packets should fly across the > network, just to say the file exists. But 1G should not go anywhere. > > I'm not sure I understand your goal then. There's no FS I know of that > will do what you're asking. Until today, I only had one idea which came close - google gfs (not global gfs) does exactly what I want except that it always writes to 3 or more peers. If google gfs is available for use (can install and be used on linux) and if it's configurable to take that number down to 1, then it might do what I want. Another person gave me a promising suggestion - Export /scratch from each machine, and then use unionfs to mount all the various machines' scratches onto a single scratch. Haven't tried this yet - it may still go horribly wrong. > Your options are: local disk (ext3, xfs), shared disk (iSCSI, fiber > channel) running GFS (global FS, not google FS), network file system > (NFS) or distributed file system (lustre, GPFS, AFS). For this purpose, I'm interested in a distributed filesystem, which allows writes to be performed on a single host, to allow any individual host to work on new files at local disk speeds. Thanks to this thread, I have the names of some distributed filesystems (as you mentioned, lustre etc) ... but I haven't had the chance to research any of them more thoroughly yet. > Anything beyond local disk will require communication over the > network. With GFS, you'll mostly be speaking with the lock manager > over the network. So that would accomplish your goal of "write locally > and only send limited amount of data over the network". However, GFS > isn't one of the shining stars in distributed/parallel processing or > HPC (high performance computing). Plus gfs (global gfs, not google) requires iscsi / san / shared disk. > I think you're saying - it writes across more than one machine, which > would slow down the write operation > > Actually, writing across multiple hosts speeds it up. Much along the > lines of a RAID 0 striping pattern, the data is spread across multiple > destinations. Clarification - Suppose I have local raid 0+1, and I do random read/write to those disks. It's going very fast, perhaps 3Gb sata bus speed. Now I write to the fastest NFS server in the world. I am limited to 1Gb Ethernet which adds overhead. Not nearly as fast. I understand, in google gfs for example, if most of your operations are read, and most of your operations come across the network, then you're able to gain performance by distributing writes across multiple hosts, so later you have multiple hosts available to handle the read requests. But if your usage is (as is the case for me) mostly random read/write on a single host, with occasional reads from some other host, then positively the best performance will come from local disk. ___ Tech mailing list Tech@lopsa.org http://lopsa.org/cgi-bin/mailman/listinfo/tech This list provided by the League of Professional System Administrators http://lopsa.org/
Re: [lopsa-tech] shared network disks - vs gfs - vs distributed filesystem - vs ...
On Wed, 1 Jul 2009 19:29:43 -0400 Edward Ned Harvey wrote: Ned> > You seem to have conflicting wants/needs. First you said this: Ned> > --- On Wed, 7/1/09, Edward Ned Harvey wrote: Ned> > > I have a bunch of compute servers. They all have local disks Ned> > > mounted as /scratch to use for computation scratch space. This Ned> > ensures Ned> > > maximum performance on all systems, and no competition for a shared Ned> > > resource during crunch time. At present, all of their /scratch Ned> > > directories are local, separate and distinct. Ned> > Ned> > Then this: Ned> > > I think it would be awesome Ned> > > if /scratch looked the same on all systems. Ned> > Ned> > Does "look the same" mean configured the same? You didn't really expand Ned> > on this statement and clarify the goal, which I'm not sure is Ned> > uniformity, accessibility, or a combo of both. Ned> You're right - although it was clear in my mind, I see how that was Ned> confusing. Let me try again: Ned> If you go into some directory and do "ls" (or whatever), then the results Ned> should be the same regardless of which machine you're on. I do not want a Ned> centralized network file server, because of the bandwidth and diskspace Ned> bottleneck. I want a distributed filesystem, which would provide the Ned> aforementioned ubiquitousness of namespace, but also allows you to do heavy Ned> IO on some machine without necessitating heavy network traffic. A minimal Ned> amount of traffic is probably required, just so the other machines all have Ned> awareness of the existence of some file, but the file contents themselves Ned> are not needed to traverse the network until some other machine requests the Ned> contents of the file. Well, there are two ways to go about it. Either you have a single namespace backed by a single canonical version (which can be done with something like cachefs to help improve performance via local disk caching), or you have a single namespace backed by a distributed canonical version. There are a pile of filesystems that sort of work this way, with a variety of tradeoffs. PVFS and Lustre do this, as does GPFS (which costs $$ but works well in our experience). Distributed filesystems get tricky due to both metadata issues (for example, walking the directory hierarchy can involve talking to multiple servers, depending on how metadata is distributed) and locking, where you need to start worrying about things like consensus protocols. PVFS doesn't do locking, so this latter issue isn't a problem for it; it gets improved performance and robustness in exchange. Any of these will usually stripe files across multiple servers, to improve performance. A big issue that you would run into on a filesystem that works as you described above is that the failure of any machine participating would potentially take a chunk out of the filesystem. This could include either partial or complete files, or worse, directories out of the directory hierarchy. So once you start ramping up the client count, you want to start looking at replication management. So, you could also look at something like Hadoop's hdfs. I don't know a ton about it, but it works hard to be robust (like ensuring multiple copies of files exist across your network) so I would assume (tho I don't know) that it ends up being slower than the options above. There are probably a variety of solution that would give you 70% of what you want, but I don't know of any filesystem that would give you everything. This is a surprisingly hard problem to solve well. Tradeoffs abound. -nld ___ Tech mailing list Tech@lopsa.org http://lopsa.org/cgi-bin/mailman/listinfo/tech This list provided by the League of Professional System Administrators http://lopsa.org/
Re: [lopsa-tech] shared network disks - vs gfs - vs distributed filesystem - vs ...
da...@lang.hm wrote: > On Wed, 1 Jul 2009, Edward Ned Harvey wrote: > >>> It sounds like you're looking at a clustered file system. Where on the >>> Fast/Cheap/Reliable triangle do you want to land? >> Well, of course, all three. ;-) >> Fast and cheap are requirements. For reliability, it is acceptable to >> use simple disk mirroring on a single host. It needs to be protected >> against a single disk failure, but does not need to be protected against >> a machine failure or scsi/sata bus failure. >> >> At present, each machine's /scratch disk is either a mirror or a raid5, >> depending on which machine in question. > > actually, what you want is a bit more nuanced than that. > > you want new files that are created to be created on the local disk so > that they have the same performance as they have today. > > but you are willing to loose substantial performance in accessing a file > that was created by another machine that lives on it's local disk. > > you just want the different filesystems to be transparently glued > togeather.\ > Sounds like GlusterFS with local affinity turned on. (apologies for not trimming context more) ___ Tech mailing list Tech@lopsa.org http://lopsa.org/cgi-bin/mailman/listinfo/tech This list provided by the League of Professional System Administrators http://lopsa.org/
Re: [lopsa-tech] shared network disks - vs gfs - vs distributed filesystem - vs ...
On Wed, 1 Jul 2009, John Reddy wrote: On Wed, Jul 1, 2009 at 6:53 PM, Edward Ned Harvey wrote: The goal I'm trying to accomplish - it's expected that some amount of network traffic is required, but it should be minimal. If a 1G file is created on some machine, then a few packets should fly across the network, just to say the file exists. But 1G should not go anywhere. I'm not sure I understand your goal then. There's no FS I know of that will do what you're asking. Your options are: local disk (ext3, xfs), shared disk (iSCSI, fiber channel) running GFS (global FS, not google FS), network file system (NFS) or distributed file system (lustre, GPFS, AFS). Anything beyond local disk will require communication over the network. With GFS, you'll mostly be speaking with the lock manager over the network. So that would accomplish your goal of "write locally and only send limited amount of data over the network". However, GFS isn't one of the shining stars in distributed/parallel processing or HPC (high performance computing). I'm not sure that what he wants does exist (and neither is he, that's why he's asking) what he's wanting is a cluster filesystem where any system can read data that lives on any drive, but when he creates a file it should be created on the local drive. the idea is to give local-drive performance for anything created by this box, while falling back to NFS-like performance if you access files created on a different box. he's not planning to have a high-bandwidth network in place to share this data. I think you're saying - it writes across more than one machine, which would slow down the write operation Actually, writing across multiple hosts speeds it up. Much along the lines of a RAID 0 striping pattern, the data is spread across multiple destinations. that assumes that your network is faster than your local disks David Lang___ Tech mailing list Tech@lopsa.org http://lopsa.org/cgi-bin/mailman/listinfo/tech This list provided by the League of Professional System Administrators http://lopsa.org/ ___ Tech mailing list Tech@lopsa.org http://lopsa.org/cgi-bin/mailman/listinfo/tech This list provided by the League of Professional System Administrators http://lopsa.org/
Re: [lopsa-tech] shared network disks - vs gfs - vs distributed filesystem - vs ...
> You seem to have conflicting wants/needs. First you said this: > --- On Wed, 7/1/09, Edward Ned Harvey wrote: > > I have a bunch of compute servers. They all have local disks > > mounted as /scratch to use for computation scratch space. This > ensures > > maximum performance on all systems, and no competition for a shared > > resource during crunch time. At present, all of their /scratch > > directories are local, separate and distinct. > > Then this: > > I think it would be awesome > > if /scratch looked the same on all systems. > > Does "look the same" mean configured the same? You didn't really expand > on this statement and clarify the goal, which I'm not sure is > uniformity, accessibility, or a combo of both. You're right - although it was clear in my mind, I see how that was confusing. Let me try again: If you go into some directory and do "ls" (or whatever), then the results should be the same regardless of which machine you're on. I do not want a centralized network file server, because of the bandwidth and diskspace bottleneck. I want a distributed filesystem, which would provide the aforementioned ubiquitousness of namespace, but also allows you to do heavy IO on some machine without necessitating heavy network traffic. A minimal amount of traffic is probably required, just so the other machines all have awareness of the existence of some file, but the file contents themselves are not needed to traverse the network until some other machine requests the contents of the file. > You named the storage "/scratch", implying it is just a temporary usage > space. Are you possibly adding requirements here that are unnecessary? I did not mean to imply scratch is temporary - You see, we already have a NFS server, which is backed up, so I named the local directory "scratch" so users know it's not backed up. > We have similar HPC systems that write results to local disk space. > When the computation is completely done, the results are rsynced to > separate network accessible storage space; the local space is then > reclaimed for the next job. The rsync is controlled by LSF scripts, but > any job management system will have similar capabilities. The network > available results can then be perused by engineers. If they want to > keep the results around permanently, they move the results at their > discretion to longer term storage. Anything that isn't moved by the > engineers after 7 days is considered unimportant, and deleted after 7 > days. > > Would that paradigm work for you? I have done exactly the same in the past - er - I should say the users have done the same. It's acceptable. In fact, what we have now is also acceptable. I'm just trying to make it better. (and learn more). The two downsides of the above are limitation of disk space on the nfs server, and actually since a bunch of machines are all pushing their results up to the nfs server, performance does become an issue. What we have now, is as follows: Each machine has a local disk mounted as /scratch. Each machine exports it. Each machine has an automount directory, /scratches So you can access any scratch directory from any machine. For example, /scratches/machinename is the nfs mountpoint for machinename:/scratch This is almost ideal - except - in order to access the data that was generated last night, the user must know on which machine that data was created. Acceptable, but still could be cooler. :-) ___ Tech mailing list Tech@lopsa.org http://lopsa.org/cgi-bin/mailman/listinfo/tech This list provided by the League of Professional System Administrators http://lopsa.org/
Re: [lopsa-tech] shared network disks - vs gfs - vs distributed filesystem - vs ...
On Wed, Jul 1, 2009 at 6:53 PM, Edward Ned Harvey wrote: > The goal I'm trying to accomplish - it's expected that some amount of > network traffic is required, but it should be minimal. If a 1G file is > created on some machine, then a few packets should fly across the network, > just to say the file exists. But 1G should not go anywhere. I'm not sure I understand your goal then. There's no FS I know of that will do what you're asking. Your options are: local disk (ext3, xfs), shared disk (iSCSI, fiber channel) running GFS (global FS, not google FS), network file system (NFS) or distributed file system (lustre, GPFS, AFS). Anything beyond local disk will require communication over the network. With GFS, you'll mostly be speaking with the lock manager over the network. So that would accomplish your goal of "write locally and only send limited amount of data over the network". However, GFS isn't one of the shining stars in distributed/parallel processing or HPC (high performance computing). I think you're saying - it writes across more than one machine, which would > slow down the write operation Actually, writing across multiple hosts speeds it up. Much along the lines of a RAID 0 striping pattern, the data is spread across multiple destinations. > I've heard of lustre, but never seen it or read anything about it before. > The Supercomputing Top 500 was recently released. Lustre is in use on two thirds of the top 500, and 7 out of the top 10 supercomputers. It's designed for massively parallel access at incredibly high speeds. It's certainly not a swiss army knife, but your requirements of "fast" and "visible to all nodes" would suggest that it might do what you want. How many client computers are you talking about, though? -John ___ Tech mailing list Tech@lopsa.org http://lopsa.org/cgi-bin/mailman/listinfo/tech This list provided by the League of Professional System Administrators http://lopsa.org/
Re: [lopsa-tech] shared network disks - vs gfs - vs distributed filesystem - vs ...
On Wed, 1 Jul 2009, Edward Ned Harvey wrote: >> It sounds like you're looking at a clustered file system. Where on the >> Fast/Cheap/Reliable triangle do you want to land? > > Well, of course, all three. ;-) > Fast and cheap are requirements. For reliability, it is acceptable to > use simple disk mirroring on a single host. It needs to be protected > against a single disk failure, but does not need to be protected against > a machine failure or scsi/sata bus failure. > > At present, each machine's /scratch disk is either a mirror or a raid5, > depending on which machine in question. actually, what you want is a bit more nuanced than that. you want new files that are created to be created on the local disk so that they have the same performance as they have today. but you are willing to loose substantial performance in accessing a file that was created by another machine that lives on it's local disk. you just want the different filesystems to be transparently glued togeather. >> Keep in mind: you >> should accept that if you want to have all systems see the same unified >> file system and not have a shared storage media (fibre, iSCSI, etc), >> then you will have your reads and writes go across the network. > > The goal I'm trying to accomplish - it's expected that some amount of > network traffic is required, but it should be minimal. If a 1G file is > created on some machine, then a few packets should fly across the > network, just to say the file exists. But 1G should not go anywhere. right. David Lang ___ Tech mailing list Tech@lopsa.org http://lopsa.org/cgi-bin/mailman/listinfo/tech This list provided by the League of Professional System Administrators http://lopsa.org/
Re: [lopsa-tech] shared network disks - vs gfs - vs distributed filesystem - vs ...
> It sounds like you're looking at a clustered file system. Where on the > Fast/Cheap/Reliable triangle do you want to land? Well, of course, all three. ;-) Fast and cheap are requirements. For reliability, it is acceptable to use simple disk mirroring on a single host. It needs to be protected against a single disk failure, but does not need to be protected against a machine failure or scsi/sata bus failure. At present, each machine's /scratch disk is either a mirror or a raid5, depending on which machine in question. > Keep in mind: you > should accept that if you want to have all systems see the same unified > file system and not have a shared storage media (fibre, iSCSI, etc), > then you will have your reads and writes go across the network. The goal I'm trying to accomplish - it's expected that some amount of network traffic is required, but it should be minimal. If a 1G file is created on some machine, then a few packets should fly across the network, just to say the file exists. But 1G should not go anywhere. > If you're looking for fast/cheap, then you might want to look at Lustre > (http://www.lustre.org/). It works well on RHEL & SUSE derivatives, > makes use of distributed resources. It does have support for > Infiniband if you've got that. Lustre writes across multiple nodes and > multiple partitions in order to gain speed. I think you're saying - it writes across more than one machine, which would slow down the write operation, but then when a read request comes in, the read could go faster because there's more than one machine available. So "in order to gain speed" assumes a usage pattern of reading more times than you write. Similar to the google gfs behavior. Unfortunately not accurate for the users I support. :-( Perhaps it's a configuration parameter? Perhaps you could, if you want to, tell the system to write files on a single host, and then when some other host reads the file, it finally goes across the network? I've heard of lustre, but never seen it or read anything about it before. > Additionally, it uses > block-level locking instead of file-level locking which can also speed > things up. Be warned: there is a fairly steep learning curve for > Lustre. Acknowledged. Thanks for the heads-up. ___ Tech mailing list Tech@lopsa.org http://lopsa.org/cgi-bin/mailman/listinfo/tech This list provided by the League of Professional System Administrators http://lopsa.org/
Re: [lopsa-tech] shared network disks - vs gfs - vs distributed filesystem - vs ...
In fact, this is what we already do. Mount ext3 /scratch Mount autofs /scratches So . /scratches/machinename is the mountpoint for NFS machinename:/scratch If you want to see the scratch directory of some other machine, you just go into /scratches/machinename. This is not horrible, and it is the best I've come up with so far, but it has the downside that you need to know which machine your job ran on. It is attractive to make it ubiquitous, without the need to know the machine name. I'm hoping to do better. From: tech-boun...@lopsa.org [mailto:tech-boun...@lopsa.org] On Behalf Of Jacob Kenner Sent: Wednesday, July 01, 2009 2:28 PM To: Edward Ned Harvey Cc: tech@lopsa.org Subject: Re: [lopsa-tech] shared network disks - vs gfs - vs distributed filesystem - vs ... On 1 Jul 2009, at 13:16, Edward Ned Harvey wrote: Not sure what else I should look at. Any ideas? I know you mentioned no NFS, but have you considered automount NFS so that you /scratch is the automount top level folder and the next folder is your machine name, mapping to a standard local folder? As a loopback NFS mount, you would get close to local filesystem speeds, and even from machine "bar", /scratch/foo would still exist (on machine "foo"). Jacob ___ Tech mailing list Tech@lopsa.org http://lopsa.org/cgi-bin/mailman/listinfo/tech This list provided by the League of Professional System Administrators http://lopsa.org/
Re: [lopsa-tech] shared network disks - vs gfs - vs distributed filesystem - vs ...
You seem to have conflicting wants/needs. First you said this: --- On Wed, 7/1/09, Edward Ned Harvey wrote: > I have a bunch of compute servers. They all have local disks > mounted as /scratch to use for computation scratch space. This ensures > maximum performance on all systems, and no competition for a shared > resource during crunch time. At present, all of their /scratch > directories are local, separate and distinct. Then this: > I think it would be awesome > if /scratch looked the same on all systems. Does "look the same" mean configured the same? You didn't really expand on this statement and clarify the goal, which I'm not sure is uniformity, accessibility, or a combo of both. The minute you move the computation space to networked storage, you've undermined your first goal, i.e., "maximum performance on all systems, and no competition for a shared resource during crunch time". You named the storage "/scratch", implying it is just a temporary usage space. Are you possibly adding requirements here that are unnecessary? We have similar HPC systems that write results to local disk space. When the computation is completely done, the results are rsynced to separate network accessible storage space; the local space is then reclaimed for the next job. The rsync is controlled by LSF scripts, but any job management system will have similar capabilities. The network available results can then be perused by engineers. If they want to keep the results around permanently, they move the results at their discretion to longer term storage. Anything that isn't moved by the engineers after 7 days is considered unimportant, and deleted after 7 days. Would that paradigm work for you? ___ Tech mailing list Tech@lopsa.org http://lopsa.org/cgi-bin/mailman/listinfo/tech This list provided by the League of Professional System Administrators http://lopsa.org/
Re: [lopsa-tech] shared network disks - vs gfs - vs distributed filesystem - vs ...
It sounds like you're looking at a clustered file system. Where on the Fast/Cheap/Reliable triangle do you want to land? Keep in mind: you should accept that if you want to have all systems see the same unified file system and not have a shared storage media (fibre, iSCSI, etc), then you will have your reads and writes go across the network. If you're looking for fast/cheap, then you might want to look at Lustre ( http://www.lustre.org/). It works well on RHEL & SUSE derivatives, makes use of distributed resources. It does have support for Infiniband if you've got that. Lustre writes across multiple nodes and multiple partitions in order to gain speed. Additionally, it uses block-level locking instead of file-level locking which can also speed things up. Be warned: there is a fairly steep learning curve for Lustre. Also in the fast/cheap camp is Gluster (http://www.gluster.org/), though I don't know anyone using Gluster in the supercomputing community, though. If you want fast/reliable and are willing to spend money, maybe looking at commercial products like GPFS, BlueArc, Panasas would be the way to go. -John Reddy ___ Tech mailing list Tech@lopsa.org http://lopsa.org/cgi-bin/mailman/listinfo/tech This list provided by the League of Professional System Administrators http://lopsa.org/
Re: [lopsa-tech] shared network disks - vs gfs - vs distributed filesystem - vs ...
On Wed, Jul 01, 2009 at 02:16:25PM -0400, Edward Ned Harvey wrote: > Not sure what else I should look at. Any ideas? For HPC scratch space, I've used PVFS2 (http://www.pvfs.org/), which provides a lot of the needs you've mentioned. One of the things the researchers liked was that you could write code to access the data directly rather than go through the filesystem layer. I do recall one of the problems was that you couldn't run executables from PVFS2, which causes some confusion. I've also heard of people using GlusterFS (http://www.gluster.org/) in a similar manner. -- Jonathan Billings ___ Tech mailing list Tech@lopsa.org http://lopsa.org/cgi-bin/mailman/listinfo/tech This list provided by the League of Professional System Administrators http://lopsa.org/
Re: [lopsa-tech] shared network disks - vs gfs - vs distributed filesystem - vs ...
On 1 Jul 2009, at 13:16, Edward Ned Harvey wrote: Not sure what else I should look at. Any ideas? I know you mentioned no NFS, but have you considered automount NFS so that you /scratch is the automount top level folder and the next folder is your machine name, mapping to a standard local folder? As a loopback NFS mount, you would get close to local filesystem speeds, and even from machine "bar", /scratch/foo would still exist (on machine "foo"). Jacob ___ Tech mailing list Tech@lopsa.org http://lopsa.org/cgi-bin/mailman/listinfo/tech This list provided by the League of Professional System Administrators http://lopsa.org/
[lopsa-tech] shared network disks - vs gfs - vs distributed filesystem - vs ...
I have a bunch of compute servers. They all have local disks mounted as /scratch to use for computation scratch space. This ensures maximum performance on all systems, and no competition for a shared resource during crunch time. At present, all of their /scratch directories are local, separate and distinct. I think it would be awesome if /scratch looked the same on all systems. Does anyone know of a way to "unify" this storage, without compromising performance? Of course, if some files reside on server A, and they are requested from server B, then the files must go across the network, but I don't want the files to go across the network unless they are requested. And yet, if you do something like "ls /scratch" you would ideally get the same results regardless of which machine you're on. Due to the nature of heavy runtime IO (read, seek, write, repeat.) it's not well suited to NFS or any network filesystem. Due to the nature of many systems all doing the same thing at the same time, it's not well suited to a SAN using shared disks. I looked at gfs (the cluster filesystem) - but - it seems gfs assumes a shared disk (like a san) in which case there is competition for a shared resource. I looked at gfs (the google filesystem) - but - it seems they constantly push all the data across the network, which is good for redundancy and mostly-just-read operations, and not good for heavy computation IO. Not sure what else I should look at. Any ideas? TIA. ___ Tech mailing list Tech@lopsa.org http://lopsa.org/cgi-bin/mailman/listinfo/tech This list provided by the League of Professional System Administrators http://lopsa.org/