Re: [lopsa-tech] shared network disks - vs gfs - vs distributed filesystem - vs ...

2009-07-03 Thread Edward Ned Harvey
> I'll say this once more, in case it got lost in the volume before. You
> should definitely have a look at GlusterFS. It's free with commercial
> support available (they even have try-and-buy 30 day support packages).

In fact, I have a punch list of possibilities to follow-up on, which are:

(promising)
.   Lustre
.   Pvfswww.pvfs.org 
.   Glusterfs   www.gluster.org   with local affinity turned on
.   NFS share with unionfs

(not so promising)
.   afs
.   using drbd to duplicate a block device then running ocfs2 as the
filesystem

Thanks for your feedback.  I'll surely post here again when I know more.

___
Tech mailing list
Tech@lopsa.org
http://lopsa.org/cgi-bin/mailman/listinfo/tech
This list provided by the League of Professional System Administrators
 http://lopsa.org/


Re: [lopsa-tech] shared network disks - vs gfs - vs distributed filesystem - vs ...

2009-07-03 Thread Doug Hughes
Edward Ned Harvey wrote:
> I have a bunch of compute servers.  They all have local disks mounted as 
> /scratch to use for computation scratch space.  This ensures maximum 
> performance on all systems, and no competition for a shared resource 
> during crunch time.  At present, all of their /scratch directories are 
> local, separate and distinct.  I think it would be awesome if /scratch 
> looked the same on all systems.  Does anyone know of a way to “unify” 
> this storage, without compromising performance?  Of course, if some 
> files reside on server A, and they are requested from server B, then the 
> files must go across the network, but I don’t want the files to go 
> across the network unless they are requested.  And yet, if you do 
> something like “ls /scratch” you would ideally get the same results 
> regardless of which machine you’re on.
> 
>  
> 
> Due to the nature of heavy runtime IO (read, seek, write, repeat…) it’s 
> not well suited to NFS or any network filesystem…  Due to the nature of 
> many systems all doing the same thing at the same time, it’s not well 
> suited to a SAN using shared disks…
> 
>  
> 
> I looked at gfs (the cluster filesystem) – but – it seems gfs assumes a 
> shared disk (like a san) in which case there is competition for a shared 
> resource.
> 
>  
> 
> I looked at gfs (the google filesystem) – but – it seems they constantly 
> push all the data across the network, which is good for redundancy and 
> mostly-just-read operations, and not good for heavy computation IO.
> 
>  
> 
> Not sure what else I should look at.  Any ideas?

I'll say this once more, in case it got lost in the volume before. You 
should definitely have a look at GlusterFS. It's free with commercial 
support available (they even have try-and-buy 30 day support packages).

It's *really* fast given our 1GE testing. It's a little rough around 
some edges, but it's bones are very good. They're actively working on 
getting a feature-replete software suite. It does pretty much what your 
requirements say, plus a little bit more. With local affinity turned on, 
  if you write data on machine X to the global FS, it will try to store 
it on the local disk so your read of the same data will be serviced 
locally. It also can do block or file level mirroring to other hosts for 
redundancy.

It's definitely worth taking a look for scratch space. We're about to 
deploy it on some of our smaller clusters after fairly successful 
sandbox testing.
___
Tech mailing list
Tech@lopsa.org
http://lopsa.org/cgi-bin/mailman/listinfo/tech
This list provided by the League of Professional System Administrators
 http://lopsa.org/


Re: [lopsa-tech] shared network disks - vs gfs - vs distributed filesystem - vs ...

2009-07-02 Thread Brandon S. Allbery KF8NH

On Jul 2, 2009, at 11:53 , Atom Powers wrote:

I didn't see AFS mentioned yet. My, admittedly incomplete,
understanding of afs is that it provides a single namespace (directory
tree) to all clients but the files themselves may be stored on a local
or remote server; a bit like Microsoft DFS.


Standard AFS doesn't quite work like that; even if you're running on  
the fileserver (a bad idea) you're still going through the protocol  
(AFS filesystems are not directly accessible via *nix APIs) and the  
local disk cache, and you want anything like local speeds you need a  
cache large enough to hold the entire working set.


There *is* an experimental hostafs that enables individual  
workstations to share out their normal disk space in the way you're  
talking about, but I gather the code doesn't work with current  
versions of AFS.  You could post a query to openafs-i...@openafs.org  
if you're desperate.


--
brandon s. allbery [solaris,freebsd,perl,pugs,haskell] allb...@kf8nh.com
system administrator [openafs,heimdal,too many hats] allb...@ece.cmu.edu
electrical and computer engineering, carnegie mellon universityKF8NH




PGP.sig
Description: This is a digitally signed message part
___
Tech mailing list
Tech@lopsa.org
http://lopsa.org/cgi-bin/mailman/listinfo/tech
This list provided by the League of Professional System Administrators
 http://lopsa.org/


Re: [lopsa-tech] shared network disks - vs gfs - vs distributed filesystem - vs ...

2009-07-02 Thread Robert Hajime Lanning
Eric Sorenson wrote:
> On Jul 2, 2009, at 8:53 AM, Atom Powers wrote:
> 
>> I didn't see AFS mentioned yet. My, admittedly incomplete,
>> understanding of afs is that it provides a single namespace (directory
>> tree) to all clients but the files themselves may be stored on a local
>> or remote server; a bit like Microsoft DFS.
>>
>> Would this not unify the storage while maintaining local access speeds
>> for files created locally, and still allow any host/client the ability
>> to access any file on any server?
> 
> 
> You really don't want to make every server a fileserver in AFS though,  
> the way the OP requested. File servers are special in AFS in that you  
> have to carefully (and manually) manage backup copies of volumes so  
> there's always n+1; if a volume with no backup copies goes away the  
> whole cell comes to a screeching halt.

You don't have to.  The advantage of AFS is that is uses a local disk as
cache.  And it can work in offline mode.  Though, I don't know it's
limits in doing that.  It might be RO when offline.

Moose!  care to comment? :)

-- 
END OF LINE
  --MCP
___
Tech mailing list
Tech@lopsa.org
http://lopsa.org/cgi-bin/mailman/listinfo/tech
This list provided by the League of Professional System Administrators
 http://lopsa.org/


Re: [lopsa-tech] shared network disks - vs gfs - vs distributed filesystem - vs ...

2009-07-02 Thread Eric Sorenson

On Jul 2, 2009, at 8:53 AM, Atom Powers wrote:

> I didn't see AFS mentioned yet. My, admittedly incomplete,
> understanding of afs is that it provides a single namespace (directory
> tree) to all clients but the files themselves may be stored on a local
> or remote server; a bit like Microsoft DFS.
>
> Would this not unify the storage while maintaining local access speeds
> for files created locally, and still allow any host/client the ability
> to access any file on any server?


You really don't want to make every server a fileserver in AFS though,  
the way the OP requested. File servers are special in AFS in that you  
have to carefully (and manually) manage backup copies of volumes so  
there's always n+1; if a volume with no backup copies goes away the  
whole cell comes to a screeching halt.

  - Eric Sorenson - N37 17.255 W121 55.738  - http://twitter.com/ 
ahpook  -

___
Tech mailing list
Tech@lopsa.org
http://lopsa.org/cgi-bin/mailman/listinfo/tech
This list provided by the League of Professional System Administrators
 http://lopsa.org/


Re: [lopsa-tech] shared network disks - vs gfs - vs distributed filesystem - vs ...

2009-07-02 Thread Phil Pennock
On 2009-07-02 at 11:31 -0400, Edward Ned Harvey wrote:
> Until today, I only had one idea which came close - google gfs (not
> global gfs) does exactly what I want except that it always writes to 3
> or more peers.  If google gfs is available for use (can install and be
> used on linux) and if it's configurable to take that number down to 1,
> then it might do what I want.

I've double-checked the published paper on this, to be sure I'm not
revealing anything not already published, but am just drawing your
attention to things you've perhaps overlooked.

GFS almost certainly doesn't do what you want.  It gets some of its
advantages by not being a regular file-system, you can't mount it and if
you try to hack in support via FUSE then you can find some normal POSIX
ops being rather unhealthy for the GFS cell.

You'd need all of your applications to link against GFS client
libraries.

> Clarification - Suppose I have local raid 0+1, and I do random
> read/write to those disks.

Google GFS is not what you want.  Files are append-only.  You can hack
around that, but it's probably more development work than you want to
spend on it.

-Phil, who has run GFS cells for a living
___
Tech mailing list
Tech@lopsa.org
http://lopsa.org/cgi-bin/mailman/listinfo/tech
This list provided by the League of Professional System Administrators
 http://lopsa.org/


Re: [lopsa-tech] shared network disks - vs gfs - vs distributed filesystem - vs ...

2009-07-02 Thread Atom Powers
On Wed, Jul 1, 2009 at 11:16 AM, Edward Ned Harvey wrote:
> I have a bunch of compute servers.  They all have local disks mounted as
> /scratch to use for computation scratch space.  This ensures maximum
> performance on all systems, and no competition for a shared resource during
> crunch time.  At present, all of their /scratch directories are local,
> separate and distinct.  I think it would be awesome if /scratch looked the
> same on all systems.  Does anyone know of a way to “unify” this storage,
> without compromising performance?  Of course, if some files reside on server
> A, and they are requested from server B, then the files must go across the
> network, but I don’t want the files to go across the network unless they are
> requested.  And yet, if you do something like “ls /scratch” you would
> ideally get the same results regardless of which machine you’re on.
>
...
>
> Not sure what else I should look at.  Any ideas?
>

I didn't see AFS mentioned yet. My, admittedly incomplete,
understanding of afs is that it provides a single namespace (directory
tree) to all clients but the files themselves may be stored on a local
or remote server; a bit like Microsoft DFS.

Would this not unify the storage while maintaining local access speeds
for files created locally, and still allow any host/client the ability
to access any file on any server?



-- 
Perfection is just a word I use occasionally with mustard.
--Atom Powers--

___
Tech mailing list
Tech@lopsa.org
http://lopsa.org/cgi-bin/mailman/listinfo/tech
This list provided by the League of Professional System Administrators
 http://lopsa.org/


Re: [lopsa-tech] shared network disks - vs gfs - vs distributed filesystem - vs ...

2009-07-02 Thread Edward Ned Harvey
> I'm not sure that what he wants does exist (and neither is he, that's
> why he's asking)

Precisely - although there are some promising leads to check out.  Lustre in
general seems like it would be good to know, even if it doesn't solve this
goal.  Gluster with local affinity turned on, sounds like a real likely
candidate.  Local ext3, plus NFS unionfs is another possibility.


> what he's wanting is a cluster filesystem where any system can read
> data that lives on any drive, but when he creates a file it should be
> created on the local drive.
> 
> the idea is to give local-drive performance for anything created by
> this box, while falling back to NFS-like performance if you access
> files created on a different box.

Precisely.


> he's not planning to have a high-bandwidth network in place to share
> this data.

Well, 1Gb Ethernet.  I'd call that medium.  It's not a fibrechannel san, but
it's also not a 1.5Mb wan.


___
Tech mailing list
Tech@lopsa.org
http://lopsa.org/cgi-bin/mailman/listinfo/tech
This list provided by the League of Professional System Administrators
 http://lopsa.org/


Re: [lopsa-tech] shared network disks - vs gfs - vs distributed filesystem - vs ...

2009-07-02 Thread Edward Ned Harvey
> On Wed, Jul 1, 2009 at 6:53 PM, Edward Ned Harvey 
> wrote:
> The goal I'm trying to accomplish - it's expected that some amount of
> network traffic is required, but it should be minimal.  If a 1G file is
> created on some machine, then a few packets should fly across the
> network, just to say the file exists.  But 1G should not go anywhere.
> 
> I'm not sure I understand your goal then.  There's no FS I know of that
> will do what you're asking.

Until today, I only had one idea which came close - google gfs (not global gfs) 
does exactly what I want except that it always writes to 3 or more peers.  If 
google gfs is available for use (can install and be used on linux) and if it's 
configurable to take that number down to 1, then it might do what I want.

Another person gave me a promising suggestion - Export /scratch from each 
machine, and then use unionfs to mount all the various machines' scratches onto 
a single scratch.  Haven't tried this yet - it may still go horribly wrong.


> Your options are: local disk (ext3, xfs), shared disk (iSCSI, fiber
> channel) running GFS (global FS, not google FS), network file system
> (NFS) or distributed file system (lustre, GPFS, AFS).

For this purpose, I'm interested in a distributed filesystem, which allows 
writes to be performed on a single host, to allow any individual host to work 
on new files at local disk speeds.  Thanks to this thread, I have the names of 
some distributed filesystems (as you mentioned, lustre etc) ... but I haven't 
had the chance to research any of them more thoroughly yet.


> Anything beyond local disk will require communication over the
> network.  With GFS, you'll mostly be speaking with the lock manager
> over the network.  So that would accomplish your goal of "write locally
> and only send limited amount of data over the network".  However, GFS
> isn't one of the shining stars in distributed/parallel processing or
> HPC (high performance computing).

Plus gfs (global gfs, not google) requires iscsi / san / shared disk.


> I think you're saying - it writes across more than one machine, which
> would slow down the write operation
> 
> Actually, writing across multiple hosts speeds it up.  Much along the
> lines of a RAID 0 striping pattern, the data is spread across multiple
> destinations.

Clarification - Suppose I have local raid 0+1, and I do random read/write to 
those disks.  It's going very fast, perhaps 3Gb sata bus speed.  Now I write to 
the fastest NFS server in the world.  I am limited to 1Gb Ethernet which adds 
overhead.  Not nearly as fast.

I understand, in google gfs for example, if most of your operations are read, 
and most of your operations come across the network, then you're able to gain 
performance by distributing writes across multiple hosts, so later you have 
multiple hosts available to handle the read requests.  But if your usage is (as 
is the case for me) mostly random read/write on a single host, with occasional 
reads from some other host, then positively the best performance will come from 
local disk.



___
Tech mailing list
Tech@lopsa.org
http://lopsa.org/cgi-bin/mailman/listinfo/tech
This list provided by the League of Professional System Administrators
 http://lopsa.org/


Re: [lopsa-tech] shared network disks - vs gfs - vs distributed filesystem - vs ...

2009-07-01 Thread Narayan Desai
On Wed, 1 Jul 2009 19:29:43 -0400 Edward Ned Harvey wrote:

  Ned> > You seem to have conflicting wants/needs. First you said this:
  Ned> > --- On Wed, 7/1/09, Edward Ned Harvey  wrote:
  Ned> > > I have a bunch of compute servers.  They all have local disks
  Ned> > > mounted as /scratch to use for computation scratch space.  This
  Ned> > ensures
  Ned> > > maximum performance on all systems, and no competition for a shared
  Ned> > > resource during crunch time.  At present, all of their /scratch
  Ned> > > directories are local, separate and distinct.
  Ned> > 
  Ned> > Then this:
  Ned> > > I think it would be awesome
  Ned> > > if /scratch looked the same on all systems.
  Ned> > 
  Ned> > Does "look the same" mean configured the same? You didn't really expand
  Ned> > on this statement and clarify the goal, which I'm not sure is
  Ned> > uniformity, accessibility, or a combo of both.

  Ned> You're right - although it was clear in my mind, I see how that was
  Ned> confusing.  Let me try again:

  Ned> If you go into some directory and do "ls" (or whatever), then the results
  Ned> should be the same regardless of which machine you're on.  I do not want 
a
  Ned> centralized network file server, because of the bandwidth and diskspace
  Ned> bottleneck.  I want a distributed filesystem, which would provide the
  Ned> aforementioned ubiquitousness of namespace, but also allows you to do 
heavy
  Ned> IO on some machine without necessitating heavy network traffic.  A 
minimal
  Ned> amount of traffic is probably required, just so the other machines all 
have
  Ned> awareness of the existence of some file, but the file contents themselves
  Ned> are not needed to traverse the network until some other machine requests 
the
  Ned> contents of the file.

Well, there are two ways to go about it. Either you have a single
namespace backed by a single canonical version (which can be done with
something like cachefs to help improve performance via local disk
caching), or you have a single namespace backed by a distributed
canonical version. There are a pile of filesystems that sort of work
this way, with a variety of tradeoffs. PVFS and Lustre do this, as does
GPFS (which costs $$ but works well in our experience). Distributed
filesystems get tricky due to both metadata issues (for example, walking
the directory hierarchy can involve talking to multiple servers,
depending on how metadata is distributed) and locking, where you need to
start worrying about things like consensus protocols. PVFS doesn't do
locking, so this latter issue isn't a problem for it; it gets improved
performance and robustness in exchange. Any of these will usually stripe
files across multiple servers, to improve performance.

A big issue that you would run into on a filesystem that works as you
described above is that the failure of any machine participating would
potentially take a chunk out of the filesystem. This could include
either partial or complete files, or worse, directories out of the
directory hierarchy. So once you start ramping up the client count, you
want to start looking at replication management. 

So, you could also look at something like Hadoop's hdfs. I don't know a ton
about it, but it works hard to be robust (like ensuring multiple copies
of files exist across your network) so I would assume (tho I don't know)
that it ends up being slower than the options above.

There are probably a variety of solution that would give you 70% of what
you want, but I don't know of any filesystem that would give you
everything. This is a surprisingly hard problem to solve well. Tradeoffs
abound. 
 -nld

___
Tech mailing list
Tech@lopsa.org
http://lopsa.org/cgi-bin/mailman/listinfo/tech
This list provided by the League of Professional System Administrators
 http://lopsa.org/


Re: [lopsa-tech] shared network disks - vs gfs - vs distributed filesystem - vs ...

2009-07-01 Thread Doug Hughes
da...@lang.hm wrote:
> On Wed, 1 Jul 2009, Edward Ned Harvey wrote:
> 
>>> It sounds like you're looking at a clustered file system.  Where on the
>>> Fast/Cheap/Reliable triangle do you want to land?
>> Well, of course, all three.  ;-)
>> Fast and cheap are requirements.  For reliability, it is acceptable to 
>> use simple disk mirroring on a single host.  It needs to be protected 
>> against a single disk failure, but does not need to be protected against 
>> a machine failure or scsi/sata bus failure.
>>
>> At present, each machine's /scratch disk is either a mirror or a raid5, 
>> depending on which machine in question.
> 
> actually, what you want is a bit more nuanced than that.
> 
> you want new files that are created to be created on the local disk so 
> that they have the same performance as they have today.
> 
> but you are willing to loose substantial performance in accessing a file 
> that was created by another machine that lives on it's local disk.
> 
> you just want the different filesystems to be transparently glued 
> togeather.\
> 
Sounds like GlusterFS with local affinity turned on.
(apologies for not trimming context more)
___
Tech mailing list
Tech@lopsa.org
http://lopsa.org/cgi-bin/mailman/listinfo/tech
This list provided by the League of Professional System Administrators
 http://lopsa.org/


Re: [lopsa-tech] shared network disks - vs gfs - vs distributed filesystem - vs ...

2009-07-01 Thread david

On Wed, 1 Jul 2009, John Reddy wrote:


On Wed, Jul 1, 2009 at 6:53 PM, Edward Ned Harvey wrote:


The goal I'm trying to accomplish - it's expected that some amount of
network traffic is required, but it should be minimal.  If a 1G file is
created on some machine, then a few packets should fly across the network,
just to say the file exists.  But 1G should not go anywhere.



I'm not sure I understand your goal then.  There's no FS I know of that will
do what you're asking.

Your options are: local disk (ext3, xfs), shared disk (iSCSI, fiber channel)
running GFS (global FS, not google FS), network file system (NFS) or
distributed file system (lustre, GPFS, AFS).

Anything beyond local disk will require communication over the network.
With GFS, you'll mostly be speaking with the lock manager over the network.
So that would accomplish your goal of "write locally and only send limited
amount of data over the network".  However, GFS isn't one of the shining
stars in distributed/parallel processing or HPC (high performance
computing).


I'm not sure that what he wants does exist (and neither is he, that's why 
he's asking)


what he's wanting is a cluster filesystem where any system can read data 
that lives on any drive, but when he creates a file it should be created 
on the local drive.


the idea is to give local-drive performance for anything created by this 
box, while falling back to NFS-like performance if you access files 
created on a different box.


he's not planning to have a high-bandwidth network in place to share this 
data.



I think you're saying - it writes across more than one machine, which would

slow down the write operation



Actually, writing across multiple hosts speeds it up.  Much along the lines
of a RAID 0 striping pattern, the data is spread across multiple
destinations.


that assumes that your network is faster than your local disks

David Lang___
Tech mailing list
Tech@lopsa.org
http://lopsa.org/cgi-bin/mailman/listinfo/tech
This list provided by the League of Professional System Administrators
 http://lopsa.org/
___
Tech mailing list
Tech@lopsa.org
http://lopsa.org/cgi-bin/mailman/listinfo/tech
This list provided by the League of Professional System Administrators
 http://lopsa.org/


Re: [lopsa-tech] shared network disks - vs gfs - vs distributed filesystem - vs ...

2009-07-01 Thread Edward Ned Harvey
> You seem to have conflicting wants/needs. First you said this:
> --- On Wed, 7/1/09, Edward Ned Harvey  wrote:
> > I have a bunch of compute servers.  They all have local disks
> > mounted as /scratch to use for computation scratch space.  This
> ensures
> > maximum performance on all systems, and no competition for a shared
> > resource during crunch time.  At present, all of their /scratch
> > directories are local, separate and distinct.
> 
> Then this:
> > I think it would be awesome
> > if /scratch looked the same on all systems.
> 
> Does "look the same" mean configured the same? You didn't really expand
> on this statement and clarify the goal, which I'm not sure is
> uniformity, accessibility, or a combo of both.

You're right - although it was clear in my mind, I see how that was
confusing.  Let me try again:

If you go into some directory and do "ls" (or whatever), then the results
should be the same regardless of which machine you're on.  I do not want a
centralized network file server, because of the bandwidth and diskspace
bottleneck.  I want a distributed filesystem, which would provide the
aforementioned ubiquitousness of namespace, but also allows you to do heavy
IO on some machine without necessitating heavy network traffic.  A minimal
amount of traffic is probably required, just so the other machines all have
awareness of the existence of some file, but the file contents themselves
are not needed to traverse the network until some other machine requests the
contents of the file.


> You named the storage "/scratch", implying it is just a temporary usage
> space. Are you possibly adding requirements here that are unnecessary?

I did not mean to imply scratch is temporary - You see, we already have a
NFS server, which is backed up, so I named the local directory "scratch" so
users know it's not backed up.


> We have similar HPC systems that write results to local disk space.
> When the computation is completely done, the results are rsynced to
> separate network accessible storage space; the local space is then
> reclaimed for the next job. The rsync is controlled by LSF scripts, but
> any job management system will have similar capabilities. The network
> available results can then be perused by engineers. If they want to
> keep the results around permanently, they move the results at their
> discretion to longer term storage. Anything that isn't moved by the
> engineers after 7 days is considered unimportant, and deleted after 7
> days.
> 
> Would that paradigm work for you?

I have done exactly the same in the past - er - I should say the users have
done the same.  It's acceptable.  In fact, what we have now is also
acceptable.  I'm just trying to make it better.  (and learn more).  The two
downsides of the above are limitation of disk space on the nfs server, and
actually since a bunch of machines are all pushing their results up to the
nfs server, performance does become an issue.

What we have now, is as follows:
Each machine has a local disk mounted as /scratch.
Each machine exports it.
Each machine has an automount directory, /scratches
So you can access any scratch directory from any machine.  For example,
/scratches/machinename is the nfs mountpoint for machinename:/scratch

This is almost ideal - except - in order to access the data that was
generated last night, the user must know on which machine that data was
created.  Acceptable, but still could be cooler.  :-)


___
Tech mailing list
Tech@lopsa.org
http://lopsa.org/cgi-bin/mailman/listinfo/tech
This list provided by the League of Professional System Administrators
 http://lopsa.org/


Re: [lopsa-tech] shared network disks - vs gfs - vs distributed filesystem - vs ...

2009-07-01 Thread John Reddy
On Wed, Jul 1, 2009 at 6:53 PM, Edward Ned Harvey wrote:

> The goal I'm trying to accomplish - it's expected that some amount of
> network traffic is required, but it should be minimal.  If a 1G file is
> created on some machine, then a few packets should fly across the network,
> just to say the file exists.  But 1G should not go anywhere.


I'm not sure I understand your goal then.  There's no FS I know of that will
do what you're asking.

Your options are: local disk (ext3, xfs), shared disk (iSCSI, fiber channel)
running GFS (global FS, not google FS), network file system (NFS) or
distributed file system (lustre, GPFS, AFS).

Anything beyond local disk will require communication over the network.
With GFS, you'll mostly be speaking with the lock manager over the network.
So that would accomplish your goal of "write locally and only send limited
amount of data over the network".  However, GFS isn't one of the shining
stars in distributed/parallel processing or HPC (high performance
computing).


I think you're saying - it writes across more than one machine, which would
> slow down the write operation


Actually, writing across multiple hosts speeds it up.  Much along the lines
of a RAID 0 striping pattern, the data is spread across multiple
destinations.




> I've heard of lustre, but never seen it or read anything about it before.
>

The Supercomputing Top 500 was recently released.  Lustre is in use on two
thirds of the top 500, and 7 out of the top 10 supercomputers.  It's
designed for massively parallel access at incredibly high speeds.  It's
certainly not a swiss army knife, but your requirements of "fast" and
"visible to all nodes" would suggest that it might do what you want.  How
many client computers are you talking about, though?

-John
___
Tech mailing list
Tech@lopsa.org
http://lopsa.org/cgi-bin/mailman/listinfo/tech
This list provided by the League of Professional System Administrators
 http://lopsa.org/


Re: [lopsa-tech] shared network disks - vs gfs - vs distributed filesystem - vs ...

2009-07-01 Thread david
On Wed, 1 Jul 2009, Edward Ned Harvey wrote:

>> It sounds like you're looking at a clustered file system.  Where on the
>> Fast/Cheap/Reliable triangle do you want to land?
>
> Well, of course, all three.  ;-)
> Fast and cheap are requirements.  For reliability, it is acceptable to 
> use simple disk mirroring on a single host.  It needs to be protected 
> against a single disk failure, but does not need to be protected against 
> a machine failure or scsi/sata bus failure.
>
> At present, each machine's /scratch disk is either a mirror or a raid5, 
> depending on which machine in question.

actually, what you want is a bit more nuanced than that.

you want new files that are created to be created on the local disk so 
that they have the same performance as they have today.

but you are willing to loose substantial performance in accessing a file 
that was created by another machine that lives on it's local disk.

you just want the different filesystems to be transparently glued 
togeather.

>> Keep in mind: you
>> should accept that if you want to have all systems see the same unified
>> file system and not have a shared storage media (fibre, iSCSI, etc),
>> then you will have your reads and writes go across the network.
>
> The goal I'm trying to accomplish - it's expected that some amount of 
> network traffic is required, but it should be minimal.  If a 1G file is 
> created on some machine, then a few packets should fly across the 
> network, just to say the file exists.  But 1G should not go anywhere.

right.

David Lang
___
Tech mailing list
Tech@lopsa.org
http://lopsa.org/cgi-bin/mailman/listinfo/tech
This list provided by the League of Professional System Administrators
 http://lopsa.org/


Re: [lopsa-tech] shared network disks - vs gfs - vs distributed filesystem - vs ...

2009-07-01 Thread Edward Ned Harvey
> It sounds like you're looking at a clustered file system.  Where on the
> Fast/Cheap/Reliable triangle do you want to land?  

Well, of course, all three.  ;-)
Fast and cheap are requirements.  For reliability, it is acceptable to use 
simple disk mirroring on a single host.  It needs to be protected against a 
single disk failure, but does not need to be protected against a machine 
failure or scsi/sata bus failure.

At present, each machine's /scratch disk is either a mirror or a raid5, 
depending on which machine in question.


> Keep in mind: you
> should accept that if you want to have all systems see the same unified
> file system and not have a shared storage media (fibre, iSCSI, etc),
> then you will have your reads and writes go across the network.

The goal I'm trying to accomplish - it's expected that some amount of network 
traffic is required, but it should be minimal.  If a 1G file is created on some 
machine, then a few packets should fly across the network, just to say the file 
exists.  But 1G should not go anywhere.


> If you're looking for fast/cheap, then you might want to look at Lustre
> (http://www.lustre.org/).  It works well on RHEL & SUSE derivatives,
> makes use of distributed resources.  It does have support for
> Infiniband if you've got that.  Lustre writes across multiple nodes and
> multiple partitions in order to gain speed.  

I think you're saying - it writes across more than one machine, which would 
slow down the write operation, but then when a read request comes in, the read 
could go faster because there's more than one machine available.  So "in order 
to gain speed" assumes a usage pattern of reading more times than you write.  
Similar to the google gfs behavior.  Unfortunately not accurate for the users I 
support.  :-(

Perhaps it's a configuration parameter?  Perhaps you could, if you want to, 
tell the system to write files on a single host, and then when some other host 
reads the file, it finally goes across the network?

I've heard of lustre, but never seen it or read anything about it before.


> Additionally, it uses
> block-level locking instead of file-level locking which can also speed
> things up.  Be warned: there is a fairly steep learning curve for
> Lustre.

Acknowledged.  Thanks for the heads-up.



___
Tech mailing list
Tech@lopsa.org
http://lopsa.org/cgi-bin/mailman/listinfo/tech
This list provided by the League of Professional System Administrators
 http://lopsa.org/


Re: [lopsa-tech] shared network disks - vs gfs - vs distributed filesystem - vs ...

2009-07-01 Thread Edward Ned Harvey
In fact, this is what we already do.

Mount ext3 /scratch

Mount autofs /scratches

 

So . /scratches/machinename is the mountpoint for NFS machinename:/scratch

 

If you want to see the scratch directory of some other machine, you just go
into /scratches/machinename.  This is not horrible, and it is the best I've
come up with so far, but it has the downside that you need to know which
machine your job ran on.  It is attractive to make it ubiquitous, without
the need to know the machine name.  I'm hoping to do better.

 

 

 

 

From: tech-boun...@lopsa.org [mailto:tech-boun...@lopsa.org] On Behalf Of
Jacob Kenner
Sent: Wednesday, July 01, 2009 2:28 PM
To: Edward Ned Harvey
Cc: tech@lopsa.org
Subject: Re: [lopsa-tech] shared network disks - vs gfs - vs distributed
filesystem - vs ...

 

On 1 Jul 2009, at 13:16, Edward Ned Harvey wrote:

Not sure what else I should look at.  Any ideas?

 

I know you mentioned no NFS, but have you considered automount NFS so that
you /scratch is the automount top level folder and the next folder is your
machine name, mapping to a standard local folder?  As a loopback NFS mount,
you would get close to local filesystem speeds, and even from machine "bar",
/scratch/foo would still exist (on machine "foo").

 

Jacob

 

___
Tech mailing list
Tech@lopsa.org
http://lopsa.org/cgi-bin/mailman/listinfo/tech
This list provided by the League of Professional System Administrators
 http://lopsa.org/


Re: [lopsa-tech] shared network disks - vs gfs - vs distributed filesystem - vs ...

2009-07-01 Thread unix_fan


You seem to have conflicting wants/needs. First you said this:
--- On Wed, 7/1/09, Edward Ned Harvey  wrote:
> I have a bunch of compute servers.  They all have local disks
> mounted as /scratch to use for computation scratch space.  This ensures
> maximum performance on all systems, and no competition for a shared
> resource during crunch time.  At present, all of their /scratch
> directories are local, separate and distinct.  

Then this:
> I think it would be awesome
> if /scratch looked the same on all systems.  

Does "look the same" mean configured the same? You didn't really expand on this 
statement and clarify the goal, which I'm not sure is uniformity, 
accessibility, or a combo of both.

The minute you move the computation space to networked storage, you've 
undermined your first goal, i.e., "maximum performance on all systems, and no 
competition for a shared resource during crunch time".

You named the storage "/scratch", implying it is just a temporary usage space. 
Are you possibly adding requirements here that are unnecessary?

We have similar HPC systems that write results to local disk space. When the 
computation is completely done, the results are rsynced to separate network 
accessible storage space; the local space is then reclaimed for the next job. 
The rsync is controlled by LSF scripts, but any job management system will have 
similar capabilities. The network available results can then be perused by 
engineers. If they want to keep the results around permanently, they move the 
results at their discretion to longer term storage. Anything that isn't moved 
by the engineers after 7 days is considered unimportant, and deleted after 7 
days. 

Would that paradigm work for you?



___
Tech mailing list
Tech@lopsa.org
http://lopsa.org/cgi-bin/mailman/listinfo/tech
This list provided by the League of Professional System Administrators
 http://lopsa.org/


Re: [lopsa-tech] shared network disks - vs gfs - vs distributed filesystem - vs ...

2009-07-01 Thread John Reddy
It sounds like you're looking at a clustered file system.  Where on the
Fast/Cheap/Reliable triangle do you want to land?  Keep in mind: you should
accept that if you want to have all systems see the same unified file system
and not have a shared storage media (fibre, iSCSI, etc), then you will have
your reads and writes go across the network.

If you're looking for fast/cheap, then you might want to look at Lustre (
http://www.lustre.org/).  It works well on RHEL & SUSE derivatives, makes
use of distributed resources.  It does have support for Infiniband if you've
got that.  Lustre writes across multiple nodes and multiple partitions in
order to gain speed.  Additionally, it uses block-level locking instead of
file-level locking which can also speed things up.  Be warned: there is a
fairly steep learning curve for Lustre.

Also in the fast/cheap camp is Gluster (http://www.gluster.org/), though I
don't know anyone using Gluster in the supercomputing community, though.

If you want fast/reliable and are willing to spend money, maybe looking at
commercial products like GPFS, BlueArc, Panasas would be the way to go.

-John Reddy
___
Tech mailing list
Tech@lopsa.org
http://lopsa.org/cgi-bin/mailman/listinfo/tech
This list provided by the League of Professional System Administrators
 http://lopsa.org/


Re: [lopsa-tech] shared network disks - vs gfs - vs distributed filesystem - vs ...

2009-07-01 Thread Jonathan Billings
On Wed, Jul 01, 2009 at 02:16:25PM -0400, Edward Ned Harvey wrote:
> Not sure what else I should look at.  Any ideas?

For HPC scratch space, I've used PVFS2 (http://www.pvfs.org/), which
provides a lot of the needs you've mentioned.  One of the things the
researchers liked was that you could write code to access the data
directly rather than go through the filesystem layer.  

I do recall one of the problems was that you couldn't run executables
from PVFS2, which causes some confusion.

I've also heard of people using GlusterFS (http://www.gluster.org/) in
a similar manner.

-- 
Jonathan Billings 
___
Tech mailing list
Tech@lopsa.org
http://lopsa.org/cgi-bin/mailman/listinfo/tech
This list provided by the League of Professional System Administrators
 http://lopsa.org/


Re: [lopsa-tech] shared network disks - vs gfs - vs distributed filesystem - vs ...

2009-07-01 Thread Jacob Kenner

On 1 Jul 2009, at 13:16, Edward Ned Harvey wrote:

Not sure what else I should look at.  Any ideas?


I know you mentioned no NFS, but have you considered automount NFS so  
that you /scratch is the automount top level folder and the next  
folder is your machine name, mapping to a standard local folder?  As a  
loopback NFS mount, you would get close to local filesystem speeds,  
and even from machine "bar", /scratch/foo would still exist (on  
machine "foo").


Jacob

___
Tech mailing list
Tech@lopsa.org
http://lopsa.org/cgi-bin/mailman/listinfo/tech
This list provided by the League of Professional System Administrators
 http://lopsa.org/


[lopsa-tech] shared network disks - vs gfs - vs distributed filesystem - vs ...

2009-07-01 Thread Edward Ned Harvey
I have a bunch of compute servers.  They all have local disks mounted as
/scratch to use for computation scratch space.  This ensures maximum
performance on all systems, and no competition for a shared resource during
crunch time.  At present, all of their /scratch directories are local,
separate and distinct.  I think it would be awesome if /scratch looked the
same on all systems.  Does anyone know of a way to "unify" this storage,
without compromising performance?  Of course, if some files reside on server
A, and they are requested from server B, then the files must go across the
network, but I don't want the files to go across the network unless they are
requested.  And yet, if you do something like "ls /scratch" you would
ideally get the same results regardless of which machine you're on.

 

Due to the nature of heavy runtime IO (read, seek, write, repeat.) it's not
well suited to NFS or any network filesystem.  Due to the nature of many
systems all doing the same thing at the same time, it's not well suited to a
SAN using shared disks. 

 

I looked at gfs (the cluster filesystem) - but - it seems gfs assumes a
shared disk (like a san) in which case there is competition for a shared
resource.

 

I looked at gfs (the google filesystem) - but - it seems they constantly
push all the data across the network, which is good for redundancy and
mostly-just-read operations, and not good for heavy computation IO.

 

Not sure what else I should look at.  Any ideas?

 

TIA.

___
Tech mailing list
Tech@lopsa.org
http://lopsa.org/cgi-bin/mailman/listinfo/tech
This list provided by the League of Professional System Administrators
 http://lopsa.org/