Re: [Oscar-users] Question for NFS in cluster

Jeff Squyres Wed, 05 Feb 2003 07:37:52 -0800

On Mon, 3 Feb 2003, Weirong Zhu wrote:

> Assuming in an ethernet I have two linux boxes-- A and B. A is a NFS
> server, and B is a client,which mounts a partition from A as NFS type.
> In that NFS partition, there is a big data file,let's call it FILE.
>
> Now, let me compare two different conditions:
>
> 1. A open FILE locally, and read data from FILE to a buffer, then using
> standard TCP/IP socket to transfer this buffer to B's buffer.
>
> 2. B open FILE from NFS partition, and read data from FILE to buffer.
>
> What I concern about is which case is faster? Which case is more costly
> from the view of wall time?


There are many factors to consider here, even if we're only talking about
raw performance (e.g., disregarding extra cost in terms of code complexity
for your software).

Looking at it very simply: using a TCP socket to send the data is probably
vastly simpler than the NFS protocols.  Meaning that the overhead of your
TCP socket (assuming that you do very large write()'s, and send max-sized
packets, the kernel can write faster than the network can send, etc.)
will probably be far less than the NFS protocol overhead.  So from this
standpoint, opening a socket and sending it your self is probably [at
least somewhat] faster.

That being said, it's probably not that simple.  :-)  The above scenario
is probably true between two machines and sending a single large chunk of
data.

If you need to send a lot of data to a lot of machines, it becomes more
complicated.  Having a single server becomes a bottleneck.  You may want
to consider alternate solutions, such as a distributed sending pattern
(e.g., a binomial tree or a binary tree, or perhaps something that closely
matches your network architecture).  Code to do this becomes complicated,
though.  Cluster-based filesystems may help here (e.g., GPFS) -- I'm not
knowledgeable in this area, though, so you'll probably want to look into
those and see if they'll help.

Two notable things

- MPI implementations typically add very little overhead, and may provide
what you need.  They can potentially save you all the headache of setting
up TCP sockets (or whatever your native protocol is).  MPI_Send using
MPI_BYTE (so that there's no byte order translation), especially for very
large messages, can be effectively operate at native "wire speed".  Even
better, do a bunch of MPI_Isend calls (non-blocking sends) sending data
chunks to different destinations, and then MPI_Waitall on them, letting
MPI make progress on all of them.  Or do MPI_Scatter to let MPI handle the
distribution to multiple destinations, and potentially automatically do
the heirarchical distribution for you (as mentioned above).  Or... (and so
on)

- The linux system call sendfile() was recently introduced to send a file
across a socket without crossing the kernel barrier.  Hence, you can open
a file across a socket without all of the data needing to cross from the
kernel into user space and back into kernel space.  This may be helpful to
you.

-----

In short: NFS is very convenient.  For small to mid-sized data sets, it's
probably your best bet.  But if you're going to have larger data sets
and/or many nodes requesting large amounts of data, there are many better
ways, some of which are easy, some of which are not.  It really depends on
how much data you're talking about, and to how many destination nodes.

When it comes down to it, if we're talking about the difference between 5
minutes and 10 minutes, it's probably not worth the code complexity and
debugging.  But if we're talking about 30 minutes vs. 2 hours, then it's
probably worth investigating these other kinds of solutions.

> I assume there is 3 steps in Case 2 ---(1)open file on A (2) A read data
> (3) A send data to B through some protocol. All these 3 steps are
> transparent for user. Am I correct for this assumption? And if 2 is done

Correct.

> frequently, will B buffer FILE locally after the first read?

I'm not 100% sure, but I do not believe so.  Can someone else comment more
definitively here?

-- 
{+} Jeff Squyres
{+} [EMAIL PROTECTED]
{+} http://www.lam-mpi.org/


-------------------------------------------------------
This SF.NET email is sponsored by:
SourceForge Enterprise Edition + IBM + LinuxWorld = Something 2 See!
http://www.vasoftware.com
_______________________________________________
Oscar-users mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/oscar-users

Re: [Oscar-users] Question for NFS in cluster

Reply via email to