Re: [OMPI users] Problem with NFS + PVFS2 + OpenMPI

2008-05-30 Thread Davi Vercillo C. Garcia
Hi,

Sorry but I made a mistake... I'm not trying to use PVFS over NFS but
PVFS over EXT3. I still don't know this error message...

On Thu, May 29, 2008 at 5:33 PM, Robert Latham  wrote:
> On Thu, May 29, 2008 at 04:48:49PM -0300, Davi Vercillo C. Garcia wrote:
>> > Oh, I see you want to use ordered i/o in your application.  PVFS
>> > doesn't support that mode.  However, since you know how much data each
>> > process wants to write, a combination of MPI_Scan (to compute each
>> > processes offset) and MPI_File_write_at_all (to carry out the
>> > collective i/o) will give you the same result with likely better
>> > performance (and has the nice side effect of working with pvfs).
>>
>> I don't understand very well this... what do I need to change in my code ?
>
> MPI_File_write_ordered has an interesting property (which you probably
> know since you use it, but i'll spell it out anyway):  writes end up
> in the file in rank-order, but are not necessarily carried out in
> rank-order.
>
> Once each process knows the offsets and lengths of the writes the
> other process will do, that process can writes its data.  Observe that
> rank 0 can write immediately.  Rank 1 only needs to know how much data
> rank 0 will write.  and so on.
>
> Rank N can compute its offset by knowing how much data the proceeding
> N-1 processes want to write.  The most efficent way to collect this is
> to use MPI_Scan and collect a sum of data:
>
> http://www.mpi-forum.org/docs/mpi-11-html/node84.html#Node84
>
> Once you've computed these offsets, MPI_File_write_at_all has enough
> information to cary out a collective write of the data.
>
> ==rob
>
> --
> Rob Latham
> Mathematics and Computer Science DivisionA215 0178 EA2D B059 8CDF
> Argonne National Lab, IL USA B29D F333 664A 4280 315B
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>



-- 
Davi Vercillo Carneiro Garcia

Universidade Federal do Rio de Janeiro
Departamento de Ciência da Computação
DCC-IM/UFRJ - http://www.dcc.ufrj.br

"Good things come to those who... wait." - Debian Project

"A computer is like air conditioning: it becomes useless when you open
windows." - Linus Torvalds

"Há duas coisas infinitas, o universo e a burrice humana. E eu estou
em dúvida quanto o primeiro." - Albert Einstein



Re: [OMPI users] Problem with NFS + PVFS2 + OpenMPI

2008-05-29 Thread Robert Latham
On Thu, May 29, 2008 at 04:48:49PM -0300, Davi Vercillo C. Garcia wrote:
> > Oh, I see you want to use ordered i/o in your application.  PVFS
> > doesn't support that mode.  However, since you know how much data each
> > process wants to write, a combination of MPI_Scan (to compute each
> > processes offset) and MPI_File_write_at_all (to carry out the
> > collective i/o) will give you the same result with likely better
> > performance (and has the nice side effect of working with pvfs).
> 
> I don't understand very well this... what do I need to change in my code ?

MPI_File_write_ordered has an interesting property (which you probably
know since you use it, but i'll spell it out anyway):  writes end up
in the file in rank-order, but are not necessarily carried out in
rank-order.   

Once each process knows the offsets and lengths of the writes the
other process will do, that process can writes its data.  Observe that
rank 0 can write immediately.  Rank 1 only needs to know how much data
rank 0 will write.  and so on.

Rank N can compute its offset by knowing how much data the proceeding
N-1 processes want to write.  The most efficent way to collect this is
to use MPI_Scan and collect a sum of data:

http://www.mpi-forum.org/docs/mpi-11-html/node84.html#Node84

Once you've computed these offsets, MPI_File_write_at_all has enough
information to cary out a collective write of the data.

==rob

-- 
Rob Latham
Mathematics and Computer Science DivisionA215 0178 EA2D B059 8CDF
Argonne National Lab, IL USA B29D F333 664A 4280 315B


Re: [OMPI users] Problem with NFS + PVFS2 + OpenMPI

2008-05-29 Thread Brock Palen
That I don't know, we use lustre for this stuff now. And our users  
don't use parallel IO,  (though I hope to change that).
Sorry can't help more.  I would really use 'just' pvfs2 for your IO.   
The other reply pointed out you can have both and not use NFS at all  
for your IO but leave it mounted is thats what users are expecting.


Brock Palen
www.umich.edu/~brockp
Center for Advanced Computing
bro...@umich.edu
(734)936-1985



On May 29, 2008, at 3:46 PM, Davi Vercillo C. Garcia wrote:

Hi,

I'm already using "noac" option on my /etc/fstab but this error is
still happening. I need to put this in another file ?

On Thu, May 29, 2008 at 4:33 PM, Brock Palen  wrote:
Well don't run like this. Have PVFS have NFS,  don't mix them like  
that your

asking for pain, my 2 cents.
I get this error all the time also, you have to disable a large  
portion of
the caching that NFS does to make sure that all MPI-IO clients get  
true data
on the file they are all trying to access.  To make this work  
check in your

/etc/fstab and see if you have the 'noac'  option.  This
is attribute caching, this must be disabled.

On that note PVFS2 is made for doing MPI-IO to multiple hosts  
(thus no need
for NFS)  because it was made with MPI-IO in mind it should work  
out the

box.
Brock Palen
www.umich.edu/~brockp
Center for Advanced Computing
bro...@umich.edu
(734)936-1985


On May 29, 2008, at 3:24 PM, Davi Vercillo C. Garcia wrote:

Hi,
I'm trying to run my program in my environment and some problems are
happening. My environment is based on PVFS2 over NFS (PVFS is mounted
over NFS partition), OpenMPI and Ubuntu. My program uses MPI-IO and
BZ2 development libraries. When I tried to run, a message appears:
File locking failed in ADIOI_Set_lock. If the file system is NFS, you
need to use NFS version 3, ensure that the lockd daemon is running on
all the machines, and mount the directory with the 'noac' option (no
attribute caching).
[campogrande05.dcc.ufrj.br:05005] MPI_ABORT invoked on rank 0 in
communicator MPI_COMM_WORLD with errorcode 1
mpiexec noticed that job rank 1 with PID 5008 on node campogrande04
exited on signal 15 (Terminated).
Why ?!
--
Davi Vercillo Carneiro Garcia
Universidade Federal do Rio de Janeiro
Departamento de Ciência da Computação
DCC-IM/UFRJ - http://www.dcc.ufrj.br
"Good things come to those who... wait." - Debian Project
"A computer is like air conditioning: it becomes useless when you  
open

windows." - Linus Torvalds
"Há duas coisas infinitas, o universo e a burrice humana. E eu estou
em dúvida quanto o primeiro." - Albert
Einstein___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users





--
Davi Vercillo Carneiro Garcia

Universidade Federal do Rio de Janeiro
Departamento de Ciência da Computação
DCC-IM/UFRJ - http://www.dcc.ufrj.br

"Good things come to those who... wait." - Debian Project

"A computer is like air conditioning: it becomes useless when you open
windows." - Linus Torvalds

"Há duas coisas infinitas, o universo e a burrice humana. E eu estou
em dúvida quanto o primeiro." - Albert Einstein

___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users







Re: [OMPI users] Problem with NFS + PVFS2 + OpenMPI

2008-05-29 Thread Davi Vercillo C. Garcia
HI,

> Oh, I see you want to use ordered i/o in your application.  PVFS
> doesn't support that mode.  However, since you know how much data each
> process wants to write, a combination of MPI_Scan (to compute each
> processes offset) and MPI_File_write_at_all (to carry out the
> collective i/o) will give you the same result with likely better
> performance (and has the nice side effect of working with pvfs).

I don't understand very well this... what do I need to change in my code ?

-- 
Davi Vercillo Carneiro Garcia

Universidade Federal do Rio de Janeiro
Departamento de Ciência da Computação
DCC-IM/UFRJ - http://www.dcc.ufrj.br

"Good things come to those who... wait." - Debian Project

"A computer is like air conditioning: it becomes useless when you open
windows." - Linus Torvalds

"Há duas coisas infinitas, o universo e a burrice humana. E eu estou
em dúvida quanto o primeiro." - Albert Einstein



Re: [OMPI users] Problem with NFS + PVFS2 + OpenMPI

2008-05-29 Thread Robert Latham
On Thu, May 29, 2008 at 04:24:18PM -0300, Davi Vercillo C. Garcia wrote:
> Hi,
> 
> I'm trying to run my program in my environment and some problems are
> happening. My environment is based on PVFS2 over NFS (PVFS is mounted
> over NFS partition), OpenMPI and Ubuntu. My program uses MPI-IO and
> BZ2 development libraries. When I tried to run, a message appears:
> 
> File locking failed in ADIOI_Set_lock. If the file system is NFS, you
> need to use NFS version 3, ensure that the lockd daemon is running on
> all the machines, and mount the directory with the 'noac' option (no
> attribute caching).
> [campogrande05.dcc.ufrj.br:05005] MPI_ABORT invoked on rank 0 in
> communicator MPI_COMM_WORLD with errorcode 1
> mpiexec noticed that job rank 1 with PID 5008 on node campogrande04
> exited on signal 15 (Terminated).

Hi.

NFS has some pretty sloppy consistency semantics.  If you want
parallel I/O to NFS you have to turn off some caches (the 'noac'
option in your error message) and work pretty hard to flush
client-side caches (which ROMIO does for you using fcntl locks).  If
you do this, note that your performance will be really bad, but you'll
get correct results.

Your nfs-exported PVFS volumes will give you pretty decent serial i/o
performance since NFS caching only helps in that case.

I'd suggest, though, that you try using straight PVFS for your MPI-IO
application, as long as the parallell clients have access to all of
the pvfs servers (if tools like pvfs2-ping and pvfs2-ls work, then you
do).  You'll get better performance for a variety of reasons and can
continue to keep your NFS-exported PVFS volumes up at the same time. 

Oh, I see you want to use ordered i/o in your application.  PVFS
doesn't support that mode.  However, since you know how much data each
process wants to write, a combination of MPI_Scan (to compute each
processes offset) and MPI_File_write_at_all (to carry out the
collective i/o) will give you the same result with likely better
performance (and has the nice side effect of working with pvfs).

==rob

-- 
Rob Latham
Mathematics and Computer Science DivisionA215 0178 EA2D B059 8CDF
Argonne National Lab, IL USA B29D F333 664A 4280 315B


[OMPI users] Problem with NFS + PVFS2 + OpenMPI

2008-05-29 Thread Davi Vercillo C. Garcia
Hi,

I'm trying to run my program in my environment and some problems are
happening. My environment is based on PVFS2 over NFS (PVFS is mounted
over NFS partition), OpenMPI and Ubuntu. My program uses MPI-IO and
BZ2 development libraries. When I tried to run, a message appears:

File locking failed in ADIOI_Set_lock. If the file system is NFS, you
need to use NFS version 3, ensure that the lockd daemon is running on
all the machines, and mount the directory with the 'noac' option (no
attribute caching).
[campogrande05.dcc.ufrj.br:05005] MPI_ABORT invoked on rank 0 in
communicator MPI_COMM_WORLD with errorcode 1
mpiexec noticed that job rank 1 with PID 5008 on node campogrande04
exited on signal 15 (Terminated).

Why ?!

-- 
Davi Vercillo Carneiro Garcia

Universidade Federal do Rio de Janeiro
Departamento de Ciência da Computação
DCC-IM/UFRJ - http://www.dcc.ufrj.br

"Good things come to those who... wait." - Debian Project

"A computer is like air conditioning: it becomes useless when you open
windows." - Linus Torvalds

"Há duas coisas infinitas, o universo e a burrice humana. E eu estou
em dúvida quanto o primeiro." - Albert Einstein
/**
* - Lembrar na hora de executar com o MPI que os usuarios PRECISAM ter o mesmo ID.
*
*
*/
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include "bzlib.h"
#include 
#include "mpi.h"

#define FILE_NAME_LEN 1034
#define BENCH   1

typedef unsigned char uchar;
typedef char Char;
typedef unsigned char Bool;
typedef unsigned char UChar;
typedef int Int32;
typedef unsigned int UInt32;
typedef short Int16;
typedef unsigned short UInt16;

#define True  ((Bool)1)
#define False ((Bool)0)

/**
 * Define o modo verboso
 */
int VERBOSE = 1;

/*--
 IntNative is your platform's `native' int size.
 Only here to avoid probs with 64-bit platforms.
 --*/
typedef int IntNative;

Int32 blockSize100k = 9;
Int32 verbosity = 0;
Int32 workFactor= 30;

/**
 * Define o tamanho Maximo da fila
 */
long TAM_FILA = 10;
/**
 * Tamanho do bloco lido por cada thread
 */
long M_BLOCK  = 900*1000;
#define M_BLOCK_OUT (M_BLOCK + M_BLOCK)


/**
 * MPI Variables
 */

int  nProcs= 0;
int  rank  = 0;
int  nfiles= 0;
int  nBlocosPorProc= 0;
int  nBlocosResto  = 0;
long nBlocos   = 0;
long long filesize = 0;
long long tamComprimidoPorProc = 0;

typedef struct SBloco{
   UChar* dado;
   long int id;
} Bloco;


typedef struct s_OutputBuffer{
   long size;
   uchar *zbuf;
} OutputBuffer;


/**
 * TODO Implementando
 */
static void comprime( MPI_File stream, MPI_File zStream )
{
   // 1 T Leitora, 1 T Escritora, o resto são compressoras
   // OBS: No minimo deve existir 3 T
   #define NUM_THREADS 4
   MPI_Status status;
   //MPI_Offset offset; [DAVI]

   uchar *zbuf;
   long r, count;
   unsigned int nZ;
   long nIdBlock;
   UChar *ibuf[TAM_FILA]; // buffer de entrada
   OutputBuffer **obuf; //buffer de saida
   Int32 nIbuf[TAM_FILA];
   Int32 block_in_use[TAM_FILA];

   long nLeituraAtual;
   long nProcAtual;
   long nGravacaoAtual;
   Int32 erro;
   Int32 endRead;
   long long nTamOBuf = ( filesize / M_BLOCK ) + 1;

   // incia buffer de saida
   obuf = (OutputBuffer**)malloc( sizeof(OutputBuffer*)*nTamOBuf );

   for( count = 0; count < nTamOBuf; count++ )
   {
   if( count < TAM_FILA )
   ibuf[count] = (UChar*)malloc( sizeof(UChar) * M_BLOCK );
   obuf[count] = (OutputBuffer*)malloc( sizeof(OutputBuffer) );
   obuf[count]->size = -1;
   obuf[count]->zbuf = NULL;
   }

   // Configura nro de threads
   omp_set_num_threads( NUM_THREADS );

   erro   = 0;
   nLeituraAtual  = 0;
   nProcAtual = 0;
   nGravacaoAtual = 0;
   endRead= 0;
   nIdBlock   = -1;
//  char str[10];
   //int nPrinted = 0;
   int tsleep   = 0;

   for (count = 0; count < TAM_FILA; ++count) {
   block_in_use[count] = 0;
   }

   MPI_File_set_view( stream,  0, MPI_BYTE, MPI_BYTE, "native", MPI_INFO_NULL );
   MPI_File_set_view( zStream, 0, MPI_BYTE, MPI_BYTE, "native", MPI_INFO_NULL );

// Inicio Regiao Paralela
#pragma omp parallel default(shared) private(zbuf, nZ, r, nIdBlock )
{
   zbuf = (uchar*)malloc( (M_BLOCK + 600 + (M_BLOCK / 100)) * sizeof(uchar) );

   while ( !erro && omp_get_thread_num() != 1 )
   {
   //printf( "PROCESSO %d\n", rank );
   if( omp_get_thread_num() == 0 ) //Thread Leitora
   {
   if( VERBOSE )printf( "Processo %d Thread Leitora\n", rank );

   if ( (rank + nLeituraAtual*nProcs) >= nBlocos &&
nLeituraAtual > 0