Re: [OMPI users] Problem with NFS + PVFS2 + OpenMPI
Hi, Sorry but I made a mistake... I'm not trying to use PVFS over NFS but PVFS over EXT3. I still don't know this error message... On Thu, May 29, 2008 at 5:33 PM, Robert Lathamwrote: > On Thu, May 29, 2008 at 04:48:49PM -0300, Davi Vercillo C. Garcia wrote: >> > Oh, I see you want to use ordered i/o in your application. PVFS >> > doesn't support that mode. However, since you know how much data each >> > process wants to write, a combination of MPI_Scan (to compute each >> > processes offset) and MPI_File_write_at_all (to carry out the >> > collective i/o) will give you the same result with likely better >> > performance (and has the nice side effect of working with pvfs). >> >> I don't understand very well this... what do I need to change in my code ? > > MPI_File_write_ordered has an interesting property (which you probably > know since you use it, but i'll spell it out anyway): writes end up > in the file in rank-order, but are not necessarily carried out in > rank-order. > > Once each process knows the offsets and lengths of the writes the > other process will do, that process can writes its data. Observe that > rank 0 can write immediately. Rank 1 only needs to know how much data > rank 0 will write. and so on. > > Rank N can compute its offset by knowing how much data the proceeding > N-1 processes want to write. The most efficent way to collect this is > to use MPI_Scan and collect a sum of data: > > http://www.mpi-forum.org/docs/mpi-11-html/node84.html#Node84 > > Once you've computed these offsets, MPI_File_write_at_all has enough > information to cary out a collective write of the data. > > ==rob > > -- > Rob Latham > Mathematics and Computer Science DivisionA215 0178 EA2D B059 8CDF > Argonne National Lab, IL USA B29D F333 664A 4280 315B > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users > -- Davi Vercillo Carneiro Garcia Universidade Federal do Rio de Janeiro Departamento de Ciência da Computação DCC-IM/UFRJ - http://www.dcc.ufrj.br "Good things come to those who... wait." - Debian Project "A computer is like air conditioning: it becomes useless when you open windows." - Linus Torvalds "Há duas coisas infinitas, o universo e a burrice humana. E eu estou em dúvida quanto o primeiro." - Albert Einstein
Re: [OMPI users] Problem with NFS + PVFS2 + OpenMPI
On Thu, May 29, 2008 at 04:48:49PM -0300, Davi Vercillo C. Garcia wrote: > > Oh, I see you want to use ordered i/o in your application. PVFS > > doesn't support that mode. However, since you know how much data each > > process wants to write, a combination of MPI_Scan (to compute each > > processes offset) and MPI_File_write_at_all (to carry out the > > collective i/o) will give you the same result with likely better > > performance (and has the nice side effect of working with pvfs). > > I don't understand very well this... what do I need to change in my code ? MPI_File_write_ordered has an interesting property (which you probably know since you use it, but i'll spell it out anyway): writes end up in the file in rank-order, but are not necessarily carried out in rank-order. Once each process knows the offsets and lengths of the writes the other process will do, that process can writes its data. Observe that rank 0 can write immediately. Rank 1 only needs to know how much data rank 0 will write. and so on. Rank N can compute its offset by knowing how much data the proceeding N-1 processes want to write. The most efficent way to collect this is to use MPI_Scan and collect a sum of data: http://www.mpi-forum.org/docs/mpi-11-html/node84.html#Node84 Once you've computed these offsets, MPI_File_write_at_all has enough information to cary out a collective write of the data. ==rob -- Rob Latham Mathematics and Computer Science DivisionA215 0178 EA2D B059 8CDF Argonne National Lab, IL USA B29D F333 664A 4280 315B
Re: [OMPI users] Problem with NFS + PVFS2 + OpenMPI
That I don't know, we use lustre for this stuff now. And our users don't use parallel IO, (though I hope to change that). Sorry can't help more. I would really use 'just' pvfs2 for your IO. The other reply pointed out you can have both and not use NFS at all for your IO but leave it mounted is thats what users are expecting. Brock Palen www.umich.edu/~brockp Center for Advanced Computing bro...@umich.edu (734)936-1985 On May 29, 2008, at 3:46 PM, Davi Vercillo C. Garcia wrote: Hi, I'm already using "noac" option on my /etc/fstab but this error is still happening. I need to put this in another file ? On Thu, May 29, 2008 at 4:33 PM, Brock Palenwrote: Well don't run like this. Have PVFS have NFS, don't mix them like that your asking for pain, my 2 cents. I get this error all the time also, you have to disable a large portion of the caching that NFS does to make sure that all MPI-IO clients get true data on the file they are all trying to access. To make this work check in your /etc/fstab and see if you have the 'noac' option. This is attribute caching, this must be disabled. On that note PVFS2 is made for doing MPI-IO to multiple hosts (thus no need for NFS) because it was made with MPI-IO in mind it should work out the box. Brock Palen www.umich.edu/~brockp Center for Advanced Computing bro...@umich.edu (734)936-1985 On May 29, 2008, at 3:24 PM, Davi Vercillo C. Garcia wrote: Hi, I'm trying to run my program in my environment and some problems are happening. My environment is based on PVFS2 over NFS (PVFS is mounted over NFS partition), OpenMPI and Ubuntu. My program uses MPI-IO and BZ2 development libraries. When I tried to run, a message appears: File locking failed in ADIOI_Set_lock. If the file system is NFS, you need to use NFS version 3, ensure that the lockd daemon is running on all the machines, and mount the directory with the 'noac' option (no attribute caching). [campogrande05.dcc.ufrj.br:05005] MPI_ABORT invoked on rank 0 in communicator MPI_COMM_WORLD with errorcode 1 mpiexec noticed that job rank 1 with PID 5008 on node campogrande04 exited on signal 15 (Terminated). Why ?! -- Davi Vercillo Carneiro Garcia Universidade Federal do Rio de Janeiro Departamento de Ciência da Computação DCC-IM/UFRJ - http://www.dcc.ufrj.br "Good things come to those who... wait." - Debian Project "A computer is like air conditioning: it becomes useless when you open windows." - Linus Torvalds "Há duas coisas infinitas, o universo e a burrice humana. E eu estou em dúvida quanto o primeiro." - Albert Einstein___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users -- Davi Vercillo Carneiro Garcia Universidade Federal do Rio de Janeiro Departamento de Ciência da Computação DCC-IM/UFRJ - http://www.dcc.ufrj.br "Good things come to those who... wait." - Debian Project "A computer is like air conditioning: it becomes useless when you open windows." - Linus Torvalds "Há duas coisas infinitas, o universo e a burrice humana. E eu estou em dúvida quanto o primeiro." - Albert Einstein ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users
Re: [OMPI users] Problem with NFS + PVFS2 + OpenMPI
HI, > Oh, I see you want to use ordered i/o in your application. PVFS > doesn't support that mode. However, since you know how much data each > process wants to write, a combination of MPI_Scan (to compute each > processes offset) and MPI_File_write_at_all (to carry out the > collective i/o) will give you the same result with likely better > performance (and has the nice side effect of working with pvfs). I don't understand very well this... what do I need to change in my code ? -- Davi Vercillo Carneiro Garcia Universidade Federal do Rio de Janeiro Departamento de Ciência da Computação DCC-IM/UFRJ - http://www.dcc.ufrj.br "Good things come to those who... wait." - Debian Project "A computer is like air conditioning: it becomes useless when you open windows." - Linus Torvalds "Há duas coisas infinitas, o universo e a burrice humana. E eu estou em dúvida quanto o primeiro." - Albert Einstein
Re: [OMPI users] Problem with NFS + PVFS2 + OpenMPI
On Thu, May 29, 2008 at 04:24:18PM -0300, Davi Vercillo C. Garcia wrote: > Hi, > > I'm trying to run my program in my environment and some problems are > happening. My environment is based on PVFS2 over NFS (PVFS is mounted > over NFS partition), OpenMPI and Ubuntu. My program uses MPI-IO and > BZ2 development libraries. When I tried to run, a message appears: > > File locking failed in ADIOI_Set_lock. If the file system is NFS, you > need to use NFS version 3, ensure that the lockd daemon is running on > all the machines, and mount the directory with the 'noac' option (no > attribute caching). > [campogrande05.dcc.ufrj.br:05005] MPI_ABORT invoked on rank 0 in > communicator MPI_COMM_WORLD with errorcode 1 > mpiexec noticed that job rank 1 with PID 5008 on node campogrande04 > exited on signal 15 (Terminated). Hi. NFS has some pretty sloppy consistency semantics. If you want parallel I/O to NFS you have to turn off some caches (the 'noac' option in your error message) and work pretty hard to flush client-side caches (which ROMIO does for you using fcntl locks). If you do this, note that your performance will be really bad, but you'll get correct results. Your nfs-exported PVFS volumes will give you pretty decent serial i/o performance since NFS caching only helps in that case. I'd suggest, though, that you try using straight PVFS for your MPI-IO application, as long as the parallell clients have access to all of the pvfs servers (if tools like pvfs2-ping and pvfs2-ls work, then you do). You'll get better performance for a variety of reasons and can continue to keep your NFS-exported PVFS volumes up at the same time. Oh, I see you want to use ordered i/o in your application. PVFS doesn't support that mode. However, since you know how much data each process wants to write, a combination of MPI_Scan (to compute each processes offset) and MPI_File_write_at_all (to carry out the collective i/o) will give you the same result with likely better performance (and has the nice side effect of working with pvfs). ==rob -- Rob Latham Mathematics and Computer Science DivisionA215 0178 EA2D B059 8CDF Argonne National Lab, IL USA B29D F333 664A 4280 315B
[OMPI users] Problem with NFS + PVFS2 + OpenMPI
Hi, I'm trying to run my program in my environment and some problems are happening. My environment is based on PVFS2 over NFS (PVFS is mounted over NFS partition), OpenMPI and Ubuntu. My program uses MPI-IO and BZ2 development libraries. When I tried to run, a message appears: File locking failed in ADIOI_Set_lock. If the file system is NFS, you need to use NFS version 3, ensure that the lockd daemon is running on all the machines, and mount the directory with the 'noac' option (no attribute caching). [campogrande05.dcc.ufrj.br:05005] MPI_ABORT invoked on rank 0 in communicator MPI_COMM_WORLD with errorcode 1 mpiexec noticed that job rank 1 with PID 5008 on node campogrande04 exited on signal 15 (Terminated). Why ?! -- Davi Vercillo Carneiro Garcia Universidade Federal do Rio de Janeiro Departamento de Ciência da Computação DCC-IM/UFRJ - http://www.dcc.ufrj.br "Good things come to those who... wait." - Debian Project "A computer is like air conditioning: it becomes useless when you open windows." - Linus Torvalds "Há duas coisas infinitas, o universo e a burrice humana. E eu estou em dúvida quanto o primeiro." - Albert Einstein /** * - Lembrar na hora de executar com o MPI que os usuarios PRECISAM ter o mesmo ID. * * */ #include #include #include #include #include #include #include #include #include #include #include #include "bzlib.h" #include #include "mpi.h" #define FILE_NAME_LEN 1034 #define BENCH 1 typedef unsigned char uchar; typedef char Char; typedef unsigned char Bool; typedef unsigned char UChar; typedef int Int32; typedef unsigned int UInt32; typedef short Int16; typedef unsigned short UInt16; #define True ((Bool)1) #define False ((Bool)0) /** * Define o modo verboso */ int VERBOSE = 1; /*-- IntNative is your platform's `native' int size. Only here to avoid probs with 64-bit platforms. --*/ typedef int IntNative; Int32 blockSize100k = 9; Int32 verbosity = 0; Int32 workFactor= 30; /** * Define o tamanho Maximo da fila */ long TAM_FILA = 10; /** * Tamanho do bloco lido por cada thread */ long M_BLOCK = 900*1000; #define M_BLOCK_OUT (M_BLOCK + M_BLOCK) /** * MPI Variables */ int nProcs= 0; int rank = 0; int nfiles= 0; int nBlocosPorProc= 0; int nBlocosResto = 0; long nBlocos = 0; long long filesize = 0; long long tamComprimidoPorProc = 0; typedef struct SBloco{ UChar* dado; long int id; } Bloco; typedef struct s_OutputBuffer{ long size; uchar *zbuf; } OutputBuffer; /** * TODO Implementando */ static void comprime( MPI_File stream, MPI_File zStream ) { // 1 T Leitora, 1 T Escritora, o resto são compressoras // OBS: No minimo deve existir 3 T #define NUM_THREADS 4 MPI_Status status; //MPI_Offset offset; [DAVI] uchar *zbuf; long r, count; unsigned int nZ; long nIdBlock; UChar *ibuf[TAM_FILA]; // buffer de entrada OutputBuffer **obuf; //buffer de saida Int32 nIbuf[TAM_FILA]; Int32 block_in_use[TAM_FILA]; long nLeituraAtual; long nProcAtual; long nGravacaoAtual; Int32 erro; Int32 endRead; long long nTamOBuf = ( filesize / M_BLOCK ) + 1; // incia buffer de saida obuf = (OutputBuffer**)malloc( sizeof(OutputBuffer*)*nTamOBuf ); for( count = 0; count < nTamOBuf; count++ ) { if( count < TAM_FILA ) ibuf[count] = (UChar*)malloc( sizeof(UChar) * M_BLOCK ); obuf[count] = (OutputBuffer*)malloc( sizeof(OutputBuffer) ); obuf[count]->size = -1; obuf[count]->zbuf = NULL; } // Configura nro de threads omp_set_num_threads( NUM_THREADS ); erro = 0; nLeituraAtual = 0; nProcAtual = 0; nGravacaoAtual = 0; endRead= 0; nIdBlock = -1; // char str[10]; //int nPrinted = 0; int tsleep = 0; for (count = 0; count < TAM_FILA; ++count) { block_in_use[count] = 0; } MPI_File_set_view( stream, 0, MPI_BYTE, MPI_BYTE, "native", MPI_INFO_NULL ); MPI_File_set_view( zStream, 0, MPI_BYTE, MPI_BYTE, "native", MPI_INFO_NULL ); // Inicio Regiao Paralela #pragma omp parallel default(shared) private(zbuf, nZ, r, nIdBlock ) { zbuf = (uchar*)malloc( (M_BLOCK + 600 + (M_BLOCK / 100)) * sizeof(uchar) ); while ( !erro && omp_get_thread_num() != 1 ) { //printf( "PROCESSO %d\n", rank ); if( omp_get_thread_num() == 0 ) //Thread Leitora { if( VERBOSE )printf( "Processo %d Thread Leitora\n", rank ); if ( (rank + nLeituraAtual*nProcs) >= nBlocos && nLeituraAtual > 0