I have developed a methodology to desort based on the script that I previously mentioned. The speed of desorting has been improved. For a system of 100K atoms the setup and processing for mdrun_mpi (grompp's trjconv's g_desort's etc.) took me 30sec. For a system of 750K atoms the time was 90sec.

Benifits:
When running in 200ps segments on NP=4 in single precision I get a speedup of about 1.4 times vs without -shuffle and speedup of 1.15 times vs. with shuffle but without sort. It is only with sort that my scaling to NP=4 or NP=16 matches the gromacs benchmarks.

Testing:
1. I have run 20 x 20ps mdrun_mpi iteratively using -shuffle -sort and this g_desort program and the system "looks normal".. no quantitative test though. 2. I have compared the -deshuf.ndx file produced by grompp to a g_desort output .ndx file (both produced using -shuffle but not -sort) and they match exactly.

I have not posted this tool to the uploads section since I don't want anybody to run into unexpected problems in case it is incorrect since it affects the mdrun directly.

I welcome any suggestions for ways to test if this is correct. Also, if anybody has some long-time sorted trajectories out there then they could use this to recover individual positions and "follow their favourite water molecule". If anybody does do this, please post something about your success.

This was written from template.c and compilation of this program is done similarly.

Before using this, please read the "static char *desc[] = {" completely. Especially for the warnings that discuss what number your index files begin with and the exact implementation that is a bit of a hassle based on the order that grompp does things but it can all be scripted.

Chris Neale.

/*
* $Id: template.c,v 1.4 2001/07/23 15:28:29 lindahl Exp $
*
*                This source code is part of
*
*                 G   R   O   M   A   C   S
*
*          GROningen MAchine for Chemical Simulations
*
*                        VERSION 3.0
*
* Copyright (c) 1991-2001
* BIOSON Research Institute, Dept. of Biophysical Chemistry
* University of Groningen, The Netherlands
*
* This program is free software; you can redistribute it and/or
* modify it under the terms of the GNU General Public License
* as published by the Free Software Foundation; either version 2
* of the License, or (at your option) any later version.
*
* If you want to redistribute modifications, please consider that
* scientific software is very special. Version control is crucial -
* bugs must be traceable. We will be happy to consider code for
* inclusion in the official distribution, but derived work must not
* be called official GROMACS. Details are found in the README & COPYING
* files - if they are missing, get the official version at www.gromacs.org.
*
* To help us fund GROMACS development, we humbly ask that you cite
* the papers on the package - you can find them in the top README file.
*
* Do check out http://www.gromacs.org , or mail us at [EMAIL PROTECTED] .
*
* And Hey:
* Gyas ROwers Mature At Cryogenic Speed
*/

/* This line is only for CVS version info */
static char *SRCID_template_c = "$Id: template.c,v 1.4 2001/07/23 15:28:29 lindahl Exp $";

#include "statutil.h"
#include "typedefs.h"
#include "smalloc.h"
#include "vec.h"
#include "copyrite.h"
#include "statutil.h"
#include "tpxio.h"
#include "math.h"

int main(int argc,char *argv[])
{
 static char *desc[] = {
"g_desort takes two coordinate files as input and outputs a .ndx file that ",
   "can be used to properly sort/shuffle or desort/deshuffle. ",
"The -f option takes two coordinate files as input and the order is important: ", "the first file is the desired order and the second file is the current order. ", "To desort coordinates g_desort -f unsorted.gro sorted.gro and to resort ", "coordinates g_desort -f sorted.gro unsorted.gro. In order to use the -sort option ", "of grompp you will only need to desort your trajectory and g_desort makes this trivial. ",
   "However, the benifit of sorting ",
"decreases over time as the particles move away from their sorted positions. Therefore ", "regular re-sorting is desirable. The usage section below describes how to do this. ", "The basic idea is to pre-sort the .trr file for loading into grompp based on a resort ", ".ndx file generated by g_desort after a preliminary grompp without the .trr. An additional ", "sorting step is added to ensure that the grompp sorting did not change based on the loaded i", ".trr file. The mdrun in then completed and desorted prior to starting again. \n",
   "USAGE: \n",
   "1.   grompp -shuffle -sort -c original.gro -o sorted_temp.tpr \n",
   "2.   editconf -f sorted_temp.tpr -o sorted_temp.gro \n",
   "3.   g_desort -f sorted_temp.gro original.gro -o resort_temp.ndx \n",
   "4.   trjconv -f original.trr -n resort_temp.ndx -o sorted.trr \n",
"5. grompp -shuffle -sort -c original.gro -t sorted.trr -o sorted.tpr \n",
   "6.   editconf -f sorted.tpr -o sorted.gro \n",
   "7.   g_desort -f sorted.gro original.gro -o resort.ndx \n",
"8. look=`diff -q resort_temp.ndx resort.ndx`; if [ -n ${look} ]; then exit; fi \n",
   "9.   g_desort -f original.gro sorted.gro -o desort.ndx \n",
   "10.  mdrun_mpi -s sorted.tpr \n",
   "11a. trjconv -f output.xtc -n desort.ndx -o output_desorted.xtc \n",
   "11b. trjconv -f output.trr -n desort.ndx -o output_desorted.trr \n",
   "11c. trjconv -f output.gro -n desort.ndx -o output_desorted.gro \n",
"12. Use these output_desorted files as input to step 1 as the original.xtc/trr/gro files \n",
   "       - the .edr file does not need to be desorted \n",
" - g_desort takes care of sorting and shuffling at the same time \n",
   "           - Do not use the -deshuf deshuf.ndx file from grompp \n",
   "WARNINGS:\n",
" - This is a beta version. It might not work properly. Don't trust it \n", " - Expecting your index files to start from the number 1. If they start from the number 0 then ", "you must modify the fprintf output lines to remove the +1 from the integer that is output \n",
   "BUGS: \n",
"The determination of the expected number of particles per grid is sub-optimal ", "and if the ratio of system volume over grid points is too small then the program will not allocate ", "enough memory. This is detected, but it should be fixed (perhaps by actually counting before allocating\n",
 };

 static char *outputfnam;
 static real rDISTCUT=0.005;
 static int iNUMBINS=10;

 /* Extra arguments - but note how you always get the begin/end
  * options when running the program, without mentioning them here!
  */

 //
 t_pargs pa[] = {
        { "-c", FALSE, etREAL, {&rDISTCUT},
               "Cut-off distance for atom identification."
        },
        { "-n", FALSE, etINT, {&iNUMBINS},
               "Number of grids in each dim (>=1)."
        }

 };

 t_topology top;
 char       title[STRLEN];
 t_trxframe fra;
 t_trxframe frb;
 rvec       *xtop;
 matrix     box;
 int        status;
 int        flags = TRX_READ_X;
 char        **fnms;
 int        nfile_in,ata,atb,found_flag;
 real    dab;
real max[3]={0.0,0.0,0.0},min[3]={0.0,0.0,0.0},maxminusmin[3]={0.0,0.0,0.0},minovermaxminusmin[3]={0.0,0.0,0.0};
 int    **bxyz; //[natoms][3]
 int    twobxyz[3]={0.0,0.0,0.0};
int ****twobinlist; //[xdim][ydim][zdim][list] *dim is the cells [list=0] is the number in that list
 int maxcellnum;
 int i,j,k;
 FILE *fout;


 t_filenm fnm[] = {
   { efTRX, "-f", NULL, ffRDMULT },      /* and this for the trajectory */
   { efNDX, "-o", "deshuffledesort", ffWRITE }    //output
 };

#define NFILE asize(fnm)

 CopyRight(stderr,argv[0]);

 /* This is the routine responsible for adding default options,
  * calling the X/motif interface, etc. */
 parse_common_args(&argc,argv,PCA_CAN_TIME | PCA_CAN_VIEW,
           NFILE,fnm,asize(pa),pa,asize(desc),desc,0,NULL);

 nfile_in = opt2fns(&fnms,"-f",NFILE,fnm);
 if (nfile_in==0)
   gmx_fatal(FARGS,"");
 if (nfile_in!=2)
   gmx_fatal(FARGS,"There must be exactly 2 input files!");
 outputfnam=opt2fn("-o",NFILE,fnm);
 if(iNUMBINS<=0)
gmx_fatal(FARGS,"The number of bins must be greater than or equal to 1\n");
 /* The first time we read data is a little special */
 read_first_frame(&status,fnms[0],&fra,flags);
 read_first_frame(&status,fnms[1],&frb,flags);

 if(fra.natoms!=frb.natoms){
printf("Error: unequal number of atoms (%d and %d)\n",fra.natoms,frb.natoms);
   exit(1);
 }

printf("WARNING: This is a beta version. It might not work properly. Don't trust it\n");
 fout=fopen(outputfnam,"w");
 fprintf(fout,"[ DeShuffle ]");

//Could use a smarter determination of maxcellnum based on loaded dimensions and on knowing how many grids unfilled
 if(iNUMBINS<=3)
   maxcellnum=fra.natoms+1;
 else
   maxcellnum=2*fra.natoms/((iNUMBINS-2)*(iNUMBINS-2)*(iNUMBINS-2)) + 1;
 bxyz=(int **)malloc((fra.natoms+1)*sizeof(int *));
 if(bxyz==NULL)exit(1);
 for(i=0; i<=fra.natoms; ++i){
   bxyz[i]=(int *)malloc(sizeof(int[3]));
   if(bxyz[i]==NULL)exit(1);
 }
 twobinlist=(int ****)malloc(iNUMBINS*sizeof(int ***));
 if(twobinlist==NULL)exit(1);
 for(i=0; i<iNUMBINS; ++i){
   twobinlist[i]=(int ***)malloc(iNUMBINS*sizeof(int **));
   if(twobinlist[i]==NULL)exit(1);
   for(j=0; j<iNUMBINS; ++j){
     twobinlist[i][j]=(int **)malloc(iNUMBINS*sizeof(int *));
     if(twobinlist[i][j]==NULL)exit(1);
     for(k=0; k<iNUMBINS; ++k){
       twobinlist[i][j][k]=(int *)malloc(maxcellnum*sizeof(int));
       if(twobinlist[i][j][k]==NULL)exit(1);
       twobinlist[i][j][k][0]=0;
     }
   }
 }
//printf("About to divide data\n");fflush(stdout);

 //Divide each dimension into the boxes
 for(ata=0; ata<fra.natoms; ++ata){
   for(i=0; i<=2; ++i){
     if(fra.x[ata][i]>max[i])max[i]=fra.x[ata][i];
     if(fra.x[ata][i]<min[i])min[i]=fra.x[ata][i];
   }
 }
 for(i=0; i<=2; ++i){
maxminusmin[i]=(max[i]-min[i])/(real)iNUMBINS; minovermaxminusmin[i]=min[i]/maxminusmin[i];
 }

 //We only need to know which bin the first one is in
 for(ata=0; ata<fra.natoms; ++ata){
   for(i=0; i<=2; ++i){
bxyz[ata][i]=(int)floor(fra.x[ata][i]/maxminusmin[i] - minovermaxminusmin[i]); bxyz[ata][i]=(bxyz[ata][i]<0)?0:(bxyz[ata][i]>(iNUMBINS-1))?(iNUMBINS-1):bxyz[ata][i];
   }
 }

 //For the second list, we need to group them by which cell
 for(atb=0; atb<frb.natoms; ++atb){
   for(i=0; i<=2; ++i){
twobxyz[i]=(int)floor(frb.x[atb][i]/maxminusmin[i] - minovermaxminusmin[i]); twobxyz[i]=(twobxyz[i]<0)?0:(twobxyz[i]>(iNUMBINS-1))?(iNUMBINS-1):twobxyz[i];
   }
//printf("%d in %d,%d,%d as %d entry\n",atb,twobxyz[XX],twobxyz[YY],twobxyz[ZZ],twobinlist[twobxyz[XX]][twobxyz[YY]][twobxyz[ZZ]][0]);fflush(stdout);

   if(++twobinlist[twobxyz[XX]][twobxyz[YY]][twobxyz[ZZ]][0]>maxcellnum){
printf("Error: too many atoms in one of the cells (exceeded max of %d). Try reducing the number of grids that you are using\n",maxcellnum);
     exit(1);
   }
twobinlist[twobxyz[XX]][twobxyz[YY]][twobxyz[ZZ]][twobinlist[twobxyz[XX]][twobxyz[YY]][twobxyz[ZZ]][0]]=atb;
 }

 //printf("About to find matches\n");fflush(stdout);
 /* IMPORTANT NOTE:
  * Printed values must be +1 to value
  * Also, rDISTCUT is squared here to avoid sqrt of dist
  */
 rDISTCUT*=rDISTCUT;
 for(ata=0; ata<fra.natoms; ++ata){
   if(div(ata,10).rem==0)fprintf(fout,"\n");
   found_flag=0;
   //try first to see if it is in the same box
for(atb=1; atb<=twobinlist[bxyz[ata][XX]][bxyz[ata][YY]][bxyz[ata][ZZ]][0]; ++atb){ dab=distance2(fra.x[ata],frb.x[twobinlist[bxyz[ata][XX]][bxyz[ata][YY]][bxyz[ata][ZZ]][atb]]);
     if(dab<rDISTCUT){
//printf("%d\t-ON GRID-\t%d %d %d\n",twobinlist[bxyz[ata][XX]][bxyz[ata][YY]][bxyz[ata][ZZ]][atb]+1,bxyz[ata][XX],bxyz[ata][YY],bxyz[ata][ZZ]); fprintf(fout," %d",twobinlist[bxyz[ata][XX]][bxyz[ata][YY]][bxyz[ata][ZZ]][atb]+1);
       found_flag=1;
       break;
     }
   }
   if(found_flag)continue;
   //Look through the entire list
   for(atb=0; atb<frb.natoms; ++atb){
     dab=distance2(fra.x[ata],frb.x[atb]);
     if(dab<rDISTCUT){
       fprintf(fout,"  %d",atb+1);
       found_flag=1;
break; }
   }
   if(!found_flag){
     printf("Error: could not find pair for %d\n",ata);
     exit(1);
   }
 }
 fprintf(fout,"\n");
 fclose(fout);
 thanx(stderr);

 return 0;
}

_______________________________________________
gmx-users mailing list    gmx-users@gromacs.org
http://www.gromacs.org/mailman/listinfo/gmx-users
Please don't post (un)subscribe requests to the list. Use the www interface or send it to [EMAIL PROTECTED]
Can't post? Read http://www.gromacs.org/mailing_lists/users.php

Reply via email to