On Wed, Mar 28, 2012 at 4:12 PM, David Fuentes <fuentesdt at gmail.com> wrote:
> works! > Excellent. Now, my thinking was that GPUs are most useful doing single work, but I can see the utility of double accuracy for a residual. My inclination is to define another type, say GPUReal, and use it for all kernels. Would that do what you want? Matt > SCRGP2$ make ex52 > /usr/bin/mpicxx -o ex52.o -c -O0 -g -fPIC > -I/opt/apps/PETSC/petsc-dev/include > -I/opt/apps/PETSC/petsc-dev/gcc-4.4.3-mpich2-1.2-epd-sm_20-single-dbg/include > -I/opt/apps/cuda/4.0/cuda/include -I/usr/include -I/usr/include/mpich2 > -D__INSDIR__=src/snes/examples/tutorials/ ex52.c > nvcc -G -O0 -g -arch=sm_10 -c --compiler-options="-O0 -g -fPIC > -I/opt/apps/PETSC/petsc-dev/include > -I/opt/apps/PETSC/petsc-dev/gcc-4.4.3-mpich2-1.2-epd-sm_20-single-dbg/include > -I/opt/apps/cuda/4.0/cuda/include -I/usr/include -I/usr/include/mpich2 > -D__INSDIR__=src/snes/examples/tutorials/" ex52_integrateElement.cu > ex52_gpu_inline.h(7): warning: variable "points_0" was declared but never > referenced > > ex52_gpu_inline.h(21): warning: variable "Basis_0" was declared but never > referenced > > ex52_gpu_inline.h(7): warning: variable "points_0" was declared but never > referenced > > ex52_gpu_inline.h(13): warning: variable "weights_0" was declared but > never referenced > > ex52_gpu_inline.h(21): warning: variable "Basis_0" was declared but never > referenced > > ex52_gpu_inline.h(28): warning: variable "BasisDerivatives_0" was declared > but never referenced > > ex52_gpu_inline.h(7): warning: variable "points_0" was declared but never > referenced > > ex52_gpu_inline.h(21): warning: variable "Basis_0" was declared but never > referenced > > ex52_gpu_inline.h(7): warning: variable "points_0" was declared but never > referenced > > ex52_gpu_inline.h(13): warning: variable "weights_0" was declared but > never referenced > > ex52_gpu_inline.h(21): warning: variable "Basis_0" was declared but never > referenced > > ex52_gpu_inline.h(28): warning: variable "BasisDerivatives_0" was declared > but never referenced > > /usr/bin/mpicxx -O0 -g -o ex52 ex52.o ex52_integrateElement.o > > -Wl,-rpath,/opt/apps/PETSC/petsc-dev/gcc-4.4.3-mpich2-1.2-epd-sm_20-single-dbg/lib > -L/opt/apps/PETSC/petsc-dev/gcc-4.4.3-mpich2-1.2-epd-sm_20-single-dbg/lib > -lpetsc > -Wl,-rpath,/opt/apps/PETSC/petsc-dev/gcc-4.4.3-mpich2-1.2-epd-sm_20-single-dbg/lib > -ltriangle -lX11 -lpthread -lmetis -Wl,-rpath,/opt/apps/cuda/4.0/cuda/lib64 > -L/opt/apps/cuda/4.0/cuda/lib64 -lcufft -lcublas -lcudart -lcusparse > -Wl,-rpath,/opt/epd-7.1-2-rh5-x86_64/lib -L/opt/epd-7.1-2-rh5-x86_64/lib > -lmkl_rt -lmkl_intel_thread -lmkl_core -liomp5 > -Wl,-rpath,/usr/lib/gcc/x86_64-linux-gnu/4.4.3 > -L/usr/lib/gcc/x86_64-linux-gnu/4.4.3 -ldl -lmpich -lopa -lpthread -lrt > -lgcc_s -lmpichf90 -lgfortran -lm -lm -lmpichcxx -lstdc++ -lmpichcxx > -lstdc++ -ldl -lmpich -lopa -lpthread -lrt -lgcc_s -ldl > > > SCRGP2$ ./ex52 -dim 2 -compute_function -show_residual -batch -gpu > GPU layout grid(1,2,1) block(3,1,1) with 1 batches > N_t: 3, N_cb: 1 > Residual: > Vector Object: 1 MPI processes > type: seq > -0.25 > -0.5 > 0.25 > -0.5 > -1 > 0.5 > 0.25 > 0.5 > 0.75 > SCRGP2$ ./ex52 -dim 2 -compute_function -show_residual -batch > Residual: > Vector Object: 1 MPI processes > type: seq > -0.25 > -0.5 > 0.25 > -0.5 > -1 > 0.5 > 0.25 > 0.5 > 0.75 > > > > > > On Wed, Mar 28, 2012 at 1:37 PM, David Fuentes <fuentesdt at gmail.com>wrote: > >> sure. will do. >> >> >> On Wed, Mar 28, 2012 at 1:23 PM, Matthew Knepley <knepley at gmail.com>wrote: >> >>> On Wed, Mar 28, 2012 at 1:14 PM, David Fuentes <fuentesdt at >>> gmail.com>wrote: >>> >>>> thanks! its running, but I seem to be getting different answer for >>>> cpu/gpu ? >>>> i had some floating point problems on this Tesla M2070 gpu before, but >>>> adding the '-arch=sm_20' option seemed to fix it last time. >>>> >>>> >>>> is the assembly in single precision ? my 'const PetscReal >>>> jacobianInverse' being passed in are doubles >>>> >>> >>> Yep, that is the problem. I have not tested anything in double. I have >>> not decided exactly how to handle it. Can you >>> make another ARCH --with-precision=single and make sure it works, and >>> then we can fix the double issue? >>> >>> Thanks, >>> >>> Matt >>> >>> >>>> SCRGP2$ ./ex52 -dim 2 -compute_function -show_residual -batch -gpu >>>> GPU layout grid(1,2,1) block(3,1,1) with 1 batches >>>> N_t: 3, N_cb: 1 >>>> Residual: >>>> Vector Object: 1 MPI processes >>>> type: seq >>>> 0 >>>> 755712 >>>> 0 >>>> -58720 >>>> -2953.13 >>>> 0.375 >>>> 1.50323e+07 >>>> 0.875 >>>> 0 >>>> SCRGP2$ >>>> SCRGP2$ ./ex52 -dim 2 -compute_function -show_residual -batch >>>> Residual: >>>> Vector Object: 1 MPI processes >>>> type: seq >>>> -0.25 >>>> -0.5 >>>> 0.25 >>>> -0.5 >>>> -1 >>>> 0.5 >>>> 0.25 >>>> 0.5 >>>> 0.75 >>>> >>>> >>>> >>>> >>>> >>>> On Wed, Mar 28, 2012 at 11:55 AM, Matthew Knepley <knepley at >>>> gmail.com>wrote: >>>> >>>>> On Wed, Mar 28, 2012 at 11:45 AM, David Fuentes <fuentesdt at >>>>> gmail.com>wrote: >>>>> >>>>>> The example seems to be running on cpu with '-batch' but i'm getting >>>>>> errors in line 323 with the '-gpu' option >>>>>> >>>>>> [0]PETSC ERROR: IntegrateElementBatchGPU() line 323 in >>>>>> src/snes/examples/tutorials/ex52_integrateElement.cu >>>>>> >>>>>> should this possibly be PetscScalar ? >>>>>> >>>>> >>>>> No. >>>>> >>>>> >>>>>> - ierr = cudaMalloc((void**) &d_coefficients, Ne*N_bt * >>>>>> sizeof(float));CHKERRQ(ierr); >>>>>> + ierr = cudaMalloc((void**) &d_coefficients, Ne*N_bt * >>>>>> sizeof(PetscScalar));CHKERRQ(ierr); >>>>>> >>>>>> >>>>>> SCRGP2$ python >>>>>> $PETSC_DIR/bin/pythonscripts/PetscGenerateFEMQuadrature.py 2 1 1 1 >>>>>> laplacian ex52.h >>>>>> ['/opt/apps/PETSC/petsc-dev/bin/pythonscripts/PetscGenerateFEMQuadrature.py', >>>>>> '2', '1', '1', '1', 'laplacian', 'ex52.h'] >>>>>> 2 1 1 1 laplacian >>>>>> [{(-1.0, -1.0): [(1.0, ())]}, {(1.0, -1.0): [(1.0, ())]}, {(-1.0, >>>>>> 1.0): [(1.0, ())]}] >>>>>> {0: {0: [0], 1: [1], 2: [2]}, 1: {0: [], 1: [], 2: []}, 2: {0: []}} >>>>>> Perm: [0, 1, 2] >>>>>> Creating /home/fuentes/snestutorial/ex52.h >>>>>> Creating /home/fuentes/snestutorial/ex52_gpu.h >>>>>> [{(-1.0, -1.0): [(1.0, ())]}, {(1.0, -1.0): [(1.0, ())]}, {(-1.0, >>>>>> 1.0): [(1.0, ())]}] >>>>>> {0: {0: [0], 1: [1], 2: [2]}, 1: {0: [], 1: [], 2: []}, 2: {0: []}} >>>>>> Perm: [0, 1, 2] >>>>>> Creating /home/fuentes/snestutorial/ex52_gpu_inline.h >>>>>> >>>>>> SCRGP2$ make ex52 >>>>>> /usr/bin/mpicxx -o ex52.o -c -O0 -g -fPIC >>>>>> -I/opt/apps/PETSC/petsc-dev/include >>>>>> -I/opt/apps/PETSC/petsc-dev/gcc-4.4.3-mpich2-1.2-epd-sm_20-dbg/include >>>>>> -I/opt/apps/cuda/4.1/cuda/include >>>>>> -I/opt/apps/PETSC/petsc-dev/include/sieve >>>>>> -I/opt/MATLAB/R2011a/extern/include -I/usr/include >>>>>> -I/opt/apps/PETSC/petsc-dev/gcc-4.4.3-mpich2-1.2-epd-sm_20-dbg/cbind/include >>>>>> -I/opt/apps/PETSC/petsc-dev/gcc-4.4.3-mpich2-1.2-epd-sm_20-dbg/forbind/include >>>>>> -I/usr/include/mpich2 -D__INSDIR__=src/snes/examples/tutorials/ ex52.c >>>>>> nvcc -O0 -g -arch=sm_20 -c --compiler-options="-O0 -g -fPIC >>>>>> -I/opt/apps/PETSC/petsc-dev/include >>>>>> -I/opt/apps/PETSC/petsc-dev/gcc-4.4.3-mpich2-1.2-epd-sm_20-dbg/include >>>>>> -I/opt/apps/cuda/4.1/cuda/include >>>>>> -I/opt/apps/PETSC/petsc-dev/include/sieve >>>>>> -I/opt/MATLAB/R2011a/extern/include -I/usr/include >>>>>> -I/opt/apps/PETSC/petsc-dev/gcc-4.4.3-mpich2-1.2-epd-sm_20-dbg/cbind/include >>>>>> -I/opt/apps/PETSC/petsc-dev/gcc-4.4.3-mpich2-1.2-epd-sm_20-dbg/forbind/include >>>>>> -I/usr/include/mpich2 -D__INSDIR__=src/snes/examples/tutorials/" >>>>>> ex52_integrateElement.cu >>>>>> ex52_gpu_inline.h(7): warning: variable "points_0" was declared but >>>>>> never referenced >>>>>> >>>>>> ex52_gpu_inline.h(21): warning: variable "Basis_0" was declared but >>>>>> never referenced >>>>>> >>>>>> ex52_gpu_inline.h(7): warning: variable "points_0" was declared but >>>>>> never referenced >>>>>> >>>>>> ex52_gpu_inline.h(13): warning: variable "weights_0" was declared but >>>>>> never referenced >>>>>> >>>>>> ex52_gpu_inline.h(21): warning: variable "Basis_0" was declared but >>>>>> never referenced >>>>>> >>>>>> ex52_gpu_inline.h(28): warning: variable "BasisDerivatives_0" was >>>>>> declared but never referenced >>>>>> >>>>>> ex52_gpu_inline.h(7): warning: variable "points_0" was declared but >>>>>> never referenced >>>>>> >>>>>> ex52_gpu_inline.h(21): warning: variable "Basis_0" was declared but >>>>>> never referenced >>>>>> >>>>>> ex52_gpu_inline.h(7): warning: variable "points_0" was declared but >>>>>> never referenced >>>>>> >>>>>> ex52_gpu_inline.h(13): warning: variable "weights_0" was declared but >>>>>> never referenced >>>>>> >>>>>> ex52_gpu_inline.h(21): warning: variable "Basis_0" was declared but >>>>>> never referenced >>>>>> >>>>>> ex52_gpu_inline.h(28): warning: variable "BasisDerivatives_0" was >>>>>> declared but never referenced >>>>>> >>>>>> /usr/bin/mpicxx -O0 -g -o ex52 ex52.o ex52_integrateElement.o >>>>>> >>>>>> -Wl,-rpath,/opt/apps/PETSC/petsc-dev/gcc-4.4.3-mpich2-1.2-epd-sm_20-dbg/lib >>>>>> -L/opt/apps/PETSC/petsc-dev/gcc-4.4.3-mpich2-1.2-epd-sm_20-dbg/lib >>>>>> -lpetsc >>>>>> -Wl,-rpath,/opt/apps/PETSC/petsc-dev/gcc-4.4.3-mpich2-1.2-epd-sm_20-dbg/lib >>>>>> -ltriangle -lX11 -lpthread -lsuperlu_dist_3.0 -lcmumps -ldmumps -lsmumps >>>>>> -lzmumps -lmumps_common -lpord -lparmetis -lmetis -lscalapack -lblacs >>>>>> -Wl,-rpath,/opt/apps/cuda/4.1/cuda/lib64 -L/opt/apps/cuda/4.1/cuda/lib64 >>>>>> -lcufft -lcublas -lcudart -lcusparse >>>>>> -Wl,-rpath,/opt/MATLAB/R2011a/sys/os/glnxa64:/opt/MATLAB/R2011a/bin/glnxa64:/opt/MATLAB/R2011a/extern/lib/glnxa64 >>>>>> -L/opt/MATLAB/R2011a/bin/glnxa64 -L/opt/MATLAB/R2011a/extern/lib/glnxa64 >>>>>> -leng -lmex -lmx -lmat -lut -licudata -licui18n -licuuc >>>>>> -Wl,-rpath,/opt/epd-7.1-2-rh5-x86_64/lib -L/opt/epd-7.1-2-rh5-x86_64/lib >>>>>> -lmkl_rt -lmkl_intel_thread -lmkl_core -liomp5 -lexoIIv2for -lexodus >>>>>> -lnetcdf_c++ -lnetcdf -Wl,-rpath,/usr/lib/gcc/x86_64-linux-gnu/4.4.3 >>>>>> -L/usr/lib/gcc/x86_64-linux-gnu/4.4.3 -ldl -lmpich -lopa -lpthread -lrt >>>>>> -lgcc_s -lmpichf90 -lgfortran -lm -lm -lmpichcxx -lstdc++ -lmpichcxx >>>>>> -lstdc++ -ldl -lmpich -lopa -lpthread -lrt -lgcc_s -ldl >>>>>> /bin/rm -f ex52.o ex52_integrateElement.o >>>>>> >>>>>> >>>>>> SCRGP2$ ./ex52 -dim 2 -compute_function -show_residual -batch >>>>>> Residual: >>>>>> Vector Object: 1 MPI processes >>>>>> type: seq >>>>>> -0.25 >>>>>> -0.5 >>>>>> 0.25 >>>>>> -0.5 >>>>>> -1 >>>>>> 0.5 >>>>>> 0.25 >>>>>> 0.5 >>>>>> 0.75 >>>>>> SCRGP2$ ./ex52 -dim 2 -compute_function -show_residual -batch -gpu >>>>>> [0]PETSC ERROR: IntegrateElementBatchGPU() line 323 in >>>>>> src/snes/examples/tutorials/ex52_integrateElement.cu >>>>>> [0]PETSC ERROR: FormFunctionLocalBatch() line 679 in >>>>>> src/snes/examples/tutorials/ex52.c >>>>>> [0]PETSC ERROR: SNESDMComplexComputeFunction() line 431 in >>>>>> src/snes/utils/damgsnes.c >>>>>> [0]PETSC ERROR: main() line 1021 in src/snes/examples/tutorials/ex52.c >>>>>> application called MPI_Abort(MPI_COMM_WORLD, 35) - process 0 >>>>>> >>>>> >>>>> This is failing on cudaMalloc(), which means your card is not >>>>> available for running. Are you trying to run on your laptop? >>>>> If so, applications like Preview can lock up the GPU. I know of no way >>>>> to test this in CUDA while running. I just close >>>>> apps until it runs. >>>>> >>>>> Thanks, >>>>> >>>>> Matt >>>>> >>>>> >>>>>> >>>>>> On Tue, Mar 27, 2012 at 8:37 PM, Matthew Knepley <knepley at >>>>>> gmail.com>wrote: >>>>>> >>>>>>> On Tue, Mar 27, 2012 at 2:10 PM, Blaise Bourdin <bourdin at >>>>>>> lsu.edu>wrote: >>>>>>> >>>>>>>> >>>>>>>> On Mar 27, 2012, at 1:23 PM, Matthew Knepley wrote: >>>>>>>> >>>>>>>> On Tue, Mar 27, 2012 at 12:58 PM, David Fuentes < >>>>>>>> fuentesdt at gmail.com> wrote: >>>>>>>> >>>>>>>>> Hi, >>>>>>>>> >>>>>>>>> I had a question about the status of example 52. >>>>>>>>> >>>>>>>>> >>>>>>>>> http://petsc.cs.iit.edu/petsc/petsc-dev/file/a8e2f2c19319/src/snes/examples/tutorials/ex52.c >>>>>>>>> >>>>>>>>> http://petsc.cs.iit.edu/petsc/petsc-dev/file/a8e2f2c19319/src/snes/examples/tutorials/ex52_integrateElement.cu >>>>>>>>> >>>>>>>>> >>>>>>>>> Can this example be used with a DM object created from an >>>>>>>>> unstructured exodusII mesh, DMMeshCreateExodus, And the FEM assembly >>>>>>>>> done >>>>>>>>> on GPU ? >>>>>>>>> >>>>>>>> >>>>>>>> 1) I have pushed many more tests for it now. They can be run using >>>>>>>> the Python build system >>>>>>>> >>>>>>>> ./config/builder2.py check src/snes/examples/tutorials/ex52.c >>>>>>>> >>>>>>>> in fact, you can build any set of files this way. >>>>>>>> >>>>>>>> 2) The Exodus creation has to be converted to DMComplex from >>>>>>>> DMMesh. That should not take me very long. Blaise maintains that >>>>>>>> so maybe there will be help :) You will just replace >>>>>>>> DMComplexCreateBoxMesh() with DMComplexCreateExodus(). If you request >>>>>>>> it, I will bump it up the list. >>>>>>>> >>>>>>>> >>>>>>>> DMMeshCreateExodusNG is much more flexible than DMMeshCreateExodus >>>>>>>> in that it can read meshes with multiple element types and should have >>>>>>>> a >>>>>>>> much lower memory footprint. The code should be fairly easy to read. >>>>>>>> you >>>>>>>> can email me directly if you have specific questions. I had looked at >>>>>>>> creating a DMComplex and it did not look too difficult, as long as >>>>>>>> interpolation is not needed. I have plans to write >>>>>>>> DMComplexCreateExodus, >>>>>>>> but haven't had time too so far. Updating the Vec viewers and readers >>>>>>>> may >>>>>>>> be a bit more involved. In perfect world, one would write an EXODUS >>>>>>>> viewer >>>>>>>> following the lines of the VTK and HDF5 ones. >>>>>>>> >>>>>>> >>>>>>> David and Blaise, I have converted this function, now >>>>>>> DMComplexCreateExodus(). Its not tested, but I think >>>>>>> Blaise has some stuff we can use to test it. >>>>>>> >>>>>>> Thanks, >>>>>>> >>>>>>> Matt >>>>>>> >>>>>>> >>>>>>>> Blaise >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> Let me know if you can run the tests. >>>>>>>> >>>>>>>> Thanks >>>>>>>> >>>>>>>> Matt >>>>>>>> >>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> David >>>>>>>>> >>>>>>>> -- >>>>>>>> What most experimenters take for granted before they begin their >>>>>>>> experiments is infinitely more interesting than any results to which >>>>>>>> their >>>>>>>> experiments lead. >>>>>>>> -- Norbert Wiener >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> Department of Mathematics and Center for Computation & Technology >>>>>>>> Louisiana State University, Baton Rouge, LA 70803, USA >>>>>>>> Tel. +1 (225) 578 1612, Fax +1 (225) 578 4276 >>>>>>>> http://www.math.lsu.edu/~bourdin >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> What most experimenters take for granted before they begin their >>>>>>> experiments is infinitely more interesting than any results to which >>>>>>> their >>>>>>> experiments lead. >>>>>>> -- Norbert Wiener >>>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>>> -- >>>>> What most experimenters take for granted before they begin their >>>>> experiments is infinitely more interesting than any results to which their >>>>> experiments lead. >>>>> -- Norbert Wiener >>>>> >>>> >>>> >>> >>> >>> -- >>> What most experimenters take for granted before they begin their >>> experiments is infinitely more interesting than any results to which their >>> experiments lead. >>> -- Norbert Wiener >>> >> >> > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20120328/e31420da/attachment-0001.htm>